992 Matching Annotations
  1. Oct 2022
    1. Author Response:

      Reviewer #1 (Public Review):

      This is an excellent paper with extensive data and important results. The authors present convincing data that resurgent sodium current from Nav1.8 and Nav1.9 channels is mediated, at least in part, by FHF proteins. [...] Altogether, the results in the paper make a major contribution to understanding the molecular events involved in generating resurgent current in Nav1.8 and Nav1.9 channels. The paper contains an impressive amount of data building on an equally impressive foundation of techniques developed in previous work and the results are clear and convincing.

      We thank the reviewer for the positive comments.

      There are some aspects of the presentation that could be improved. Line 74 "…and show for the first time that INaR can be reconstituted in heterologous systems by coexpressing full-length A-type FHFs with VGSC α-subunits." It seems debatable whether the expression of Nav1.8 in ND7/23 cells constitutes truly "heterologous" expression. After all, ND7/23 cells are an immortalized DRG cell line. At the least, the authors need to explain why ND7/23 cells were used for the Nav1.8 expression and to acknowledge that ND7/23 cells may express proteins in addition to the transfected Nav1.8 and FHF that could be important for the generation of resurgent current. Did they ever attempt to express Nav1.8 and FHF4A in HEK or CHO cells? There should also be a reference to the literature (I suppose Lee et al PLoS One, 14:e0221156y, 2019) showing that ND7/23 cells do not express endogenous Nav1.8 currents.

      We agree with the reviewer that ND7/23 cells are not the typical heterologous cell line. While ND7/23 cells are partly derived from rat DRG neurons, multiple reports have shown that they express Nav1.8. We now cite the paper from John et. Al, 2004 “Heterologous expression and functional analysis of rat Nav1.8 (SNS) voltage-gated sodium channels in the dorsal root ganglion neuroblastoma cell line ND7-23” along with Lee et al., 2019, which as the reviewer points out showed that these cells do not express Nav1.8 message but do express mouse Nav1.6 and Nav1.7. Based on these published studies, we believe that “heterologous expression” is appropriate, but we have clarified the use of the cell line in several places and We have attempted to transiently express Nav1.8 and FHF4A in Hek293 cells, but no Nav1.8 currents were elicited under whole-cell recording configuration. We have not attempted to express Nav1.8+FHF4A in CHO cells because the literature indicates that transient transfection of recombinant Nav1.8 in CHO cells has yielded no or low-level functional currents (Zhou et al., 2019; John et al., 2004). We now explicitly acknowledge state that “ND7/23 cells are derived from the fusion of rat DRG neurons with the N18Tg2 mouse neuroblastoma cell lines, and thus may express proteins in addition to the transfected VGSCs and FHF that could be important for the differential effects on resurgent currents and long-term inactivation that we observed with Nav1.8 and Nav1.6.” in the discussion.

  2. Sep 2022
    1. Author Response

      Reviewer #2 (Public Review):

      Suggestions to improve the paper:

      Major Issues

      1) I do not think that the introduction accurately reflects the state of the field with respect to single cell omics and nerve injury. The CCI model is different than the SNI model, which has been used in most previous studies, in terms of the nature of the injury, and the resolution of pain after the injury. I do not think it is accurate to claim that the CCI model is somehow more relevant clinically, because both models are just that. It is also not really true that co-mingling, un-injured neurons have not been profiled before. The Renthal paper did this, but using a different model. There is value in what the authors have done here, but they can state it more clearly in the introduction. In particular, most published studies have only used male mice, so the sex differences aspect of this work is important. In that regard, the authors did not cite any of the growing literature on sex differences in neuropathic pain mechanisms.

      We revised the introduction and discussion to address the comments. Specifically, we revised the related information about animal models (Page 4-5). Although Renthal et al. examined co-mingling, “un-injured” neurons using a sciatic crush injury model, they did not find cell-type specific changes in uninjured neurons. The reason for this is unclear, but we speculate that it may be partially due to differences in the techniques (e.g., tissue processing, cell sorting, sequencing depth) and animal models (CCI versus crush injury). Compared to sciatic CCI induced by loose ligation of the sciatic nerve, crush injury would injure most nerve fibers (~50% of L3-5 DRG neurons are axotomized in this model). Therefore, the remaining “uninjured’ neurons for sequencing may be much less than that in the CCI model. In addition, we used Pirt-EGFPf mice to establish a highly efficient purification approach to enrich neurons for scRNA-seq and therefore largely increased the number of genes detected in DRG neurons. Comparatively, the neuronal selectivity and number of genes detected were lower in the previous study, which may have resulted in fewer DEGs and decreased ability to detect aforementioned changes. We include a brief discussion (Page 24).

      We appreciate the reviewer’s good suggestion, and cited sex differences studies in neuropathic pain mechanisms (Pages 5, 25). Although our findings suggest that peripheral neuronal mechanisms may also underlie sexual dimorphisms in neuropathic pain, Renthal et al. reported no differences in subtype distributions or injury-induced transcriptional changes between males and females after sciatic nerve crush injury (Renthal et al., 2020). We also discussed the differences between current findings and previous work and also emphasized the sex differences aspect of this work in the discussion (Page 25).

      2) I am curious about the choice to only use samples from 7 days after CCI. One of the advantages of the CCI model is that pain resolves at about 35-60 days, depending on how the ligations are done, and this allows one to look at how transcriptional programs change in DRG neurons after pain resolves. This would give some new insight, at least in comparison to the very comprehensive profiling done in the sciatic nerve crush model by Renthal and colleagues.

      We thank the reviewer for this comment. We provided the rationale for day 7 post-CCI (Page 22). It is the time point when neuropathic pain-like behavior is fully developed in most animals, and the post-injury time point examined in many previous studies. The reviewer is correct, an advantage of the CCI model is that pain resolves at about 35-60 days. Although meaningful, it was not our intention to conduct a time course study to fully characterize time-dependent transcriptional changes using scRNA-seq, which is costly and requires a great effort for data analysis, etc., and is beyond the scope of the current study. We will address this in a future study, and provided a brief discussion (Page 22).

      3) An alternative interpretation of the ATF3 expression is that the dissociation protocol causes this upregulation. ATF3 induction may be rapid and could occur due to the technique the authors chose to use. This could be acknowledged.

      We agree and acknowledged this in our original discussion (Page 22).

      4) I think the authors are a bit over-confident in their call of "injured" and "un-injured" neurons based on Sprr1a expression. This is really the only grounds for calling these neurons injured or uninjured. The fact is that the CCI model does not provide a clear way to determine injured and uninjured neurons contributing to neuropathic pain. This is an advantage of the SNL model, as shown in many classic papers from the Chung lab.

      We included a brief discussion about Sprr1a (Page 22). Although Atf3 is a classic marker of injured neurons in some previous studies, a recent study suggested that Sprr1a may be a better standard to define “injured” neurons (Nguyen et al., 2017). Although injured and uninjured neurons can be readily separated in the SNL model, they are mostly from different DRGs, but not intermingled in the same DRG. Since glia-neuron interaction and neuron-neuron interaction may occur between cells within the same DRG after injury, these interactions may profoundly affect neuronal excitability and gene expression. Accordingly, we choose the CCI model for the current study to determine whether injured and uninjured neurons contribute to neuropathic pain. We included a brief discussion (Page 5, 23, 24).

      5) There are now two papers on human DRG neurons that are available. One was recently published in eLife, and the other is available on Biorxiv, and has been there since Feb 2021. I expected the authors to make some comparisons of cell types that are changing in CCI with populations that are found in humans. Would similar effects be expected? Are these cell types represented in the human DRG?

      Study of human DRG is important, and recent studies elegantly characterized neurochemical and physiological properties. Previous findings have suggested some notable difference between human and rodent DRGs. Importantly, many markers and methods used for classifying subpopulations of rodent DRG neurons do not apply well to human DRG neurons. In addition, data from human DRG came from patients with different etiologies, but not due to peripheral nerve injury as in the animal study. Due to these differences, we feel that it is difficult to make direct compassion of cell types that are changing in CCI with corresponding human DRG neurons.

      Minor Issues

      1) Does the 40 um cell strainer eliminate some larger diameter cells from the analysis?

      We think this is unlikely, as large-diameter cells such as NF1 and NF2 clusters were also observed in our dataset. Importantly, we examined the cell strainer by washing it out inversely and did not find single cells. In addition, all subtypes identified in other studies were also found in our study. Nevertheless, an underrepresentation of the amount of NF neurons may be a result of the fact that not all NF neurons are GFP-positive in Pirt-EGFPf mice. In Pirt-EGFPf mice, expression of the knockin EGFPf was under the control of the endogenous Pirt promoter. Anti-GFP antibody staining revealed that GFP is widely expressed in 83.9% of all neurons. However, Pirt-negative neurons are mainly NF200+ and have large-diameter cell bodies. In addition, compared to small neurons, large neurons are also easier to lose during FACS sorting. We included a brief discussion of this potential limitation, as the NF population may be underrepresented in our sample set (Page 21).

    1. Author Response

      Reviewer #2 (Public Review):

      Zhong et al conducted a scRNA-seq analysis to uncover the features in multiple myeloma (MM) based on the Revised International Staging System (R-ISS) stage. They contributed 11 scRNA-seq datasets, including 9 MM samples and 2 healthy BMMC. And validated their findings using the deconvolution method in large cohorts.

      In addition, the newly identified and validated a subset of GZMA+ cytotoxic multiple myeloma cells. The experiments were nicely conducted and the datasets generated in this study might benefit many other studies. Major comments:

      1) Several studies on scRNA-seq in MM have been reported, but different from that reported in this study. The authors might discuss the insight gained from their study.

      Thanks for your comments. Several studies on scRNA-seq in MM have been disclosed some heterogeneity of MM. For example, Jang JS et al identified the molecular pathways during MM progression (MGUS, SMM, NDMM, and RRMM) [Blood Cancer J. 2019 Jan 3;9(1):2.]. Jean Fan et al devised a computational approach called HoneyBADGER to identify copy number variation and loss of heterozygosity in individual cells from single-cell RNA-sequencing data [Genome Res. 2018 Aug; 28(8):1217-1227.]. These studies verified the high heterogeneities existed in MM. But the specific the mechanism was not clear. Furthermore, these studies didn’t specify the heterogeneity among different stages in R-ISS staging system, which has been an international wide used prognostic stratification system. Therefore, we focused on the specific cluster, marker, and cross-talk pattern among the three stages of MM to reveal the potential mechanism of heterogeneity.

      2) The author claimed Proliferating plasma cells were increased in EBV-positive MM patients. It would be interesting to examine the abundance of EBV RNA levels in the scRNA-seq datasets. Several tools, such as viral-track or PathogenTrack, might be used to conduct such analysis.

      Thanks for the reviewer’s great suggestions and comments. According to your suggestion, we used PathogenTrack to identify pathogens in MM patients and added this analysis results in the file ‘Data for reviewers-1(PathogenTrack).xlsx’. However, the algorithm did not identify EBV reads in the scRNA-seq datasets. In order to verify our conclusion, we collected more MM patients’ samples and examined EBV, MKI67, and PCNA. Our result showed that EBV positive samples had significantly higher MKI67 and PCNA expression, compared with EBV negative samples on Lines 193 to 195, Page 6 (in Figure 5B and 5C).

      3) Methods used for deconvolution are missing.

      We thank the reviewer’s comments and suggestions. In our study, we didn’t use an analytical tool named CIBERSORT, thus we didn’t use deconvolution either in the manuscript. It may cause you a misunderstanding because of our unclear description.

      Reviewer #3 (Public Review):

      The authors constructed a single-cell transcriptome atlas of bone marrow in normal and R-ISS-staged MM patients. A group of malignant PC populations with high proliferation capability (proliferating PCs) was identified. Some intercellular ligand receptors and potential immunotargets such as SIRPA-CD47 and TIGIT-NECTIN3 were discovered by cell-cell communication. A small set of GZMA+ cytotoxic PCs was reported and validated using public data.

      For scRNA-seq data analysis, the authors did QC and filtering and removed low quality cells, including some doublets and followed by batch effect correction. Malignant PC populations were identified using the copy number analysis tool "inferCNV".

      The authors have done lots of analysis. But I think the results can be improved if they can do more analyses. I would recommend to 1) analyze doublets; 2) remove cell cycle effect; 3) GO and pathway analysis for genes with copy number change; 4) do cell-cell communication with more cell type/clusters.

      Thanks for your suggestion and comment.

      1) We applied Scrublet to computationally infer and remove doublets in each sample individually, with an expected doublet rate of 0.06 and default parameters used otherwise. The doublet score threshold was set by visual inspection of the histogram in combination with automatic detection. Information about this description was added to material and methods section as ‘We applied Scrublet [74] to computationally infer and remove doublets in each sample individually, with an expected doublet rate of 0.06 and default parameters used otherwise. The doublet score threshold was set by visual inspection of the histogram in combination with automatic detection.’ accordingly in Lines 731-734, Page 27.

      2) As we focused on the differences in proliferative capacity of myeloma cells, the cell cycle could reflect the difference well. Therefore, the cell cycle data was provided accordingly. Information about this description was added into main text as ‘Next, we analysed the cell cycle of six PC clusters, and distinguished them from other clusters, PCs in cluster 6 (PCC6) were presumably enriched in G2/M stage (Figure. 3B)’ in Lines 142-144, Page 5.

      3) We have analyzed the GO and pathway analysis for genes with copy number changes, and provided the file ‘Data for reviewers-2 and 3 (InferCNV for PCC4 and PCC6)’. Based on this, we found that oxidative phosphorylation was the most significant enriched pathways for PCC4 and PCC6, respectively. Cell-cell communication with more cell type/clusters was provided with the supplementary data in the file ‘Data for reviewers-3 (Overall T cells interaction ligand-receptor pairs dotplot, Overall T cells interaction ligand-receptor, Overall T cells interaction map)’.

      Data analysis of public data was sufficient to prove the small set of GZMA+ cytotoxic PCs. More data analysis or wet experiment proof is required.

      Thanks for your suggestion. The subset of cytotoxic PCs was identified in this study. These PCs exhibited NKG7 and GZMA. Furthermore, NKG7 showed the higher expression level than NKG7. Therefore, we validated it using Multi-parameter Flow Cytometry (MFC) and Immunofluorescence in MM samples. We identified a new subset of NKG7+ cytotoxic PCs and found that the percentage of NKG7+ PCs displayed obvious diversities among stage I, II and III groups. Information about this description was added in the main text as ‘In another MM single-cell dataset focusing on PC heterogeneity of symptomatic and asymptomatic myeloma (dataset GSE117156) [19], one cluster, C21, exclusively expressing NKG7 corresponded to PC18 in our dataset (Fig 2C-2D). In GSE117156 of all 42 samples, the cell proportion varied from 0% to 30.95% of all PCs, with an average percentage of 4.28% (Figure. 2E).Next, immunofluorescence confirmed the expression of NKG7 in cytoplasm of PCs (CD138 positive) from patients with MM (Figure. 2F). Finally, twenty MM patients (stage I: three patients, stage II: 10 patients and stage III: seven patients) were enrolled for multi-parameter flow cytometric (MFC) analysis. The results showed that the percentage of NKG7+ PCs displayed obvious diversities among stage I, II and III groups (Figure. 2G and Figure. S2). The average percentage of NKG7+ population was 2.73% in stage I, 8.89% in stage II and 0.58% in stage III (Figure. 2G and Figure. S3). In summary, we characterized a NKG7+ PC population (PC18), which may provide a novel perspective for the cytotherapy of MM.’ in Figure 2 and S3 and Lines 118-130, Page 4-5.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors argue that xgO secretes Spi and Col4a1 to induce MAPKdependent L5 differentiation. However, no loss-of-function condition for these putative ligands was tested. Since they speculated that expression of Spi and Col4a1 alone may not lead to a sufficient level of MAPK activity, the results of their loss of function conditions have to be included in the paper.

      We agree with Reviewer #2 completely. Our manuscript now includes spi and Col4a1 loss-of-function data specifically in xgO, which has strengthened our manuscript considerably and allows us to draw stronger conclusions as to the roles of Spi and Col4a1.

      The authors found ectopic L5 neurons when apoptosis was repressed (Fig. 1). It is likely that cells that fail to differentiate to L5 are removed by apoptosis, but this link was not clearly demonstrated in the paper. As a result, there is a gap between the data in Fig. 1 (section 1 in the text) and the other part of the paper. The relationship between Fig. 1 and the other data should be carefully discussed. In my opinion, the first section of Results should be moved after the last section so that the results of Fig. 1 are explained as a potential mechanism to remove cells that failed to differentiate to L5.

      We have restructured the manuscript as suggested.

      Reviewer #3 (Public Review):

      There is considerable overlap with Fernandes et al 2017 Science paper: (1) That EGFR signalling is required for L5 neuron survival had been shown in their Fernandes et al 2017 Science paper, as over-expression of p35 rescued apoptosis caused by EGFRDN. Now, using Dronc mutants in the current manuscript is an equivalent experiment. (2) In Fernandes et al 2017 Science, they over-express activated MAPK in lamina neurons (Fig.1G), and in the current, they over-express its target Pnt-P1 (Fig.1I) - equivalent experiment. (3) Figure S1 reports Lamina>MAPKACT rescues Bsh and Spl2 positive neurons. These data are similar to those reported in Fernandes et al 2017 Science, where they showed the rescue of lamina neurons with this same genotype. (4) rho3 mutants cannot secrete Spi and L1-4 cannot differentiate and only a few L5 do (Fernandes et al 2017 Science), they then rescued this phenotype including L5s by over-expressing EGFRACT or Ras in wrapping glia (Figure 2F-I). With the submitted manuscript, they rescue with rho3 overexpression in photoreceptors - genetically different, but rather similar, as together they demonstrate that rescue of L5 requires rho or spi. These close similarities reduce the appeal and novelty of the current manuscript.

      We agree with the reviewer that our previous work established that MAPK signalling was necessary and sufficient to drive premature neuronal differentiation in the lamina. Therefore, we have removed the data related to this point, which were previously contained in Figure S3A-C of our prior submission; namely laminats>AopACT and DroncI24; UAS-AopACT MARCM clones.

      However, this manuscript makes substantially different points from the previous paper regarding the roles of EGFR activity and survival. Although Fernandes et al., (2017) did show that lamina neurons differentiated prematurely in lamina>MAPKACT, here we evaluate apoptosis and lamina neuron sub-type identities and show that the ‘extra’ LPCs do not die but differentiate into L5s under these conditions. This is a key message of our manuscript and was not evaluated nor reported before. Additionally, the Dronc mutants used here reveal that preventing apoptosis is not sufficient to drive differentiation of the additional LPC in each column, addressing a different point and not simply reproducing prior data showing the EGFR promotes LPC survival.

      Similarly, we previously established that photoreceptor-derived Spi was received by wrapping glia, the involvement of photoreceptor-Spi and L5 differentiation had not been thoroughly explored and the involvement of xgO is novel.

      Establishing the cells expressing spi, argos, Col41a and Ddr is key to supporting the hypothesis. The authors claim that they confirmed the best screen candidates by testing their expression using enhancer trap lines. What is the evidence that these enhancer trap reporters reproduce the endogenous expression patterns of these genes? A description of their location in the loci and potential drawbacks should be provided and discussed.

      We now clarify whether enhancer traps used in our study were validated previously and provide in situ hybridization chain reaction data where enhancer traps were not previously validated.

      Fig.4A and Fig.S3K do not demonstrate that aos-lacZ and Ddr-lacZ are in L5 neurons, and showing this with Bsh and Spl2 as they do for other data would support the claim that L5 neurons receive Col4a1 and distal L5 neurons can receive aos.

      We use L5 specific markers with aoslacZ. For Ddr-Gal4>UAS-lacZ the entire lamina was labelled, and we provide new data showing Ddr expression by in situ hybridization chain reaction to show that it is expressed throughout the lamina.

      Fig.S3M uses HCR in situ to show that spi mRNA is found in xg{degree sign} glia. However, the given images are not convincing. Since in situs detect mRNA, wouldn't the nuclear signal correspond to two sites of transcription, whereas a more abundant signal would be expected in the cytoplasm? Instead, the nucleus contains as many spots as the surrounding background and there is no clear signal in the cytoplasm. The authors must provide separate channels and convincing evidence that spi mRNA is present in xg{degree sign} glia or remove/weaken the claim (ie use only the GAL4 evidence).

      We have understood that the main concern around the spi HCR included in our manuscript relates to the fact that the signal detected in the nucleus was more abundant than just two puncta as would be expected from two sites of transcription.

      The reviewers are correct that only two puncta corresponding to active sites of transcription would be expected in the nucleus when detected by single molecule FISH (smFISH). However, here we are not using smFISH but HCR with maximal amplification. This results in signal proportional to the relative abundance of transcripts (Choi et al., 2018; Trivedi et al., 2018) and as such all transcripts, including those moving away from the transcription site in the nucleus, are also detected by this method. Other groups who have used this method also report the same (Andrews et al., 2020; Duckhorn et al., 2022; Schwarzkopf et al., 2020; Zhuang et al., 2020). We used this form of HCR over single molecule HCR (smHCR or digital-HCR), which uses limited amplification (Trivedi et al., 2018), as these other methods require diffraction-limited spot detection, which would be very challenging in our system.

      We apologise for not explaining the HCR protocol sufficiently and have included more details in the Materials and Methods.

      In addition to using HCR to detect spi expression in xgO in controls and when EGFR signalling is blocked in xgO, we now also provide new data to show Col4a1 and Ddr expression using HCR, to lend support to enhancer traps that were not validated previously. We found that both spi and Col4a1 expression in xgO decreased when EGFR signalling was blocked in xgO and provide single channel images in Figure 3 – figure supplement 1.

      With this clarification, we hope the reviewers will reconsider the inclusion of these data as we feel it is important to show that xgO express these ligands in an EGFR signalling-dependent manner, especially in light of the spi and Col4a1 loss-of-function data detailed above. Nonetheless, if the reviewers still feel that these data should be removed from the manuscript, we will be happy to do so.

      Involvement of Spi does not seem to have been entirely unresolved. They show that over-expression of rho3 in photoreceptors in rho 3 mutants rescued L5 neurons, suggesting that Spi from photoreceptors can rescue L5 neurons. As this is slightly different from what they saw before, what is the penetrance of these phenotypes? These phenotypes have not been quantified (other than providing sample size) and the incomplete penetrance of phenotypes could explain both observations.

      Spi secreted from photoreceptor axons is insufficient to induce L5 neuronal differentiation directly as it is unable to do so when EGFR signalling is blocked in xgO (Figure 1F,H, Figure 1 – figure supplement 1N). Therefore our results argue that xgO are a critical mediator of photoreceptor signals. Since restoring rho3 expression in photoreceptors in rho3 background rescues neuronal differentiation of all lamina neurons, these results imply that the signalling relays through both wrapping glia and xgO have been reactivated.

      We have quantified of the number of L5s per column in rho3 heterozygotes, rho3 homozygotes and in rho3 homozygotes when rho3 expression was restored in photoreceptors only (Figure 3C). Importantly, compared to rho3 heterozygotes, the number of L5s per column in rho3 homozygotes was significantly reduced (Figure 3C; one-way ANOVA with Dunn’s multiple comparisons test with rho3/-; GMR as control; P****<0.0001), whereas they were fully rescued in rho3; GMR>rho3 (Figure 3C; one-way ANOVA with Dunn’s multiple comparisons test with rho3/-; GMR as control; P>0.05).

      They claim that whereas L5 neurons are lost in xg{degree sign}>EGFRDN over-expressing glia, concomitant over-expression of Spi rescues L5 neurons. Also, over-expression of spi with xg{degree sign}>spi clearly results in ectopic L5 neurons. However, in Fig.3P they show rescue with membrane-tethered m.spi and not secreted s.spi. Why was secreted s.spi not used instead? How does membrane-tethered spi from glia reach to rescue distal L5 neurons?

      Spi is initially produced as an inactive transmembrane precursor (mSpi) that needs to be cleaved into its active secreted form (sSpi) (Tsruya et al., 2002). This requires the intracellular trafficking protein Star and Rhomboid proteases (Tsruya et al., 2002; Urban et al., 2002; Yogev et al., 2008). mSpi thus represents wild-type (unprocessed) Spi. Whereas misexpression of sSpi results in secretion of active Spi from any cell type, misexpression of mSpi results in secretion of active Spi only from cells capable of processing mSpi to sSpi.

      Thus, mis-expressing mSpi to rescue L5 neurons in the xgO>EGFRDN background also demonstrates that xgO are capable of processing mSpi into sSpi, which is a more stringent experimental condition and gives us more confidence in our results. We also performed these experiments with sSpi and observed an equivalent and statistically significant rescue (included in the quantifications in Figure 3 – figure supplement 1C). We have also clarified the use of these reagents in the text as follows:

      Page 6, lines 166-168:

      “Spi is initially produced as an inactive transmembrane precursor (mSpi) that needs to be cleaved into its active secreted form (sSpi) (Tsruya et al., 2002). This requires the intracellular trafficking protein Star and Rhomboid proteases (Tsruya et al., 2002; Urban et al., 2002; Yogev et al., 2008).”

      And Page 8, lines 221-223:

      “Note that expressing either sSpi or wild-type (unprocessed) mSpi (referred to as Spiwt) in xgO rescued L5 numbers (Figure 3 – Figure supplement 1C), indicating that xgO are capable of processing mSpi into the active form (sSpi).”

      To support the involvement of spi in promoting survival of proximal L5 in wildtype, a loss of function experiment would be required e.g. xg{degree sign}>spi-RNAi, and visualise apoptosis with Dcp1 and remaining L5 neurons.

      We knocked down spi and Col4a1 simultaneously in xgO and observed a statistically significant decrease in the number of L5 neurons relative to controls (Figure 3T-W and Figure 3 – figure supplement 2A-B). Under these conditions we also observed Dcp1 positive cells in the most proximal row of the lamina, which were never observed in controls. Thus, suggesting that Spi and Col4a1 promote L5 neuronal differentiation and survival.

      Quantifications are incomplete in places and statistical analysis is incorrect in places. For genotypes that are not quantified in graphs (ie cell number), sample sizes have been provided, but phenotypic penetrance has not (Fig.1F dronc-/-; Fig.2K, L rho3 and rescue) and this is required to report variability.

      We apologise for these omissions. We have quantified the rho3 mutant and rescue phenotypes. The Dronc mutant phenotype was fully penetrant and we have stated this explicitly in the text.

      Fig.2I, J: A quantification is provided within the text for apoptosis caused by xg{degree sign}>EGFRDN, with 5.93{plus minus}0.18 Dcp1 cells per column (N=19). However, this number alone does not mean much unless it is compared to Dcp1 in wild-type. Apoptosis in wild-type is shown but not quantified in Fig.2I. A comparison of Dcp1 counts in control and xg{degree sign}>EGFRDN is required and validated with statistical analysis.

      We thank the reviewer for pointing out this mistake. We have now added the graph to the figure (Figure 2D) and have stated this explicitly in the text as follows:

      Page 5-6, line 151-156 (Figure 2D):

      “We used an antibody against the cleaved form of Death Caspase-1 (Dcp-1), an effector caspase, to detect apoptotic cells (Akagawa et al., 2015) and, indeed, observed a significant increase in the number of Dcp-1 positive cells in the lamina when EGFR signalling was blocked in the xgO (132.8 cells/unit volume ± 19.48 standard error of the mean) compared to controls (49.14 cells/unit volume ± 4.53) (Figure 2A-B, 2D, P<0.0005, Mann-Whitney U Test).”

      Fig.S3L, P: authors claim that over-expression of spi in xg{degree sign}>EGFRDN does not rescue nuclear dpMAPK in xg{degree sign}, but it does in L5 neurons. However, the quantification of these data in Fig.S3L shows that nuclear:cytopl dpMAPK levels are not statistically significantly different from xg{degree sign}>EGFRDN. No evidence has been provided of how this single piece of data supports both contradictory claims. The authors must either quantify accurately and separately dpMAPK in xg{degree sign} glia and L5 neurons - it is unclear how this could be done from the data provided - or remove or modify the claim to adjust accurately to the data.

      We have now quantified dpMAPK levels in both xgO and L5s in these conditions.

      Statistical analysis needs revising. It is unclear why they use non-parametric tests throughout, are data always not normally distributed? The use of bar charts, means, and s.e.m. combined with non-parametric tests does not faithfully represent the data, and box plots or other displays (eg volcano or dot plots, etc) that show the distribution would be more appropriate. And multiple comparison corrections are required. For example, if Fig.S3F is a Kurskal Wallis ANOVA (should be, but it is not stated explicitly), then this requires multiple comparison tests to a fixed control (post hoc Dunn test), and the figure legend should provide the p-value for the ANOVA. Fig.3K, P use Mann Whitney test, whereas these graphs have both more than 2 sample types and therefore should be Kruskal Wallis ANOVA (if distributions are not normal, if they are normal they should be One Way ANOVA), and Dunn post hoc comparison to fixed control, box plots, and no s.e.m as above.

      Thank you for flagging that we had not reported our statistical analyses appropriately. We apologise for this and have made sure to explicitly state the statistical test performed for multiple and pairwise comparisons with the Pvalues as detailed by Reviewer 3. These are highlighted throughout the text with track-changes. As well, we have changed all our graphs to box and whisker plots showing the entire distribution of the data as well as the interquartile range, as recommended.

      Much of the data in our manuscript are proportions generated from cell counts and, by definition, are limited to numerical values between 0 and 1 (inclusive). As such, as with count data (i.e. discrete numbers such as from cell counts), parametric statistics are generally inappropriate for proportion data because the data violate assumptions about normality (Douma and Weedon, 2019). Therefore, we used non-parametric tests throughout the manuscript except for Figure 1- Figure Supplement 1R where appropriate assumptions were met..

  3. Aug 2022
    1. Author Response

      Reviewer #2 (Public Review):

      This work will be of potential interest to biologists studying aging. While transposable elements have been reported to have higher expression as organisms age, it was previously unclear if their expression can exacerbate aging phenotypes or if they are a byproduct of aging. The authors present evidence in this manuscript that artificially increasing transposable element expression during the whole Drosophila life cycle can worsen aging phenotypes.

      Strengths

      The authors provide direct evidence that expression of their gypsy construct across the whole life of animals decreases fly lifespan (Figure 4), and that this outcome is dependent on reverse transcriptase (Figure 6).

      Monitoring TE mobilization can be difficult in general and is often expensive when using a sequencing approach. The authors accurately monitor gypsy mobilization from their ectopic copy through qPCR and sequencing.

      Weaknesses

      Experiment design, data interpretation, and story structure:

      The current model proposes that TE increases activity in aged animals and potentially contributes to the aging process. However, this paper artificially drives gypsy activation throughout the whole fly life cycle. Under this design, TE may already bring deleterious effects from early developmental stages or young age, thus ultimately shortening their life cycle. To truly test the function of TE during the aging process, the authors need to temporally control gypsy expression and only express their construct in aged animals.

      Figure 1: I am not sure I got any convincing messages from this figure. First, flies at 30 days of age should not be considered as old. Second, the authors try to claim that TE expression increased with aged FOXO mutants. However, there is no data to show the comparison between aged wild-type and FOXO mutants (panel e is young wt vs young FOXO null). Meanwhile, Figure 1 has nothing to do with Gypsy. How could this figure fit into the story?

      It is clear that we did not do a good job explaining this section. First, we did not mean to imply that the 30-day flies are old. They are simply older than the 5-day flies. The 30-day timepoint was chosen to match previous experiments and data sets in the literature. It was also chosen to minimize any survivor bias that could occur by doing the assay in very old flies. We have clarified this in the text and figures.

      Second, it is the number of transposons that show an increase in expression in the dFOXO null animals that we mean to highlight (18 for dFOXO vs only 2 for wDAH). Panel e is meant to illustrate that the transposon landscapes, even in young flies differ by genotype making a direct transposon to transposon comparison impossible. We have added text to clarify these points.

      Third, we also do not mean to imply that anything here is specific for gypsy. The work going forward in the paper uses gypsy as a tool because it is one of the better understood retrotransposons, there existed a validated active clone of the transposon and it has already been implicated in aging in the fly. We took gypsy as a model retrotransposon. We have added text to clarify here.

      Figure 3: While the data presented in this Figure is sound, it is unclear how this data fits into the overall narrative that transposon activity drives aging.

      Figure 3 is a continuation of the characterization of our ectopic gypsy. We wanted to rule out that there is a “hotspot” of insertion that would account for any phenotypes we observe. We find no hotspot in the males we use for analysis suggesting it is the act of transposition, not a specific target gene that is important. We have added to the text to clarify the motivation for these experiments.

      Figure 5: It is interesting to see the copies of gypsy are not increased after 5 days. Does gypsy still mobilize after this young age? If yes, the authors should observe increased gypsy gDNA in later time points, unless the cells having gypsy new insertions keep dying. The authors should specifically check tissues with low cell turnover (such as brain) or high cell turnover (such as gut).

      Reviewer 2 makes a great observation. In fact, using primer pairs that specifically detect the ectopic gypsy, we consistently see insertion numbers go down in very old animals (figure 5a&b). With our current understanding of retrotransposition, we should not be able to see loss of insertions unless the host cells are being lost from the analysis. We favor the idea that the reviewer suggests; that the cells that have high levels of insertion are dying and disappearing from the analysis. We think this is also reflected in the bias for intergenic or intronic sequences in our insertion mapping of figure 3. In an attempt to address this question we did measure insertions in heads versus bodies. In male flies aged 14 days there was no difference in the average number of insertions (although the variability was greater in heads). This data is reported in Supplemental Figure 6a.

      Figure 8: Using Ubiquitin GAL4 to drive both gypsy and FOXO expression could dilute the expression of each individual gene. Thus, it is possible the rescue effect seen by expressing FOXO in addition to gypsy may just be due to lower gypsy expression. Including qPCR data showing gypsy expression levels in Ubi>gypsy, UAS FOXO flies compared to Ubi>gypsy flies would be helpful.

      We included this data in Figure 2b and 8c. Unfortunately, we did not clearly direct the reader to compare the values. Comparing Figure 2b with Figure 8c shows the RNA level of the ectopic gypsy is comparable in both cases. Perhaps even slightly higher in the UAS-FOXO case. We have added a sentence to make this clear.

      It is unclear if FOXO can rescue TE-specific aging phenotypes. While it appears that FOXO overexpression rescues the decrease in lifespan caused by gypsy expression, the authors did not test if FOXO overexpression could rescue the effects of gypsy in the paraquat resistance assays or rhythmicity experiments.

      We include in this revision data showing dFOXO overexpression rescues the paraquat resistance and lowers the levels of overall insertions in the animals.

    1. Author Response

      Reviewer #2 (Public Review):

      This is a nice study that pulls together a new reference genome and several levels of new popgen and RNAseq data and new analyses to provide interesting new insight on some of the evolutionary forces affecting the evolution of the ~Zal2/Zal2m system which underpins the stripe-colour polymorphism in the white-throated sparrow. The data are well balanced between homozygous Zal2/Zal2 and heterozygous Zal2m/Zal2m birds, and at a technical level, the authors do a good job of accounting for difficulties in disentangling the Zal2 and Zal2m chromosomes in heterozygous birds.

      The authors convincingly show that Zal2m has signs of degeneration, similarly to what has been shown in the fire ant Sb supergene and young Y chromosomes. They show this using multiple approaches (increase in repetitive elements, reduced genetic diversity, increased non-synonymous substitutions...). But they also show that part of Zal2m (which is rare in homozygous form) has something interesting going on, with higher local diversity and evidence of balancing selection. Analysis of allelic-biased expression shows signatures of degeneration, but also that allelic bias is associated with expression differences between morphs.

      The paper is generally well written and includes much novel insight on a timely topic and system.

      Weaknesses in this study might come from:

      • Not fully considering differences in the effects of repetitive elements on apparent genotypes (e.g, segmental duplications or jumping of repetitive elements which may have occurred in Zal2m and which lead reads to appear to be somewhere they are not).

      • Difficulties in accounting for variation in recombination rates along the genome, where low recombination can lead to patterns that look like selective sweeps.

      • Some ambiguities in interpreting allelic biases as adaptive where many of them can simply be collateral effects of the supergene architecture.

      Many of the patterns seen and interpretations offered are similar to what is known from young sex chromosomes, such as in Drosophila, but also the anther rust mating type loci and the fire ant Sb social supergene. The haploid systems of anther rust fungus and fire ant males are able to examine such patterns in more depth than what is readily accessible here.

      We appreciate an overall balanced perspective on our work. We addressed the specific concerns. Some of the limitations of the current system include the lack of high quality long-range sequence data from the species, thus making it difficult to resolve structural variation and repetitive sequences.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors of the paper provide new evidence of how prefrontal cortex of mutant mice used as a disease model of schizophrenia differs from wild type littermates. By analyzing local network dynamics at the level of specific cell type, authors shed new light on the circuit mechanisms that underlie changes in network dynamics in these mice.

      The claims in the submitted manuscript are supported by the data. I have a few comments and questions that need to be clarified.

      We thank the reviewer for highlighting the novelty of our work and its relevance (…shed new light on the circuit mechanisms that underlie changes in network dynamics in these mice…) for the field and the validity of our data (….claims in the submitted manuscript are supported by the data).

      1) Average firing rates

      Authors claim that they saw a significant reduction in interneuron firing rates in Disc1 mutant mice compared to control mice Fig.1c. However, the difference could be general and not interneuron specific. Due to the high firing rates of interneurons, the statistical test will work better on interneurons than on pyramidal cells as pyramidal cells average firing rates are lower. What I suggest to do is to take interneuron cells that fire at a lower rate (lower 33% for example ) and compare the control and Disc1 groups. Also I would suggest to take pyramidal cells that have higher firing rates (upper 33% for example) and compare firing rates across the same groups. One would like to see if these differences are not due to changes in firing rates per se.

      We thank the reviewer for pointing out this important aspect. In our original analysis, we did not take into account that additional differences in the PYR population might be present but ‘masked’ by the overall lower firing rate of that neuronal population. As suggested by the reviewer, we separately considered the firing rate of the ‘top 33%” of the PYR population, which did not significantly differ between genotypes (p=0.958, n=209 control and 245 Disc1 PYRs, Welch’s test). As suggested, we moreover considered the ‘bottom 33%’ of INT firing rates, for which the significantly lower rates of Disc1-mutant INTs remained (control: 4.2 ± 0.6 Hz vs. Disc1: 1.8 ± 0.2, n=26 and 34 neurons, p=0.013, Mann-Whitney U-test). Since only few INTs were recorded per session in some cases (ranges: Disc1: 2-12/session; control: 2-19/session), we performed this analysis on the basis of individual cells (see also our reassessment of the main statistical comparisons in response to #1 by reviewer 2 and #4 by reviewer 3). These data are now reported in the new Fig. 1 – figure supplement 3 and referred to in line 76 ff. (line 72 ff. without tracked changes) of the revised manuscript.

      2) Optogenetic tagging

      Authors indicate that light triggered and spontaneous spike waveform are similar Fig.1d. This is nice, but would be better to see all the tagged neurons. I would suggest showing all optically tagged neurons spike features. Authors can impose with a different color spike features of tagged neurons in Fig.1a. I suspect that since all PVI are narrow spiking and they must fall into the area of blue colored cells in Fig.1a.

      Following the reviewers suggestions, we included the average waveforms with and without light for all opto-tagged PVIs in the revised Fig. 1f. Moreover, we included the kinetic features of opto-tagged PVIs in Fig. 1a (red dots), and separately for control and Disc1-mutant mice in the new Figure 1-figure supplement 2. As predicted by the reviewer, the PVIs indeed cluster with the other putative INTs. We would moreover like to point to our new analysis in response to #2 of reviewer 2 addressing the spike kinetics of optotagged PVIs versus untagged putative INTs, which are similar in their trough-to-peak duration and asymmetry index. These data are shown in the novel Fig. 1 – figure supplement 2.

      3) It was not clear why authors assessed only firing rates in last 25ms (line 348-349). If they have a clear justification for this they should provide it. But why not use the latency of the first spike also as an additional metric. A well tagged cell will respond to light pulse with short latency (within 5 ms). My concern is that non PVI cells may increase firing rate after 25ms of stimulation of PVI cells due to disinhibition.

      Despite the latency to the first spike being frequently used as a method to detect ChR2-positive neurons, the laser stimulation produced significant photoartefacts in our hands. We were therefore concerned that spikes that happen shortly after the onset of the light pulse might be missed, and hence the latency to the first spike might be misinterpreted. Selecting a later time point in the stimulation interval allowed us to assess the firing rate during light application without the interference by artefacts. Nevertheless, we fully agree with the reviewer’s concern that ChR2-negative non-PVIs might increase their rate due to disinhibition, and that these neurons might thus be falsely classified as PVIs. However, we are confident that that is not the case. First, optotagged PVIs cluster well within the population of electrophysiologically identified INTs (see our response to your first remark on ‘optogenetic tagging’) and were indistinguishable from this population in terms of spike kinetics (see our response to #2 of reviewer 2 and the new Fig. 1 – figure supplement 2), suggesting that no disinhibited PYRs were included in the optotagged sample of cells. Second, we performed an additional analysis to address the time course of firing rate changes in optotagged PVIs. We computed smoothed spike trains (convolved with a 5 ms SD Gaussian kernel), and extracted the average firing rate of each optogenetically identified PVI centered on the onset of the light pulses. This analysis revealed a rapid increase in firing rate upon light delivery, arguing against disinhibitory network effects. These new data are now shown in the new Fig. 1 – figure supplement 5 and reported in line 89 (85 without tracked changes) of the revised manuscript.

      4) Spike cross-correlations

      The authors show that spike transmission probability from PYR to PVI is reduced in Disc1 mice compared to the controls Fig.2d and Fig.2e, but what happens to PVI to PYR spike transmission probability? Is it different in those groups? Answering this question is important since the authors discuss this topic in line 185-193.

      Inhibitory synaptic interactions are indeed detectable by spike-train cross-correlation. However, we find these to be harder to quantitatively interpret than excitatory connections. Those interactions are not visible as spike transmission but rather as a reduction in spike transmission. Reliable estimates of the reduction in spike rate of postsynaptic PYRs require very large spike numbers of postsynaptic neurons that need to be sampled. For instance, Senzai et al., 2019 (Neuron 101: 500-513.e5) identified inhibitory interactions in continuous recordings lasting up to 68 h. Since we did not explicitly design our experiments to investigate inhibitory interactions, our recordings were substantially shorter than the required length. Using the method of Senzai et al., 2019 to identify inhibitory interactions, we detected only 5 INT-INT interactions (in the pooled Disc1-mutant and control data set). This low number does not allow the quantification of potentially reduced spike transmission. Thus, attempts to quantify inhibitory interactions properly would require a substantial amount of additional long-duration recordings. While the point raised by the reviewer is highly relevant and should be investigated in future, we think that given the extensive amount of experimentation needed to address this question, it is beyond the scope of the current manuscript.

      5) Authors could try to link oscillations with spike transmission probabilities. On line 180 authors discuss that lower synchrony between PVI might be responsible for observed reduction in gamma power in Disc1 mutant mice. With the available data authors could test this hypothesis. They can look at spike cross correlations in their pool of INT and PVI (if they have pairs of PVI recorded in the same session) population.

      We thank the reviewer for this excellent suggestion! We computed the cross-correlations for all simultaneously recorded putative INTs and quantified the baseline-subtracted mean cross-correlation within 10 ms around zero time lag. This analysis revealed weaker cross-correlation in Disc1-mutant mice (p=0.026, Mann-Whitney U test, tested on averages from n=7 control and Disc1 mice with at least 2 INTs recorded simultaneously), suggestive of reduced synchronization of putative INTs at short time lags. These new data are now included in the new Fig. 4 and reported in line 201 ff. (185 ff. without tracked changes) of the revised manuscript.

      6) An alternative way to link oscillations with lower spike transmission probabilities in PYR-PVI pairs is to use synchrony triggered LFP analysis. One could take all time points when PVI and PYR cells fired acausal spikes within 2ms window and look at the LFP around this time point. Than take the average of the synchrony-triggered LFP and look at the power spectrum.

      The proposal to link spike transmission with LFP power is indeed intriguing. As suggested by the reviewer, we extracted the 60-90 Hz-filtered LFPs triggered by INT spikes that followed a spike in a presynaptic PYR by <2 ms and measured the average gamma amplitude in a window of 20 ms around the INT spike. This analysis revealed comparable gamma amplitudes in Disc1 compared to control pairs. This finding suggests that local PYR-INT loops are still capable to produce gamma oscillations, and that the gamma oscillation defect of Disc1 mice is likely not caused by such a local defect. To investigate the relationship between INT spike timing and gamma oscillations more generally, we further extracted gamma amplitudes of spike-triggered LFPs using all available spikes of the INTs. Moreover, we compared the data to gamma amplitudes measured at randomly selected time points. ANOVA analysis followed by Tukey tests performed on the level of mouse averages indicated that while INT spiking-associated gamma amplitudes were significantly larger than those depicted from random time points in wild type mice (p=0.001). However, the same was not true for Disc1-mutant mice (p=0.591). Furthermore, this analysis revealed significantly reduced spike-triggered high gamma amplitudes in Disc1-mutant compared to control mice (p=0.011). While these results argue against a driving role of local connection alterations in gamma defects, they generally confirm the impaired synchrony of INT spiking relative to gamma oscillation that we observed in our analysis of phase coupling. These data are now shown in the new Fig. 4, which summarizes all new analyses regarding gamma oscillations and phase-coupling, and in figure 4 – figure supplement 2. The new results are described in the main text of the revised manuscript in line 188 ff. (172 ff. without tracked changes).

      Considering the reduced short time scale synchronization of INTs (see our new results towards the reviewer’s #5) and reduced gamma amplitude of INT spike-triggered LFPs, it is possible that impaired synchronization among prefrontal INTs might contribute to the observed reduction in gamma power of Disc1-mutant mice (thereby, essentially, reflecting impaired INT gamma (ING)). Additionally, reduced long-range excitatory drive maintaining local gamma oscillations might be a contributing factor. For example, recent work showed that high gamma oscillations in the mPFC occur synchronized with gamma oscillations in the olfactory bulb (Karalis & Sirota, 2022, Nat Commun 13:467). It remains to be investigated whether local INTs are rhythmically driven by input from the olfactory bulb (in a multi-synaptic pathway including olfactory cortex) and to what extent that drive maintaining afferent gamma might be altered in Disc1-mutant mice. While the current data set does not allow a systematic evaluation of these possibilities, they should be further explored in future experiments.

      7) Cell assembly analysis

      The authors used 10ms for testing synchronization among pairs of PYR neurons in Fig.4a but 25ms for analysis of assembly dynamics. I think the authors justified why they used 25ms bin size, but it was not clear why they used 10ms? Could the authors clarify the reasons behind this decision?

      The synchronization analysis was originally applied to PYRs converging on a common postsynaptic INT. English et al. (Neuron 95:505-520, 2017) systematically tested the effect of presynaptic cooperativity on spike transmission in the hippocampus (their Fig. 5). Their analysis revealed a maximum in cooperativity at ~10 ms. To maximize the sensitivity of our approach, we thus focused on 10 ms for this analysis. However, we agree that using the same time window as for assembly extraction is a reasonable proposal, in particular since we find no difference in the synchronization of identified presynaptic PYRs (Fig. 3e of the revised manuscript). Thus, we have recomputed cross-correlations using a 25 ms bin size. To further improve the analysis, we restricted it to neurons with at least 1000 spikes and simplified the quantification of excess spiking by using the ‘coinicident_spikes’ function of the Python package neuronpy.utils.spiketrain. Excess synchrony is now estimated by quantifying the number of coincident spikes between a reference and a comparison spike train detected in a 25 ms time window normalized by the firing rate expected by chance (2*frequency of comparison train * synchrony window * number of the reference train).

      By using this improved analysis with a 25 ms time window, we could replicate our original finding of enhanced synchronization of PYR spiking. However, when we averaged the data on the basis of individual mice as suggested in #1 of reviewer 2 and #4 of reviewer 3, we could not observe this effect (irrespective of whether we used the new, coincident spikes-based analysis or the original excess synchrony analysis at either 10 or 25 ms synchrony window). This result is now stated in line 215 ff. (199 ff. without tracked changes) of the revised manuscript.

      Reviewer #2 (Public Review):

      This is an interesting paper, in which the authors assessed spiking and network deficits in a well-established mouse model of schizophrenia. This mouse model carries a genetic deletion of the Disrupted-in-schizophrenia-1 (Disc1) gene, which is highly penetrant in the human condition. The authors combined behavioral analyses with state-of-the-art electrophysiological recordings in vivo, coupled to optogenetic tagging, to study a subnetwork formed by a major inhibitory neuron subclass (the parvalbumin (PV)-expressing interneuron) and principal excitatory pyramidal neurons in the medial prefrontal cortex. This work indicates reduced firing rates of PV cells in Disc1-KO mice, likely due to reduced coupling with pyramidal neurons, leading to alterations in local network activity. Indeed, the authors found that Disc-KO mice exhibited reduced levels of gamma oscillations and somewhat hypersynchronous networks.

      Taking advantage of novel techniques and analytical strategies, the manuscript provides rich, novel insight into the neurobiology of a mouse model of this severe psychiatric condition. The data is of high quality, the findings interesting and the manuscript is well written.

      Overall, the results support the authors' conclusions, although some additional analyses are necessary to corroborate their interpretations.

      Although the paper does not give information on how PV cell dysfunctions are engaged during cognitive tasks, this study can be considered as an important first step in advancing our knowledge on the basic dysfunctions of cortical networks in this model of schizophrenia

      We thank the reviewer for praising the ‘high quality’ of our work, and the ‘rich, novel insights’ on the neurobiology of a mouse model of a psychiatric disorder.

      1) The major findings stem from the analysis of the spiking activity of individual neurons recorded using either silicon probes or arrays of tetrodes. Both techniques allow simultaneous recording of many neurons from a single animal; therefore, from a statistical point of view neurons recorded from one animal are pseudo replicas and cannot be considered as independent measurements. Throughout the manuscript, the authors perform two-sample tests on the pooled data from all recorded neurons to compare differences between genotypes; therefore, artifactually increasing the power of statistical tests. Comparisons between genotypes should be performed using each mouse as an independent measurement.

      To be able to compare the data on the basis of mouse averages, we performed additional recordings, which resulted in a final data set of 9 Disc1 and 7 control mice. We recomputed the main results of this study based on mouse averages. First, consistent with our original cell-by-cell analysis, we found significantly reduced firing rates of putative INTs but not of PYRs (line 72 (69 without tracked changes)). Moreover, we confirmed our results on decreased spike transmission probability at PYR-INT connections (line 121 (107 without tracked changes)), decreased spike transmission in the resonance window (line 163 (147 without tracked changes)), reduced high gamma power (line 173 ff. (157 ff. without tracked changes)), lower phase-coupling of INT spikes to high gamma oscillations (line 178 (162 without tracked changes)), and reduced strength of assembly activations in Disc1 compared to control mice (line 229 ff. (211 ff. without tracked changes)). Similarly, we performed new analysis on INT-INT synchronization and INT spike-triggered gamma amplitudes (as requested by reviewer 1 #5 & 6), which showed significant effects on the level of mouse averages (line 188 ff. (line 172 without tracked changes)). Second, our original finding on significant differences in the synchronization of individual PYR-PYR pairs could not be reproduced on the level of individual mice. This is reported in line 215 (199 without tracked changes) of the revised manuscript. Finally, the analyses based on optogentically identified PVIs did not allow comparison by mouse averages due to the low number of experiments (n=3 mice each). Given that the vast majority of our conclusions is based on electrophysiologically identified INTs, with optogenetic identification experiments being only confirmatory in nature, and that performing additional experiments for optogentic identification of PVIs would be very laborious, we report the results of these analyses as comparisons between neurons or connected pairs. This is clearly stated at the respective sections throughout the revised manuscript. We hope that the reviewer can agree with our decision.

      2) The superficial layers of the mPFC are difficult to reach with a vertical approach of the probes due to the presence of a large blood vessel located medially in the frontal dura. Therefore, the authors are most likely reaching mPFC deep layers where PYR neurons produce fast spikes at high rates. If this is the case, this would make it difficult to sort the spiking of PYR from that of INs based on the spike kinetics and rate. The authors used opto-tagging of PVIs in a set of experiments. It would be reassuring to confirm that the spike waveform and kinetics that they extracted from PVIs are similar to those they assigned as INTs in their experiments with no opto-tagging. Identified PVIs should be statistically different from putative PYRs (not responding to light). Although opto-tagging of PVIs can solve this issue, the amount of cells isolated remains low and the number of animals is not stated. Opto-tagged cells are subsequently used for further analyses but the statistical value of those remain unclear. Since the entire interpretation of the rest of the results depend on this result, this must be clarified.

      As correctly pointed out by the reviewer, we indeed targeted deep layers of the mPFC (~0.4 mm lateral of the midline; see also the histological information about the recordings sites that is now included in Figure 1 – figure supplement 1), where higher spike rates are expected compared to superficial layers. To assess whether this might have influenced the identification of putative INTs, we separately plotted the duration and asymmetry index used to classify the neurons in PYRs and putative INTs for Disc1 and control mice. This analysis yielded well separated clusters in both cases. In addition, as suggested by the reviewer, we compared the kinetic properties (spike duration and asymmetry index) and rates of PYRs, putative INTs, and optotagged PVIs. In both genotypes, ANOVA analysis followed by Tukey post-hoc testing revealed significant differences between the PYRs and both groups of INTs, both for rate (smaller in PYRs) and kinetic properties (longer spikes of PYRs) while we found no difference between putative INTs and PVIs. These results thus suggest that the method used to identify INTs works reliably. These new data are now shown in the revised Fig. 1a and the new Figure 1 – figure supplement 2 and mentioned in line 89 ff. (85 without tracked changes) of the revised manuscript.

      We agree that the number of experiments using PVI opto-tagging is low (n=3 mice per genotype, this information is now included in the main text in line 93 ff. (88 ff. without tracked changes)). However, our analysis of spike transmission probability using the population of untagged putative fast-spiking INTs revealed similar results as for the sample of optogenetically identified PVIs. We view the PVI optotagging experiment as an additional confirmation that the difference in firing rate and spike transmission did likely not arise from sampling from different INT types in Disc1 and control mice, as pointed out in line 80 (76 without tracked changes) of the revised manuscript. The limitation of the low number of PVIs in our study is critically reflected in the revised discussion in line 249 ff. (229 without tracked changes).

      3) Proportion of gamma coupled neurons. The authors mention the use of pairwise phase consistency (PPC). PPC is a good method to measure phase coupling independent of differences in firing rates. However, it is not entirely clear how PPC is used to measure the extent of phase locking. In the methods, the authors mention that they ran the PPC analysis after determining significant phase locking with Rayleigh's test. Moreover, they provide PPC values for high gamma oscillations but not for other frequency ranges. Perhaps, it would be better to test significant coupling of all units by nonrandom spike-phase distributions crossing a confidence interval, estimated by Monte Carlo methods from independent surrogate data set. These can be obtained upon randomly jittering each spike times. Indeed, PPC values estimated by the authors for high gamma are higher for PYR than INT (Fig. 1- Fig. Suppl 4 b). This is at odds with previously published observations in V1 (e.g. Perrenoud et al., PLoS Biol. 2016 PMID: 26890123). Given the existing reports of reduced excitatory transmission in DISC-1 mice, phase locking of PYR to other frequency bands might be affected.

      Following the reviewer’s suggestion we have revised our phase-coupling analysis. First, Perrenoud et al (2016) show that gamma oscillations occur in short bursts of high power. To better reflect the coupling of putative INTs to those transient gamma events, we restricted the phase-coupling analysis to epochs within the largest quintile of gamma amplitude (assessed by the envelope of the gamma-filtered signal obtained by Hilbert transformation). Second, instead of the Rayleigh test, we obtained for each unit randomized spike trains by shuffling the inter-spike intervals (500 iterations). Significant phase locking was then obtained by testing whether two consecutive bins of the phase histogram exceeded the 95th percentile of the random distribution. This analysis was performed separately for the low (20-40 Hz) and high gamma bands (60-90 Hz) for both putative INTs and PYRs. Third, the depth of phase coupling was assessed by PPC for all significantly phase-coupled neurons. While this metric is more robust against changes in spike rates than traditional measures, it is still not completely independent of it. Perrenoud et al, for instance, showed using spike sub-sampling that the reliability in estimating PPC depends on spike rate (with >1000 spikes being optimal). However, our data set of PYRs contained fewer than 1000 spikes during high gamma events (mean Disc1: 657 ± 32, mean control: 840 ± 43). To better account for the effect of rate dependence, we restricted the analysis to neurons with >250 spikes. To further limit the potential impact of different spike counts across neurons, we used random subsampling with a fixed spike number of 250 (100 iterations per cell), computed PPC in each iteration, and averaged over the PPC estimates per cell. Finally, in response to the reviewers point 1, the results of all neurons (PYR and INT separately) were then averaged for each mouse.

      Consistent with our original analysis, we found a significantly reduced proportion of phase-coupled INTs but unaltered PPC of significantly coupled INTs to the high gamma band. Moreover, we observed no significant effects for low gamma oscillations or for the phase-coupling of PYRs to either low or high gamma bands. These results are now shown in the new Fig. 4 and the new Figure 4 – figure supplement 1, and are described in line 170 ff. (154 without tracked changes) of the revised manuscript. In addition, we provide a detailed explanation of the revised phase coupling analysis, including a formal description how PPC is computed, in the Methods section of the revised manuscript in line 524 ff. (486 without tracked changes).

      Using the revised phase-coupling analysis, we observed comparable PPC values of significantly coupled PYRs (0.013) and INTs (0.014) to high gamma in control mice. While the improved analysis thus resolved the paradoxical finding of lower PPC in INTs, we did not observe weaker phase-coupling of PYRs as reported in Perrenoud et al. (2016). A possible explanation for this discrepancy might be genuine differences in gamma coupling of the PYR population between visual cortex (Perrenoud et al., 2016) and the prefrontal cortex (our study), which will require further investigation in future.

      Reviewer #3 (Public Review):

      In the present study, the authors aim to assess network activity alterations in the prefrontal cortex of mice with a deletion variant in the schizophrenia susceptibility gene DISC1 ("DISC1 mutants"). Using silicon probe in vivo recordings from the prefrontal cortex, they find that mutant mice show reduced firing rates of fast-spiking interneurons, reduced spike transmission efficacy from pyramidal cells to interneurons, and enhanced synchronization and activation of cell assemblies. The authors conclude that "interneuron pathology is linked with the abnormal coordination of pyramidal cells, which might relate to impaired cognition in schizophrenia."

      The cellular and circuit basis of psychiatric disorders has received strong interest in the recent past. In particular, alterations of the "excitation-inhibition balance" in cortical circuits has been the focus of extensive scrutiny (reviewed in pmid 22251963). Specifically, in both human samples as well as in mouse models, disruption of interneuron development and function have been implicated in the pathogenesis of schizophrenia. In the DISC1 mouse model, studies have reported disrupted interneuron development (e.g. pmid 23631734, 27244370), reduced numbers of GABAergic neurons (e.g. pmid 18945897), reduced inhibition from GABAergic neurons ex vivo (e.g. pmid 32029441), and reduced firing rates of fast-spiking neurons in vivo in the basal forebrain (pmid 34143365).

      The present manuscript makes a potentially important contribution to this question by probing the microcircuitry of the prefrontal cortex in vivo in the DISC1 mouse model of schizophrenia. It goes beyond previous work in assessing circuit dynamics in vivo in more detail, albeit with indirect methods. The experiments and analysis have generally carefully been performed, though the statistical analysis raises some questions. The advances made by the present work compared to previous studies could be delineated more clearly.

      We thank the reviewer for praising the analysis of our data ‘…have generally carefully been performed..’ and the ‘important contribution’ of our work to the field.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use both in vitro signaling assays, knockdown in chick neural tube patterning assays and some limited use of Plexin mutant mice. The in vitro work convincingly demonstrates that misexpression of several Plexins is sufficient to enhance HH signaling in a way that depends on the Plexin GAP domain.

      We thank the reviewer for the positive evaluation of our work.

      Not addressed is how the GAP activity promotes HH signaling.

      The reviewer raises an interesting point that we hope to address in future mechanistic studies.

      The in vivo data are extremely interesting. However, alternative interpretations of the data are not assessed and need to be before the conclusions favored by the authors can be asserted.

      We agree with the reviewer.

      Reviewer #2 (Public Review):

      This is interesting work that expands our knowledge of Hedgehog signaling. The work is well-done, well-written, and the figures are clear. I have comments that would help strengthen some of the experiments and improve the manuscript. In particular, the in vivo loss of function experiments could be measured in additional ways (using additional endpoints) to provide a convincing case of the role that Plexins play in Hh signaling in vivo.

      We thank the reviewer for their favorable assessment and appreciate their recommendations to add additional in vivo loss of function experiments, which are addressed in the response to Essential Revisions.

      1) The authors show that the effect of SmoM2 or Gli1 overexpression on Hh pathway activity can be potentiated by Plexins. They then conclude that "These data suggest that PLXNs function downstream of HH ligand at the level of GLI regulation...". It is unclear to me how this experiment allows them to conclude this, as the effect of Plexins could be downstream of Gli1, through the regulation of the transcription machinery, for example.

      See response to Essential Revisions.

      2) Are primary cilia formed normally and present at normal frequency in cells with loss or over-expression of Plexins? This could help understand better how Plexins act to modulate the Hh pathway.

      See response to Essential Revisions.

      3) Are Gli1 protein levels affected by Plexins?

      We have not directly examined GLI1 protein levels. Future studies will investigate the consequence of PLXNs on levels, processing and localization of all GLI proteins based on the findings from this study.

      4) In order to provide a convincing case for the role that Plexins play in Hh signaling in vivo, the in vivo Plexin loss of function experiments should be assessed in additional ways to Gli1-lacZ (Figure 6). Also, proliferation should be measured (as previously shown to be Hh-dependent).

      See response to Essential Revisions.

      5) Data showing whether Plexins bind Shh (or not) should be presented.

      The reviewer raises an interesting point. However, the data with the Plxna1∆ECD construct, which lacks the entire extracellular domain suggests that PLXN binding to SHH is not required for HH pathway promotion (see Figure 3). Instead, our experiments suggest that PLXN functions downstream of HH ligand (see Figure 3).

      6) The authors show that increased Plexin activity in chick neural tubes increases cell migration into the neural tube lumen. Is this effect of Plexins Gli-dependent?

      See response to Essential Revisions.

      7) In the chick neural tube experiments, how can the authors conclude that Plexin promotes Gli-dependent cellular responses since their data show that Plexin is not significantly affecting the fate (NKX6.1 and PAX7) of the cells? I was confused by this. The image shows a change, but the quantification does not.

      See response to Essential Revisions.

      8) Could loss of function experiments in chick neural tube using RNAi against multiple Plexins be performed? This would provide a very convincing case of the requirement of Plexins for Shh signaling.

      While we appreciate the reviewer’s suggestion, this experiment would be technically very challenging, given that several PLXNs are expressed in the chicken neural tube (Mauti et al. 2006), and we would likely need to achieve robust knockdown of multiple Plxns to reveal a phenotype. Instead, we have relied on knockdown approaches in cell culture and genetic deletion in mice to assess the consequences of PLXN loss-of-function on HH signaling.

      9) Figure 1 panels H-I need a negative control for siRNAs.

      As noted in the methods (lines 571-573) and in the results (lines 128-131), negative controls for siRNAs were included in each experiment.

      10) Figure 3B needs to control for Plxn1ΔECD expression levels (by western). Can higher activation of the pathway be explained by higher Plexin protein expression?

      While higher PLXNDECD protein levels is one possible explanation for the increase in HH pathway activity, the subsequent data with the GAP domain and FYN kinase mutants (in the context of PLXNDECD, would argue that this is not simply a matter of protein expression, but instead is due to the previously demonstrated increase in GAP activity caused by deletion of the PLXN extracellular domain.

      Reviewer #3 (Public Review):

      The main strengths of this study are the compelling data derived from the use of well-established cell-based assays of Hedgehog signalling and novelty of the finding that Plexins can modulate the response of cells to Hedgehog. The experiments are well designed and carefully controlled.

      We thank the reviewer for their favorable assessment.

      The main weaknesses are as follows:

      1) Plxna2 is expressed at levels lower than a3, b2 and d1, but it is not explained why this gene was knocked out in cell lines in preference to the other three.

      We initially generated Plxna1-/-;Plxna2-/- MEFs simply due to the availability of these animals (i.e., we do not have Plxnb2 or Plxnd1 mutant mice in our colony). We then utilized siRNA to achieve a further loss of Plxna3, Plxnb2, and Plxnd1.

      2) Most of the analysis and the main conclusions of the study are based on the 3T3 experiments. The data supporting the in vivo significance of these findings are less strong:

      First, using electroporation of the chick neural tube, they revealed that constitutive Plexin activity can replicate only a subset of the effects of Gli over-expression. It would be relevant to know if ectopic cell migration can be caused by levels of Gli activity lower than those sufficient to induce Nkx6.1 expression - I am not sure if this is already known.

      See response to Essential Revisions above.

      Second, the authors investigate the consequences of loss of plexin function in the hippocampus, using mouse Plxna1 and Plxna2 mutants. This is a bit puzzling given that their own cell-based assays show that loss of either or both of these proteins has no impact on the response of 3T3 cells to ligand.

      We agree with the reviewer that these were surprising results. However, the profile of PLXN expression in the hippocampus is distinct from that of NIH/3T3 cells, and the relative abundance of PLXNs also differs in these sites. Therefore, it is difficult to assess the relative importance of individual PLXNS in the hippocampus, other than analyzing individual Plxn mutant animals. Future studies investigating the individual and combined contributions of PLXNs to HH-dependent embryogenesis will be of high significance.

      Moreover, a previous study cited by the authors (Cheng et al 2001) reported that Plxna3 shows the highest and most widespread expression in the CNS and in the hippocampus in particular. Plxna2, by contrast is expressed at much lower levels whilst Plxna1 was detected principally in mature pyramidal cells. It is not clear why the authors chose to focus on these particular Plexins and to what extent the requirements for Plexin function have been rigorously tested.

      We agree that investigating the role of other PLXNs in the CNS will be of great value. However, as noted above, we only had access to Plxna1 and Plxna2 mutants at the time of this study.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors asked to what extent early visual and visuomotor experience is essential for developing the ability to recalibrate the visuo-motor system flexibly. This kind of recalibration crucially underpins everyday actions, allowing the brain to issue effective feed-forward motor control commands that correctly account for temporary changes in sensory-motor mappings (e.g. when using tools, carrying objects, wearing new glasses). To address the role of experience in developing these recalibration abilities, they used the unusual clinical population of late-operated cataract patients: children and adolescents who initially had many years of sensory experience that is atypical in that it lacked effective pattern vision. They used a standard sensory-motor task in which participants point to targets with and without displacement of the visual image via a prism lens: after the prism displacement, the visuo-motor mapping needs to be recalibrated to enable effective pointing. They compared late-operated cataract patients with controls matched in age, controls matched in both age and visual acuity (via added visual blur), as well as an extensive broader comparison group of typically developing 6- to 17-year-olds. Their key findings were that recalibration was less effective - both in the initial effect and in the subsequent after-effect - in the patient group than in control groups; this was not related to chronological age but was related to time post-operation, such that performance came to match controls after around 2 years of improved visual experience. The authors conclude that flexible sensory recalibration abilities normally rely on extensive sensory-motor experience in childhood, and suggest that the underlying computational problem is establishing the correct correspondences between sensory and motor coordinate frames. This may be achieved through extended exposure to the sensory consequences of self-generated movements.

      Strengths of the approach include use of the established (although rare and difficult to access) model population of late-operated cataract patients and a well-established experimental task (pointing after displacement of the visual image by viewing through prism lenses). The task has a known typical time-course of behaviour - supplemented here by an extensive additional study on typical development using the exact same main task, which even alone would be a meaningful contribution to literature on sensory-motor development. The procedure, measures, analysis, and the approach to control groups are careful and rigorous. The findings are rich in showing not only an initial deficit in patient vs control groups but also an approximate time course for further learning and development after which point (by ~2 years) the patients come to match controls. A challenge is the heterogenous group, in terms of age at operation and ages at testing and follow-up. However, this is very usual and almost inevitable in the literature with this kind of population, and is dealt with well in the analyses. The approach is also well supplemented by repeated follow-up of a portion (actually more than half) of the group.

      One potential issue is the role of baseline pointing precision differences across the groups. It would be useful to better understand the potential role of the reduced pointing precision that was found in the cataract group (Supplemental Figure 1B). It is not surprising that, following visual deprivation, this group's predictive feedforward visuo-motor control was less precise than that of controls, even in the baseline measures before any prism manipulation, and even when the controls' vision is comparably blurred. It seems likely (although is not shown) that during the adaptation phase and the post-adaptation phase, the variability of individuals around their (gradually shifting) mean pointing location would also be higher than in controls. I wonder how large an explanatory role there could be simply for this noisier initial visuo-motor mapping in the patient group. It might be said that, on each trial, they intend to carry out a feedforward plan with a certain endpoint, but because of noise, they are on average substantially further from that endpoint than comparable controls are. So, during recalibration, while controls are dealing mainly with cancelling out one kind of error - the constant error due to the prism adaptation - the cataract patients are also dealing with more variable errors due to their own noisier visuo-motor system. In theory, could this alone - higher initial noise in the system - explain the difference? This seems like a simpler explanation than that the system has developed differently in substantial ways to do with its abilities to learn and adapt. One starting point for checking in to this would be asking if initial pointing variability predicts recalibration (perhaps controlling for visual acuity), both at first test and in the repeated participants. Another would be looking into ways to perturb controls' baseline pointing performance further (perhaps with something like an unexpected added weight rather than more visual blurring) so that their variable pointing errors were matched to the cataract group.

      We thank Reviewer 1 for drawing our attention to this important point. The Reviewer is right in suggesting that precision at baseline (measured as the variance of the pointing errors in the pre-prism phase) might predict recalibration abilities (as measured by the recalibration index irecal ). Indeed, we found that the variance of the errors in pre-prism phase correlates with irecal in cataract-treated participants. Thus, the higher sensorimotor noise in cataract-treated participants (indicating more uncertainty) slows down their rate of recalibration. This finding is in accordance with Burge and colleagues (2008) who found that higher uncertainty (in their case in the form of visual blur leading to more motor variability) slows down the adaptation rate. We have now reported this analysis in the Results section and discussed the contribution of sensorimotor noise to recalibration in the Discussion. However, higher sensorimotor noise cannot explain alone the performance of the cataract-treated individuals. Indeed, the subset of participants tested a second time after surgery (4-to-16 months after the first post-surgery test) presented better recalibration ability (i.e., higher irecal ), although their precision at baseline did not increase accordingly, but stayed basically unchanged. Moreover, in their second test, their precision at baseline did not correlate with the successive irecal.

      In the Discussion, we added the greater sensorimotor noise as a factor contributing to recalibration. However, as it does not explain alone the improvement of recalibration performance over time, we still discuss the contribution of their lack of experience with the sensorimotor mapping to their recalibration performance.

      Another question is how well the contrast sensitivity function (CSF) as a whole (not just the maximum acuity point) was matched - this is dealt with only briefly. I am not sure to what extent the blurring manipulation would be expected to change the shape of the CSF as a whole to be in line with that of patients, and to what extent other aspects of the CSF besides the maximum acuity point determine the precision and accuracy of ballistic pointing movements under the experimental and lighting conditions used in the study. Depending on the answers to these questions, the concern could be that visual differences relevant to control of pointing remained across the patient and blurred control groups.

      We have now provided more information on this point in the Methods section and in the Supplementary Information. In a pilot study, we determined the range of distances between the blurring screen and the visual target that would be needed to reproduce–in controls–the range of visual acuity values of the cataract-treated participants. Nonetheless, to ensure the procedure would lead to the desired contrast sensitivity function (CSF) for each participant, we tested the visual acuity also of the sighted controls. We visually inspected the CSF of each sighted participant (tested with visual blur) and we included in the study only those whose CSF matched the desired CSFs in terms of both cut off frequency and shape. In other words, when the CSF of a sighted control did not match the one of the to-be-matched cataract-treated participant (in the cut off frequency and/or in the shape of the function), that sighted control was not included in the study. This led to excluding 8 sighted controls, before reaching the final sample of 20 controls, individually matched to the cataract-treated participants. We have now reported these further details in the paragraph entitled ‘Procedure to blur vision in sighted controls’ (Materials and Methods). Moreover, we have provided a Figure in the Supplemental Material, showing the mean CSF in the group of cataract-treated participants and in the group of sighted controls tested with visual blur (Figure supplement 1). In that figure, it is possible to appreciate that we ensured matching the two groups not only for the cutoff frequency, but also for the shape of the whole function. However, we have now also mentioned in the Discussion that we cannot exclude that other possible visual differences, besides spatial visual acuity, that we did not consider, between the group of cataract-treated and that of controls tested with visual blur might have influenced the recalibration performance.

      Another more minor or technical issue is some lack of detail in how the calibration index, which feeds into most of the key analyses, is calculated. It is likely that many different ways of doing this would lead to similar conclusions, but it should be clear, including for the sake of replicability.

      While the index is briefly mentioned in the Results section, we have now explained it in detail in the Material and Methods section. This recalibration index combined the amount of recalibration in the prism phase and at the beginning of the post-prism phases (Adaptation and Initial Aftereffect, respectively). Adaptation was calculated as the error reduction in the prism phase (the induced prism distortion–11.31°–minus the average of the last three pointing errors of the prism phase, cf. Fortis et al. (2010)). Initial Aftereffect was calculated as the magnitude of the aftereffect exhibited right after prism removal (i.e., average of the first three pointing errors of the post-prism phase). The Initial Aftereffect was correlated with the amount of Adaptation in the prism phase (see Material and Methods) and thus provides converging information which in order to increase power can be summarised in the recalibration index. That is, the recalibration index irecal was calculated as the average between Adaptation and the (negative) Initial Aftereffect. Such index is normalized on the induced prism distortion (i.e., the index is divided by 11.31°), so that it ranges between 0 and 1. Further details are provided in the Material and Methods section.

      Reviewer #2 (Public Review):

      It is very interesting that recalibration effects in the cataract-reversal group increase over time. However, it seems as if the conclusion that it takes about two years to reach recalibration effects comparable to those of typically sighted controls is based on repeated measurements of two participants tested 2 and 3 years after their surgery as well as on singular measurements of two participants tested 10 years after their surgery. Close inspection of Figure 1F suggests that four participants reached comparable levels in their second testing session already about 6 months after surgery. Consistently, the confidence interval of the time constant b is rather large (it also seems to differ between the main text and the figure caption). Given this high degree of uncertainty around the time estimate it would be advisable to not report and discuss a fixed duration of two years but rather focus on the increase of recalibration effects and report an interval during which recalibration effects might reach asymptotic levels.

      We thank Reviewer 2 for drawing our attention to this important point. Following this advice, we have now discussed the high inter-subject variability in the recalibration performance over time, and we have discussed the uncertainty inherent in the estimate of the rate of improvement leading to a performance comparable to healthy controls within about 2 years - this estimate for sure is very uncertain (see Results and Discussion).

      It is important to note that the exponential fit on all measurements (Figure 1F, dark green curve) is not driven by the 2 participants tested more than 10 years after surgery: when excluding them from the exponential fit, the time constant b (b=1.5, 95% CI=[0.39, 2.67]) is comparable to the one obtained in the whole sample.

      We have also reported the linear correlation between time since surgery and recalibration index in the first testing session without the 2 participants tested more than 10 years after surgery, as they would drive the correlation. Note that the effect of time since surgery is evident even when removing them from this analysis (main text, red line in Figure 1 F, and Material and Methods). Importantly, also the linear fit on the first test session alone (excluding the participants tested more than 10 y after surgery) provides converging evidence of the fact that the performance level of controls (tested with visual blur) is reached at roughly 2 years from surgery, as visible in Figure 1F (red regression line crossing dashed line of controls).

      Regarding the time costant b previously reported in the figure caption, this was related to the inlaid reported in Figure 1 F in the last submission (i.e., the exponential fit on the difference between each pair of cataract participants and controls). We have now removed this inlaid from the figure and its relative fit (in the figure and figure caption) to avoid confusion.

      Having longitudinal data from several participants is great and can provide interesting insights. However, to get an idea about the role of visuo-motor experience it would be helpful to not collapse across the different time points for the second evaluation in the depiction of the data and their analysis. Moreover, it would be helpful to have an idea of the degree of variability across repeated measurements in control participants.

      We decided to report these data in two ways: 1) In agreement with Reviewer 2, we showed these longitudinal data in their different time points (Figure 1F), so that the progression of the recalibration ability over time after surgery would be more transparent and easier to appreciate; 2) We still present these data also collapsed in Figure 1 E, because we believe this representation helps clarity and completeness: given that we also included the pre-surgical assessment in that figure, it is easier to visually appreciate the differences between pre- vs. first post-surgical assessment and second post-surgical assessment in the re-tested participants. We also rearranged the text accordingly. However, if the Reviewer still believes that this way of reporting the results is unclear or redundant, we will remove Figure 1E.<br /> Unfortunately, we were unable to collect comparable repeated measures from the control children with the same temporal gap between the first and second test.

      Visuo-motor adaptation and aftereffects are related but clearly separate phenomena not least because visual feedback about the position of the finger was only present during the adaptation phase. Combining both effects into one index potentially obscures differential effects of developmental vision on the processes underlying either phenomenon. This concern is supported by the result that the manipulation of visual precision in typically developed controls affected visuo-motor adaptation and aftereffects differentially. Thus, it would be preferable to drop the combined index and analyze adaptation and aftereffects separately throughout. This will have the additional advantage of allowing for direct comparisons of both effects to those reported in the extensive literature on the topic.

      We are grateful to Reviewer 2 for bringing this important point to our attention. We have now run all the correlational analyses separately for adaptation (i.e., error reduction in the prism phase) and aftereffect (mean systematic error in the post-prism phase). We have described these analyses in the Results section and in the Material and Methods section. However, as these separate analyses led to comparable results for adaptation and aftereffect, we did not report them in detail in the main text, as they would be very redundant. While it is possible to appreciate each of them in detail in the Supplemental Materials (Figure 1– figure supplement 4), in the main text we avoided this redundancy by combining them into a unified measure, the recalibration index (irecal). Reviewer 2 is right in highlighting the difference between adaptation and aftereffect. Note, however, that the recalibration index does not include the entire aftereffect (which may have a different time constant as it may well be distinct from the adaptation), but only the amplitude of the initial three trials of the aftereffect after removing the prism (i.e., the mean of the first three pointing errors of the post-prism phase). This initial amplitude of the aftereffect (that we have now called “Initial Aftereffect”) is highly correlated with the amount of recalibration in the prism phase. We have now discussed this point in the Results section. In other word, the recalibration index did not include the aftereffect in the entire post-prism phase (i.e., the systematic error across all trials of the post-prism phase). In fact, we agree with the Reviewer that including the development of the after effect across all trials of the post-prism phase would have potentially shown a different phenomenon, namely the effect of proprioception while reinstating the usual sensorimotor mapping. Indeed, at odds with the prism phase, the pointing task in the post-prism phase was performed in the absence of any optical distortion and in the absence of visual feedback. The development of the aftereffect across all trials of the post-prism phase is analysed in the main text and in Figure supplement 3, while the correlations between each factor (age, visual acuity, etc.) and the mean aftereffect across all trials of the post-prism phase is reported in Figure supplement 4. We have now also clarified all these points in the main text and in the Materials and Methods.

      The absence of a significant statistical effect does not provide evidence for the absence of the effect. This problem arises in several instances throughout the paper. For example, a non-significant Kruskal-Wallis-Test does not indicate a similar distribution of baseline pointing errors. A figure showing the distribution of pointing errors from this phase provides far more convincing evidence (l. 134). A non-significant t-test does not provide for the absence of a relation between the change in recalibration effects and visual acuity (l. 225). Here, it would be correct to state that there was no statistically significant difference between visual acuity at the two different post-tests.

      The problem that the absence of statistical effects does not allow for any conclusions is even more evident for the correlational analyses, which are severely underpowered. The non-significant correlations should be reported in the supplement rather than in a prominent position in the manuscript and all conclusions based on non-significant correlations must be dropped.

      We have now modified the text and Figure 1 accordingly, by rephasing the text and removing the non-significant correlations from the figure.

      Figures 1C and 1F suggest that the significant correlation between the time since surgery and recalibration effects might be driven by outliers. The analysis should be repeated without outlier data to make sure that the effect is present in the data.

      As reported in the first response to Reviewer 2, we have now re-run the analyses also without the participants tested more than 10 years after surgery. The effect of time since surgery is present even when removing the outliers (See main text and Figure 1F).

      The abstract makes rather general claims about the influence of developmental vision on recalibration and plasticity which are not supported by the data. All conclusions should be restricted to the visuo-motor domain, which in my view will not impact their importance.

      We thank the Reviewer for the comments, and we have adapted the abstract accordingly.

      Given that most participants had residual light perception, it would be more accurate to consistently speak of absent pattern vision rather than visual deprivation.

      We have rephrased the text accordingly.

    1. Author Response

      Reviewer #1 (Public Review):

      “Strengths of the paper include the use of the novel promoter (which is stated to have ~50-fold higher abundance in SGCs than astrocytes) and the dataset itself, which is for the most part thorough and convincing.”

      We thank the Reviewer for appreciating the novelty of the study, and that in general, our findings are well-supported and convincing.

      “Concerning specificity, CNS involvement through effects on other cell types is not totally ruled out in these studies, and effects on the same cell type but in other ganglia (parasympathetic and sensory) might be expected to impact sympathetic function. For example, as Vit (2008) reported that following shRNA knockdown of Kir4.1 in trigeminal ganglia hypersensitivity to mechanical stimulation could affect autonomic activity. The authors tested for the influence of parasympathetic using pupillary constriction, and it is somewhat surprising that there is no deficit if neuronal death and dysfunction are as profound in parasympathetic ganglia as shown here for the superior cervical ganglia.”

      We include new results to show that the elevated heart rate in BLBP:iDTA mice is prevented by chemically ablating sympathetic nerves using 6-OHDA (Figures 3E-F, revised manuscript). Since 6-OHDA does not cross the blood-brain barrier in adult mice after i.p injections (Kostrzewa and Jacobowitz, 1974), we conclude that these results point to a peripheral locus for the cardiovascular defect in BLBP:iDTA mice. Since 6-OHDA is selective for sympathetic nerves, we also reason that the potential loss of sensory or parasympathetic satellite glia do not contribute to the increased heart rate in BLBP:iDTA mice. As the Reviewer notes, we also found that BLBP:iDTA mice fully constrict their pupils in response to light, indicative of normal parasympathetic function.

      We do not exclude the possibility of defects in sensory or parasympathetic ganglia in BLBP:iDTA mice. A comprehensive analysis of these two systems will warrant significant effort, which we respectfully state is outside the scope of this initial study where we have focused on satellite glia in the sympathetic nervous system. However, our results in the revised manuscript and in the original submission provide evidence that enhanced sympathetic tone is responsible for driving the autonomic defects (elevated heart rate and pupil dilation) observed in BLBP:iDTA mice.

      To the Reviewer’s point about CNS involvement in the autonomic dysfunction in BLBP:iDTA mice, we cannot completely exclude the possibility since BLBP is also expressed in astrocytes, albeit to much lower levels (45-fold less) compared to satellite glial cells. However, as discussed in our original submission, astrocyte ablation results in severe motor deficits in mice including limb paralysis, ataxia, as well as smaller body weights (Schreiner et. al, 2015), none of which were observed in BLBP:iDTA mice. These results suggest that astrocytes are minimally perturbed in BLBP:iDTA mice.

      “Physiological effects of DTX but not Kir4.1 deletion increased sympathetic activity, whereas increased heart rate was also observed following chemical activation of SGCs using DREADD ligands (Xie et al., 2017). This opposite action is not discussed at length but is attributed to "context-dependence." Inconsistent results with stimuli believed to target the same substrate are worthy of additional consideration by the authors.”

      The Reviewer asks about the differences between our findings and a previous study (Xie et. al., 2017) where the authors reported increased heart rate with chemo-genetic manipulation in Gfap-hM3Dq mice. In contrast, we observe increased heart rate with satellite glia ablation in BLBP:iDTA mice. We are unable to reconcile the apparent differences because of the following reasons:

      (i) The experimental manipulations in the two studies are very different; acute chemo-genetic manipulation (over a time-scale of minutes) versus genetic ablation of satellite glial cells (over two weeks), making it difficult to directly compare behavioral outcomes (heart rate) at the whole animal level.

      (ii) The Reviewer does not distinguish between activation of the Gq-GPCR signaling pathway in satellite glial cells using DREADD ligands in the Xie et. al., study versus “activation” of satellite glia. It remains unknown how activation of this signaling pathway affects satellite glia physiology and functions. Indeed, it remains unclear what “activation” or “silencing” even mean for satellite glial cells. For satellite glial cells, it remains unknown as to how calcium mobilization affects these glial cells, and how this in turn, affects neuronal activity.

      We have read the manuscript by Xie et. al., carefully and could not find any direct evidence for exactly how DREADD-based activation of the Gq-GPCR signaling pathway in satellite glial cells could activate sympathetic neurons. The authors speculate that activation of the Gq-GPCR signaling pathway in satellite glia could activate sympathetic neurons via modulating glutamate transporters or inwardly rectifying K+ channels expressed in satellite glia. However, there are no glutamatergic neurons in sympathetic ganglia, and whether glutamate transporters in peripheral satellite glia have a role in glutamate uptake analogous to CNS astrocytes remains to be established. Further, if activation of the Gq-GPCR signaling pathway in satellite glia would lead to increased K+ uptake, as occurs in astrocytes, then this would result in reduced sympathetic neuron activity. However, Xie et. al., observed an increase in sympathetic tone in Gfap-hM3Dq mice.

      (iii) The cellular specificity of the Cre driver lines used in the two studies are different. We, and others, have shown that BLBP is a highly specific satellite glial cell marker (this study; Mapps et. al., 2022; Avraham et. al., 2020; 2021; 2022). Notably, using single-cell sequencing, we, and others do not detect GFAP in mouse satellite glial cell under normal or reactive conditions (Mapps et. al., 2022; Mohr et. al., 2021; Jager et. al., 2020), although it is found in rat satellite glial cell (Mohr et. al., 2021), raising the question of the suitability of GFAP as a satellite glia-specific marker in mice.

      We have included a modified version of this Discussion (pages 18-19, revised manuscript).

      “An alternative conclusion from the finding that the similar cellular level changes in sympathetic neurons induced by DTX and Kir4.1 cKO led to distinct changes in autonomic tone is that the neuronal phenotype does not dictate whole animal physiology”

      We have made this same point in our original submission that while BLBP:iDTA and Kir4.1 cKO mice show similar neuronal phenotypes at the cellular level, the loss of a single gene, Kcnj10 (for Kir4.1), from satellite glial cells is not sufficient to drive behavioral changes (pupil size and heart rate) at the whole animal level as seen with satellite glial cell ablation. We reason that there are Kir4.1-independent mechanisms that contribute to neuronal excitability to drive network-level changes.

      “Spatial buffering is given as the proposed benefit of Kir4.1 channels to the sympathetic neurons. However, this concept arose from studies in which clearance of local extracellular space was limited, and astrocytes were appreciated to be connected to a vast syncytium allowing siphoning away from the high levels near active neurons. The organization in peripheral ganglia differs in three major respects: Despite narrow extracellular space, there is no true barrier to diffusion of K ions from the neurons (one factor that makes drug targeting peripheral neurons appealing), SGCs are very thin (and thus without spatial consequence to uptake), and the coupling among the SGCs is local to those surrounding individual neurons, with very little coupling under normal conditions to other distal SGC-neuron units.”

      Briefly, previous studies indicate that satellite glial cells are capable of influencing neuronal excitability by dissipating extracellular K+ increases, largely by acting via Kir4.1 channels (Tang et. al., 2010). In addition, we include in the revised Discussion (page 16, revised manuscript) other ways by which Kir.4.1 loss in satellite glial cells could indirectly influence neuronal excitability, notably by promoting membrane depolarization of satellite glia or through regulation of diffusible signals such as BDNF.

      Reviewer #2 (Public Review):

      “Mapps and colleagues' potentially very interesting and important work investigates the biological function of satellite glial cells (SGCs) in the nervous system. SGCs are extremely understudied, even compared to other glia, which are themselves generally understudied compared to neurons. Thus, discoveries concerning what SGCs do would be of very high importance.”

      We thank the Reviewer for appreciating the novelty and importance of the study.

      “Major Concerns:”

      “I have major concerns related to the experimental approach to ablate SGCs, specifically in sympathetic ganglia, the successful accomplishment of which underlies the entire study.”

      • Most troublingly, in Figure 1E, Sox2(+) cells do not appear to be gone 14 days post-tamoxifen injection, just dimmer. The cell bodies in the treated panel also appear much dimmer than controls (on a separate note, these cell bodies do not appear to be atrophied, as shown in Figure 2A, which is also confusing). Although Figure 1B shows decreased Blbp levels by IF, there is no quantification. Coupled with the data that tamoxifen administration does not cause any change in phagocytic immune cells, as one might expect if a population of cells was ablated, this raises concerns about whether the experimental paradigm is working as expected. The authors need to convincingly show that SGCs are ablated from sympathetic ganglia for any subsequent critical claims to be supported.”

      We include new data to demonstrate the loss of satellite glial cells in BLBP:iDTA mice. Briefly;

      (i) There is a significant increase in the number of TUNEL+; Sox10+ satellite glial cells in BLBP:iDTA sympathetic ganglia at 5 days post-tamoxifen injection (Figures 1E, G, revised manuscript)

      (ii) Expression of several satellite glia-specific transcripts, including BLBP, is markedly decreased in BLBP:iDTA sympathetic ganglia, revealed by q-PCR analyses (Figure 1-figure supplement 2H, revised manuscript).

      (iii) We re-analyzed the images of Sox2-labeling by generating binary images, which removes the dependence on pixel values and simply records the presence/absence of a signal. Quantification revealed a substantial decrease in Sox2-labeled cells (33% decrease) in the BLBP:iDTA ganglia compared to controls at 14 days post-tamoxifen injections (Figure 1-figure supplement 2D-G in the revised manuscript). The 33% decrease in satellite glial cells, when analyzed in this manner, is lower than the 54% loss we had initially reported, suggesting that we may have included some proportion of cells that had down-regulated Sox2 expression but had not been ablated. We have clarified this point in the Results section (page 7, revised manuscript).

      Together, our results indicate that a significant proportion of SGCs are being ablated in BLBP:iDTA sympathetic ganglia.

      “Given the lack of changes in phagocytes in the experimental approach, it would also be important to show what happens to this large population of dead SGCs to understand the environment of the ganglia more fully and to interpret cellular and behavioral phenotypes better”

      Using IBA-1 labeling, we show that macrophage density is unaffected in BLBP:iDTA sympathetic ganglia. However, we do not make any conclusions about the reactivity of the macrophages or their ability to engulf dying satellite glia. It is also possible that persisting satellite glia in BLBP:iDTA ganglia are involved in clearing apoptotic cells, as reported by Wu et. al., 2009. We respectfully state that addressing the precise mechanisms by which dying satellite glial cells are cleared from the mutant ganglia is outside the scope of the current study.

      “• If the transgenic approach can be shown to be working to ablate SGCs as expected, it would be important to demonstrate that Blbp is not driving diphtheria toxin in any other cell type. The authors rule out a role for Schwann cells based on prior RNAseq and reporter mouse studies, but they do not verify these findings in their system. RNAseq can lack sufficient depth, and reporter mice do not always faithfully recapitulate endogenous expression. Similarly, the authors rule out astrocytic contributions based on lack of phenocopy but do not directly examine CNS tissue to support this claim.”

      We have provided additional evidence in the revised manuscript to support the cellular specificity of BLBP expression in satellite glial cells using single-cell RNA sequencing data from our lab and others (Mapps et. al., 2022; Avraham et. al., 2020; 2021; 2022) as well as the use of genetic reporter mice. Our single-cell RNA sequencing analyses also revealed that Schwann cells are scarce in sympathetic ganglia compared to sensory ganglia (Mapps et. al., 2022).

      As discussed in our response to Reviewer #1, DTA expression is restricted to BLBP-positive satellite glial cells and cannot be taken up by other cell types since this would require the DT receptor, which is not endogenously expressed in mice.

      In response to Reviewer #1 above and in the revised manuscript, we also discuss that although we cannot exclude the involvement of astrocytes in the behavioral defects observed in BLBP:iDTA mice, BLBP is 45-fold enriched in satellite glia compared to astrocytes. Genetic ablation of astrocytes results in severe motor deficits in adult mice including limb paralysis, ataxia, and smaller body weights (Schreiner et. al, 2015), which are not present in BLBP:iDTA mice. Importantly, we provide new data to show that chemical ablation of sympathetic nerves using 6-OHDA, which does not cross the blood-brain barrier in adult mice, prevents the cardiac dysfunction in BLBP:iDTA mice, indicating a peripheral locus. Together, with the evidence that we have provided for sympathetic neuronal defects at the morphological and cellular levels in BLBP:iDTA mice, we conclude that behavioral defects arise primarily from dysfunction of peripheral sympathetic neurons.

      “Additionally, I have some other concerns related to the data/data interpretation that should be clarified:

      • In Figure 1C, the authors note that TUNEL-labeled cells have large ovoid nuclei and are likely neuronal. A double-label would demonstrate this claim with more certainty than cell shape.”

      We have performed these additional experiments. We show that the majority of apoptotic cells at 5 days post-tamoxifen injections are satellite glial cells (Figure 1E, G, revised manuscript). Although, there was a trend toward enhanced neuronal apoptosis at this early stage, the number of apoptotic neurons in BLBP:iDTA ganglia was not statistically different from that in controls (Figure 1F, H, revised manuscript). We also did not observe a significant loss of neurons at 5 days post-tamoxifen injections (Figure 2-figure supplement 1B, revised manuscript). However, by 14 days post-tamoxifen injections, there is a significant loss (24% decrease) of sympathetic neurons (Figure 2G, revised manuscript). Together, these results suggest that sympathetic neuron loss occurs secondarily to the loss of satellite glial cells in BLBP:iDTA mice.

      “Related to this experiment, the quantification in Figure 1D does not appear to match the image shown in Figure 1C. Many TUNEL(+) cells are shown in the Blbp:iDTA image compared to control outside of the ganglionic borders, but this was not mentioned in the manuscript.”

      As discussed above in response to Reviewer #1, the quantifications in Figure 1D represents the total number of TUNEL-positive cells in the entire superior cervical ganglia (approximately 24-32 tissue sections of 12 m thickness each), while images in Figure 1C show a single tissue section from the ganglia.

      The Reviewer also notes that we observe TUNEL-positive cells outside the ganglia. However, this is evident in both control and mutant ganglia. This may represent a normal turnover of cells in tissues outside sympathetic ganglia (fat deposits and arteries). In the revised manuscript, we provide new images that show TUNEL labeling outside both the control and mutant ganglia (Figures 1C and 4D, revised manuscript), and also clarify this point in the text (page 6, revised manuscript).

      “• It is unclear from the Materials and Methods section if the mice are all on a congenic background. For example, how standard is pupil size from mouse to mouse, and is this more variable if mice are not congenic? This may be an issue given that the pupil measurements used as a read-out of sympathetic function have no baseline comparison, just control, and BLBP:iDTA animals.”

      We compared pupil size in BLBP:iDTA mice and their litter-mate controls, which are of the same genetic background. Our values of basal pupil sizes in mice after dark adaptation are consistent with previously reported results (Keenan et. al., 2016). Pupil size tends to be similar in darkness across mice of different genetic backgrounds (Keenan et. al., 2016).

      Reviewer #3 (Public Review):

      “In this manuscript, Mapps et al. report on the very interesting finding that satellite glia deletion significantly impacts sympathetic neuron function and survival. .. This is a very novel finding that reveals an important role for satellite glia in sympathetic physiology. It is comprehensive and well controlled. There are just a few issues that the authors should consider.”

      We thank the Reviewer for finding the study to be novel, comprehensive, and well-controlled.

      “In Fig. 1C-D, how many dpi was the TUNEL assay performed? It would be helpful to know how quickly the neurons die after glial depletion and if the cell death continues or plateaus. The authors should also co-label using neuronal and glial markers to evaluate whether the apoptotic cells are primarily neurons or glia. They report a loss of neurons, but how much of that is reflected in the TUNEL labeling is not clear.”

      Figures 1C-D represent the results of TUNEL labeling done at 5 days post-tamoxifen injections. As discussed above in responses to Reviewers #1 and 2, we include new data in the revised manuscript, using co-labeling with TUNEL and sympathetic neuron/satellite glia markers, to show early apoptosis of satellite glia, but not of neurons, at 5 days post-tamoxifen injections in BLBP;iDTA mice (Figures 1E-H, revised manuscript). We also did not observe a significant decrease in neuronal numbers at 5 days post-tamoxifen injections (Figure 2-figure supplement 1B, revised manuscript). However, by 14 days post-tamoxifen injections, there is a significant loss (24% decrease) of sympathetic neurons (Figure 2G, revised manuscript). These results indicate that satellite glia apoptosis occurs first, followed by the loss of sympathetic neurons in BLBP:iDTA mice. At the moment, we do not know if the neuronal death continues and/or reaches a plateau at later stages. This is a good point, which we will investigate in future analyses.

      “In Figs. 1C and 5C TUNEK analysis, there are quite a few TUNEL+ puncta outside of the ganglia, suggesting that there may be apoptosis in other adjacent tissues when the glia removed or Kir4.1 is deleted. The authors should comment on that if it were something consistently observed.”

      We observe TUNEL-positive cells immediately adjacent to the ganglia in both control and mutant mice (BLBP:iDTA or Kir4.1 cKO mice). This may represent a normal turnover of cells in tissues outside sympathetic ganglia (fat deposits and arteries). In the revised manuscript, we provide new images that show TUNEL labeling outside both the control and mutant ganglia (Figures 1C and 4D, revised manuscript), and also clarify this point in the text (page 6, revised manuscript).

      “The loss of neurons upon glial cell loss or Kir4.1 deletion is interesting. The authors discuss how neuron death could occur, but did they observe TUNEL+ cells in regions where the glia had been deleted? Given that the diphtheria toxin did not ablate all glia, were the neurons left with little or no surrounding glia more likely to die? This may be difficult to tell, but from the images in 1E, it looks like some neurons lack nearby glia. This would be a potential explanation for why only a fraction of the neurons died; those neurons with associated glia may be more protected.”

      The Reviewer makes an interesting point that the neurons without attached satellite glial cells are the most vulnerable to apoptosis. We were unable to conclusively make this correlation after looking through multiple images of TUNEL labeling in BLBP:iDTA ganglia. Interestingly, we found a similar (22% loss) of sympathetic neurons in Kir4.1 cKO mice in the absence of obvious disruptions in satellite glia association with sympathetic neurons (see Figure 4C and Figure 4-figure supplement 1C, revised manuscript). Thus, while the loss of neuron-satellite glia contacts may contribute to the neuronal death, there are also other mechanisms that could be involved in neuronal apoptosis.

      “It would be helpful to clarify a bit more what the control mice used for comparison were. From the text, it seems as if they were the same mice but not treated with tamoxifen. Were they given diphtheria toxin?”

      The Reviewer is correct that we used Fabp7-CreER2;ROSA26eGFP-DTA mice that were injected with either vehicle (corn oil) to serve as controls or injected with tamoxifen for satellite glia ablation. Mice did not have to be injected with diphtheria toxin, since tamoxifen injections would drive CRE-mediated DTA expression in BLBP-positive satellite glia.

      “In addition, did the authors check for any effects of tamoxifen alone? Given that estrogen can affect many physiological parameters, including cardiac function, tamoxifen alone could have some effect, e.g., Kuo et al., PMID: 20392827.”

      We thank the Reviewer for this point. We measured heart rate by electrocardiogram recordings in adult (P45 day old) wild-type C57Bl/6J mice or control (ROSA26eGFP-DTA) mice that did not express Cre that were either injected with corn oil or tamoxifen using the same paradigm as for BLBP:iDTA mice and litter-mate controls (sub-cutaneous injections, 180 mg/kg body weight for 5 consecutive days). We show that tamoxifen injection alone does not elicit any effects on heart rate in wild-type or control mice in the absence of Cre (Figure 3-figure supplement 1C-D, revised manuscript).

      “Interestingly, TH levels in BLBP:iDTA mutant axons appeared to be similar to that in controls, despite the marked reduction in TH mRNA and protein levels in neuronal cell bodies (Figure S2A). The Kaplan lab (PMC7164330) showed that TH mRNA trafficking and local synthesis play an important role in synthesizing catecholamines in the axon and presynaptic terminal. Although a bit beyond the scope of this study, it would be interesting to determine whether TH mRNA transport is altered by deletion of the glia. The authors might check to see if TH transcripts are reduced in axons by something like RNAscope.”

      We thank the Reviewer for this interesting point. The Reviewer likely meant that TH levels are up-regulated in axons since BLBP:iDTA mutant axons maintain TH expression despite the reduction of TH in neuronal soma. We tried assessing Th mRNA in axons in vivo using single molecule fluorescence in situ hybridization (smFISH). However, despite several attempts, we were not successful in getting the TH RNAscope probe to work. In the revised manuscript, we discuss enhanced Th mRNA trafficking and local translation as a possible mechanism for maintenance of axonal TH levels in BLBP:iDTA sympathetic neurons (Discussion, pages 19- 20, revised manuscript), and cite the Gervasi et. al., 2016 study.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, Jan Kubanek attempts to derive an 'effective decision strategy' that is optimal (and therefore normative) given certain constraints resulting from computational capacity limitations. The author first points out that neoclassical economics (i.e., expected utility theory, EUT) provides normative predictions for decisions to maximize utility. Next, he (correctly) points out that finding the optimal solutions to decision problems requires computational resources that are unlikely to exist in actually existing decision-makers (animals and humans). He claims that this fact is the most severe problem for concluding that EUT is an accurate description of actual human or animal decision processes. I disagree with him on this point as I will lay out in more detail below. Next, the author attempts to find an 'efficient' (i.e., computationally reasonable) decision strategy that comes close to the original normative framework. He claims that such a strategy is EDM, whereby decisions are made by allocating relative effort in proportion to the relative reward of each option.

      Overall, I find this paper hard to judge. The considerations described in this paper are certainly interesting and I have no reason to presume that the mathematical derivations described are wrong (without having made an effort to follow and check it in detail). Still, I find the paper, in the end, sterile and I fear it will have only limited impact. I think the manuscript should be expanded in three different directions to make it more relevant for the neuroscientific understanding of decision making.

      First, the author needs to show that EDM can also explain other known violations of EUT related to the axiom of regularity (i.e., preferences between two options should not be affected by the presence of inferior options). This seems relevant because these behavioral effects robustly violate the choice allocation strategy of EDM.

      Second, EDM is so abstract that the actual structure and capacity of the nervous system are nearly irrelevant. The author should consider more deeply the computational requirements and capacities of different types of brains; fruit flies, frogs, and primates, and the consequences of these differences for what is (or should be) achievable in terms of optimal behavior.

      Third, the paper contains no test for EDM. This is in part because EDM is at no point compared to the predictions of alternative theories.

      I thank the Reviewer for these constructive comments, which are addressed below.

      My specific concerns are as follows:

      (1) The author claims that the most severe problem of EUT is that it is computationally implausible. However, I disagree.

      It could be claimed that EUT describes an (unattainable) optimal state that actual brains try to accomplish with limited resources. (In essence, the current paper follows this strategy).

      Correct, EDM stems from Expected Utility Theory subjected to specific biological considerations, as shown in Figure 1.

      Given this origin, the paper now makes more appropriate statements regarding the biologically-relevant shortcomings of EUT:

      i) Abstract: "the apparatus requires a large number of evaluations of the decision options as well as neural representations and computations that are not biologically plausible."—>"the apparatus requires a large number of evaluations of the decision options as well as neural representations and computations that are difficult to implement at the biological level" ii) Introduction: "To address these biologically implausible requirements, …" —> "To address the biological constraints, …").

      I think the situation is much direr. During the last 70 years, a small army of psychologists and behavioral economists have described a large number of violations of EUT's normative predictions: the Allais paradox, framing effects, the behavioral tendencies summarized in Prospect theory, and others. These differences between behavior and normative predictions are important because they violate basic assumptions of the normative theory.

      Prospect theory can be readily incorporated into EDM.

      This has resulted in the following paragraph in the Discussion:

      "Notably, the 𝑢𝑖 and 𝑒𝑖 variables can incorporate additional factors such as the probability of an outcome, as in prospect theory (Kahneman and Tversky, 1979). A previous study (Kubanek, 2017) demonstrates that prospect theory’s incorporation of probabilities into utilities does not change the relationship between the differential formulation of Equation 1 and the fractional formulation of Equation 2, which is crucial for EDM. Moreover, the 𝑢𝑖 and 𝑒𝑖 variables can be entirely subjective. So long as the representations are comparable by the brain (e.g., through relative firing rates; Figure 6), the 𝑒𝑖 = 𝑢𝑖 strategy provides an efficient allocation of the decision-maker’s resources."

      (2) The most interesting case of such violations is a set of well-known behavioral effects that occur in the context of multi alternative-multi attribute decision making. They are known as the attraction, similarity, and compromise effects (there is a large literature; more recently: Dumbalska T, Li V, Tsetsos K, Summerfield C. A map of decoy influence in human multi alternative choice. Proc Natl Acad Sci U S A. 2020 Oct 6;117(40):25169-25178. doi: 10.1073/pnas.2005058117. Epub 2020 Sep 21.) These biases have received so much attention because they violate a very basic axiom of EUT. Choices between two options should not be affected by the presence of a third option that is inferior to both of them. However, that is exactly what happens in these choice biases. The effects have been shown in many species ranging from humans to amphibians to invertebrates. As far as I can see, EDM cannot explain how choice allocation between two options A and B that have equal value would be changed by the inclusion of a new option D so that is of lower value than A or B in such a way that D is not chosen at all, but A is chosen more often than B if D is similar in attributes to A (the 'attraction' effect). If I am mistaken, the inclusion of an explanation of how this would work would be of major importance.

      The new Figure 6 provides a starting point for addressing these effects.

      Specifically, this comment has resulted in the following Discussion paragraph:

      "In EDM, the relativistic representation of utility at the neural level (black bars in Figure 6) involves divisive normalization. Divisive normalization a common operation performed by neural circuits (Carandini and Heeger, 2012). The specific form of this operation may be crucial for explaining attraction, similarity, and compromise effects observed in multi-alternative, multi-attribute decision environments (Noguchi and Stewart, 2014; Dumbalska et al., 2020). For instance, it has been found that a transformation of utilities by specific monotonic functions prior to divisive normalization can explain these behavioral effects parsimoniously (Dumbalska et al., 2020). On this front, monotonic transformations and divisive normalization are performed by several kinds of feedforward and feedback neural circuits (Lek et al., 1996; Carandini and Heeger, 2012). Nonetheless, how exactly individual attributes of decision options are encoded at the neural level should be investigated using large-scale neuronal recordings."

      (3) EDM as described in this manuscript is completely static, that is it ignores actual computational processes that underlie decision making. This is in opposition to an important modern branch of decision research that has stressed the importance of understanding processes (and their limitations) to understand how choices are made. Examples are: (1) Roe RM, Busemeyer JR, Townsend JT. Multialternative decision field theory: a dynamic connectionist model of decision making. Psychol Rev. 2001 Apr;108(2):370-92. doi: 10.1037/0033-295x.108.2.370. PMID: 11381834.; (2) Tsetsos K, Usher M, Chater N. Preference reversal in multiattribute choice. Psychol Rev. 2010 Oct;117(4):1275-93. doi: 10.1037/a0020580. PMID: 21038979. The relationship between EDM and algorithmic implementations should be explored.

      This point has been addressed in the following ways:

      1) EDM is now implemented at the algorithmic level while positioned within stochastic choice environments.

      2) The performance of EDM in the stochastic environments is reported in a new Figure 4.

      3) The performance of EDM within the stochastic and deterministic environments is now compared in a new Figure 5. The figure shows that both environments support the same principal conclusions.

      4) Figure 4b provides mechanistic examples of the individual effort allocations by EDM and alternative strategies.

      5) The Discussion includes a new paragraph that places EDM within a broader context of algorithmic implementations:

      "In deterministic environments, EDM comprises a single stage that embodies Equation 7. This rule is analogous to the evolutionarily stable “relative reward sum” in ecology (Harley, 1981; Hamblin and Giraldeau, 2009) and “local fractional income” in neuroscience (Sugrue et al., 2004). In dynamic and stochastic environments, the strategy should additionally incorporate an integration stage that mitigates the effect noise and thus provides meaningful estimates of the worth of each option. Several approaches can be used to keep track of dynamic, stochastic environments and thus estimate their relative worth 𝑢𝑖. The most compact are related to reinforcement learning, in which previous payoffs are discounted exponentially using a “learning rate.” This approach has been applied in ecology (Harley, 1981; Hamblin and Giraldeau, 2009), computer science (Sutton and Barto, 1998), neuroscience (Sugrue et al., 2004; Corrado et al., 2005), and was also applied here when assessing performance in stochastic environments. One benefit of this free parameter is that decision-makers can adapt the learning rate to the speed of change or the level of stochasticity of specific decision situations (Iigaya et al., 2019)."

      (4) Most importantly, what is missing is a clear prediction for a finding (behavioral or neuronal) that would only be predicted, but not by any other theory of decision making. Without such a proposed test, the idea has no scientific merit.

      The paper includes new analyses and text that provide predictions that are specific to EDM. Specifically, this point has been addressed in three ways:

      1) The three predictions that are specific to EDM are now made explicit in a new Figure 5. The figure also provides quantitative support of EDM through performance evaluations across these predictions.

      2) The Results include the following text regarding the key defining properties of EDM: "Figure 5 summarizes and expands on the defining properties of EDM. First, the main finding of this article is that EDM is characterized by high performance following a single evaluation of decision options (Figure 5a). Second, Figure 3a suggested that the proportional allocation of effort to relative utilities (𝛽 → 1) may represent an optimum, at least across the space of effort-utility contingencies tested. Figure 5b-top additionally evaluates the impact of this exponent in the stochastic choice situations. This figure replicates the findings of Figure 3a in that 𝛽 = 1 lies near the optimum, with 𝛽 = 1.0 and 𝛽 = 1.2 providing an average gain of 94.0% and 94.1%, respectively. Thus, the proportional allocation of effort to relative utilities is another defining trait of EDM, and this strategy provides near-optimal performance in all decision situations tested. And third, the effort allocation strategy in EDM, 𝑒harvest = 𝑢(𝑒eval), is invoked once regardless of the number of decision options. This is in contrast to optimization, whose convergence time scales with problem dimensionality, i.e., the number of options. The single-evaluation EDM strategy maintains performance across the number of options under the VI schedules (Figure 5c-top; slope 0.67% per option, 𝐹 = 4.46, 𝑝 = 0.073), although it does incur a performance loss (Figure 5c-bottom; slope -0.95% per option, 𝐹 = 34.66, 𝑝 = 0.00061) in the deterministic cases. Notably, to attain the performance of EDM, the theoretical maximizing agents required substantially more evaluations in situations involving a large number of options (Figure 5c gray; top: slope 2.8 evaluations per option, 𝐹 = 21.80, 𝑝 = 0.0023; bottom: slope 3.4 evaluations per option, 𝐹 = 250.2, 𝑝 = 9.7 × 10−7)."

      3) The Discussion now includes a dedicated paragraph on the testability of the EDM predictions at the behavioral and neural levels:

      "EDM is testable at the behavioral (Figure 5) and neural (Figure 6) levels. At the behavioral level, EDM possesses three distinctive characteristics. First, EDM obtains high reward rapidly (Figure 5a). This characteristic can be tested in choice environments that minimize noise as an additional factor (e.g., Figure 7a), providing performance versus evaluations plots analogous to Figure 2. Second, EDM allocates relative effort to relative utilities proportionally (Figure 5b). This characteristic can be tested in situations in which utilities can be measured precisely, e.g., through the volume of fluid rewards in animal experiments or money in human experiments. And third, EDM allocates effort rapidly regardless of the number of options. This is because the 𝑒harvest = 𝑢 (𝑒eval) strategy is agnostic to the number of options. This characteristic can be tested by varying the number of decision options and quantifying the number of times a decision-maker evaluates the options. At the neural level, EDM only requires the encoding of relative utilities of the recently sampled options. This relative code can be implemented using firing rates of the neuronal pools representing each alternative (Figure 6 top row, middle column). Indeed, this representation has been found in the primate brain. Specifically, the relative value associated with EDM, termed “fractional income,” captures firing rates of neurons in monkey area LIP (Sugrue et al., 2004)."

      Reviewer #2 (Public Review):

      In this article, Kubanek shows how simple, local decision strategies approximate optimal foraging behavior using analytical methods and model simulations. To ground the argument beyond model simulations, Kubanek generalizes previous theoretical frameworks in economics and foraging to show how evaluating relative utility and effort are sufficient to find optimal behavior. A particular strength of this study is its principled approach to linking general economic theory with foraging theory and deriving the conditions under which local behavioral strategies provide general and efficient means to the optimality problem. Re-casting utility and effort in relative terms offers attractive possibilities to apply these formulations in describing a range of phenomena. I, therefore, believe this short report will be of interest to a multidisciplinary audience from economics, psychology behavior theory, foraging, and neuroscience. The author's main claims are supported by their evidence.

      Potential weaknesses of the study include:

      1) Predictions from the proposed EDM framework are stated in vague terms and could be formulated more concretely and, if possible, included in the model simulations.

      The predictions are now stated explicitly in a new Figure 5 and the associated text, and supported through performance evaluations within deterministic and stochastic choice environments.

      Moreover, a new Figure 6 provides a representational and computational account of EDM, and summarizes the main point of the paper that EDM combines high performance with simple, biologically plausible evaluation.

      2) The specificity of the EDM model and related model is only briefly touched on. The EDM argument could be strengthened by making the relation to other behavioral models more explicit.

      The new Figure 5 now compares the key characteristics of EDM with a set of related and more complex models, and evaluates their performance across these characteristics. The relations to other behavioral models are further specified in two new Discussion paragraphs.

      3) Many behavioral situations, including the in this paper often-cited study by Sugrue et al (2004), involve reward contingencies with a high level of uncertainty and non-stationary environments. While the author mentions these situations at the end of the discussion, it remains vague how EDM precisely performs or relates to decision strategies that deal with such environments.

      The article now includes also stochastic choice environments, in addition to the original deterministic choice environments. This has resulted in new Figure 4, Figure 5, and the associated text.

      The results in the stochastic environment corroborate those obtained in the deterministic environments.

    1. Author Response

      Reviewer #1 (Public Review):

      Neural stem cells express cascades of transcription factors that are important for generating the diversity of neurons in the brain of flies and mammals. In flies, nothing is known about whether the transcription factor cascades are build from direct gene regulation, e.g. factor A binding to enhancers in gene B to activate its expression. Here, Xin and Ray show that one temporal factor, Slp1/2, is regulated transcriptionally via two molecularly defined enhancers that directly bind two other transcription factors in the cascade as well as integrating Notch signaling. This is a major step forward for the field, and provides a model for subsequent studies on other temporal transcription factor cascades.

      Thanks for the positive comments!

      Reviewer #2 (Public Review):

      The manuscript addresses an important question concerning the mechanisms regulating temporal transitions in Drosophila neural progenitors called neuroblasts. Here, they concentrate on a specific transition between the transcription factors Ey and Slp1/2 that are sequentially expressed within a cascade involving at least 6 temporal transcription factors. Using a combination of new transgenes, bioinformatics and genome-wide profiling of transcription factor biding sites (Dam-ID), they functionally characterize two enhancers of the Slp1/2 genes that are active during this transition. This led to the identification of the Notch pathway as an important facilitator of the transition. They also show that Notch signaling requires cell cycle progression and that Slp1/2 is a direct target of Ey, validating the importance of transcriptional cross-regulatory interactions among the temporal transcription factors to trigger progression.

      In my opinion, the study is very interesting, representing the first careful analysis of enhancers involved in temporal transitions in neural progenitors, and leading to new insights into the mechanisms promoting temporal progression.

      Thanks for the positive comments!

      Reviewer #3 (Public Review):

      In this manuscript, the authors present data to suggest that transcriptional activation of the Slp1/2 temporal factors in the medulla neuroblasts of the developing Drosophila optic lobe is dependent on two enhancer elements. The authors concluded that these two enhancers were able to be activated by Ey and Scro, two other factors identified to be involved in the temporal cascade of the medulla NB. The authors show that cell cycle progression is necessary for Notch signaling, and that Notch signaling activates and sustains the temporal transcription factor cascade. The authors use GFP reporter assays to correlate the enhancer activity to Slp1/2 expression and used DamID to show in-vivo binding of Su(H) and Ey to the enhancer fragments.

      I agree with the authors that it is important to define the mechanisms by which Notch, cell cycle control and these temporal transcription factors function through their cis-regulatory elements to establish this self-propagating cascade to generate diverse cell types during neurogenesis. However, the findings in this study offer limited new insights toward reaching this goal for a myriad of reasons. First, studies in invertebrate and vertebrate neurogenesis have agreed on the conceptual framework that transcriptional control plays a key role in regulating the generation of diverse cell types. The data showing the patterns of slp1/2 transcript simply reaffirm the proposed model as well as recently published single-cell transcriptomic analyses of fly optic lobe neuroblasts. Second, it remains unclear how physiologically relevant the enhancer analyses presented in this study are to the regulation of Slp1/2 expression, as the data can only suggest that they act redundantly to each other. It is also troubling to see that mutating binding sites of a single transcription factor appears to completely abolish enhancer activity while Slp1/2 protein expression is delayed in mutant clonal analyses. Third, the authors do not offer any explanation for how Notch signaling contributing to the timing of Slp1/2 expression, considering that Notch signaling should be active during the entire life of the neuroblast based on canonical Notch target gene expression. What action do Ey and Scro play in this timely enhancer activation as both appear to be necessary to activate the enhancers along with Notch. Fourth, many studies including the Okamoto et al., 2016 study cited in this study have contributed to our appreciation of the role of proper cell cycle control in promoting generation of diverse neurons in vertebrate neurogenesis. It is unclear to me if findings from the current study contribute to significant advancement on this regulatory link.

      Thanks for raising these concerns. Here are our responses:

      First, we agree that there have been great advances in this field including classical studies in the ventral nerve cord, recent studies on type II lineages and medulla including our own scRNA-seq study of medulla neuroblasts. These studies have revealed the sequential expression of transcription factors in neuroblasts of different ages, and proposed that these transcription factors form a transcriptional cascade based on the cross-regulations among them. However, these cross-regulations were based on mutant phenotypes, and in most cases, the cis-regulatory elements of these TTFs have not been characterized, and it hasn’t been studied whether these cross-regulations are direct or not. Little is known about exactly how the timing of the transition is regulated and coordinated with cell-cycle control. We have addressed these questions and identified two enhancer elements for slp1/2, and demonstrated that the previous TTF Ey, another TTF Scro, and Notch signaling directly regulate slp expression. Further we demonstrated that Notch signaling is dependent on cell cycle progression in neuroblasts, and supplying Notch signaling rescues the delay in Slp expression in cell cycle mutants. We believe this study has provided important insights in this field and is another step forward.

      Second, now we provide evidence that deletion of both enhancers specifically abolished Slp1 and Slp2 expression in medulla neuroblasts.

      Regarding the concerns about binding site mutation:

      1) Ey: With loss of Ey, Slp is completely lost. The Ey binding site mutation phenotype is consistent with loss of Ey phenotype.

      2) Su(H): For the u8772 250bp enhancer, mutating all four predicted Su(H) binding sites did abolish the reporter expression. During the revision, we generated another construct, in which we mutated the two predicted Su(H) binding sites which are perfect matches to the consensus, and found that this dramatically reduced the reporter expression. For the d5778 850bp enhancer, mutation of Su(H) binding sites caused strong glial expression which prevented us to precisely assess the neuroblast expression. Thanks to the excellent advice from review #3, we used repo-Gal4 and GFP-RNAi to remove the glial expression. This approach turned out very informative. We found that mutation of four or six out of six predicted Su(H) binding sites actually did not decrease the reporter expression in neuroblasts, suggesting that Notch signaling does not active the d5778 850bp enhancer through these binding sites. However, we think this is the explanation why this enhancer drives a delayed expression comparing to the 220bp enhancer and the endogenous Slp. In addition, this also explains why with loss of Notch signaling, endogenous Slp expression is only delayed but not completely lost. This is because although the 220bp enhancer driven expression is completely lost, the d5778 850 bp enhancer still directs a delayed expression of Slp and this expression is not dependent on Notch signaling.

      3) Scro: Mutation of Scro binding sites caused a decreased expression level of the reporter, consistent with the scro RNAi phenotype on Slp, which is a decreased expression level.

      Third, regarding how Notch signaling which is active in the entire neuroblast life, can act to activate Slp expression in a specific time We tested genetic interactions between Ey, Scro, and Notch in the regulation of Slp expression. We found that with loss of Ey, supplying constitutive active Notch or Scro is not sufficient to rescue Slp expression. Thus Ey as the previous TTF, may be required to prime the slp locus, so that Notch signaling and Scro can act to further activate Slp expression. Therefore, Notch signaling requires Ey to specifically further activate Slp at the correct time. We have added these experimental results and discussion.

      Fourth, the Okamoto et al., 2016 study actually concluded that cell cycle progression is not required for the temporal progression. In their experimental setup, they supply Notch to maintain the un-differentiated status of cortical neural progenitors when they block cell cycle progression. The observed that temporal transition still happened, and they concluded that cell cycle progression is not required for temporal transitions. However, they didn’t consider the possibility that Notch signaling, which is itself dependent on cell cycle progression, actually rescued the possible phenotype caused by arrest of cell cycle progression. Our result demonstrated that in Drosophila medulla, supplying Notch signaling can rescue the delay in the transition to the Slp stage in cell-cycle arrested neuroblasts, and further showed that the mechanism is by direct transcriptional regulation. We believe that publication of our results will be informative to the vertebrate study, promoting vertebrate researchers to re-consider the role of cell cycle progression and Notch signaling in temporal progression.

    1. Author Response

      Reviewer #1 (Public Review):

      Edmondson et al. develop an efficient coding approach to study resource allocation in resource constrained sensory systems, with a particular focus on somatosensory representations. Their approach is based on a simple, yet novel insight. Namely - to achieve output decorrelation when encoding stimuli from regions with different input statistics, neurons in the sensory bottleneck should be allocated to these regions according to jointly sorted eigenvalues of the input covariance matrix. The authors demonstrate that, even in a simple scenario, this allocation scheme leads to a complex, non-monotonic relationship between the number of neurons representing each region, receptor density and input statistics. To demonstrate the utility of their approach, the authors generate predictions about cortical representations in the star-nosed mole, and observe a close match between theory and data.

      Strengths:

      These results are certainly interesting and address an issue which to my knowledge has not been studied in-depth before. Touch is a sensory modality rarely mentioned in theoretical studies of sensory coding, and this work contributes to this direction of research.

      A clear strength of the paper is that it demonstrates the existence of non-trivial dependence between resource allocation, bottleneck size and input statistics. Discussion of this relationship highlights the importance of nuance and subtlety in theoretical predictions in neuroscience.

      The proposed theory can be applied to interpret experimental observations - as demonstrated with the example of the star-nosed mole. The prediction of cortical resource allocation is a close match to experimental data.

      We thank the reviewer for the feedback. Indeed, demonstrating an ‘interesting’ effect in even such a simple model was one of the main aims.

      Weaknesses:

      The central weakness of this work are the strong assumptions which are not clearly stated. In result, the consequences of these assumptions are not discussed in sufficient depth which may limit the generality of the proposed approach. In particular:

      1) The paper focuses on a setting with vanishing input noise, where the efficient coding strategy is toreduce the redundancy of the output (for example through decorrelation). This is fine, however, it is not a general efficient coding solution as indicated in the introduction - it is a specific scenario with concrete assumptions, which should be clearly discussed from the beginning.

      2) The model assumes that the goal of the system is to generate outputs, whose covariance structure isan identity matrix (Eq. 1). This corresponds to three assumptions: a) variances of output neurons are equalized, b) the total amount of output variance is equal to M (i.e. the number of of output neurons), c) the activity of output neurons is decorrelated. The paper focuses only on the assumption c), and does not discuss consequences or biological plausibility of assumptions a) and b).

      We have clarified the assumptions in the revised version. The original version did not distinguish clearly between assumptions that were necessary to allow study of the main effect, and assumptions that were included to present a full model but that could have been chosen otherwise without affecting the results.

      This has now been made much clearer. Regarding the noise issue (point 1), we have clarified the main strategy pursued by the model namely decorrelation, we acknowledge other possible strategies, and we make clear whether and how noise could be incorporated into the model. Regarding the biological plausibility of our assumptions (point 2),

      Reviewer #2 (Public Review):

      The authors propose a new way of looking at the amount of cortical resources (neurons, synapses, and surface area) allocated to process information coming from multiple sensory areas. This is the first theoretical treatment of attempting to answer this question with the framework of efficient coding that states that information should be preserved as much as possible throughout the early sensory stages. This is especially important when there is an explicit bottleneck such that some information has to be discarded. In this current paper, the bottleneck is quantified as the number of dimensions in a continuous space. Using only the second-order statistics of the stimulus, and assuming only the second-order statistics carrying information, the authors use variance instead of Shannon's information. The result is a non-trivial analysis of ordering in the eigenvalues of the corresponding representations. Using clever mathematical approximations, the authors arrive at an analytical expression -- advantageous since numerical evaluation of this problem is tricky due to the long thin tails of the eigenvalues of the chosen covariance function (common in decaying translation-invariant covariances). By changing the relative stimulus power (activity ratio), receptor density (effectively the width of the covariance function), and the truncation of dimensions (bottleneck width), they show that the cortical allocation ratio, surprisingly, is a non-trivial function of such variables. There are a number of weaknesses in this approach, however, it produced valuable insights that have a potential to start a new field of studying such resource allocation problems all across different sensory systems in different animals.

      Strengths

      • A new application of the efficient coding framework to a neural resource allocation problem given acommon bottleneck for multiple independent input regions. It's an innovation (initial results presented at NeurIPS 2019) that brings normative theory with qualitative predictions that may shed new light to seemingly disproportionate cortical allocations. This problem did not have a normative treatment prior to this paper.

      • New insights into allocation of encoding resources as a function of bottleneck, stimulus distribution, andreceptor density. The cortical allocation ratios have nontrivial relations that were not shown before.

      • An analytical method for approximating ordered eigenvalues for a specific stimulus distribution.

      Weaknesses

      The analysis is limited to noiseless systems. This may be a good approximation in the high signal-to-noise ratio regime. However, since the analysis of allocation ratio is very sensitive to the tail of eigenvalue distribution (and their relative rank order), not all conclusions from the current analysis may be robust. Supplemental figure S5 perhaps paints a better picture since it defines the bottleneck as a function of total variance explained instead of number of dimensions. The non-monotonic nonlinear effects are indeed mostly in the last 10% or so of the total variance.

      We agree that the model is most likely to apply in the low-noise regime, as stated in the Discussion. The robustness of the results is indeed a worry, and indeed we have encountered some difficulties when calculating model results numerically due to the issue pointed out by the reviewer, and this led us to focus on an analytical approach in the first case. However, to test model robustness we have now included numerical results for several other covariance functions to demonstrate that, at least qualitatively, the results presented in the paper are not simply a consequence of the particular correlation structure we investigated.

      In case where the stimulus distribution is Guassian, the proposed covariance implies that the stimulus distribution is limited to spatial Gaussian processes with Ornstein-Uhlenbeck prior with two parameters: (inverse) length-scale and variance. While this special case allowed the authors to approach the problem analytically, it is not a widely used natural stimuli distribution as far as I know. This assumed covariance in the stimulus space is quite rough, i.e., each realization of the stimulus is spatially continuous isn't differentiable. In terms of texture, this corresponds to rough surfaces. Of course, if the stimulus distribution is not Gaussian, this may not be the case. However, the authors only described the distribution in terms of the covariance function, and lacks additional detail to fill in this gap.

      We would argue that somewhat ‘rough’ covariance structure might be relatively common, for example in vision objects have clear borders leading to a power law relation and similarly in touch objects are either in contact with the skin or they are not. In either case, we have now extended the analysis to test several other covariance functions numerically. We found that, qualitatively, the main effects described in the paper were still present, though they could differ quantitatively. Interestingly, the convergence limit appeared to depend on the roughness/smoothness of the covariance function, indicating that this might be an important factor.

      The neural response model is unrealistic: Neuronal responses are assumed to be continuous with arbitrary variance. Since the signal is carried by the variance in this manuscript, the resource allocation counts the linear dimensions that this arbitrary variance can be encoded in. Suppose there are 100 neurons that encode a single external variable, for example, a uniform pressure plate stimulus that matches the full range of each sensory receptor. For this stimulus statistics, the variance of all neurons can be combined to a single cortical neuron with 100 times the variance of a single receptor neuron. In this contrived example, the problem is that the cortical neuron can't physiologically have 100 times the variance of the sensory neuron. This study is lacking power constraint that most efficient coding frameworks have (e.g. Atick & Redlich 1990).

      We agree that the response model, as presented, is very simplistic. However, the model can easily be extended to include a variety of constraints, including power constraints, without affecting the results at all. Unfortunately, we did not make this clear enough in the original version. The underlying reason is that decorrelation does not uniquely specify a linear transform and the remaining degrees of freedom can be used to enforce other constraints. As the allocation depends only on the decorrelation process (via PCA), we do not explicitly calculate receptive fields in the paper and any additional constraints (power, sparsity) would affect the receptive fields only and so were left out in the original specification. We have now added clearer pointers for how these could be included and why their inclusion would not affect the present results.

      The star-nosed mole shows that the usage statistics (translated to activity ratio) better explains the cortical allocation than the receptor density. However, the evidence presented for the full model being better than either factor is weak.

      We agree that the results do not present definitive evidence that the model directly accounts for cortical allocations and as we state in the paper, much stronger tests would be needed. Our idea here was to test whether, in principle, the model predictions are compatible with empirical evidence and therefore whether such models could become plausible candidates for explaining neural resource allocation problems. This seems to be the case, even though the evidence in favour of the ‘full model’ versus the ‘activity only’ model is indeed not overwhelming (though this might be expected as the regional differences in activity levels are much greater than those in density). We have now added additional tests to show that the results are not trivial. We would also like to note that it is not obvious that the ‘full’ model would perform better than the ‘activity only’ model: for either we choose the best-fitting bottleneck width (as the true bottleneck width is unknown), and therefore the degrees of freedom are equal (with both activity levels and densities fixed by empirical data).

      Reviewer #3 (Public Review):

      This work follows on a large body of work on efficient coding in sensory processing, but adds a novel angle: How do non-uniform receptor densities and non-uniform stimulus statistics affect the optimal sensory representation?

      The authors start with the motivating example of fingers and tactile receptors, which is well chosen, as it is not overstudied in the efficient coding literature. However, the connection between their model and the example seems to break down after a few lines when the authors state that they treat individual regions as independent, and set the covariance terms to zero. For finger, e.g. that would seem highly implausible, because we typically grasp objects with more than one finger, so that they will be frequently coactivated.

      Our aim was to take a first stab at a model that could theoretically account for neural resource allocation under changes in receptor density and activity levels, and by necessity this initial model is rather simple. Choosing a monotonically decreasing covariance function along with some other simplifications allowed us to quantify the most basic effects, and do so analytically. Any future work should take more complex scenarios into account. Regarding the sense of touch, we agree that the correlational structure of the receptor inputs will be more complex than assumed here, however, whether and how this would affect the results is less clear: Across all tactile experiences (not just grasps, but also single finger activities like typing), cross-finger correlations might not be large compared to intra-finger ones. Unfortunately, there is currently relatively little empirical data on this. That said, we agree with the broader point that complex correlational structure can be found in sensory systems and would need to be taken into account when efficiently representing this information.

      The bottleneck model posited by the authors requires global connectivity as they implement the bottleneck simply by limiting the number of eigenvectors that are used. Thus, in their model, every receptor potentially needs to be connected with every bottleneck neuron. One could also imagine more localized connectivity schemes that would seem more physiologically plausible given the observed connectivity patterns between receptors and relay neurons (e.g. in LGN in the visual system). It would be very interesting to know how this affects the predictions of the theory.

      We agree that the model in its current form is not biologically plausible. While individual receptive fields can be extremely localised, the initial allocation of neurons to regions we describe in the paper relies on a global PCA, and it is not clear how this might be arrived at in practice under biological constraints. However, our aim here was to specify a normative model that generates the optimal allocation and thereby answer what the brain should be doing under ideal circumstances. Future work should definitely ask whether and how these allocations might be worked out in practice and how biological constraints would affect the solutions.

      The representation of the results in the figures is very dense and due to the complex interplay between various factors not easy to digest. This paper would benefit tremendously from an interactive component, where parameters of the model can be changed, and the resulting surfaces and curves are updated.

      We have aimed to make the figures as clear as possible, but do appreciate that the results are relatively complex as they depend on multiple parameters. The code for re-creating the figures is available on Github (https://github.com/lauraredmondson/expansion_contraction_sensory_bottlenecks), making it easy to explore scenarios not described in the paper.

      For parts of the manuscript, not all conclusions made by the authors seem to follow directly from the figures: For example, the authors interpret Fig. 3 as showing that activation ratio determines more strongly whether a sensory representation expands or contracts than density ratio. This is true for small bottlenecks, but for relatively generous ones it seems the other way around. The interpretation by the authors, however, fits better the next paragraph, where they argue that the sensory resources should be relatively constant across the lifespan of an animal, and only stimulus statistics adapt. However, there are notable exceptions - for example, in a drastic example zebrafish change their sensory layout of the retina completely between larvae and adult.

      We have amended the text for this section in the paper to more closely reflect the conclusions that can be drawn from the figure. These are summarised below.

      The purpose of Fig. 3B is to show that knowledge of the activation ratio provides more information about the possible regime of the bottleneck allocations. We cannot tell the magnitude of the expansion or contraction from this information alone, or where in the bottleneck the expansion or contraction would occur. Typically, when we know the activation ratio only, we can tell whether regions will be expanded or contracted or whether both occur over all bottleneck sizes. For a given activation ratio (for example, a = 1:2, as shown in the 3B), we know that the lower activation region can be either contracted only or both expanded and contracted over the course of the bottleneck. In this case, regardless of the density ratio, the lower activation region cannot be contracted only. Conversely, for any density ratio (see dashed horizontal line in Fig. 3B), allocations can be in any regime.

      In the final part of the manuscript, the authors apply their framework to the star nosed mole model system, which has some interesting properties; in particular, relevant parameters seem to be known. Fitting to their interpretation of the modeling outcomes, they conclude that a model that only captures stimulus statistics suffices to model the observed cortical allocations. However, additional work is necessary to make this point convincingly.

      We have now included a further supplementary figure panel providing more details on the fitting procedure and results for each model. Given that we fit over a wide range of bottleneck sizes, where allocations for each ray can vary widely (see Figure 6, supplement 1A), we tested an additional model to confirm that the model requires accurate empirical density and/or activation values for each ray to provide a good fit to cortical data. Here we randomise the values for the density and activation of each ray within the possible range of values for each. We find that with this randomisation of the values the model performs poorly on fitting even with a range of bottleneck sizes. This suggests that the model can only be fitted to the empirical cortical data when using the empirically measured values.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports a new analytical method (rhapsodi) to impute genotypes on human gamete data. The authors characterize the specificity and sensitivity of the approach and benchmark it against the current tool to analyze gamete data. rhapsodi is more efficient and versatile than the current approach, and thus represents an important technical feat. The last analysis of the manuscript is a reanalysis of the SpermSeq dataset, a massive sequencing effort to characterize recombination in human sperm haplotype data. rhapsodi fails to find any deviations from random segregation and challenges the notion that there are distorters in the human genome. In general, the manuscript represents an important technical piece but the results could be better contextualized to provide a perspective of what are the implications of the findings for our understanding of human recombination and segregation distortion.

      Thank you for appreciating the technical importance of our work for improving the analysis of transmission distortion (TD) based on low-coverage single-cell sequencing data from gametes. We agree that the results (in regard to the method performance, statistical power, and implications for human TD) should be better contextualized, which we address in a point-by-point manner below.

      Reviewer #2 (Public Review):

      This paper describes a new and powerful method of inferring gametic haplotypes using low-coverage sperm sequencing data, rhapsodi. It is a highly useful tool, and the authors demonstrate its robustness using simulations and comparisons to the current gold standard, Hapi. The authors also use the results of rhapsodi on a sample of low-coverage human sperm sequencing data to assess the evidence for moderate transmission distortion (TD), a pattern that previous studies using pedigrees have sought to identify without replicable success. The work's main strength lies in the method the authors have developed and their clear and thorough description and validation of its use. The rhapsodi method clearly performs substantially better than Hapi in several relevant use cases, and in some instances it is usable when Hapi would fail to run or require unreasonable resources. This study, then, provides a highly useful tool to researchers wishing to phase donor haplotypes, infer gamete genotypes, and estimate rough locations of recombination breakpoints using Sperm-seq data.

      Thank you for engaging with our method and for noting its use cases and performance.

      A major limitation is the lack of consideration of strong TD. Under this scenario, there may be "allelic dropout" in the low-coverage Sperm-seq data; without information on the parental genotype from somatic cells, over-transmission of one allele would appear to be absence of the alternate allele (i.e., the donor would be erroneously inferred to be homozygous). Some known examples of TD in other species are extremely strong; e.g., the SD locus in Drosophila can cause distortion as strong as k=0.99. Such cases seem highly likely to be missed using Sperm-seq + rhapsodi, and a lack of power to detect them would influence both ability to observe individual cases of TD as well as the authors' test for a global signal of biased transmission. Since the provided simulations only include scenarios up to 70% transmission of one allele, the paper does not address this potential limitation.

      The authors claim that their work conclusively excludes the presence of ongoing TD in their sample of human males, which, if they are from the same populations as former studies, may provide additional evidence against ongoing TD in these human populations. However, whereas earlier studies were only highly powered for extremely strong TD, the current method appears to be highest powered for intermediate levels of TD, strong enough to generate differences from binomial expectations, but not so strong that one allele might be missing in the low coverage pool of sperm serving as input to rhapsodi. This claim, then, may be better framed as a lack of evidence for TD of intermediate strength in current samples, rather than the strict adherence to Mendelian transmission indicated in the title.

      This is an interesting and important point, and we agree that extreme TD would produce apparent tracts of homozygosity across the sample of sperm genomes. Without external knowledge of heterozygous sites in the donor genome, such SNPs would be unobserved within the sperm sequencing data. To address this possibility, we performed additional simulations of very strong TD (transmission rate, k = 0.99; Figure 4-figure supplement 3; lines 416-434; lines 1062-1083). These simulations demonstrate that despite the homozygosity of the causal SNP, recombination in flanking regions recovers heterozygosity but still manifests extreme and detectable TD. Specifically, across 2,200 simulations (100 independent simulations x 22 chromosomes; k = 0.99) with parameters matching a typical Sperm-seq donor, we identified the TD signature in all 2,200 cases (Power = 1) despite homozygosity (and thus filtering) of the causal SNP in 89% of cases (1958 / 2200). This high power also holds for donor samples with higher (Power = 1) and lower (Power = 1) coverages, respectively.

      In summary, even though it is the case that the causal SNP and nearby flanking SNPs “drop out” of the data, recombination occurs as one extends out from these regions in both directions, and very strong signals (well beyond genome-wide significance thresholds) are detectable within these heterozygous regions. While we cannot attribute the signal to the true causal SNP, this limitation is not unique to our study, but is a general limitation of any study design (including pedigree and pooled sequencing studies) that must contend with linkage disequilibrium.

      Nevertheless, as highlighted by Reviewer 3, the use of the term “strict” in the title may be too subjective. TD of 5% or less could be considered strong from a population genetic perspective, but undetectable based on binomial variance and our stringent multiple testing corrections. We have therefore removed the word “strict” from the title and moderated the adjectives we use when describing the strength of detectable TD throughout the paper. We also enumerate various forms of TD that would be undetectable based on our study design in the Discussion (lines 581-586; lines 603-638).

      Reviewer #3 (Public Review):

      The authors reanalyze an existing dataset of single-cell Sperm-seq data to search for signals of transmission distortion. They develop an improved genotype imputation method and use this approach to phase donors and characterize the landscape of ancestry across each sperm genome. Using these data, the authors determined that there are no regions in any of the male donors' genomes that display a significant excess of TD. The main biological claim of the paper is that there is a strict adherence to Mendelian transmission ratios in human males.

      The computational approaches for accurately phasing and reconstructing haplotypes in individually lightly sequenced gametes is a potentially useful advance that I expect may be valuable for geneticists analyzing similar datasets. The quality of software documentation and usability is high. I have concerns about the appropriateness of the comparisons selected for this approach and the algorithm does not appear particularly novel.

      I have no doubt about the authors' basic conclusion that there are no strong male TD loci in the male donors examined. However, I find their statements about "strict adherence to Mendelian ratios" and many references to strong statistical power to be oversold. The power of this study is still quite limited relative to the strength of TD that we would expect to find in human populations.

      Thank you for your comments and for engaging with our manuscript so closely. We agree that additional discussion of statistical power, the strength of TD that can be detected, and the uses of our software are necessary, and these changes have substantially strengthened our revised manuscript.

      Major Concerns:

      There are really two distinct papers here. One is about improved imputation and crossover analysis from sperm-seq data and one is about TD. The bulk of the methodological development is a rework of the approach for genotype imputation and haplotype phasing in Sperm-seq. Yet, the major conclusions are focused on a scan for TD. I am left wondering if analyzing these data using the original method in the Bell et al paper would have produced different conclusions about either? If not, is there a systematic bias such that one would find an excess of false detections of TD? Phasing slightly more markers is not a particularly compelling link between these sections because even fairly sparsely distributed markers that are correctly phased would certainly be fine in a scan for TD within a single individual due to linkage. If this cannot be shown I wonder if this work would be better split into two manuscripts with one more technical paper describing the differences in recombination maps associated with rhapsodi and the other as a brief report stating that strong TD is probably uncommon in human males.

      While we agree that there are two important aspects of our study, we feel that the combination of a generalizable method as well as an application to test an important biological hypothesis is a strength of our work.

      For additional context, Dr. Bell is a co-author on our study and collaborated with us in part based on the motivation to build a reproducible software toolkit for similar analyses. Bell et al. (2020) did not implement their method as generalizable software, but rather as a set of analysis scripts tested only with their data and computing environment. Unlike our method (rhapsodi) and the comparison approach (Hapi), those scripts were not written as user-friendly software and are therefore less likely to be used by the research community.

      It is not surprising that rhapsodi outperforms Hapi since Hapi was designed for a very different quantity of samples and sequencing depths. I appreciate the authors' point that Hapi performed better than other methods in comparisons run by the Hapi authors. However, they were looking at very few gametes (10 or so, I believe). For that reason, this comparison is not appropriate to address the application to the datasets used in this paper. The authors should include an analysis comparing rhapsodi against hapcut2, PHMM and other methods that are appropriate for the full scale and sequencing depth of the data. Additionally, the original Bell paper used a phasing + HMM approach of some kind for exactly this data. Why wasn't that approach considered as a point of comparison?

      While your point is well taken, we do not believe that a direct comparison between rhapsodi and PHMM would provide additional insight. In the publication describing PHMM (Hou et al. 2013), their algorithm was designed for datasets containing lower numbers of cells (11-41) sequenced to higher coverage per cell (0.4-0.9) relative to the data analyzed by rhapsodi. PHMM is therefore, like Hapi, optimized for a more narrow range of parameters than rhapsodi. Across this range of parameters, Hapi uniformly performs better than PHMM. Other tools such as hapcut2 may be designed to work with lower coverages and higher cell numbers than PHMM and Hapi, but are designed for use exclusively with diploid genomes. rhapsodi is therefore the first haploid phasing tool that can work with large numbers of low-coverage cells and there is no existing software that operates in the same niche. While the parameter spaces of Hapi and rhapsodi only partially overlap, Hapi therefore remains the most appropriate point of comparison.

      In addition to the point about analysis scripts versus a generalizable software package, we note two major differences between the steps employed in Bell et al. 2020 and rhapsodi’s method:

      1) For phasing, Bell et al. (2020) used Hapcut2 in an “off-label” way that required artificial assignment of alleles from the same sperm cell to the same “read” for input. This approach ignores the positional information that was already encoded in the alignment and may not take full advantage of the co-inheritance patterns of the SNP alleles. The phasing method implemented in rhapsodi is a principled approach tailored to the structure of the input data and knowledge of the biological process of meiosis.

      2) For crossover discovery, Bell et al. (2020) handled genotype error by encoding an “error” state in the HMM. In our method, we assign gamete-level genotypes via HMM-based imputation prior to detecting recombination breakpoints. We believe dealing with the error prior to crossover discovery is a simpler approach that better leverages the strengths of HMMs.

      With respect to the method for imputation, no comparison is made to known recombination maps nor do the authors make any comparison across the maps derived from each donor. Reporting an improved method without it motivating novel biological conclusions is not compelling in itself. I suggest the authors expand that analysis to consider these are related questions. E.g., are there males whose recombination maps differ in specific regions? Are those associated with known major chromosomal abnormalities? Is this map consistent with estimates from LD, pedigrees, Bell et al?

      We agree that evaluating the inferred crossover landscape in relation to published maps would be useful as a technical evaluation of our method, though we respectfully disagree with the suggestion to expand the scope of the manuscript to the analysis of inter-individual variability in the crossover landscape—topics that were the main focus of Bell et al. (2020). The distinction between our work and that study was addressed in our responses to previous comments.

      To address the suggestion to compare to existing maps, we counted the number of inferred recombination events for each 1 Mbp genomic bin, pooling across the donors. We compared this result with a published male-specific recombination map inferred from trio sequencing data (Halldorsson et al. 2019) and observed a strong correlation with our map (R = 0.9; Figure 5-figure supplement 5). We have incorporated this in “Results: Application to data from human sperm” (lines 372-377; lines 385-391) and note the potential biological and technical reasons for the observed discrepancies (lines 391-399). One such technical reason for the observed modest discrepancy appears to be related to the sample sequencing depth of coverage. Rather than pooling the number of inferred recombination events for each bin across all donors, we repeated the correlation analysis in a donor specific manner. Then, we fit a linear regression model with the sample-specific sequencing depths of coverage as the predictor and the sample-specific correlations as the response variable. We found that the sample-specific correlation with the deCODE map was positively associated with depth of coverage (lines 391-399).

      Most of the validations presented are based on simulated data. This is fine and has some advantages, but real data imposes challenges that these analyses do not address. My understanding is that the Bell et al. (2020) paper includes a donor with a phased diploid genome. A comparison of rhapsodi's phasing accuracy against that genome should be included.

      Bell et al. 2020 included only sperm donors with previously unknown genomes, and phased their genomes via the sperm sequencing data. They validated their phasing approach in two ways: 1) via simulated data and 2) via comparing to the phase generated by Eagle (Loh et al 2016, Nat Gen) for one donor genome, specifically comparing the phase of neighboring sites phased with both approaches. Importantly, such population-based approaches achieve only local phasing of common variation, as opposed to the chromosome-scale phasing achieved via gamete sequencing. Nevertheless, we acknowledge that real data exhibits features that are not captured by simulated data. We tried to capture the most significant potential contributors from real data (e.g., genotyping errors) in our simulations. Our newly added comparisons to the Halldorsson et al. (2019) map help address this concern (Figure 5-figure supplement 5).

      The main biological conclusion about a "strict adherence to Mendelian expectations across sperm genomes" is an overstatement. Statistical power of this study is still limited relative to the strength of TD that would be expected within human populations. One reason is the multiple testing correction. Another is that 1000-3000 draws from a binomial distribution with expected p = 0.5 is just not sufficient to overcome binomial sampling variance. In light of this concern and the central conclusion of this paper, the authors' discussion of power is inadequate. The main text really should contain explicit discussion of the required genotype ratio skew for TD in each donor to be detected with good power. Given previous pedigree studies, it is not surprising that no significant TD was discovered that exceeded the necessary ~10% effect sizes to be detectable. Recent, much more powerful analyses in mice, Drosophila and plants, indicate that strong TD is probably uncommon and even weak effects can be detected but are uncommon.

      Thank you for these detailed suggestions regarding statistical power. Our manuscript is greatly improved by these updates to the power analysis and our comparison to alternative methods for investigating TD.

      Specifically, we added additional simulations of TD at different rates (including very strong TD, as also noted in response to Reviewer 1) to demonstrate the range in which our study would be able to detect TD in this sample, considering the burden of multiple testing (Figure 4-figure supplement 3).

      We added to the section titled “Results: Statistical power to detect moderate and strong TD” a statement about the strength of TD that would be detectable within the Sperm-seq dataset (lines 400-415). Briefly, the 25 donors have an average of 1711 gametes each (range 969-3377). Based on this sample size, we have Power = 0.681 to detect deviations of 0.07 (i.e., 57% transmission of one allele in a single donor) and Power = 0.912 to detect deviations of 0.08, accounting for multiple hypothesis testing across the genome and across donors (p-value threshold = 1.78 x 10-7). For an individual with 950 gametes, we have Power = 0.637 to detect deviations of 0.09 and Power = 0.84 to detect deviations of 0.1.

      Based on these calculations, we agree that the term “strict” is subjective and may be considered an over-statement depending on the point of comparison, and we have modified the title accordingly.

      This manuscript would benefit from a much clearer examination of statistical power and a detailed comparison of the power of this approach vs pedigree-based analyses as well as bulk gamete sequencing approaches. Although the authors are correct that all scans for TD in human genomes have been pedigree or single-cell based, more powerful alternatives are known. These are based on sequencing pools of individuals or gametes (e.g., Wei et al. 2017, Corbett-Detig et al. 2019). Each of those studies has been able to identify signatures of segregation distortion below the thresholds required for significance in this study. These and related works should be acknowledged in both the introduction and discussion. Although I appreciate that the ability to phase the genome in a single experiment may be appealing, phasing diploid genomes via hi-c omni-c is straightforward and the advantages in statistical power suggest that approaches using pools of gametes are preferable for well-powered scans for TD.

      Thank you for your suggestions regarding contextualizing the statistical power of single-gamete sequencing-based approaches. Our steps to address these comments have strengthened our manuscript and made the paper more applicable to future research.

      The single-cell nature of the low-coverage (~0.01x) Sperm-seq data allowed us to augment our sample size 100-fold at each SNP in a way that is not possible with a pooled sequencing approach. Pooled sequencing methods may augment statistical power for detecting TD by 1) combining information from nearby SNPs and 2) assuming different sperm are sampled at each site. This approach has relied on external knowledge of haplotypes (e.g., obtained through sequencing of inbred strains of Drosophila). This permits aggregation of alleles supporting one haplotype or the other across adjacent SNPs, which can increase statistical power. The same statistical test for TD cannot be applied to bulk sequencing data from human sperm (e.g., Bruess et al. 2019, Yang et al. 2021) without external knowledge of the parental haplotypes. One potential approach for circumventing this issue would be local phasing using patterns of LD from a reference panel, but this would limit the analysis to common SNPs within relatively small windows that can be adequately phased with such methods.

      It is not immediately obvious that pooled sequencing studies have greater power for discovering TD than single-cell studies. None of the pooled sequencing studies mentioned by the reviewer performed similarly exhaustive power analyses, and the power analyses that were performed in pooled sequencing studies were done in systems with different levels of heterozygosity, different genome sizes, different sample sizes of donor individuals, etc. All of these factors affect the multiple testing burden, making it impossible to compare directly to a study in humans. Given the above considerations, we believe that an in-depth analysis of the statistical power of pooled sequencing approaches for discovering TD in humans lies outside the scope of our study.

      We have nevertheless updated our manuscript to discuss the strengths of pooled sequencing methods as an approach for investigating TD, citing relevant studies in both the Introduction (lines 37-46) and Discussion (lines 508-529; lines 557-580). We acknowledge that these methods have been successfully applied in other species (e.g., Wei et al. 2017, Corbett-Detig et al. 2019) and their potential to improve statistical power. We note the steps that would be necessary for making these methods applicable for TD scans in humans as new datasets are produced.

      We added a general power analysis of pedigree studies (Figure 4-figure supplement 4A) to illustrate the large sample sizes necessary to detect weak TD. To demonstrate the large sample size required for a pedigree study to achieve strong statistical power, we plot the number of informative transmissions of each SNP in the two pedigrees from Meyer et al. 2012 for which data was publicly accessible (Figure 4-figure supplement 4B).

      Importantly, in a single-gamete sequencing study, the number of informative transmissions is equal to the number of genotyped gametes for all heterozygous SNPs. In a pedigree-based study, the number of informative transmissions varies across SNPs, as not all parent-offspring trios will include one or more parent heterozygous for a given SNP. For example, the Hardy-Weinberg expected proportion of heterozygous parents for a common SNP with an allele frequency of 0.5 is 2pq = 0.5. Meanwhile, variants at lower frequencies will possess smaller proportions of heterozygotes, thus capturing fewer informative transmissions and limiting statistical power. One implication of this distinction is that pedigree-based studies rely on distorter alleles that act across multiple families, effectively restricting such scans to variants that are common in the population. This contrasts with single-gamete sequencing studies, which provide equal power for detecting TD involving common and rare alleles, provided that they are heterozygous in the sampled donor individual. We note this in the Discussion (lines 508-529).

      As noted by the reviewer, single-cell sequencing allows both phasing and examination of TD in a single study, allowing the investigation of meiotic recombination and its potential relationship with TD and fertility profiles. We have added text in the Conclusion (lines 659-693) to address this important point. Because of this study design, we are uniquely positioned to detect TD caused by any rare alleles we do capture; this contrasts with pedigree-based studies, where a distorter would need to be acting across multiple families to be detectable (thus restricting these scans to common variants). We have noted this in the Discussion (lines 521-529).

    1. Author Response

      Reviewer #1 (Public Review):

      McLachlan and colleagues find surprisingly widespread transcriptional changes occurring in C. elegans neurons when worms are prevented from smelling food for 3 hours. Focusing most of the paper on the transcription of a single olfactory receptor, the authors demonstrate many molecular pathways across a variety of neurons that can cause many-fold changes in this receptor. There is some evidence that the levels of this single receptor can adjust behavior. I believe that the wealth of mostly very convincing data in this paper will be of interest to researchers who think about sensory habituation, but I think the authors' framing of the paper in terms of hunger is misleading.

      There is a lot to like about this paper, but I just cannot get over how off the framing is. Unless I am severely misunderstanding, the paper is about sensory habituation, but the word habituation is not used in the paper. Instead, we hear very often about hunger (6x), state (92x), and sensorimotor things (23x). This makes little sense to me. The worms are "fasted" (111x) for 3 hours, but most of the expression changes are reversed if the worms can smell, but not eat, the food. And I've heard about the fasted state, noting that worms don't eat more food after this type of "fasting". So what is with all of this hunger/state discussion?

      We think that the most straightforward interpretation of our data is that both sensory experience and internal nutritional state modulate str-44 expression. However, we agree that in the previous manuscript draft there was a disproportionate emphasis on state (as compared to sensory experience). The revised manuscript corrects this. However, several results in the manuscript do suggest that state is important, so we have not removed this from the manuscript. The lines of evidence that suggest this are:

      (1) Animals exposed to inedible aztreonam-treated food show an increase in str-44 expression compared to animals exposed to untreated, ingestable food. Thus, food ingestion acts to suppress str-44 expression (Figure 1E).

      (2) Animals exposed to food odor in the absence of food show an intermediate level of str-44 expression between “on bacteria” and “off bacteria” controls (Figure 1E). This incomplete suppression suggests that food odors alone can not explain the suppression of str-44 expression in well-fed animals.

      (3) Animals that lack intestinal rict-1, a component of the TOR2 nutrient-sensing complex, show an increase in str-44 expression, which suggests that nutrient sensing in the intestine impacts str-44 expression (Figure 5).

      (4) When animals are off food, osmotic stress inhibits the upregulation of str-44 (Figure 1G), reduces the enhanced behavioral sensitivity to butyl acetate (Figure 2G), and reduces the enhanced AWA activity in response to food (Figure 3). This physiological stressor provides a competing state that also impacts str44 expression.

      We apologize for not adequately describing how three hours of fasting impacts C. elegans behavior in the initial submission. This is obviously a key piece of information and we have corrected this in the revised manuscript. [lines 68-70; 123-126] Regarding pharyngeal pumping rates, C. elegans typically exhibits pharyngeal pumping at a near-maximal rate on the OP50 laboratory diet even when well-fed.

      Consequently, even much longer starvation times will fail to induce more feeding under these conditions. However, many other feeding-related behaviors do change with three hours of fasting, such as velocity on and off food, turning rates, roaming/dwelling behavior on OP50 food, and sensitivity to odorants. Thus, three hours of fasting is sufficient to impact several food search behaviors.

      To more directly address whether sensory habituation in AWA alters str-44 expression, we performed an additional experiment. We exposed wild-type animals to the str-44 odorants butyl acetate or propyl acetate and measured str-44 expression. If habituation explains this effect (e.g. repeated exposure of an odorant reduces transcription/translation of the receptor), we would expect that exposure to these odorants would reduce str-44 expression in “off bacteria” animals. However, we observed no differences between odor-exposed animals and controls. [Figure 4-figure supplement 2B; lines 414-421]

      And the discussion of internal states is often naïve. In the second paragraph of the introduction, we are told that "Recent work has identified specific cell populations that can induce internal states", beginning with AgRP neurons, which have been known to control the hunger state in mammals for nearly 40 years |||(Clark J. T., Kalra P. S., Crowley W. R., Kalra S. P. (1984). Neuropeptide Y and human pancreatic polypeptide stimulate feeding behavior in rats. Endocrinology 115 427-429. Hahn T. M., Breininger J. F., Baskin D. G., Schwartz M. W. (1998). Coexpression of Agrp and NPY in fasting-activated hypothalamic neurons. Nat. Neurosci. 1 271-272). Instead, the authors cite three papers from 2015, whose major contribution was to show that AgRP activity surprisingly decreases when animals encounter food. These papers absolutely did not identify AgRP neurons as inducing internal states or driving behavioral changes typical of hunger (Aponte, Y., Atasoy, D., and Sternson, S. M. (2011). AGRP neurons are sufficient to orchestrate feeding behavior rapidly and without training. Nat. Neurosci. 14, 351-355. doi: 10.1038/nn.2739; Krashes, M. J., Koda, S., Ye, C., Rogan, S. C., Adams, A. C., Cusher, D. S., et al. (2011). Rapid, reversible activation of AgRP neurons drives feeding behavior in mice. J. Clin. Invest. 121, 1424-1428. Doi: 10.1172/jci46229). Nor did Will Allen's work in Karl Deisseroth's lab discover neurons that drive thirst behaviors.

      We agree that this introductory paragraph did not do justice to the literature and improperly cited only relatively recent work. We have addressed this oversight. [lines 48-53]

      Later in the same paragraph, we hear that: "However, animals can exhibit more than one state at a time, like hunger, stress, or aggression. Therefore, the sensorimotor pathways that implement specific motivated behaviors, such as approach or avoidance of a sensory cue, must integrate information about multiple states to adaptively control behavior." This is undoubtedly true, but it's not clear what it has to do with any of the data in this paper - I don't even think this is really about hunger, much less the interaction between hunger and other drives.

      To summarize: I think the authors could give the writing of the paper a serious rethink. I want to stay far away from telling people how to write their papers, so if the authors insist on framing this obviously sensory paper as being about hunger and sensorimotor circuitry I think they should at least explain to their readers why they are doing that in light of the evidence against it (and I think they should state clearly that worms don't actually eat more in this fasted state).

      Please see the comments above that address these concerns.

      I was also surprised by how unsurprised the authors seemed by the incredibly widespread changes they observed after 3 hours away from food. Over 1400 genes change at least 4-fold? That seems like a lot to me. But the authors, maybe for narrative reasons, only comment on how many of them are GPCRs (16.5%, which isn't that much of an overrepresentation compared to 8.5% in the whole genome). For me, these widespread and strong changes are much of the takeaway from this paper. But it does make you wonder how important the activity of one particular GPCR (selected more or less randomly) could be to the changes the worm undergoes when it can't smell food.

      We agree with the reviewer that given the widespread gene expression changes in fasted animals, the changes in AWA are only a small part of the picture. We have added a discussion of this to the revised manuscript. In addition, we provide some discussion of how our gene expression profiling results relate to others in the field. For example, animals that lack the fasting-responsive transcription factor DAF-16 have been shown to have >3,000 genes differentially expressed relative to controls (Kaletsky, Lakhina et al., 2016). Given the large number of genes changing in those data and in our data, it is possible that transcriptional changes are extremely widespread during fasting. [lines 588-593]

      str-44 is very convincingly upregulated when worms can't smell food, but it's clear from the data that this upregulation has very little to do with the actual lack of eating, and more with the lack of being able to sense bacteria for 3 hours. In Figure 1E, when worms are fasted, but in the presence of bacteria, receptor levels are largely unchanged (there are 5 outliers, out of ~50 samples). Since receptor expression doesn't change in this case even though the worms are in the fasted state, it cannot be "state-dependent" - unless the state is not having smelled food for the last 3 hours. And, in my opinion, that would divorce the word "state" from its ordinary meaning.

      We have more closely examined that dataset, but we don’t feel that it would be accurate to say that the aztreonam (inedible) condition matches the fed. The highest points in the aztreonam-treated condition are most visible on the plot, but the effect is driven by the bulk of the data. Even if we remove the top 5 datapoints from the aztreonam condition, the effect is still statistically significant. Moreover, we performed this experiment over multiple days and the effect was present on each day. However, the reviewer’s point is well taken that sensory experience is equally (if not more) important for str-44 regulation and the text of the initial manuscript did not properly reflect this. As described above, we have modified the revised manuscript so that it is more balanced.

      The authors argue that str-44 expression modulates food-seeking behavior in fasted worms by causing them to preferentially seek out butyl and propyl acetate. However, the behavioral data to back this up has me a little worried. For example, take Figures 2F and 2G. They are the exact same experiment: comparing how many worms choose 1:10,000 butyl acetate compared to ethanol when the worms are either fasted or fed. In the first experiment (2F), ~70% chose butyl acetate for fasted worms and ~60% for fed worms. But in the replicate, ~60% choose butyl acetate for fasted worms and ~50% for fed worms. A 10% variability in baseline behavior is fine (but not what I would call a huge state change), but when the difference between conditions is the same size as baseline variability I start to disbelieve. Can the authors explain this variability? Or am I misunderstanding?

      We and others often observe large variance in C. elegans chemotaxis behavior over time because of small changes in environmental variables such as temperature, humidity, and pressure, so it is standard to always run wild-type controls together with all experimental groups and compare within day. The experiment in Figure 2F was conducted before the others in Figure 2G and Figure 4F. However, we remain highly confident in this result – we observed a difference in fed vs starved every time that we ran this experiment, which (in sum total for wild-type) was on 6 different days, with at least 3 plates per day (40-200 worms per plate).

      And I'll say it just one last time, I think the authors are overselling their results...or at least the str-44 and AWA results (they are dramatically underselling the results that show the widespread changes in the expression level of 10% of the genome in response to not smelling food for 3 hours):

      "Our results reveal how diverse external and internal cues... converge at a single node in the C. elegans nervous system to allow for an adaptive sensorimotor response that reflects a complete integration of the animal's states."

      This implies that str-44 expression AWA is the determinant of whether a worm will act fasted or fed. I have already expressed why I don't believe this is the case (inedible bacteria experiment, Figure 1E), but just because things like osmotic stress suppress the upregulation of str-44, that doesn't mean that it is the site of convergence. It could be any of the other 1400 genes that changed 4+ fold with bacterial deprivation. And even in terms of the actual AWA neuron, it was chosen because it showed modest upregulation of chemoreceptors (1.8 fold compared to ~1.5 fold in ASE and ASG), even though chemoreceptors were highly upregulated in other neurons as well.

      We agree that AWA chemoreceptors alone are unlikely to explain all of the behavioral changes observed in an animal that has been removed from food, and we certainly did not intend to imply that str-44 expression in AWA is the central determinant of whether the animal acts as though it is fasted or fed. Rather, we have shown that str-44 expression can explain some of these behavioral changes. We have added language throughout the manuscript to indicate that we expect other fasting-regulated genes to be of importance. See also: response to Essential Revision #1.

      Overall, and despite my critiques (and possibly tone), I really like this paper and think there really is a lot of interesting data in there.

    1. Author Response:

      Reviewer #1 (Public Review):

      Here, the authors used multiple F1 crosses and the resulting embryonic fibroblasts to perform molecular profiling with ATAC-seq and a combination of ChIP-seq, Hi-ChIP, and CUT&RUN on multiple modified histones and transcription factors proteins. The resulting data are a good resource for quantifying allelic bias in protein-DNA binding and chromatin accessibility.

      The authors claim there's "enrichment of SNPs/indels within a 150 bp window" in enhancers (Fig. 2H), but this enrichment looks quite middling. Can they quantify the level of enrichment and is it significant?

      We have added a quantification of the enrichment of SNPs in the allele-specific enhancers compared to shared enhancers (Lines 1382-1385). The average number of SNPs within central 150 bp of enhancers is:

      4.468 for enhancers with allele-specific H3K27ac levels. 3.203 for enhancers with shared H3K27ac levels. For these shared enhancers, we subsampled the shared sites to generate a set with an identical distribution of H3K27ac levels to that observed on the active allele of the allele-specific set. This helps to control for potential differences in mappability of each allele given that the allele-specific set has more SNPs, on average, and SNPs are necessary to identify allele-specific reads. (discussed in Lines 1261-1264).

      This enrichment is also clearly significant (p-value < 2.2 x 10-16, Pearson’s Chi-squared test). We have added this information to the corresponding figure legend in the revised manuscript (Lines 1381-1382).

    1. Author Response

      Reviewer #3 (Public Review):

      This work addressed some of the limitations in the production of the CVS-N2c strain of the rabies virus. CVS-N2c exhibits lower cytotoxicity and more efficient transsynaptic spread than the more widely used SAD-B19 strain, but its use as a circuit tracing tool has been held back by its slow packaging process and low resultant titers. By demonstrating that rabies packaging cell lines do not affect retrograde labelling efficiency and by creating a pseudotyping cell line that can amplify pseudotyped virus from a small amount of starting material, Sumser et. al have achieved an improvement in the speed, titer, and native coat contamination of CVS-N2c preparations whilst generating a new set of viral vectors that will help to implement a range of circuit mapping tools.

      While many of the results from N2c evaluation experiments shown here (including bicistronic rabies usage, time courses of functional characterisation with GCaMP/channelrhodopsin, Cre-OFF labelling) have been previously demonstrated in other N2c and SAD-B19 rabies studies, the suite of vectors described in the manuscript will serve as a useful resource for the community. However, some key aspects of these vectors, specifically the propensity for the starter AAV for off-target labelling, are not characterized.

      We thank the reviewer for his / her positive comments (“improvement”, “useful resource”).

      1) The six DIO AAV vectors described here, and the Flp-dependent AAV-FRT-EF1a-TVA-2A-N2cG do not contain recombinase leak prevention mechanisms such as the "ATG-out" approach, where the initiating codon and Kozak sequence are moved outside of the recombinase recognition sites to reduce inverted ORF expression. Even with these measures in place, DIO constructs are prone to recombinase-independent reversion, proposed to occur during AAV production from spontaneous recombination (Fischer et al., 2019). This presents an issue for the sensitive TVA/EnvA system, where only a small amount of TVA expression can mediate off-target rabies infection in non-Cre expressing cells. The dilution of AAV vectors can have a strong effect on the amount of non-specific labelling (Lavin et al., 2020). As the bicistronic TVA-N2cG vectors used here do not allow for individual dilutions of TVA with respect to N2cG, which is required at higher expression levels for efficient transsynaptic spread, it is especially important to test these vectors for leak expression. A sensitive test for starter leak would be to inject the AAV and rabies virus in WT mice.

      We are aware of these potential issues with the AAV targeting system. We can confirm that in our hands, all vectors have been thoroughly vetted, and no leak has been found for any. For each new viral batch produced (37 in total) we have conducted experiments in which pseudotyped particles at work concentrations (1–5 × 108 TU/ml) were injected into naïve brains and have found only minimal non-specific labeling (less than 1–2 cells per 5–10 100-µm-thick sections examined).. We have added a sentence to the Materials and Methods section (p. 5 of the revised manuscript).

      Furthermore, we have tested the effects of extremely high viral titers are (100–500-fold higher than the titers used throughout the manuscript). We found that even with these extremely high virus concentrations, contamination was minimal. Furthermore, we found that the negligible contamination in these experiments likely arises from direct penetration of pseudotyped particles into damaged cells and cell processes along the needle tract. We have added a new Figure 1 – figure supplement 2 and added a sentence on p. 5 of the revised manuscript.

      We agree that the system might benefit from additional safety measures. However, these potential safety measures could also introduce new problems. In finding the right balance, one should take into account that such leak events are rare, and their random occurrence should make it possible to distinguish them from the main effects by replication.

      2) The manuscript reports the use of a bicistronic N-P system to express the optogenetic actuator ChIEF together with a fluorescence protein. While the results of the bicistronic experiments show that both proteins are successfully expressed, control experiments using other expression strategies would strengthen the claim that the bicistronic N-P system is superior.

      In our work, we compare the activation effectiveness of the optogenetic actuator to previously published results from two separate manuscripts (Reardon et al., 2016; Osakada et al., 2011), using comparable light intensity. Our results demonstrate that our vectors achieve higher AP success rates with substantially shorter light pulses. While we have not conducted a direct comparison, we think that the improvement presented in the bicistronic vectors is sufficiently substantiated, and the logic behind this improvement sufficiently sound. Since this is a lateral point in our manuscript but a relevant information for the community, we believe that this point is worth mentioning even without a direct comparison.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors used single cell RNA-seq to assess the heterogeneity of megakaryocytes, thereby identifying a distinct CXCR4 high subpopulation that was also enriched in inflammatory genes and other chemokines or cytokines. They sort CXCR4 high cells and are able to investigate specific functional properties of this megakaryocyte population. This work complements prior studies which have suggested immune modulatory roles for certain megakaryocyte subsets such as the work of Pariser and colleagues (JCI 2021) on the antigen presentation capacity of lung megakaryocytes or the work by Liu et al (Advanced Science 2021) on immune surveillance gene expression in megakaryocytes (MKs).

      The strengths of the paper are:

      1) Analysis of scRNA-seq to identify MK subsets with validation

      2) The use of sorted CXCR4 cells to interrogate the specific in vitro functions of this immune modulatory subset (using CXCR4 low MKs as a comparison) such as phagocytosis assays

      3) Elegant use of the PF4-Cre DTR model to ablate MKs while replenishing CXCR4 high cells as a means to assess functional effects of this subset in vivo which is a reasonable approach in the absence of a Cre that would specifically delete this subset.

      We appreciate the positive feedback from this reviewer.

      Potential weaknesses are:

      1) The unclear distinction between previously identified immune modulatory MK subsets such as the lung MKs which have antigen-processing capacity (Pariser et al) and the currently identified MK5 subset. The authors indicate that the MK5 subset has transcriptomic similarities to the previously described antigen-processing MK subset but this does not explain whether MK5 and/or CXCR4 high subset is indeed the primary. This is an important question because it would help address whether the immune modulatory roles are all concentrated in one MK subset or whether different MK subsets may play distinct roles in innate and adaptive immunity. For example, in Fig 3, there is a broad claim that MKs can modulate innate and adaptive immunity but it is not clear whether this claim is valid only for the specific MK5/CXCR4 subset.

      We totally agree with this argument. Our revised data showed that CXCR4high MKs, but not CXCR4low MKs, were able to phagocytose bacteria (Revised Fig 3F), process and present ovalbumin (OVA) antigens on their cell surface (Revised Fig 3G) to activate CD8+ OT-I T cells (Revised Fig 3H) and B3Z T cells (Revised Fig 3-S2), a T cell hybridoma which expresses TCR that specifically recognizes OVA. These revised data showed that CXCR4high MKs are an antigen processing and presentation subset in MKs.

      2) It would be helpful to understand whether the CXCR4 status of MKs can change over time. Are the CXCR4 high cells generated in infection (Fig 5) generated by the conversion of CXCR4 low cells (or non MK5 cells)? Or do CXCR4 high / MK5 cells differentiate from MK progenitors directly?

      Thanks for the suggested experiment. Our revised data showed that inflammatory treatment, including interferon γ, LPS, and L. monocytogenes could not increase CXCR4 expression in CXCR4low MKs (Revised Fig 4H and Fig 4-S4D). This experiment suggested that CXCR4high MKs might not be reprogramed from CXCR4low MKs. Furthermore, our HSPC tracing experiment showed that CXCR4high MKs were generated from HSPCs as efficiently as CXCR4low MKs during the acute inflammation-induced emergency megakaryopoiesis (Revised Fig 5E-G).

      Reviewer #2 (Public Review):

      Wang J. et al. examines bone marrow megakaryocyte (MK) heterogeneity, and the role that a specific subpopulation plays in the mouse immune response to Listeria monocytogenes infection. Using single cell RNA-sequencing (scRNAseq) the authors identified a bone marrow MK subpopulation, characterized by high CXCR4 expression. This subset referred to as MK-derived immune-stimulating cell (MDIC) population has immune-stimulatory properties and supports the migration and activation of innate immune cells potentially via TNFα and IL-6 secretion.

      In agreement with recent studies mapping in situ myelopoiesis which occurs near bone marrow sinusoidal vessels upon acute inflammatory stress with L. monocytogenes (Zhang J. et al Nature 2021), the authors observed a significant association of myeloid cells with perivascular CXCR4high MK but not with the more abundant CXCR4low MK subset. This study also revealed that MK in vivo deletion leads to a significant increase in the bacterial load in extramedullary hematopoietic organs accompanied by a reduction in the number of myeloid cells, although it is unclear if a similar MDIC population exists outside the bone marrow. Accordingly, it is unclear the effect of MK depletion in the context of L. monocytogenes infection in bone marrow myelopoiesis.

      Notably, in a rescue experiment, MDIC infusion was able to partially rescue the bacterial clearance defect in MK depleted and infected mice, further confirming the important role of MDICs in regulating bacterial immune responses.

      Using Pf4-cre reporter mice the authors further evaluated the capacity of bone marrow MDIC to enter circulation and migrate into organs upon bacterial infection potentially in response to an increase in CXCL12 expression in extramedullary organs. Finally, in agreement with recent studies (Haas S. et al Cell Stem Cell 2015), Wang et al. discovered that upon inflammatory stress, emergency hematopoietic stem cell-derived megakaryopoiesis is activated to restore platelets lost upon inflammation-induced thrombocytopenia but also to regulate immune response to bacterial infection.

      Overall, this study builds on recently published work regarding MK heterogeneity which technically is very challenging to investigate. Although it's suggested that MDIC greatly overlap with the recently described CD53+LSP1+ MK immune population (Sun S. et al Blood 2021), it is still unclear the extent to which these subsets overlap, accordingly, it's still unclear the relationship between bone marrow MDIC and previously described lung MK subsets, though to be enriched in immune function. Nevertheless, the authors performed a detailed characterization of bone marrow MDIC in homeostasis and in acute inflammatory stress, providing new evidence and mechanistic clues on the mechanisms by which MK subsets regulate immune function to bacterial infection.

      While this manuscript has many strengths, some of the author's conclusions and claims require further technical support and discussion. In particular:

      1) The potential mechanism via TNFα and IL-6 secretion is very interesting, however further data is necessary to support the author's claim. First, it's unclear if steady-state MDIC MK express TNFα and IL-6. If so, does this expression change upon infection?

      MDIC MKs (now referred to as CXCR4high MKs) expressed TNFα and IL-6 during the steady state, and maintained their expression levels upon L. monocytogenes infection (Revised Fig 2J).

      Second, mechanistically it would be important to evaluate or at least discuss how MDIC sense bacterial infection and respond by secreting TNFα and IL-6.

      Thanks for the suggestion. In this revision, we have included a brief discussion about previous studies that reported that MKs express multiple inflammation signals, which enable MKs to sense inflammation signals and express cytokines, as “MKs were reported to express multiple inflammation receptors, such as Fcγ receptors (Markovic et al., Br J Haematol 1995), Toll-like receptors (Beaulieu et al., Blood 2011; Ward et al., Thromb Haemost 2005), interleukin receptors (Navarro et al., J Thromb Haemost 1991; Yang et al., Br J Haematol 2000), and IFN receptors (Negrotto et al., J Thromb Haemost 2011), which might enable MKs to receive inflammation signals and express cytokines.” (Line 15-19, Page 13).

      Third, in Fig 2L and 2M it's missing a control for the effect of anti-TNFα and anti-IL-6 on phagocytes activity in the absence of MKs.

      Thanks for the suggested control. In this revision, we have confirmed the phagocytosis activity of immune cells by flow cytometry assays as suggested by this reviewer, in which we included the anti-TNFα and anti-IL-6 controls in the absence of MKs (Revised Fig 2M, N). Our revised data consistently showed that CXCR4high MKs enhanced the phagocytosis activity of neutrophils and macrophages through a TNFα and IL-6 dependent manner.

      Fourth, in Fig 2J and 2K it's unusual to evaluate TNFα and IL-6 levels by imaging.

      We agree with the argument. In this revision, we have further evaluated the expression of TNFα and IL-6 by flow cytometry, which consistently showed that CXCR4high MKs had higher expression levels of TNFα and IL-6 than CXCR4low MKs (Revised Fig 2J).

      2) The authors further explored the potential role of MKs in regulating adaptive immune function against bacterial infection, however these studies were very superficial and further studies are needed to substantiate this claim.

      We totally agree with this argument. In this revision, we have deleted the claim that MKs regulate adaptive immune function. Furthermore, Our revised data showed that CXCR4high MKs were able to phagocytose bacteria (Revised Fig 3F), and process and present ovalbumin (OVA) antigens on their cell surface (Revised Fig 3G) to activate CD8+ OT-I T cells (Revised Fig 3H) and B3Z T cells (Revised Fig 3-S2), a T cell hybridoma which expresses TCR that specifically recognizes OVA. These revised data suggested that CXCR4high MKs had antigen processing and antigen presentation capacity, which suggested that CXCR4high MKs might contribute to the regulation of adaptive immune function. We have included a brief discussion (Line 2-5, Page 14).

      3) Overall, the study relies heavily on subjective imaging quantification. The identification of CXCR4high and low MK subsets does not seem entirely objective and it is prone to inaccuracies due to the technical difficulty of bone imaging. The usage of other surface marker(s) for the MDIC subset would significantly improve the study. Accordingly, many of the experiments should be accompanied and/or replaced by flow cytometry analyses such as the phagocytosis experiments in Fig 2; quantification of MKs in Fig 4 H, I and N.

      We totally agree with this argument, and we have discussed that additional markers are warranted to further enrich CXCR4high MKs (Line 5-9 Page 14). Furthermore, we have further confirmed our imaging quantifications by flow cytometry, such as the bacterial phagocytosis ability of immune cells and CXCR4high MKs (Revised Fig 2M, N, Fig 2-S2A, B and Fig 3F) and the number of Tomato+ CXCR4high MKs in the liver, spleen, and lung (Revised Fig 4I, O and Fig 4-S4I).

      4) Regarding MK-deletion experiments, studies from the Passegue lab have shown that this will cause persistent bone marrow myeloid granulocyte/macrophage progenitor (GMP) formation during 5FU stress, most likely due to the reduction in the levels of PF4 and TGFb1 and the effect on hematopoietic stem cells. What happens to bone marrow myelopoiesis upon MK-deletion and bacterial infection? The authors describe a significant reduction in the liver and spleen but it's unclear the effect on the bone marrow. It would be helpful to discuss this point.

      Our revised results showed MK ablation increased the number of hematopoietic stem and progenitor cells and myelopoiesis in the bone marrow upon infection (Revised Fig 3-S1A-D). However, myeloid cells were reduced in the liver and spleen after MK ablation and bacterial infection (Revised Fig 3D-E). This further suggested the important role of CXCR4high MKs in promoting the migration and function of myeloid cells. We have included a brief discussion on this point (Line 10-14, Page 14).

      Reviewer #3 (Public Review):

      Overall this is an interesting study that adds significant knowledge to our understanding and characterization of Mks as immune cells. The identification of CXCR4hi Mks as immune regulatory cells is potentially important, particularly in the bacteria model used in this study.

      We appreciate the positive feedback of this reviewer.

      At this stage, the authors have however made a number of conclusions not yet supported by the data. In particularly differentiating the role of Mks versus the platelets they produce is not clear, so many conclusions about MDIC in immune responses need to be better supported and differentiated from platelet functions.

      We agree with this argument. We cannot exclude the role of platelets in immune responses. Our revised data showed that CXCR4high MKs produced fewer platelets (Revised Fig 1-S6D) but had more robust abilities in phagocytosis and antigen processing and presentation (Revised Fig 3F-H and Fig 3-S2), and stimulating innate immune cells by secreting cytokines (Revised Fig 2E-N and Fig 2-S2) than CXCR4low MKs. Furthermore, infusion with CXCR4high MKs, but not CXCR4low MKs, partially rescued the host-defense responses in MK ablated mice, which further supported the role of CXCR4high MKs in immune responses. However, the infusion rescue experiment with CXCR4high MKs did not fully rescue the host-defense responses in MK ablated mice (Revised Fig 3K-L). This is partially due to the reduced platelets in MK ablated mice as platelets are known for immune responses. We have discussed this possibility in the current version (Line 16-17, Page 9).

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript addresses the role of the p75NTR neurotrophin receptor in the development of cerebellar granule precursor cells (CGPs). This cell type is notable for having high levels of p75NTR expression in a discrete developmental window yet the specific role of the receptor in this setting has remained obscure.

      The authors show that although p75NTR expression correlates with the CGP proliferative state, expression of p75NTR is not required to maintain the proliferative state. Rather, migration CGPs in culture and within cerebellar slices is optimal only when p75NTR levels are reduced and the authors conclude that the expression of p75NTR normally reduces CGP migration. They examine signalling mechanisms that lie downstream of p75NTR to elicit this effect and show that RhoA, previously shown to be activated by p75NTR, is required to block CGP migration, that RhoA activity is lower in p75NTR-/- CGPs than in wild-type counterparts, and that RhoA inhibitors enable CGP migration, even in cells overexpressing p75NTR.

      This is an important study that uses a combination of descriptive methods and chemical and genetic gain- and loss- function approaches to demonstrate that a p75NTR-RhoA signaling pathway normally functions to limit CGP migration during development. The paper is logical and well written and the data presentation is excellent.

      Some points to consider:

      Figure 2A introduces the CGP cultures and shows that p75NTR levels are high in cells exposed to SHH. However, these results are difficult to interpret in the absence of controls showing p75NTR levels at the time of plating - does the SHH exposure increase p75NTR expression? Or prevent its decrease?

      We agree with the question raised by the reviewer, and we have done this experiment. In these results, we observed an increase in p75NTR expression after 24h or 48 h of Shh exposure compared with the levels of the receptor at the time of plating. However, there are a few caveats to the interpretation of these results, that make it difficult to establish whether Shh increases or prevents the decrease of the receptor:

      1. When quantifying the levels of p75NTR in vivo, we obtained a granule cell population that includes proliferating and differentiated granule cells, this mixed population of cells is present at the time of plating. When establishing primary culture, a large percentage of the cells do not survive, and the majority of dying cells would be differentiated cells, therefore introducing a bias toward proliferating cells for the 24 and 48 h in vitro from the time of plating. The proportion of proliferating/differentiated cells would be different between the in vivo and the in vitro after 24 or 48h.

      2. The concentration of the mitogen most likely would be very different between the cells in vivo (the time of plating) and the exposure to Shh in vitro, introducing a second bias. It might be that the increase in p75NTR is a consequence of more cells proliferating since they respond to higher concentration of the mitogen.

      3. We know Shh induces proliferation in CGN, and this is accompanied by an increase in p75NTR. Therefore, the increase of p75NTR might be due to more proliferating cells, but not necessarily an induction of the expression of the receptor.

      I recognize the convenience of using the p75NTR-GFP construct to track migration but was surprised that the potential confounds of this approach were not examined or even mentioned. Does p75NTR-GFP activate RhoA more or less than the wild-type receptor? What experiments have been performed to ensure that this construct is an effective mimic of the wild-type receptor? Would it be possible to co-transfect p75NTR and GFP as an alternative approach?

      The p75NTR/GFP construct has been used in the field for an extensive period and the biology of the construct has been well characterized (i.e. cellular localization and translocation, activation and signaling). Co-transfecting the slices with p75NTR and GFP with separate construct doesn’t necessarily mean that each cell receives both constructs, which is not the case using a fusion protein. Although we cannot completely rule out the possibility of subtle intracellular differences between the endogenous p75NTR and the construct, we are confident that this result is supported by the other experiments in the manuscript (the conditional deletion of p75, results with the p75NTR -/-, and the use of inhibitors) Paper showing that the p75-GFP construct is correctly sorted https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2139957/

      The authors discuss previous findings that indicate that p75NTR can play a pro-migratory role but oddly do not place their results in other contexts where p75NTR has been shown to block migration. CGPs have been quite widely used to dissect the role of p75NTR in the Rho-dependent migration blockade induced by MAG and other myelin components and interesting insights on receptor components (e.g. LINGO1) and signalling mechanisms (e.g. RhoA) that mediate these effects. The results reported here should be discussed in the context of these previous findings.

      We agree with the reviewer and we will discuss our findings in this context.

    1. Author Response

      Reviewer #1: (Public Review):

      “My main quibble is with the framing. There are many places throughout the manuscript where the authors claim that there is a great deal of controversy about the extent of the branching of these neurons.”

      We agree with our reviewer that some of our statements were misleading. Thus, we have rephrased the Introduction (Pages 3-5; Lines 54-99), Results (Page 20; Lines 356-363) and Discussion (Pages 26-32; lines 472-478, 492-498, 525, 530532, 551-578, 586, 596-602) sections to focus on the controversial issues of the simultaneous projection to NAc and VTA by the same prefrontal cortical population.

      Reviewer #2 (Public Review):

      “The scope of this work is somewhat smaller than the recent reconstruction and molecular subtyping of ~6300 neurons performed by others: a comprehensive paper on this very topic ("Single-neuron projectome of mouse prefrontal cortex", Nat Neurosci 25:515”

      The Gao et al. (2022) study published during our reviewing process, indeed, confirms some of our findings. A common finding of ours and Gao et al (2022) is that mPFCNAc and mPFCVTA neurons form distinct classes within the mPFC projecting neuronal population. In addition, there is a small PT-like mPFC population, located rather in the L5b (showing RBP4 expression in our study), which sends branching axons to both NAc and VTA.

      Still, we think that the novelty of our work has remained significant.

      1) We have provided easy-to-use, widely available molecular approaches to investigate mPFC territories and laminar organization. Using the detailed expression pattern of neurochemical markers revealed via multiple immunohistochemical technique and confocal microscopy, we were able to delineate borders between mPFC regions and layers with considerably high precision. For this purpose, mostly brain atlases are used. However, in a cortical region, like the mPFC, territory borders and shapes, as well as laminar thickness and depth are greatly changing at antero-posterior as well as dorso-ventral axes. Therefore, experiment-to-experiment, ‘stable’ markers are necessary to identify the exact location of neurons, recording sites, optic fibre positions, etc; that we, in our opinion, provided in the present study.

      2) Using the presented direct molecular composition, we have identified genetic markers for selective examination of layer (and at some extent, region) -specific mPFCNAc and mPFCVTA populations.

      3) We have also provided evidence for the utility of this characterization using Cre mouse lines. The use of Calb1-, Rbp4-, Ntsr1- and FoxP2-Cre, which strains are widely used in cortical studies, in an intersectional approach, allow scientist to selectively investigate each of these mPFC populations, even in a target-selective manner via an intersectional approach.

      Major points:

      ”…But what I think is lacking here is a characterization in a region-by-region manner of the laminar organization of the cell types you either identify by retrograde label (CAV-Cre anatomy, for example) or by molecular approaches (how the lamination of Ntsr1+ neurons vary between the areas you lump together here as PFC).”

      ”…I think this subdivision might help by defining these areas in stereotaxic coordinates and giving some idea of how defined cell types (defined by Cre driver or retrograde label or other marker) might vary in their laminar distribution across these areas. Maybe I am wrong, but my perception of Fig 1 and 2 is mainly that the laminar pattern of cortical labeling from VTA and NAc varies somewhat depending on where you assess it in cortex...”

      We have plotted the location of the retrogradely labeled mPFCNAc and mPFCVTA cells which clearly shows their characteristic laminar and regional distributions. These panels are added to Figure 1-2.

      The lamination of NTSR1 neurons is remarkable, indeed. As it can also be seen on the GENSAT website (http://www.gensat.org/imagenavigator.jsp?imageID=48699), the L6 of PrL, IL and the more ventral regions lack NTSR1-expressing cells, while L6 of the cingulate and motor cortices contain a moderate density of this cell type.<br /> In our experiment, in which we aimed to target Prl-IL-MO (as majority of the retrogradely labeled mPFCNAc and mPFCVTA cells were located there), we could not detect any viral labeling in any of the layers ventral to PrL cortex (Figure 3). The labeling in the cingulate cortex is missing, probably, due to the lack of AVV diffusion into its deep layer, L6. However, as it can be seen in Figure 3–figure supplement 2H, the NTSR1-expressing the L6 neurons in M1 are also present in our trials. Altogether, it means that NTSR1 is only present in L5a of PrL cortex among the cortical regions which contain mesolimbic-projecting neurons. The shift in the distribution of Ntsr1expressing cells is present between PrL and Cg cortices.

    1. Author Response

      Reviewer #1 (Public Review):

      Marchetti and colleagues present an ex vivo culture method that enables live imaging studies of Drosophila adult midguts for periods up to 3 days. Important technical innovations include defining an optimized tissue culture media, placing midguts at an air-liquid interface for better oxygenation of the tissue, performing inducible gene expression while imaging, and performing multiplexed imaging of up to 12 midguts in a single culture. Using this ex vivo method, the authors show that midgut progenitor cells proliferate and differentiate in response to epithelial damage ex vivo. The authors exogenously activate the cell cycle, which enables them to trace and analyze the division behaviors of multiple generations of stem cell progeny, including the distances between sibling cell nuclei over time. Finally, proof-of-principle imaging of adult renal tubules and egg chambers is provided.

      Altogether, this ex vivo method offers important new advantages that will attract broad interest in the Drosophila community. My main concerns are (1) midgut viability when imaging is performed with a confocal microscope and (2) uncertainty about the authors' classification of asymmetric and symmetric fate outcomes.

      Major comments:

      1) The ability to perform continuous imaging over periods as long as 72 hours is a significant achievement. As the authors point out, this time scale is several fold longer than the 16 hour viability window of in vivo imaging (Martin 2018). The multiday timescale is important because it could, in principle, enable live imaging of many fundamental, but slow, cellular behaviors such as enterocyte differentiation.

      If I understand correctly, the multi-day experiments in the study were captured using imaging conditions that were particularly gentle: (1) widefield epifluorescence, (2) an interval of 1 hour between time points, and (3) at most two fluorescent channels. By comparison, the 16 hour viability window defined in Martin 2018 was based on confocal imaging at 7-15 minute intervals in 3-4 channels. Can the authors provide information on midgut viability when their culture method is combined with confocal microscopy at minutes-long time intervals? This information is important to help users assess the types of questions that ex vivo culture can address given the widespread use of confocal imaging and the necessity of subcellular spatial or better temporal resolution for certain types of questions.

      Viability during prolonged imaging is greatly dependent on the conditions used. If acquisition parameters are carefully set, intestines can be imaged using a confocal microscope with 15 minute intervals for a 48h interval. A movie has been added to demonstrate this (Video 23), and it has been discussed on lines 951-953. We also tested our live-imaging protocol on a Zeiss Lattice Lightsheet 7 microscope and observed no visible signs of phototoxicity after a 48 hours session with two channels (nlsGFP and His2Av.mRFP) 60µm z-stacks captured at 15 minute intervals. In regards to phototoxicity, it is hard to compare the different microscopes we tested quantitatively. As phototoxicity depends on the amount of light shone on a sample, we would need to carefully quantify the amounts of light delivered to the samples to determine the optimal frame rates and durations of different imaging methods, and have haven’t done this. Of course, since widefield, confocal, and lightsheet microscopes image samples in very different ways, comparing imaging quality is also a challenge. Therefore, all we can say with confindence is that, in our experience, if imaging parameters are set to minimize light exposure, all three different microscope types are viable for long term live-imaging. We have added some comments about microscopes to the Results section, lines 139-145, which we think will be helpful to users.

      2) The real-time tracing of multiple stem cell lineages through up to three generations is an impressive first for the midgut. The lineage trees are fascinating to examine. However I am unsure that division fate outcomes can be classified as symmetric or asymmetric using the data that are shown.

      2a) Some divisions (9 of 25) were classified as asymmetric because exactly one sibling cell divided at a time point >6 hours before the end of the movie (lines 271-277). In my view, these fate outcomes are ambiguous because it cannot be excluded that the other sibling is a stem cell that would have divided after the movie ended. Although 6/7 sibling pairs that the authors observed exhibited temporally correlated divisions, failure to observe temporally correlated divisions is not a basis for concluding that sibling fates are asymmetric.

      2b) Other divisions (8 of 25) were classified as asymmetric based on both the criteria in 2a and the observation that the non-dividing sibling showed increased nuclear size and decreased GFP intensity (lines 277-279). I agree with these criteria, but to my eye, the images in Fig 6 and Video 16 do not clearly show these changes. The nuclear size of the non-dividing sibling in panel G is not significantly different from the (presumably 4N) nuclei of the symmetrically dividing siblings in panel E. The GFP signal of the non-dividing sibling diminishes at the end of the movie, but without the His2Av::mRFP channel, I cannot tell whether the cell has lost GFP or, alternatively, has disappeared from view.

      2c) The midguts lack markers to distinguish enteroblasts, enteroendocrine precursor cells, and stem cells. Without these, several types of fate outcomes are indiscernible: Asymmetric divisions that produce a stem cell and an enteroendocrine precursor (which remains diploid and can divide again), symmetric divisions (of enteroendocrine precursors) that produce two enteroendocrine cells (c.f. Chen 2018), and symmetric divisions (of stem cells) that produce two enteroblasts (c.f. de Navascues 2010, Guisoni 2017). Additionally, Kolhmaier 2015 has shown that stg/cycE manipulation results in divisions of SuH+ enteroblasts; these enteroblast divisions cannot be distinguished from stem cell divisions in the movies.

      Can the authors either provide additional data that resolves the ambiguity of these fate classifications, or, alternatively, revise the text to describe these data in terms of division timing, displacement, and the other cell behaviors that are observed? In the latter case, speculation about fate outcomes could be added to the discussion.

      As we were unable to collect additional data that could distinguish cell fates in our live lineage analyses, we have revised the text to describe division events based on the daughter cells’ actual behavior rather than their presumed cell identities and fates. We now refer to sister cells in lineages as either “co-dividing” and “non-co-dividing”, and define these terms (lines 277-283 and 301-307).

      Reviewer #2 (Public Review):

      This work provides a new protocol for extended culture of Drosophila midguts ex vivo and live-imaging for up to three days. This paper reveals a significant improvement of explanted intestines survival compared to previous protocols by optimizing the dissection procedure; by modifying the culture medium so that it approximates the adult hemolymph and by fine-tuning the live-imaging setup. In addition, this new protocol allows temperature-sensitive gene expression or knock-down ex vivo. By successfully performing intestinal stem cell lineage tracing experiments and cell tracking over time, authors demonstrate the potential and the robustness of this system in understanding key intestinal processes such as stem cell proliferation and cell differentiation over time. Interestingly, preliminary results demonstrate the possible use of this protocol for extended culture of other organs and its implication in other areas of research.

      The relevance of the new protocol proposed by the authors in the improvement of the extended culture and live-imaging of intestines is well supported by the data. However, additional key control experiments would be needed to increase the confidence in using this protocol for the study and understanding of key intestinal processes.

      1) Authors tested the viability of explanted intestines over time by assessing different aspects such as the cell death or by testing the ability of cells to proliferate and differentiate. Adding control experiments to assess the state of the trachea and the visceral muscles, two major components of intestinal processes, would be needed.

      We have now added considerations about trachea (lines 96 and 112) and visceral muscle (lines 984-996) to the main text.

      2) Authors tested GFP expression at permissive temperature in explanted intestines using intestinal stem cells or enterocytes GAL4 driver and detected no differences compared to in vivo condition.

      Did the authors quantify the number of intestinal stem cells at different time points in explanted intestines? Did they see a difference compared to in vivo conditions?

      Same questions for other cell types?

      This analysis could be a good control to further validate the use of this system for the study and understanding of key intestinal processes.

      Since in undamaged intestines we did not observe cell death, the number of cells of any type remains constant in our cultured intestines. Therefore, cell composition is the same before (i.e. in vivo) and after (i.e. ex vivo) dissection. When using the esgTS>UAS-GFP line, we did not observe any difference in GFP+ cell numbers between intestines shifted to the permissive temperature in vivo or ex vivo.

      3) Authors tested the ability of explanted intestines to regenerate following intestinal damage induced by SDS feeding. SDS feeding results in stem cell proliferation and progenitors differentiation in explanted intestines. Adding control experiments comparing stem cell proliferation and cell differentiation upon control feeding or upon SDS treatment in explanted intestines versus in in vivo conditions would reinforce the use of this system.

      A new figure (Figure 3 – figure supplement 1) has been added to provide a SDS treatment in vivo comparison to our observation ex vivo. Results are discussed in the main text at lines 210-220.

    1. Author Response

      Reviewer #2 (Public Review):

      Ivica et al., conducted a series of electrophysiological and cryo-electron microscopy studies to investigate what differentiates partial agonist versus full agonist effects at the glycine receptor, a member of the cys-loop receptor superfamily. To this end, they used aminomethanesulfonic acid (AMS), a novel partial agonist that possesses efficacy intermediate between the high efficacy agonist glycine and the partial agonists beta-alanine and taurine. AMS was shown to possess a maximal channel open probability of 0.85, compared to 0.96 for glycine and only 0.12 for taurine. Cryo-EM structures of glycine receptors that had bound glycine, AMS, or taurine differed, with only glycine and AMS yielding a compact conformation that differed from that seen after taurine binding. This is thought to be partially responsible for the different efficacies of these ligands. This study was performed meticulously, with compelling evidence provided supporting the authors' primary hypothesis.

      The authors should consider defining what they mean by glycine being a "full agonist". In previous publications, they have argued that, since efficacy is a ratio of rates of transitions among different states of receptors, what anyone currently defines as a full agonist is in reality just the highest efficacy ligand discovered to date. There isn't any problem with the use of the term "full agonist" per se, since it is a concise way of comparing the high efficacy of glycine versus other ligands at the GlyR, but the reader would be served by having this clarified.

      This is correct. We have changed “full agonist” into “highly efficacious” wherever possible (all tracked) and added in the introduction that we shall refer to glycine as a full agonist (Line 54). In the interest of conciseness, we have tried to keep this as light-handed as possible. A complete explanation of what a full agonist is would have to include various caveats such as the fact that it depends on the receptor isoform and that glycine itself becomes partial in loss-of-function hyperekplexia mutants.

      Is there a qualitative rather than just a quantitative difference between high and low efficacy agonists at glycine receptors, in that only low efficacy compounds can interact with the loop B serine 174 residue and only high efficacy ligands yield compact binding pockets? In other words, should ligand efficacy be considered a continuum at the GlyR, or should it be considered more quantal in nature, with different agents occupying discrete categories? Explicitly addressing this issue would likely be of interest to the reader.

      The reviewer raises a good point. In this receptor, differences in the size of the binding pockets with agonists of different efficacy was relatively small (see our reply to point 1 of reviewer #1). The numbers have been added to the text on lines 239 and following.

      We did ask ourselves whether the interactions of the partial agonists with loop B could also make parts of the receptor more rigid, and thus reduce efficacy in a quantal fashion, but this is highly speculative at the moment. Answering this question will require the availability of more structures showing the receptor bound to other agonists, and possibly MD.

    1. Author Response

      Reviewer #1 (Public Review):

      a) A "hidden gem" in the work is an exploration of whether lamotrigine directly enhances HCN function and finding it did not. While an important negative result, this was not demonstrated in native tissue, leaving the question open regarding direct effects on the native channel in neurons.

      The point is well taken, and we have added this caveat in the relevant section (page 17).

      b) One weakness of the study is the data from the set of experiments exploring impact of overexpression of the variants in neurons. This technique can be highly variable and the data interpretation in this case would benefit from more rigor.

      It is indeed very difficult to rigorously compare expression patterns obtained using different viruses. To address the reviewer’s concerns, we carried out the following additional experiments and analyses:

      i. We repeated the viral injection experiments using two different AAV serotypes for each series (HA-WT, HA-GD, and HA-MI in AAV2/8; HA-WT, HA-GD, and HA-MI in AAV2/9) to ensure that our results are reproducible and independent of virus preparation.

      ii. We evaluated multiple independent injection sites in each series, ensuring that an adequate number of repetitions was executed under the same conditions (equal virus titer, injection volume, time before animal perfusion, tissue processing, and imaging).

      iii. We presented our results in a series of new figures (Figure 7 and Figure 7 – figure supplements 1 and 2) with added panels showing equivalent vs. boosted laser intensities and gain conditions, where necessary, and parvalbumin protein counter-labeling for reference.

      c) There are minor questions about statistical methods for comparing and concluding about the significance of differences between some experimental groups.

      We have now added statistical analysis supporting all our comparisons and conclusions regarding differences between groups (please see the detailed response to Reviewer #1, Recommendations for the Authors, points g,j,l, and q).

      d) An important conceptual gap remains unanswered by the study. Given the phenotypic similarities between patients with sequence variation in Na+ channel and HCN genes, as well as evidence of reduction of other channels or pumps in this case and the strong co-localization of Na+ channels and HCN channels in the PV+ neurons thought critical in the epilepsy of the HCN sequence variants, is it possible that Na+ channels are impacted as a secondary effect of HCN channel dysfunction here?

      This is certainly a possibility, and indeed one that we very much favor. We have added a new analysis of AP morphology (Figure 5 – figure supplement 1) and performed a microarray-based experiment to screen for changes in Na+ channel expression (Source Data 1). While these experiments yielded negative results, they do not definitively rule out potential cell-type specific alterations in the function of Na+ channels or other conductances. A more thorough experimental examination of this important question will have to await future studies. We have added text to underscore how changes in other conductances may indeed impact neurons’ intrinsic properties in our mice (pages 10-11).

      Reviewer #2 (Public Review):

      a) It is not clear whether the mouse equivalent of the severe developmental disability seen in humans was present in mice.

      We have added new behavioral experiments, which show impairment in some cognitive abilities in Hcn1GD/+ mice but not in Hcn1MI/+ mice, consistent with the more severe development disability observed in patients carrying the p.G391D variant compared to patients carrying the p.M153I variant (new Figure 3 and text on page 6 and 7).

      b) (…) there is no demonstration of hyperexcitability at a cellular or network level, so we do not know how HCN1 mutation predisposes to seizures. In fact, hippocampal pyramidal neurons were shown to be hypoexcitable, at least to one method of action potential generation. There is a suggestion that parvalbumin-positive interneurons may be affected, but there is no evaluation of their excitability. It is possible that HCN1 mutation is directly causing neuronal hyperexcitability, but this would only be uncovered by studying HCN1 channel effects on pyramidal neuron dendrite excitability (where they are mostly localized); synaptic function; or on interneuron excitability. There is also no direct demonstration of the effects of channel mutation on HCN1-mediated current (Ih) in native neurons, so we cannot assess how channel biophysics is altered.

      We agree with the Reviewer that there are indeed limitations to the interpretation of our study. Each of these important questions will need additional experimentation before they can be answered definitively. We have added text to underscore such limitations in the Results (pages 10 and 17) and Discussion (pages 20-21) sections. In future studies, we plan to evaluate both the excitability of interneurons through genetic labeling of PV+ cells and patch-clamp recordings, as well as evaluate their synaptic function. Voltage-clamp recordings in pyramidal neurons and possibly dendritic recordings may also be attempted. However, each of these lines of experimentation will require considerable time to complete, particularly because of the difficulty in obtaining patch-clamp recordings from hippocampal slices from the mouse mutants. So we ask that we be allowed to leave them to a future study.

      Reviewer #3 (Public Review):

      a) The authors characterize cerebellum-dependent functional deficits in the mutant mice, basing their studies on the high expression levels of HCN1 in cerebellum, citing Notomi & Shigemoto, They do not present phenotypic deficits in function ascribed to hippocampus or cortex. (…) Therefore, it should be excellent if the authors presented functional tests of hippocampus or cortex dependent behaviors, regardless of the outcome in Fig.2. At a minimum, they should modify the text and downplay the cerebellar emphasis.

      Following the Reviewer’s helpful recommendations, we have added new behavioral experiments testing short-term and long-term memory (see new Figure 3) and modified the panels in Fig 2. The manuscript text has been revised accordingly (pages 6 and 7).

      b) The authors base their proposed mechanism for the pro-epileptic effects of the mutation on the notion that HCN1 Channels are localized to axons only of PV interneurons. Whereas this fact may be true for the adult, during development, axonal targeting is not unique to basket-type interneurons. It is observed in the developing hippocampal circuit, in medial entorhinal cortex neurons innervating dentate gyrus granule cells, i.e., the perforant path. Have the authors looked at axonal targeting in this region in the mutant mice during appropriate developmental stages? Its absence might modulate the firing of GCs, specifically during development (Bender et al., J Neurosci 2007). At a minimum this point merits discussion, particularly in view of the developmental nature of the epilepsies described.

      The Reviewer correctly points out that HCN1 channels are present not only in the axons of PV+ interneurons but also in the axons of certain subclasses of excitatory neurons (see Huang et al., 2011, 2012, and 2019). Regarding axons from medial entorhinal cortex neurons innervating dentate gyrus granule cells, i.e., the perforant path, there is an interesting difference between mice and rats. While HCN1 channel subunits at this site are downregulated in adult rats, they persist in adult mice. This can be seen in the immunostainings shown in Figure 5A (formerly 4A) of the manuscript. Similar to hippocampal PV+ axons in CA3 (Figure 7A, formerly 6A), it can be noted that HCN1 expression in the perforant path is considerably decreased in Hcn1GD/+ mice compared to wildtype and Hcn1MI/+ mice.

      c) In this context, there are distinct developmental profiles for the 4 HCN subunits, including HCN1, and these profiles might contribute to age-specific defects leading to seizures. This point merits discussion.

      We thank the reviewer for raising this important point and have added text underscoring the potential contribution of altered HCN1 channel function to brain development (page 19) to address this issue, and in accord with the comments raised by Reviewer #1 above (see point p).

      d) Whereas the focus of this paper is on the role of genetic mutations in HCN1 in epilepsy, the paper may be enriched by being placed in the context of the overall contributions of HCN1 channels to human epilepsy, including "acquired epilepsy"" via potential epigenetic changes in the expression of normal HCN channels (Bender et al., 2003 and others).

      We agree with the Reviewer and now refer to these datasets in the Introduction, citing the excellent review by Brennan et al., 2016 (page 4).

    1. Author Response

      Reviewer #1 (Public Review):

      1) In the future it will be interesting to determine how these changes in the bone marrow relate to the different subsets or recruited macrophages present in obese tissues. For example, whether monocytes in the bone marrow preferentially generate CD9+Trem2+ Lipid associated macrophages recently described in obese adipose tissue (Jaitin et al, Cell, 2019) or if they are equally capable of generating monocyte-derived tissue resident macrophages in obese tissues.

      We appreciate and concur with the Reviewer’s suggestion, as stated for future analysis, of how bone marrow monocytes compare with macrophages in adipose tissue. That is a long-term plan and will make the subject for a full new study of interest to the immunometabolism community. As preamble to that future study -and considering that Jaitin et al identified CD9 and Trem2 in lipid-laden macrophages- we have tentatively explored if bone marrow-derived macrophages (BMDM) from mice fed HFD and LFD differ in their expression of these markers. In these exploratory experiments, however, HFD did not statistically change the expression of either marker in the BMDM.

      2) The main strength of this paper is in the identification of the changes in the monocyte subsets abundance early after feeding a HFD and in uncovering the metabolic changes in and between these two monocyte subsets in obese mice. One concern regarding the data as a whole is that, while the authors have nicely indicated the number of samples/mice in each figure, there is no mention of how many times each experiment was performed.

      We have more explicitly and amply stated the number of times every experiment was performed and this information is also added to the figure legends.

      Additionally, the inclusion of the different gating strategies used particularly for the first figures would be advantageous to fully appreciate the findings being presented. This is particularly relevant for the identification of the Ly6Chi and Ly6Clo BM monocytes.

      We now present the gating strategy at the beginning of the Results as Figure 2 – figure supplement 1. In Figure 2 – figure supplement 1B, control flow experiments without anti-Ly6C or anti-Cd11b are shown gated for the Gr1(-) vs CD115(+) subset, confirming the proper positioning for the Ly6clo and Ly6chi gates. In Figure S1C, we illustrate the gating strategy shows that CX3CR1 segregates with the Ly6Clo sorted monocytes, and CCR2 segregates with the Ly6Chi, as expected. We hope that this complements the information on the identification of the Ly6Chi and Ly6Clo monocytes. In the future, a more complete analysis of the FACS-sorted Ly6Clo and Ly6Chi cells could be performed using RNAseq, which was however outside the realm of possibilities for this study.

      3) The alternative explanation (to Ly6Clo conversion to Ly6Chi monocytes) could be that there are some progenitors remaining in these cultures that give rise to Ly6Chi monocytes following exposure to the conditions media. .. It is important to confirm that the sorted cells are a pure population of Ly6Clo monocytes with no contamination from progenitors that are also Ly6Clo

      We appreciate the suggestion; to address this interesting possibility, in new experiments we used markers of myeloid progenitor cells (CD117+;Sca1-, followed by gating for CD16/32 vs. CD34 to identify GMP, CMP and MEP populations). The new findings show that GMP represent the majority of progenitors that are present in the FACS-identified Ly6Clo and Ly6Chi monocyte populations derived from complete bone marrow cells. In this analysis, we find the GMP are present at low abundance (ranging 140 GMP per 1000 Ly6Clo, in results from 3 separate mice). (NEW Figure 7E). This finding complements our original observation that there are relatively few progenitor cells in the in vitro-generated monocyte samples, detected by FACS analysis (Figure S5). Despite their relatively low abundance, we cannot discount that some could become Ly6Chi cells. However, the 18-hour duration of exposure of FACS-sorted Ly6Clo cells to WATA-CM would have allowed for only about one doubling event of precursor cells. If so, the progenitors could not fully account for the entirety of the change in proportion of Ly6Clo in favour of Ly6Chi.

      Supporting this argument, when treating in vitro-generated monocytes with WATA-CM, the slight increase in CMP progenitors did not manifest as an increase in the downstream GMP progenitor numbers (now in Figure 7 – figure supplement 1), which are upstream of the Ly6C monocyte lineage.

      To more directly explore the growth potential of progenitor cells, we have now used the Colony Forming Assay to determine the ability of progenitors to give rise to more differentiated monocyte precursor colonies upon incubation with the various conditioned media. In vitro-generated bone marrow-derived monocytes were exposed to control-, WATA- and BATA- CM, The new results, shown in the new Figure 7C,D, indicate that whereas white adipocyte media (WATA-CM) did not expand colonies from CMP (GEMM), GMP (CFU-G and CFU-M) or MEP (BFU-E) progenitors, brown adipocyte media (BATA-CM) slightly expanded colonies derived from CMP/GEMM and skewed GMP cell differentiation toward CFU-M (which lead to monocytes) from CFU-G (which lead to granulocytes and ‘neutrophil’ like monocytes, Yáñez et al, 2017). The new text on pages 15-16, lines 341-365 reads as follows:

      “To buttress the above results, we also assessed the colony forming potential of in vitro-generated monocytes that received pre-treatment of WATA-CM or BATA-CM, to assess the potential for expansion of progenitor cells present in the samples. Colonies were identified as BFU-E (giving rise to erythroid cells); CFU-GEMM (giving rise to large mixed cultures of granulocyte, erythroid, macrophage, megakaryocyte; also, known as CMP); CFU-G (giving rise to granulocytes) or CFU-M (giving rise to macrophages). BATA-CM promoted growth of CMP/CFU-GEMM cultures over control Media ( p<0.01 Two-way ANOVA, Tukey’s multiple comparisons, Figure 7C) and biased granulocyte/macrophage progenitors (widely known as GMP) towards macrophage over granulocyte differentiation, relative to control Media ( p<0.01) or WATA-CM (p<0.05 or **p<0.001, Figure 7C). The total numbers of colonies that grew after 7-10 days of culturing of each pre-treated cohort of monocytes were not different across the three treatments, although trended upwards with BATA-CM (Figure 7D). These results indicated that while BATA-CM promoted expansion of selected populations (CMP/CFU-GEMM and CFU-M), consistent with the increase in BrdU incorporation shown above, WATA-CM was without effect relative to control Media. The above findings suggest that while the proportional increase in Ly6Clow monocytes induced by BATA-CM involves cell proliferation, the proportional increase in Ly6Chigh monocytes induced by WATA-CM does not. As a complementary approach, BM cells were analyzed by flow analysis for the presence of monocyte progenitors within the Ly6Clow or Ly6Chigh monocyte subsets. GMP progenitor cells were essentially the only progenitors detected by this approach in the Ly6Clow monocyte subset, and they represented 140 GMP per 1000 Ly6Clow cells (Figure 7E). During the incubation time of 18 h with conditioned medium, we anticipate the progenitors could theoretically undergo only one doubling and therefore unlikely to account for the full changes in Ly6Clow cell numbers produced by WATA-CM. Collectively, the results in Figures 7C-E indicate that WATA-CM treatment did not result in an appreciable expansion of progenitor cells or colony formation. Therefore, alternative mechanisms were explored that might contribute to the WATA-CM induced shift towards Ly6Chigh monocyte preponderance, particularly the possible conversion of one subset into the other.”

      Altogether, we feel that these previous and new results do not endorse the possibility that the brunt of the Ly6Chi cells increase is due to progenitor differentiation in response to WATA-CM. We therefore lean towards the interpretation that Ly6Clo cells convert to Ly6Chi but agree that this potential mechanism will require further additional analysis in the future.

    1. Author Response

      Reviewer #2 (Public Review):

      Taguchi et al. carried out a functional and structural analysis of microtubule dynamics inhibition by the C. elegans kinesin-4 KLP-12. The authors found that both the motor domain and the tail of KLP-12 are necessary to precisely control axon length in C. elegans. The authors showed that a minimal dimer of KLP-12 is motile along the microtubule lattice and reduces microtubule growth rate in vitro; further biochemistry assay demonstrated that the KLP-12 motor domain can similarly bind the microtubule lattice and free tubulin. The authors then solved the crystal structure of KLP-12 motor domain in complex with tubulin and compared their structure data with that of Kif5B (a motile kinesin that does not depolymerize microtubules) and Kif2C (not actively motile but depolymerizes microtubules). They found that the structure of KLP-12 is more similar to that of Kif5B than that of Kif2C, whereas the curvature of tubulin in complex with KLP-12 is between the curvatures of tubulin in complex with Kif5B and Kif2C. The high-resolution structural data from this study suggest how kinesin-4 can be motile along the microtubule lattice and at the same time stop the microtubule dynamics at its plus end; the mild effect of KLP-12 on protofilament bending may be crucial in enabling the inhibition of both the polymerization and depolymerization of the microtubules.

      Overall, this is a very nice study, although some aspects of data analysis or interpretation need to be extended or clarified.

      We sincerely appreciate the kind and fair evaluation of this reviewer.

      1) Microtubule dynamics may be inhibited by reducing growth rate, inducing pausing, or altering catastrophe. To make their results more solid, the authors should examine whether KLP-12 impacts microtubule pausing and/or catastrophe. Such additional metrics may help strengthen the results and further the insight into the role of tubulin curvature in microtubule dynamics.

      We thank this reviewer for the constructive suggestion. We evaluated each factor and found growth rate is the most affected but depolymerization rate was not significantly affected. The frequency of MT catastrophe events was slightly reduced (Figure 2G). This is similar to the result of KIF21A- or KIF5-bound microtubules suggesting the property is conserved in a broad range of kinesins. Frequency of rescue events was reduced as well (Fig 2I). One possibility is that KLP-12 suppresses microtubule polymerization. Another possibility is the indirect effect induced by reduced MT catastrophe events. We have included these in the result section (pages 8-9, line 187-204; Figure 2).

      2) Structural comparison may be sensitive to the resolution of protein structures that are compared. The authors solved the crystal structure of KLP-12 at a resolution of 2.9 A, which is different from that of Kif4, Kif5B, or Kif2C from previous structure studies (1.7, 3.2, and 3.2 A). The values of root-mean-square distance between protein structures tend to increase if the two proteins that are being compared have been resolved at different resolutions. To strengthen their structural comparison results, the authors should account for the effect of different crystallographic resolutions on their root-mean-square distance evaluations.

      We agree that the resolution of protein structures is important for the rmsd comparison. Thus, we have re-calculated the rmsd values for a fair comparison using the main chain residues (page 13, lines 310-312; Figure 4A).

      3) Structural comparison may also be sensitive to what the proteins are in complex with. The authors solved the structure of KLP-12 that is in complex with GTP-tubulin, which may be different from the structure of KLP-12 that is free of tubulin, or in complex with GDP-tubulin. Previous studies had solved the structure of Kif4 which is free of tubulin (Chang et al 2013), and the structures of Kif5B (Gigant et al 2013) and Kif2C in the presence of GDP (Wang et al 2017). To strengthen their results, the authors should clarify how these differences between the previous and the current structural studies impact their structural comparison results.

      As this reviewer suggested, the kinesin conformations are affected by the nucleotide state of the motor, by forming a complex with tubulin or microtubule, and the nucleotide state of tubulin or microtubule. Thus, we have compared the KLP-12–GTP-tubulin complex with available kinesin-4 structures, kinesin-1 structures, and kinesin-13 structure. These comparisons are shown in the revised Figure 4 and Figure4–supplement 1, demonstrating what is specific for KLP-12 or what is common among kinesin-4.

    1. Author Response*

      Reviewer 2 (Public Review):

      1) The periodic components of the simulated power did not overlap as is often seen in empirical data, they were confined to 1-40 Hz (e.g. no gamma activity was simulated), and the simulations did not include a knee in the aperiodic component. This means that it Is unclear whether SPRiNT would work as well in more complex or excessively noisy datasets. The non-sinusoidal waveform shape of the periodic component in the rodent data reiterates this concern.

      We are grateful that the Reviewer raised these important considerations about the practical value of SPRiNT in more complex data scenarios.

      We wish to clarify that in the simulations reported, although two simultaneous periodic components would not share the same centre frequency, a substantial number of realizations of the simulations made these components overlap with centre frequencies separated by less than 5 Hz (6% of all simultaneously simulated peaks; n = 8166). We now provide an example of two overlapping spectral peaks in the revised version of Figure 3 – figure supplement 1C.

      In preparing the revised manuscript, we also studied how the spectral overlap of periodic components would determine the peak detection rate: we found that the peak detection rate increases with the separation between two consecutive peaks along the frequency spectrum, but that it is independent of the presence of other peaks if they are at least 8 Hz apart from each other (Figure 3 – figure supplement 1D).

      As correctly mentioned by the Reviewer, the original synthesized data did not comprise components beyond a maximum frequency of 40 Hz, nor did they include a knee in their aperiodic component. In the revised manuscript, we now report new results obtained from the analysis of 1000 synthesized time series that comprise two periodic components (including one periodic component between 30-80 Hz) and a knee in their aperiodic component (Figure 3 – figure supplement 2). The relevant additions to the Methods section are pasted below:

      "We also simulated 1000 time series with aperiodic activity featuring a static knee (Figure 3 – figure supplement 2). Aperiodic exponents were initialized between 0.8-2.2 Hz-1. Aperiodic offsets were initialized between -8.1 and -1.5 a.u., and knee frequencies were set between 0 and 30 Hz. Within the 12-36 s time segment into the simulated time series (onset randomized), the aperiodic exponent and offset underwent a linear shift and a random magnitude in the range of -0.5 to 0.5 Hz-1 and -1 to 1 a.u., respectively. The duration of the linear shift was randomly selected for each simulated time series between 1 and 20 s; the knee frequency was constant for each simulated time series. We added two oscillatory (rhythmic) components (amplitude: 0.6-1.6 a.u.; standard deviation: 1-2 Hz) of respective peak centre frequencies between 3-30 Hz and between 30-80 Hz, with the constrain of minimum peak separation of at least 2.5 peak standard deviations. The onset of each periodic component was randomly assigned between 5-25 s, with an offset between 35-55 s. (Lines 773 to 784)"

      We analyzed these data with SPRiNT within the 1-100 Hz frequency range. These new results indicate that SPRiNT performs in a satisfactory manner on data with components distributed over a broader frequency range, with a knee in their aperiodic component.

      Below are the related edits to the revised Results section:

      "SPRiNT did not converge to fit aperiodic exponents in the range [-5, 5] Hz-1 only on rare occasions (<2% of all time points). We removed these data points from further analysis. The simulated aperiodic exponents and offsets were recovered with MAEs of 0.22 and 0.42, respectively; static knee frequencies were recovered with a MAE of 3.55x104 (inflated by large outliers in absolute error; median absolute error = 11.72). Overall, SPRiNT detected the peaks of the simulated periodic components with 56% sensitivity and 99% specificity. The spectral parameters of periodic components were recovered with equivalent performances in the lower (3-30 Hz) and respectively, higher (30-80 Hz) frequency ranges: MAEs for centre frequency (0.32, resp. 0.32), amplitude (0,27, resp. 0.22), and standard deviation (0,35, resp. 0.29). (Lines 244 to 252)"

      We also now discuss possible limitations in the Discussion:

      "Finally, SPRiNT’s performances were slightly degraded when spectrograms comprised an aperiodic knee (Figure 3 – figure supplement 2). This is due to the specific challenge of estimating knee parameters. Nevertheless, the spectral knee frequency is related to intrinsic neuronal timescales and cortical microarchitecture (Gao et al., 2021), which are expected to be stable properties within each individual and across a given recording. Thus, we recommend estimating (and reporting) aperiodic knee frequencies from the power spectrum of the data with specparam, and specifying the estimated value as a SPRiNT parameter. (Lines 480 to 486)"

      The Reviewer’s point on non-sinusoidal waveform shapes is also well taken, but we would like to emphasize that they challenge all current methods, including but not specific to SPRiNT or specparam (Donoghue et al., 2021). Indeed, SPRiNT and specparam perform a parametric decomposition of the spectrally transformed data, regardless of whether periodic components of a true sinusoidal nature are present. Non-sinusoidal periodic time series, such as the sawtooth waveforms observed in the rodent data analyzed in the manuscript, comprise spectral peaks as harmonic components (here of a theta-band fundamental rhythm). For this reason, we opted to focus our analyses and discussion of these data to the temporal dynamics of their aperiodic components.

      2) Furthermore, the SPRiNT and specparam parameters were fixed and arbitrary, and it is unclear how robust the current results are with respect to changes in these parameters.

      Here too, we appreciate the Reviewer’s insight and concern.

      We explored a subset of the simulations with SPRiNT using alternative settings for STFT (Figure 2 – figure supplement 3) and observed overall satisfactory performances. We now report the relevant results in an addition to the Supplemental Materials, as pasted below:

      "SPRiNT settings for higher temporal resolution (time range: 1-59 s, in 0.25 s steps; frequency range: 1-40 Hz, in 1 Hz steps) provided slightly larger estimation errors of exponent (MAE = 0.15) and offset (MAE = 0.20) relative to original settings (exponent, offset MAE = 0.11, 0.14, respectively). Alpha peaks were recovered with slightly lower sensitivity (98% at time bins with maximum peak amplitude; original 99%) and specificity (9% spurious detections; original 4%), and with greater errors in centre frequency (MAE = 0.43), amplitude (MAE = 0.24), and bandwidth (MAE = 0.53) compared to original settings (centre frequency, amplitude, bandwidth MAE = 0.33, 0.20, 0.42, respectively). Down-chirping beta oscillations were detected with lower sensitivity (93% sensitivity at time bins with maximum peak amplitude, original 98%; 86% specificity, original 98%), and with greater errors in centre frequency (MAE = 0.57), amplitude (MAE = 0.22), and bandwidth (MAE = 0.57) compared to original settings (centre frequency, amplitude, bandwidth MAE = 0.43, 0.17, 0.48, respectively). SPRiNT settings for higher frequency resolution (time range: 2-58 s, in 0.5 s steps; frequency range: 1-40 Hz, in 0.5 Hz steps) provided comparable estimation errors of exponent (MAE = 0.13) and offset (MAE = 0.16) relative to original settings (exponent, offset MAE = 0.11, 0.20, respectively). Alpha peaks were recovered with similar sensitivity (99% at time bins with maximum peak amplitude; original 99%) but lower specificity (21% spurious detections; original 4%), and with comparable errors in centre frequency (MAE = 0.35), amplitude (MAE = 0.23), and bandwidth (MAE = 0.41) to original settings (centre frequency, amplitude, bandwidth MAE = 0.33, 0.20, 0.42, respectively). Down-chirping beta oscillations were detected with comparable sensitivity (99% sensitivity at time bins with maximum peak amplitude, original 98%) but lower specificity (78%, original 98%), and with greater errors in centre frequency (MAE = 0.50), amplitude (MAE = 0.21), and bandwidth (MAE = 0.59) relative to original settings (centre frequency, amplitude, bandwidth MAE = 0.43, 0.17, 0.48, respectively). (Lines 1190 to 1213)"

      We now provide in the Discussion practical recommendations for setting the methods parameters, which will depend on the specific objectives of a given study. We saw the rationale for the settings used in the manuscript as guidelines to future users. We believe the specific recommendations added will be of greater practical value of the manuscript.

      Reviewer 3 (public Review):

      1) Based on the simulated data, SPRiNT seems to be very efficient and robust, and it is also superior to the wavelet-specparam approach. However, while the simulations are very extensive, I find that they are constructed in a manner that may induce biases as the comparison is conducted between SPRiNT and a single, fixed wavelet-based approach. Like any spectral analysis technique, wavelets possess their own trade-off between temporal and frequency resolutions. As the wavelet analyses are conducted using a fixed set of parameters, it may be that some of the differences between the methods stem from how well they are suited for detecting the simulated activity that is constructed using a certain standard deviation of their oscillatory frequencies. It would be valuable to evaluate whether changing the wavelet-analysis parameters or the width of the simulated oscillations would change how the alternative methods compare. It is of course clear that the STFT based approach would remain computationally superior, but it would be interesting to see whether the other differences would remain as robust after the above more detailed evaluation of the methods. Related to the method comparison, it also appears that the outlier removal within SPRiNT markedly improves the quantification of the periodic components. This matter could be discussed more within the manuscript.

      We appreciate the concerns expressed by this Reviewer regarding our choice of wavelet parameters.

      To respond to the concerns expressed, we have performed new analyses with the wavelet-specparam approach with a diversity of alternative time-frequency resolutions: FWHM of 2s at 1 Hz, and FWHM of 4s at 1 Hz (Figure 2 – Figure supplement 2).

      The changes observed remain qualitatively moderate, and the performances below those obtained with SPRiNT. The new results are displayed in Figure 2 – figure supplement 2 and described in the following revisions to Supplemental Materials:

      "Wavelet settings of finer resolution in time and coarser in frequency (time range: 3-57 s, in 0.005 s steps; central frequency = 1 Hz, FWHM = 2 s; frequency range: 1-40 Hz, in 1 Hz steps) yielded lower estimation errors of exponent (MAE = 0.12) and offset (MAE = 0.35) compared to original settings (exponent, offset MAE = 0.19, 0.78). Alpha peaks were recovered with higher sensitivity (97% at time bins with maximum peak amplitude, original 95%) and specificity (32% spurious detections, original 47%), although with greater errors in centre frequency (MAE = 0.61), amplitude (MAE = 0.25), and bandwidth (MAE = 0.94) compared to original settings (centre frequency, amplitude, bandwidth MAE = 0.41, 0.24, 0.64, respectively). Down-chirping beta oscillations were detected with lower sensitivity (29% sensitivity at time bins with maximum peak amplitude, original 62%) but higher specificity (97%, original 90%), and with greater errors in centre frequency (MAE = 0.63), amplitude (MAE = 0.17), and bandwidth (MAE = 1.59) relative to original settings (centre frequency, amplitude, bandwidth MAE = 0.58, 0.16, 1.05, respectively). When wavelet settings prioritized resolution in frequency over time (time range: 4-56 s, in 0.005 s steps; central frequency = 1 Hz, FWHM = 4 s; frequency range: 1-40 Hz, in 1 Hz steps) relative to original settings, the errors in estimates of exponent (MAE = 0.16) and offset (MAE = 0.47) parameters were reduced (original exponent, offset MAE = 0.19, 0.78, respectively). Alpha peaks were recovered with higher sensitivity (99% at time bins with maximum peak amplitude, original 95%) and similar specificity (46% spurious detections, original 47%), although with larger errors in centre frequency (MAE = 0.33), amplitude (MAE = 0.20), and bandwidth (MAE = 0.43) compared to original settings (centre frequency, amplitude, bandwidth MAE = 0.41, 0.24, 0.64, respectively). In contrast, down-chirping beta oscillations were detected with slightly higher sensitivity (79% at time bins with maximum peak amplitude, original 62%) and specificity (91%, original 90%), and with lower errors on centre frequency (MAE = 0.37), amplitude (MAE = 0.14), and bandwidth (MAE = 0.71) compared to original settings (centre frequency, amplitude, bandwidth MAE = 0.58, 0.16, 1.05, respectively). (Lines 1155 to 1179)"

      We now discuss the outlier peak removal process and its benefits/drawbacks more extensively in the revised Discussion. The relevant section is pasted below:

      "SPRiNT’s optional outlier peak removal procedure increases the specificity of detected spectral peaks by emphasizing the detection of periodic components that develop over time. This feature is controlled by threshold parameters that can be adjusted along the time and frequency dimensions. So far, we found that applying a semi-conservative threshold for outlier removal (i.e., if less than 3 more peaks are detected within 2.5 Hz and 3 s around a given peak of the spectrogram) reduced the false detection rate by 50%, without affecting the true detection rate substantially (a <5% reduction; Figure 3 and Figure 3 – figure supplement 3). Setting these threshold parameters too conservatively would reduce the sensitivity of peak detection. (Lines 487 to 494)"

      2) As for the investigation of real data, there are a few aspects that in my opinion could be investigated more thoroughly. Based on the findings it appears that the fine-grained time-resolved parametrization yields added value, especially in eyes-open rest where the fluctuation of alpha center frequency dissociates the different age groups, whereas the other time-resolved findings are not as unambiguously supportive of the need for fine-grained time-resolved analysis. Regarding the first point (fluctuation of alpha center frequency), the finding that the amount of fluctuation within the alpha frequency is distinct across age groups is very interesting. On the methodological, an open question is whether SPRiNT is required for making this observation. That is, is this effect observed only when applying the specparam-based parametrization (and outlier removal) after STFT or would the same observation have been made simply by estimating the fluctuations directly from the STFT based spectral estimates? As for using SPRiNT to determine the properties of aperiodic activity, presently it is not clear whether the approach yields added value compared to the more direct use of specparam. That is, the present findings show that the mean aperiodic slope dissociates both different age groups and resting-state conditions (eyes-open vs. -closed). It would be appropriate to test whether the same observation would be made by using specparam in the more standard way by first obtaining one spectral estimate across the whole one-minute time windows and then parametrizing this estimate. This type of testing would yield insights into whether there is a difference between SPRiNT that builds on dynamic but noisier spectral estimates and that allows the outlier removal and the standard approach benefiting from more stable spectral estimates for the present data and possibly for other questions. As for the rodent movement data, the evidence is clear that the aperiodic exponent differs between resting and movement state. However, the fundamental meaning of the change of the exponent at transition points is not explored. Does this change simply reflect the speed of the animal/amount of movement that changes across the time period prior and post rest and movement onsets? That is, does the transition curve align with the movement curve or does it represent something more complex? This aspect could be evaluated and discussed more extensively. Together, the above additional evaluations would be beneficial for determining whether there is value in looking at aperiodic activity in a time-resolved manner and whether a fine-grained analysis is needed or would a more static analysis takes into account the fact tasks/states fare equally or even in a superior manner.

      We appreciate all concerns raised here by the Reviewer. We intended to report that age-related changes of spectral features in healthy aging (Cellier et al., 2021; Donoghue et al., 2020; Hill et al., 2022; Ostlund et al., 2022; Schaworonkow & Voytek, 2021) can be replicated using summary statistics of SPRiNT outcomes. Our intention was not to showcase these effects as novel. To clarify our purpose and the novelty in the proposed approach, we have revised Figure 4 accordingly and now emphasize the genuine novel aspects of our findings from the time-resolved parameterization of the spectrogram.

      We further investigated the benefits of using SPRiNT to detect age-related changes in the temporal variability of alpha-peak frequency. Using STFT, we replicated the same effect trends whereby older individuals exhibit greater temporal variability of alpha-peak frequency. One asset from the SPRiNT approach is the interpretability of the effect because it detects genuine peak components in the spectrogram and correct their parameters from possible confounds from concurrent aperiodic components. Individual alpha peak frequency derived from STFT is based on instantaneous fluctuations of signal power in the alpha band, regardless of the actual presence of a periodic component.

      As for apparent discrepancies between the SPRiNT and specparam outcomes, we found that only the specparam-derived alpha amplitude, not SPRiNT’s, was predictive of age group. Please see our response to Reviewer 1’s first comment for a detailed interpretation of this outcome.

      Concerning the rodent data, we followed this Reviewer’s suggestion of determining whether aperiodic exponent was related to movement speed at the transitions between movement and rest (and vice versa). Indeed, we found that variability in aperiodic exponent proximal to transitions between movement and rest was partially explained by instantaneous movement speed (see Figure 5 – figure supplement 3). Below, we have revised the Results and Discussion sections accordingly:

      "We tested whether changes in aperiodic exponent proximal to transitions of movement and rest were related to movement speed and found a negative linear association in both subjects for both transition types (EC012 transitions to rest: β = -9.6x10-3, SE = 4.7x10-4, 95% CI [-1.1x10-2 -8.6x10-3], p < 0.001, R2 = 0.29; EC012 transitions to movement: β = -7.3x10-3, SE = 4.3x10-4, 95% CI [-8.1x10-3 -6.4x10-3], p < 0.001, R2 = 0.18; EC013 transitions to rest: β = -1.1x10-2, SE = 2.3x10-4, 95% CI [-1.2x10-2 -1.1x10-2], p < 0.001, R2 = 0.32; EC013 transitions to movement: β = -1.2x10-2, SE = 3.2x10-4, 95% CI [-1.3x10-2 -1.2x10-2], p < 0.001, R2 = 0.26; Figure 5 – figure supplement 3). (Lines 403-410)"

      Changes in aperiodic exponent were partially explained by movement speed (Figure 5 – figure supplement 3), which could reflect increased processing demands from additional spatial information entering entorhinal cortex (Keene et al., 2017) or increased activity in cells encoding speed directly (Iwase et al., 2020). (Lines 556-560)

    1. Author Response

      Reviewer #1 (Public Review):

      Although a bunch of studies have been carried out to see whether calcium supplementation is a prerequisite for the promotion of bone health or prevention of bone diseases, this is the first trial to see its effect on the population whose age is reaching peak bone mass. Outcomes are clear and justified by sound methodology. Also, the message from this systematic review could directly influence the clinical decision on who might gain benefit from calcium supplementation.

      We are very grateful for your considerate comments and your recognition of our work in this study. Your suggestions really helped us to improve the clarity of this manuscript.

      Strengths of this study are:

      1) This is the first systematic review by meta-analysis to focus on people at the age before achieving peak bone mass (PBM) and at the age around the PBM. 2) Detailed subgroup and sensitivity analyses drew consistent and clear results.

      Thank you very much for your comments. We are very grateful for your recognition of our work in this study.

      Limitations of this study are:

      1) Substantial intertrial heterogeneity should be considered in terms of dose effect of calcium supplementation and differences between both sexes etc.

      Thank you very much for your kind comments. We performed subgroup analyses to explore whether different doses of calcium supplementation had different effects, and the results are showed in Table 4a and 4b at the end of this Author Response. The results showed that the intertrial heterogeneity in the subgroup with doses of calcium supplementation greater than or equal to 1000 mg/day was significantly smaller than that in the subgroup with doses less than 1000 mg/day, suggesting that different doses of calcium supplementation across trials may be a potential source of the substantial intertrial heterogeneity.

      Similarly, we also performed subgroup analyses by sexes. Of all included trials, 23 trials focused on women only, and 20 trials involved both men and women participants, however these 20 trials did not report the results for men or women separately. We therefore divided the included trials into two subgroups: trials with women only and trials with both men and women. The corresponding results of subgroup analyses are showed in Table 5a and 5b at the end of this Author Response. The results showed that the subgroup with both men and women seemed to have less heterogeneous than the subgroup with women only, suggesting that sex may be a possible source of the observed heterogeneity.

      In addition, we were also aware of the large heterogeneity between trials and explored the possible sources through several additional approaches. Firstly, instead of using fixed-effects models, we have chosen random-effects models to summarize the effect estimates. Secondly, we performed meta-regression analyses by age, population regions, calcium doses, baseline intake and sample sizes to explain the intertrial heterogeneity. The results of meta-regression are provided in Table 6 at the end of this Author Response. The results suggested that this heterogeneity could be explained partially by differences in regions of participants.

      We have updated the results and discussions about potential sources of heterogeneity in the revised manuscript, as follows:

      In general, the heterogeneity between trials was obvious in the analysis for BMD (P<.001, I2=86.28%) and slightly smaller for BMC (P<.001, I2=79.28%). The intertrial heterogeneity was significantly distinct across the sites measured. Subgroup analyses and meta-regression analyses suggested that this heterogeneity could be explained partially by differences in age, duration, calcium dosages, types of calcium supplement, supplementation with or without vitamin D, baseline calcium intake levels, sex and region of participants. (See Lines 293-298 on Page 20 in the Main Text)

      Several limitations need to be considered. First, there was substantial intertrial heterogeneity in the present analysis, which might be attributed to the differences in baseline calcium intake levels, regions, age, duration, calcium doses, types of calcium supplement, supplementation with or without vitamin D and sexes according to subgroup and meta-regression analyses. To take heterogeneity into account, we used random effect models to summarize the effect estimates, which could reduce the impact of heterogeneity on the results to some extent. (See Lines 394-399 on Page 24 in the Main Text)

      2) Rarity of RCTs focused on the 20-35-year age group.

      Thank you very much for raising this point. We have comprehensively searched databases for eligible studies and found only three RCTs (Islam et al; Barger-Lux et al; Winters-Stone et al) focused on the 20-35-year age group. We did notice this fact as well. Because of this, we intend to perform a randomised controlled trial to evaluate the effects of calcium supplementation in this age group. In fact, this trial has already been started and is currently ongoing (Registration number: ChiCTR2200057644, http://www.chictr.org.cn/showproj.aspx?proj=155587).

      In this open-label, randomized controlled trial, we will randomly assign (1:1) 116 subjects (age 18-22 years) to receive either or not calcium supplementation with milk (500 mL/day, contains about 500 mg/d calcium) for 6 months. The primary outcomes are bone mineral density and bone mineral content at the lumbar spine, femoral neck and total hip. The secondary outcomes are clinical indicators related to bone health, such as serum osteocalcin, bone-specific alkaline phosphatase, urinary deoxypyridinoline, etc. We will conduct the current trial with great care and diligence and look forward to the results of this trial.

      Reviewer #2 (Public Review):

      This systematic review and meta-analysis titled 'The effect of calcium supplementation in people under 35 years old: A systematic review and meta-analysis of randomized controlled trials' provide good evidence for the importance of calcium supplementation at the age around the plateau of PBM. The statistical analyses were good overall and the manuscript was generally well written.

      We are very grateful for your considerate comments and for your recognition to our work in this study. Your suggestions really helped us to improve the clarity of this manuscript.

      One concern in this study is that RCTs included were substantially heterogenous in subjects, calcium types, duration, vitamin D supplements, etc. According to the inclusion criteria, RCTs with calcium or calcium plus vitamin D supplements with a placebo or no treatment were included in this study. However, no information about vitamin D supplementation was provided. Therefore, it seems unclear whether the effect of improving BMD or BMC is due to calcium alone or calcium plus vitamin D.

      We are extremely grateful for your great patience and for your kind suggestions. According to your suggestions, we have added the corresponding analyses regarding calcium supplementation with or without vitamin D supplementation. Among the included RCTs, 32 trials used calcium-only supplementation (without vitamin D supplementation) and 11 trials used calcium plus vitamin D supplementation. The detailed information are provided in the Table 1 and 2 at the end of this Author Response. We have added subgroup analyses by vitamin D supplementation as you suggested, and the corresponding results are provided in Table 3a and 3b at the end of this Author Response.

      When we pooled the data from the two subgroups separately, we found that calcium supplementation with vitamin D had greater beneficial effects on both the femoral neck BMD (MD: 0.758, 95% CI: 0.350 to 1.166, P < 0.001 VS. MD: 0.477, 95% CI: 0.045 to 0.910, P = 0.031) and the femoral neck BMC (MD: 0.393, 95% CI: 0.067 to 0.719, P = 0.018 VS. MD: 0.269, 95% CI: -0.025 to 0.563, P = 0.073) than calcium supplementation without vitamin D. However, for both BMD and BMC at the other sites (including lumbar spine, total hip, and total body), the observed effects in the subgroup without vitamin D supplementation appeared to be slightly better than in the subgroup with vitamin D supplementation. Therefore, these results suggested that calcium supplementation alone could improve BMD or BMC, although additional vitamin D supplementation may be beneficial in improving BMD or BMC at the femoral neck.

      We have added relevant parts in the main text of the revised manuscript. (See Lines 258-263 on Pages 12-13 and Lines 367-374 on Page 23 in the Main Text)

      As you mentioned, there exists large intertrial heterogeneity in this study, for which we compulsorily chose the random effect model, which was appropriate to get more conservative results. In addition, we did meta-subgroup analyses by calcium dose, sex, age, duration, regions, baseline calcium intake, types of calcium supplements, in order to explore possible sources of heterogeneity.

      The results of subgroup analyses by dose of calcium supplementation are showed in Table 4a and 4b at the end of this Author Response. For both BMD and BMC at the lumbar spine and whole body, the intertrial heterogeneity was significantly smaller in the subgroup with a calcium supplementation dose greater than or equal to 1000 mg/day than that in the subgroup with a calcium supplementation dose less than 1000 mg/day, suggesting that different doses of calcium supplementation may be a potential source of the heterogeneity.

      The results of subgroup analyses by sex are showed in Table 5a and 5b at the end of this Author Response. The intertrial heterogeneity was significantly smaller in the subgroup with both men and women than that in the subgroup with women only, also suggesting that sex could be a possible source of the heterogeneity.

      The results of subgroup analyses by age (pre-peak VS. peri-peak ) are showed in Table 7a and 7b at the end of this Author Response. The intertrial heterogeneity was significantly smaller in the peri-peak subgroup than that in the pre-peak subgroup, also suggesting that age may be a potential source of the heterogeneity.

      The results of subgroup analyses by intervention duration (pre-peak VS. peri-peak ) are showed in Table 8a and 8b at the end of this Author Response. For both BMD and BMC at the lumbar spine and total hip, the intertrial heterogeneity was smaller in the subgroup with a intervention period less than 18 months than that in the subgroup with a intervention period greater than or equal to 18 months, suggesting that intervention duration might be a potential source of the heterogeneity.

      Table 9a and 9b at the end of this Author Response showed the results of subgroup analyses by population region. The intertrial heterogeneity was significantly smaller in the Asian subgroup than that in the Western subgroup, also suggesting that population region may be a source of the heterogeneity.

      Table 10a and 10b at the end of this Author Response showed the results of subgroup analyses by dietary calcium intake levels at baseline. The intertrial heterogeneity was smaller in the subgroup with the dietary calcium intake level greater than or equal to 714 mg/day than that in the subgroup with the dietary calcium intake level lower than 714 mg/day, also suggesting that dietary calcium intake levels at baseline could be a potential source of the heterogeneity.

      Table 11a and 11b at the end of this Author Response showed the results of subgroup analyses by types of calcium supplements. For both BMD and BMC at the lumbar spine, the intertrial heterogeneity was smaller in the subgroup with calcium supplementation than that in the subgroup with dietary calcium, also suggesting that types of calcium supplements might be a source of the heterogeneity.

      In conclusion, the observed heterogeneity might be due to the differences in sex, age, regions of subjects, doses, intervention duration, and types of calcium supplementation, dietary calcium intake levels at baseline, and with or without vitamin D supplementation. We have updated the discussion on heterogeneity in the revised manuscript. (See Lines 394-397 on Pages 24 in the Main Text)

      Thanks again for your comments, we have tried to analyze and explain the large heterogeneity through a variety of approaches, however, there may still remain some inadequacies. Please tell us directly if it needs further corrections, we will be very grateful and appreciate it, and try our best to revise this part of heterogeneity.

      Reviewer #3 (Public Review):

      This paper will be welcome for clinicians and researchers related to the field. The authors, applying a well-structured meta-analysis, showed that calcium supplementation or calcium intake during 20-35 years is better than the <20 years. The clinical impact is directly associated with improving the bone mass of the femoral neck, and thus proposes a window of intervention for osteoporosis treatment. The manuscript is very well prepared and represents a thorough analysis of available randomized controlled clinical trials, but a few issues require additional consideration.

      We are very grateful for your considerate comments and for your recognition to our work in this study. Your comments are invaluable and have been very helpful in revising and improving our manuscript.

      After a careful read of the literature, it is important to highlight that the paper is a statistically robust study with a well-delineated meta-analysis of youth-adult subjects. But, I would like better to understand why the authors didn't use other datasets such as WHO Global Index Medicus (Index Medicus for Africa, the Eastern Mediterranean Region, South-East Asia, and Western Pacific, and Latin America and the Caribbean Literature on Health Sciences, Index Medicus), ClinicalTrials.gov, and the WHO ICTRP.

      Thank you so much for your thoughtful advice and your generosity in recommending these datasets to us. Based on your advice, we thoroughly searched these databases (the detailed search terms are provided in the Appendix File at the end of this Author Response). We have identified 23 potentially related studies and registered trials in these databases. After careful screening and review, however, no new studies were ultimately included in this meta-analysis. Some studies, which had not been completed, are recruiting subjects, and some studies were duplicates of the RCTs we had included. Finally, no new additional trials were included in our meta-analysis. The detailed screening process and the reasons for exclusion are showed in Figure 1. These three additional global databases will provide us with more comprehensive information for our future studies, thank you very much for your suggestions and guidance.

      Figure 1. Flow chart of search and selection

      References: 1. ID: emr-156089 (https://pesquisa.bvsalud.org/gim/resource/en/emr-156089) 2. ID: wpr-270003 (https://pesquisa.bvsalud.org/gim/resource/en/wpr-270003) 3. ID: lil-243754 (https://pesquisa.bvsalud.org/gim/resource/en/lil-243754) 4. ID: sea-23757 (https://pesquisa.bvsalud.org/gim/resource/en/sea-23757) 5. ID: NCT00067925 (https://clinicaltrials.gov/ct2/show/NCT00067925?term=NCT00067925&draw=2&rank=1) 6. ID: NCT00979511 (https://clinicaltrials.gov/ct2/show/NCT00979511?term=NCT00979511&draw=2&rank=1) 7. ID: NCT00065247 (https://clinicaltrials.gov/ct2/show/NCT00065247?term=NCT00065247&draw=2&rank=1) 8. Matkovic V, Landoll JD, Badenhop-Stevens NE, et al. Nutrition influences skeletal development from childhood to adulthood: a study of hip, spine, and forearm in adolescent females. J Nutr. 2004;134(3):701S-705S. doi:10.1093/jn/134.3.701S 9. Barger-Lux MJ, Davies KM, Heaney RP. Calcium supplementation does not augment bone gain in young women consuming diets moderately low in calcium. J Nutr. 2005;135(10):2362-2366. doi:10.1093/jn/135.10.2362 10. Cornes R, Sintes C, Peña A, et al. Daily Intake of a Functional Synbiotic Yogurt Increases Calcium Absorption in Young Adult Women. J Nutr. 2022;152(7):1647-1654. doi:10.1093/jn/nxac088 11. ID: NCT00063011 (https://clinicaltrials.gov/ct2/show/NCT00063011?term=NCT00063011&draw=2&rank=1) 12. ID: NCT00063024 (https://clinicaltrials.gov/ct2/show/NCT00063024?term=NCT00063024&draw=2&rank=1) 13. ID: NCT01857154 (https://clinicaltrials.gov/ct2/show/NCT01857154?term=NCT01857154&draw=2&rank=1) 14. ID: NCT00067600 (https://clinicaltrials.gov/ct2/show/NCT00067600?term=NCT00067600&draw=2&rank=1) 15. ID: NCT00063037 (https://clinicaltrials.gov/ct2/show/NCT00063037?term=NCT00063037&draw=2&rank=1) 16. ID: NCT00063050 (https://clinicaltrials.gov/ct2/show/NCT00063050?term=NCT00063050&draw=2&rank=1) 17. ID: TCTR20190624002 (https://trialsearch.who.int/Trial2.aspx?TrialID=TCTR20190624002) 18. ID: JPRN-UMIN000024182 (https://trialsearch.who.int/Trial2.aspx?TrialID=JPRN-UMIN000024182) 19. ID: NCT02636348 (https://trialsearch.who.int/Trial2.aspx?TrialID=NCT02636348) 20. ID: ACTRN 12612000374864 (https://trialsearch.who.int/Trial2.aspx?TrialID=ACTRN12612000374864) 21. ID: NCT01732328 (https://trialsearch.who.int/Trial2.aspx?TrialID=NCT01732328) 22. ID: ISRCTN28836000 (https://trialsearch.who.int/Trial2.aspx?TrialID=ISRCTN28836000) 23. ID: ISRCTN84437785 (https://trialsearch.who.int/Trial2.aspx?TrialID=ISRCTN84437785)

      We have also updated the literature search section and the flow chart in the main text of the revised manuscript, as follows:

      We applied search strategies to the following electronic bibliographic databases without language restrictions: PubMed, EMBASE, ProQuest, CENTRAL (Cochrane Central Register of Controlled Trials), WHO Global Index Medicus, Clinical Trials.gov, WHO ICTRP, China National Knowledge Infrastructure and Wanfang Data in April 2021 and updated the search in July 2022 for eligible studies addressing the effect of calcium or calcium supplementation, milk or dairy products with BMD or BMC as endpoints. (see Lines 80-85 on Page 5 and Figure 1 in the Main Text)

      The manuscript compares two sources of participants (in line 233) evaluating the effect of improvements on the femoral neck being "obviously stronger in Western countries than in Asian countries". But, I didn't identify if the searches were conducted applying language restrictions. This is important because we can be considering the entire world or specific countries.

      We are extremely grateful for your great patience and for your kind suggestions. We did not apply any language restrictions during the search process, as documented in the protocol of PROSPERO (CRD42021251275, https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=251275). Following your suggestion, we have added a description of this in the revised manuscript. (See Lines 80-81 on Page 5 in the Main Text)

      During the search process, we did identify five eligible articles from the Chinese databases including China National Knowledge Infrastructure (CNKI, https://www.cnki.net) and WanFang Data (https://www.wanfangdata.com.cn). However, we confirmed that these five studies were duplicates of the articles from the PubMed (PMID: 15230999; PMID: 17627404; PMID: 18296324; PMID: 20044757; PMID: 20460227). For those possibly relevant studies published in other languages than Chinese or English, the full text was downloaded and translated using DeepL translation website (https://www.deepl.com/translator) and then carefully reviewed. Ultimately, all included studies that met the inclusion and exclusion criteria were published in English. In view of this, after a systematic and comprehensive search, especially with the addition of your suggested databases, we could assume that our current study has incorporated all original researches in this field worldwide, rather than only from specific countries or regions.

      To explore whether the effects of calcium supplementation differ across different population regions, we performed subgroup analyses. Prior to the analysis, we hypothesized that the effect might be slightly better, or at least not worse, in populations with lower baseline dietary calcium intakes (lower baseline BMD/BMC levels) than that in populations with higher baseline dietary calcium intakes (higher baseline BMD/BMC levels). However, the results showed that the improvement effects on BMD at the femoral neck and total body and BMC at the femoral neck and lumbar spine were obviously stronger in Western countries than in Asian countries. These findings are likely to be contrary to our common sense, which is, that under normal circumstances, the effects of calcium supplementation should be more obvious in people with lower calcium intakes than in those with higher calcium intakes. Therefore, this issue needs to be tested and confirmed in future trials.

      The manuscript does not describe which version was used with the RoB tool.

      Thank you for your suggestion. As you mentioned, we completed the description of RoB tool in the Methods section, as follows:

      The quality of the included RCTs was assessed independently by two reviewers (SYL, HNJ) based on the Revised Cochrane Risk-of-Bias Tool for Randomized Trials (RoB 2 tool, version 22 August 2019), and each item was graded as low risk, high risk and some concerns. (See Lines 101-103 on Page 6 in the Main Text)

      Figures and Supplementary: No critique.

      Thanks for your kind comments and for your recognition to our work in this study.

      Appendix 1

      Search strategy • WHO Global Index Medicus:

      (tw:(calcium)) OR (mj:(calcium)) OR (tw:(calcium carbonate)) OR (tw:(calcium citrate)) OR (tw:(calcium pills)) OR (tw:(calcium supplement)) OR (tw:(Ca2)) OR (tw:(dairy product)) OR (tw:(milk)) OR (tw:(yogurt)) OR (tw:(cheese)) OR (tw:(dietary supplement)) AND (tw:(bone density)) OR (tw:(bone mineral density)) OR (tw:(bone mineral content))

      ClinicalTrials.gov

      (calcium) OR (calcium supplementation) OR (milk) OR (dairy product) OR (yogurt) OR (cheese) Applied Filters: Interventional (clinical trial); Child (birth–17); Adult (18–64)

      • WHO ICTRP

      (calcium) OR (milk) OR (dairy) OR (yogurt) OR (cheese) in the Intervention

    1. Author Response

      Reviewer #2 (Public Review):

      The authors explored if and how Piezo1 regulated mechanical stiffness and inflammatory signals, thereby directing the differentiation of TH1 and Treg cells in cancer. They showed the genetic deletion of Piezo1, a mechanosensory ion channel, in dendritic cells, promoted tumor growth in a mouse model. Piezo1ΔDC mice showed an increase in Tregs and a decrease in IFNg+ Th1 cells in the MC38 tumor tissue. They showed TGFbR2-pSmad3 and IL-12Rb2-pStat4 signaling axis were involved in this process. Moreover, they suggested cooperation between Piezo1-SIRT1-HIF1a-glycolysis metabolism pathway and calcium-Piezo1-calcineurin-NFAT signaling pathway in DCs.

      The authors have never directly tested the relationship between Piezo1 and DC stiffness. The authors claimed that "Piezo1 integrates innate inflammatory signals and mechanical stiffness signals". But what they showed were independent experiments of inflammatory stimulus (LPS) or stiffness stimulus (50kPa hydrogel). Do these two stimuli work together to induce Piezo1 signaling and contribute to Piezo-mediated differentiation of Th1 and Treg cells?

      Following the reviewer’s suggestions and comments, we included the new data showing that different stiffness conditions or/and LPS can change Piezo1 expression in human DC cells (Fig. 7C). Accordingly, we also the revised the title and text and added the discussions in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      The tools and approaches in this manuscript are of broad interest, not only to protein engineers but also to the many researchers using genome-editing reagents. However, putting the work in the context of previous research, both through changing the writing and additional experiments, will be critical for taking advantage of that widespread applicability.

      Strengths:

      Overall, the data support the conclusions of the manuscript.

      The most exciting product of this work is an engineered nuclease, Nsp2-SmuCas9, that has high activity and specificity in human cells and a relaxed PAM preference for a single C base. This chimeric enzyme can efficiently induce indels at endogenous sites. While other works have presented nucleases with minimal PAM preferences, Nsp2-SmuCas9 is a useful alternative and may be preferred. It is also more compact than the standard SpCas9, making it appealing for gene therapy applications.

      Technologically, the presented approach of screening orthologs for new specificities and making chimeras to achieve further diversity is a good way to develop new genome-editing reagents. The authors used appropriate methods, such as GUIDE-seq, to complete their goals. Extending beyond the GFP-activation assay to determine activity at endogenous targets enhanced the value of the results.

      Conceptually, it was important information to the field that proteins with very high sequence identity (93%) can have divergent PAM preferences. Through their engineering, the authors clearly demonstrate the advantage of characterizing such close orthologs with diverse amino acids in the area of PAM recognition.

      Weaknesses:

      1) An overall weakness with the work is that it is not clear how the activity level of the relaxed PAM enzyme, Nsp2-SmuCas9, compares to existing enzymes. Is it much better than the SpCas9 that has almost no PAM preference (SpRY) or the NGN PAM (SpG)? How does it compare to the most commonly used SpCas9 nuclease, which is known to be active in a wide variety of biological contexts? The activity assessment at endogenous sites seemed to have a long timeline, as the indel rate was measured 5 days after transfection. Clarifying the effectiveness of this new nuclease would increase the impact of this work.

      We sincerely thank the reviewer for the constructive comments on our manuscript. Following reviewer’s suggestions, we compared the editing efficiency of Nsp2Cas9, Nsp2-SmuCas9, SpCas9, SpCas9-NG, and SpCas9-RY side-by-side. Overall, the editing efficiency was low this time probably due to low transfection efficiency. The results revealed that SpCas9 was the most active enzyme. Nsp2Cas9, SpCas9-NG, and SpCas9-RY displayed similar activity. Nsp2-SmuCas9 displayed lower activities than other Cas9 variants (Figure 5C).

      2) In the presentation of the manuscript, there are several weaknesses. First, while it is true that allele-specific disruption is an important application of new CRISPR proteins, there are many other reasons why they would be useful. The specific focus on this single application throughout the abstract, introduction and discussion takes away from the widespread utility of these new tools. The writing would be more compelling if it targeted a broader audience. Allele-specific targeting is also possible beyond the PAM site if the mutation is in a position with high specificity.

      Many thanks for the reviewer’s suggestions. Following reviewer’s suggestions, we emphasize the widespread utility of these new tools throughout the abstract, introduction, and discussion in the revised manuscript. Allele-specific targeting is only mentioned in the discussion.

      3) Second, the introduction is further missing a discussion of other research engineering new PAM specificity or even completely removing specificity. A more convincing narrative would include reasoning for why characterizing naturally occurring orthologs is a powerful and important approach. This information is in the discussion, but it would be helpful for the reader if these points were in the introduction.

      Many thanks for the reviewer’s comments. Following reviewer’s suggestions, we added other research engineering new PAM specificity in the introduction. We also included reasoning for why characterizing naturally occurring orthologs is a powerful and important approach.

      “Engineered Cas9 variants with flexible PAMs can increase targeting scope. For example, SaCas9 was engineered to accept an NNNRRT PAM [1]; SpCas9 was engineered to accept almost all PAMs [2], but this strategy is time-consuming, and often comes at a cost of reduced on-target activity. Another strategy is to harness natural Cas9 nucleases for genome editing. We have developed several closely related Cas9 orthologs for genome editing [3, 4]. The advantage of developing tools from closely related Cas9 orthologs is that they can exchange the PAM-interacting (PI) domain. If an ortholog recognizes a particular PAM but does not work efficiently in human cells, we can use this ortholog PI to replace another ortholog PI to generate a chimeric Cas9.”

      4) A second concern with the presentation and analysis of the findings is a minimal connection to the structural context of the discoveries. Many readers will likely be interested in how the specificity shifts are occurring in these orthologs, which could be remedied by supplementary figures of homology models.

      We totally agree with the referee that structural models would help readers better understand the specificity shifts occurring in these orthologs. We have generated calculated structural models of these orthologs in complex with sgRNA and DNA using the crystal structure of Nme1Cas9 (PDB ID: 6JDV). Some specificity shifts can be well explained by these structural models. When the amino acid near the 5 position of the PAM is histidine, its side chain forms a potential hydrogen bond with the 6-hydroxyl group of guanine. Replacement of this guanine by cytosine or thymine would cause a major clash, whereas adenine lacks the hydroxyl group to form hydrogen bond with the histidine (Figure 2-figure supplement 2A). Likewise, an aspartate at 5 position of the PAM would favor a specific recognition of cytosine via hydrogen bonding with its 4-amine group, but not of other bases that may either result in major clash or abolish the hydrogen bond (Figure 2-figure supplement 2B). Similar explanation applies also to the apparent specificity between glutamine and adenine at the 8 position of the PAM on the target sequence (Figure 2-figure supplement 2C).

      5) Along the same lines, further structural analysis of the failures would be helpful for those embarking on similar projects. Are there any differences in the sequence or structure of the 4/29 orthologs that were not functional in the GFP-activation assay compared to those that were?

      Sequence alignment indicates that the four inactive orthologs possess intact active sites. In the predicted structural models of these orthologs, we did not observe local conformational variations that preclude the interaction with sgRNA or DNA. Sequence alignment indicates that the four inactive orthologs possess intact active sites. In the predicted structural models of these orthologs, we did not observe local conformational variations that preclude the interaction with sgRNA or DNA. We speculate that specific modifications of Cas9s in mammalian cells may occur, leading to the loss of enzymatic activities of the 4 orthologs.

      Calculated structural models of AseCas9, Hpa1Cas9, MspCas9, and PlaCas9. Overall calculated structures of AseCas9, Hpa1Cas9, MspCas9, and PlaCas9 with sgRNA and dsDNA.

      6) Similarly, it was surprising that the Nsp2-NarCas9 chimera was not active, and it would be helpful if the authors could speculate based on the differences between SmuCas9 and NarCas9, such as at the interface of the domains that were fused. Structural models of the fusions would help the reader to visualize the strategy. Exploring the failures and challenges is important for understanding the generalizability of the presented approach.

      Following reviewer’s comments, we generated structural models of Nsp2-NarCas9, Nsp2-SmuCas9, and NarCas9 using the crystal structure of highly homologous Nme1Cas9 in complex with sgRNA and dsDNA (PDB ID: 6JDV) as the template by SWISS-MODEL. By superimposing these models, we noticed that residues G1035, K1037 and T1038 of Nsp2-NarCas9 chimera protrude towards the DNA molecule, which would prevent the binding with DNA and thereby abolishing the editing activity (Figure 4-figure supplement 2A). In comparison, Nsp2-SmuCas9 and NarCas9, which possess the Cas activity, show no protrusion at the corresponding position (Figure 4-figure supplement 2B-C).

      7) Finally, the final sequence of Nsp2-SmuCas9 fusion, as well as other enzymes such as the failed Nsp2-NarCas9, are not obvious in the manuscript. I may have missed them, but I also did not see the primers used in the Methods section. Addgene submission is also encouraged and would be of great value to the scientific community.

      Thank you for your suggestions. The final sequence of Nsp2-SmuCas9, as well as other enzymes, have been provided in Supplemental file 1. The primers for chimera proteins were listed in Supplemental file 1. We will submit plasmids to Addgene soon.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors of this study adopted Cas9-mediated enrichment of target locus and Nanopore long-read sequencing to accurately count repeat numbers in the CNBP gene, which is notorious for precise calling before. They also compared their result with that of the conventional approach, validating their approach. It is an interesting read and shows a pathway that a clinic can take in the near future.

      However, this paper's novel contributions need to be emphasised as there are some papers that utilized Nanopore sequencing to elucidate short repeats (https://pubmed.ncbi.nlm.nih.gov/35245110/; https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-020-00853-3).

      The reviewer is correct that ONT sequencing had been already utilized for the analysis of the microsatellite within the CNBP gene (Stevanovski et al. 2021; Mitsuhashi et al. 2021), however this was confined to CNBP alleles in the normal range only. Moreover, the approaches utilized present some critical drawbacks. The work of Mitsuhashi and colleagues exploited ONT whole genome sequencing, that is not applicable in the routine due to the very high costs. The group of Stevanovski utilized the recently introduced “Read Until” feature of ONT sequencing for the analysis of microsatellites in 37 disease-associated loci. This allows selective sequencing of pre-defined target DNA molecules, thus enabling a targeted sequencing with similar advantages of the Cas9 mediated sequencing presented hereby. However, enrichment levels achieved by “Read Until” (5x) are consistently lower than those obtained with the Cas9 approach (500x), due to higher background. This may constitute an important issue when dealing with extremely long CNBP alleles that can be disadvantaged in sequencing as compared to shorter contaminating fragments (Shruti V Iyer, BioRxiv 2022).

      These aspects, underlying the advantages of the Cas9 mediated sequencing presented, hereby have been now reported in the “Discussion”section (Lines 337-348).

      Another issue is the clinical utility of the approach. Although it is precise, it is not totally clear whether this accuracy is required in clinical practice, as the repeat status does not completely correlate with phenotypic severity.

      The genotype-phenotype issue in DM2 is still an open question and relies on a single study from Day et al. (2003; PMID:12601109) in which Southern blot analysis was used to determine the length of the DM2 mutation. Because of the extremely large size of the CCTG expansions and somatic instability of the repeat, Southern blot fails to detect the DM2 mutation in about 20% of known carriers, whose expansion length remains undeterminable. Moreover, detectable expanded alleles can appear as single discrete bands, multiple bands, or smears with no indication of the degree of mosaicism. The absence of precise genotype-phenotype correlation can be thus largely due to the technical difficulties in analysing such expansions in details. Despite a clinical utility of the presented approach would not be thus immediate due to lack of knowledge, we believe that the use of long read sequencing in large cohorts of DM2 patients could definitively clarify if information about the length, the composition and the degree of mosaicism of the DM2 mutation are associated with the severity of the DM2 clinical phenotype and/or with the disease age at onset.

      Considerations related to the clinical utility of the approach have been now included in the “Discussion” section (lines 294-300 and lines 420-428).

      Lastly, it is not clear about the familial cases (A1-A4). What are their relationships and why their copy numbers are not exactly the same? Is it because of extreme recombination and variation even in a family or just represent limited accuracy?

      Cases A1-A4 derived from a large consanguineous DM2 family, whose pedigree has been now reported in Figure S1. The extreme variability in the (CCTG) and (TCTG) copy numbers within the family is typical of DM2 patients, as reported in Day et al., 2003. A tendency towards contractions rather than expansion of the CCTG array can also been observed in this family, in agreement with literature data. The meiotic instability of the (CCTG)n and (TCTG)n distal tract is probably due to unequal recombination events and errors during DNA replication/repair of this highly repetitive region, which give rise to somatic and germinal mosaicism. If we consider the variability in the number of (TG)v, this likely reflects a limited accuracy of the method, as discussed for the healthy alleles (Table 2). The 5’ (TG)v and (TCTG)w arrays are indeed supposed to be polymorphic in the general population but stable in the same individual and in the meiotic transmissions. Consistently, we now show in Figure S4 that all family members show an equivalent pattern of TG repetitions. Such small inconsistences probably reflect ONT sequencing errors and could be addressed by using the most recent base-calling algorithm and eventually the more accurate Q20+ chemistry. According to the Reviewer’s observations, all these aspects have been discussed more deeply in the manuscript, with the support of the additional Figure S1 and S4 (see Results lines 215-219 and Discussion lines 364-368)

      They lack a validation cohort, with prospective patients.

      The reviewer is correct, this is a pilot study on a limited number of DM2 patients. We are aware that a validation including a larger cohort of DM2 patients would be desirable to further confirm our results. This limitation of the study has been clearly indicated in the “Discussion” section” (lines 385-392). Unfortunately, the majority of available DNA samples derive from retrospective analyses and the DNA quantity/quality was not always sufficient for ONT sequencing. We are planning to collect at least 30 novel DNA samples from prospective DM2 cases, either sporadic or familiar. However, the limited number of DM2 patients referring to our centre (about 1-2 pts/month) and the low incidence of DM2 in the Italian population (Vanacore et al., 2016) will make this collection and validation not feasible in the short time.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Dr Riley and colleagues reports a novel link between molecular clock operative in skeletal muscle and titin mRNA, encoding for essential regulator of sarcomere length and muscular strength. Surprisingly, this clock-mediated regulation of titin occurs at the level of splicing, as demonstrated by SDS-VAGE analyses of skeletal muscle from muscle-specific Bmal1KO mice compared to Bmal1wt counterpart. Concomitant with switch of predominant isoform of titin, skeletal muscle of muscle specific Bmal1KO mice exhibited irregular sarcomere length. Moreover, the authors show that this shift of titin splice is causal for such sarcomere length irregularity and for altered sarcomere length in muscle from the mice with compromised clock function. Importantly, the authors provide compelling evidence that Rbm20, encoding for RNA-binding protein that mediates splicing of titin, is cooperatively regulated by Bmal1-Clock heterodimer and MyoD, via enhancer element in intron 1 of Rbm20, thus identifying Rbm20 as a novel direct clock-regulated gene in the skeletal muscle. Strikingly, rescue of Rbm20 in muscle specific Bmal1KO animals' results in rescue of titin splicing pattern and protein size, suggesting that Rbm20 mediates the regulatory effect of Bmal1 on titin splicing and represents a mechanistic link between the clock and regulator of sarcomere length and regularity.

      We thank reviewer 1 for the very kind comments. We agree that the circadian regulation of titin in any capacity is surprising. We are excited about the implications of our work for cardiac muscle and its therapeutic potential in human skeletal muscle.

      Reviewer #2 (Public Review):

      In this work the authors investigated whether deleting the BMAL1 gene, an integral component of the cellular clock that drives the circadian rhythms of cells, affects the giant protein titin. They report that deleting BMAL1 in skeletal muscle alters the splicing of titin and that this might underlie an increase in sarcomere length dispersion. They show that the effect is through the titin splicing factor RBM20. This work has high novelty and has the potential to add to our understanding of muscle physiology. It is unclear whether splicing of skeletal muscle titin indeed undergoes a circadian rhythm. This could be easily checked using protein gels or RNA seq in muscle samples collected at different times of the day.

      We appreciate the question and recognize that our original manuscript did not clearly outline that the circadian clock regulates both rhythmic and non-rhythmic gene expression. In this study, the target of the muscle clock is expression of Rbm20 mRNA which is not a rhythmically expressed gene in muscle. This has now been addressed in the manuscript.

      Based on the estimated titin turnover and incorporation rates of titin (Cadar et al., 2014), we do not believe that skeletal muscle titin splicing undergoes a circadian rhythm. However, we believe our data highlights the growing recognition of the molecular clock in regulating non-rhythmic processes. We have added data from a chronic phase advance model of circadian disruption with wildtype mice and identify that disrupted circadian rhythms are sufficient to change Rbm20 expression in skeletal muscle (Figure 5).

      This work would be more convincing if the sarcomere length dispersion was investigated in greater detail. Showing this in one muscle type only (TA), in muscles fixed at one length only, and not showing sarcomere length dispersion in the rescue experiment of Figure 6, is rather limited.

      We agree that our analysis of sarcomere length dispersion across joint angles would be interesting but we think it is beyond the scope of this study. As noted above, the premise of this study emerged from our early work in which we found that skeletal muscle from 2 different genetic mouse models of circadian disruption, Bmal1 KO mice as well as the Clock mutant mice, exhibit decreased maximum specific force with significant disruptions to sarcomere structure (Andrews et al., PNAS, 107 (44) 19090-19095 2010). The primary focus of this study was to address the mechanistic link between the muscle circadian clock, its transcriptional targets with a focus on sarcomere structure and our first clue was with the expression of titin isoforms. We included analysis of sarcomere length as an outcome measure because it is a fundamental feature of skeletal muscle, it has links to mechanical function and it is a structure that can be modified by titin spliceforms.

      A small increase in sarcomere length variation as suggested in Figure 2 is unlikely to have a great functional consequence. If it were, how can muscles that express naturally long titin isoforms (soleus, EDL, diaphragm, etc), function well?

      We did not intend to suggest that we see an increase in sarcomere length in Figure 2 and have clarified the figure and text accordingly. The change we see is related to the variability of sarcomere length; we do not see any change in the average sarcomere length. The topic of titin spliceform specialization and the contribution to sarcomere structure and function across different muscle groups (soleus vs. EDL vs. Diaphragm) is a really interesting question but beyond the scope of this study.

      Reviewer #3 (Public Review):

      This manuscript is using an inducible and skeletal muscle specific Bmal1 knockout mouse model (iMSBmal1-/-) that was published previously by the same group. In this study, they utilized the same mouse model and further investigated the effect of a core molecular clock gene Bmal1 on isoform switching of a giant sarcomeric protein titin and sarcomere length change resulted from titin isoform switching. Lance A. Riley et al found that iMSBmal1-/- mouse TA muscle expressed more longer titin due to additional exon inclusion of Ttn mRNA compared to iMSBmal+/+ mice. They observed that sarcomere length did not significantly change but more variable in iMSBmal1-/- muscle compared to iMSBmal+/+ muscle. In addition, they identified significant exon inclusion in the proximal Ig region, so they measured the proximal Ig length domain and confirmed that proximal Ig domain was significantly longer in iMSBmal1-/- muscle. Subsequently, they experimentally generated a shorter titin in C2C12 myotubes and observed that the shorter titin led to the shorter sarcomere length. Since RBM20 is a major regulator of Ttn splicing, they determined RBM20 expression level, and found that RBM20 expression was significantly lower in iMSBmal1-/- muscle. The reduced RBM20 expression was regulated by the molecular clock controlled transcriptional factor MyoD1. By performing a rescue experiment in vivo, the authors found that rescue of RBM20 in iMSBmal1-/- TA muscle restored titin isoform expression, however, they did not measure whether sarcomere length was restored. These data provide new information that the molecular cascades in the circadian clock mechanism regulate RBM20 expression and downstream titin isoform switching and sarcomere length change. Although the conclusion of this manuscript is mostly supported by the data, some aspects of experimental design and data analysis need be clarified and extended.

      Strengths:

      This paper links the circadian rhythms to skeletal muscle structure and function through a new molecular cascade: the core clock component Bmal1-transcription factor MyoD1-RBM20 expression-titin isoform switching-sarcomere length change.

      Utilization of muscle specific bmal1 knockout mice could rule out the confounding factors from the molecular clock in other cell types

      The authors performed the RNA sequencing and label free LC-MS analyses to determine the exon inclusion and exclusion through a side-by-side comparison which is a new approach to identify individual alternative spliced exons via both mRNA level and protein level.

      We agree that the side-by-side analysis from RNAseq and LC-MS data are novel and provides a foundation for others wanting to study both titin mRNA and protein. In this version, we have expanded this work to include samples from our Rbm20 rescue model (Figure 6). Similarly, to our approach in the muscle specific Bmal1 knockout model, these results confirm our RNA-seq results and indicate that LC-MS is a suitable method to measure titin protein isoform. We note that while more work is needed to confirm the broad utility of the LC-MC approach, it may be a suitable alternative to RNA-seq for measuring region-specific, and possibly exon-specific, changes in titin isoform expression.

      Weaknesses:

      Both RBM20 expression and titin isoform expression varies in different skeletal muscles. The authors only detected their expression in TA muscle. It is not clear why the authors only chose TA muscle.

      The reviewer, like Reviewer 2, raises a good point about muscle specificity as this is a significant challenge for research in the field of skeletal muscle. As we noted above, our primary focus was on the TA because our goal was to study the molecular links between the muscle circadian clock and titin expression with inclusion of analysis of a structural outcome, sarcomere length variability. This muscle is well suited for the combination of approaches employed. We recognize the limits of using a single muscle, but we note that the we used ChIPseq data that provided the initial clues that CLOCK and BMAL1 bind to a site within intron 1 of the Rbm20 gene came from gastrocnemius and not TA muscle samples . Our targeted ChIP-PCR confirms that CLOCK and BMAL1 bind to the same intron 1 location from TA muscle samples. In addition, we have included data from quadriceps and TA muscles in our chronic jet lag model in which we use an environmental manipulation to disrupt the muscle clocks. We believe that the edits to the text and inclusion of this data strengthen and extends our findings to other muscles through circadian disruption and not only a genetic knockout model.

      The sarcomere length data are self-contradictory. The authors stated that sarcomere length was not significantly changed in muscle specific KO mice in Line 149, however, in Line 163, the measurements showed significantly longer in muscle specific KO muscle. The significance is also indicated in Figures 2C and 3B.

      We apologize for the miscommunication. The significance indicated in Figure 2C refers to the significant difference in variability of sarcomere length and not a significant difference in sarcomere length. The difference in Figure 3B is to indicate a slightly longer but significantly different from control sarcomere length, but also a significant difference in sarcomere length variability. To make this difference clear, we have changed the symbol for significantly different variability from * to # in both Figures 2C and 3B. We hope this clarifies our findings.

      Manipulating titin size using U7 snRNPs linking to the changes in sarcomere length and overexpressing RBM20 to switch titin size are the concepts that have been proved. These data do not directly support the impact of muscle specific Bmal1 KO on ttn splicing and RBM20 expression

      We agree that the use of U7 snRNPs does not directly support the impact of muscle specific Bmal1 KO on titin splicing and RBM20 expression; however, that was not the goal of this set of experiments. Several papers have recently indicated titin’s role as a sarcomeric ruler (Tonino 2017, Brynnel 2018), but none of them have investigated the proximal Ig domain that we identified as regulated by the circadian clock disruption. Because of this, we thought it necessary to show this region specifically contributes to sarcomere length using our cell culture model. Further, we think this point strengthens our study as it suggests that in the absence of a clock effect, altering the proximal Ig domain of titin directly alters sarcomere length adding to the growing evidence base that titin acts as a sarcomeric ruler. We have edited the text of the results and the discussion to clarify this point.

      There is no evidence to show if interrupted circadian rhythms in mice change RBM20 expression and ttn splicing, which is critical to validate the concept that circadian rhythms are linked to Ttn splicing through RBM20.

      We recognize this concern and have performed a new study in which we used a model of chronic jet lag in normal adult C57BL6 mice as a model to disrupt the muscle clock (Wolff, Duncan and Esser, JAP 2013). This new data has been added in Figure 5 and shows that by altering the lights on: lights off schedule every 4 days for 8 weeks, mimicking repeated jet lag, we disrupt Rbm20 expression in TA and gastrocnemius muscle (note, this is new data for both the muscle and clock fields). Concomitant with changes in clock gene expression we reported in 2013, we found that mRNA expression of Rbm20 is altered as well. These findings confirm that normal muscle clock disruption is sufficient to alter expression of Rbm20.

    1. Author Response

      Reviewer #2 (Public Review):

      Detomasi et al investigated the mechanism behind allorecognition in the filamentous fungus Neuraspora crassa. Previous work had identified two proteins cwr-1 and cwr-2 that control recognition of haplotype and the following cell wall dissolution and subsequent fusion of hyphae. This work is a systematic study of the role of cwr-1 in the allorecognition of six haplogroups. Cwr-1 is predicted to be a chitin-active lytic polysaccharide monooxygenase belonging to the AA11 enzyme family. The activity of the isolated cwr-1 enzyme on chitin is confirmed and it is shown that the catalytic domain is sufficient to confer an allorecognition checkpoint. Surprisingly, and in contrast to previously published data, enzyme activity of cwr-1 is not required. This is shown by the introduction of mutant cwr-1 lacking key residues for activity in a cwr-1 deletion strain followed by screening for fusion events with an incompatible cwr-2 allele.

      The strength of this study is the rigor by which all experiments have been designed and carried out. The data sets from the biological assays are complete and treated appropriately with statistical tools. The enzymology is for the most part very comprehensive with eg. full-length mass spectrometry to verify the mutant enzymes. The enzyme activity assays using chitin as substrate are carried out at a high standard using HPAEC detection of soluble products.

      Because of the highly surprising conclusion that the active domain but not the active site is required, the weakness of the manuscript lies in the inability to explain this finding.

      The term "moonlighting" to describe the phenomenon, is not a very good one. I would recommend changing the title of the paper accordingly. There are already published studies that describe proteins (termed X325) that are highly similar to lytic polysaccharide monooxygenases (both in the overall fold and in the coordination of a single copper atom) that have a clear biological function but no detectable catalytic activity.

      Moonlighting proteins comprise a subset of multifunctional proteins in which one polypeptide chain exhibits more than one physiologically relevant biochemical or biophysical function (Jeffery CJ. 1999 Moonlighting Proteins. Trends Biochem. Sci. 24, 8–11 doi:10.1016/S0968-0004(98)01335-8). CWR-1 fits this description. It is predicted to be a PMO and indeed utilizes chitin as a substrate and yields expected PMO-derived oxidative products, thus is clearly definable as a PMO. CWR-1 also is directly involved in allorecognition and as our paper shows, the chitin catalytic activity has nothing to do with allorecognition. So as defined, CWR-1 is a “moonlighting” protein. There does not appear to be another PMO that falls under this definition. The closest one would be GbpA, however activity on polysaccharides present on mucin has not been ruled out, so we decided to remove this comment from the manuscript. The “X325” family PMO-type proteins have an alternate activity, but not two separate activities within the same polypeptide. Although we do not yet know what physiological role is played by the chitin activity in N. crassa, it is not required to know this to conform to the definition. Thus, CWR-1 conforms to the moonlighting definition. We slightly changed the title of the manuscript to be less cumbersome.

    1. Author Response

      Reviewer #3 (Public Review):

      Liu et al. investigated the role of Epac2, the "other" less studied cAMP effector (compared to the classical PKA) in dopamine release and cocaine reinforcement using slice electrochemistry, behavior, and in vivo imaging in dopamine neuron-specific Epac2 conditional knockout mice (confirmed by elegant single-cell RT-PCR). Epac2 genetic deletion (Epac2 cKO) or pharmacological inhibition (using the Epac2 antagonist ESI-05, i.p.) reduced cocaine (under both fixed and progressive ratio schedules) but not sucrose, self-administration, supporting an essential role for Epac2 in cocaine reinforcement but not natural reward. Cyclic voltammetry on striatal slices demonstrated that evoked DA release was reduced in Epac2 cKO mice and enhanced by the Epac2 activator S-220 or the PKA activator 6-Bnz independently. Using in vivo chemogenetics and fiber photometry (with the DA fluorescent sensor GRABDA2M), authors showed that DCZ activation of VTA DA neurons expressing rM3D(Gs) increased NAc DA release and cocaine SA in Epac2 cKO mice (rescuing), whereas inhibition of VTA DA neurons expressing hM4D(Gi) decreased DA release and cocaine SA in WT mice (mimicking). Based on these experiments, the authors concluded that Epac2 in midbrain DA neurons contributes to cocaine reinforcement via enhancement of DA release.

      The experiments are generally rigorous and the conclusions are mostly well supported by data, but some aspects of behavioral experiments and data analysis need to be clarified or extended.

      1) The chemogenetic rescue experiments in Fig. 7 suggested that enhancing DA release in Epac2 cKO mice rescued cocaine SA in mutant mice, but did not necessarily demonstrate that Epac2 mediates this process, thus a causal mechanistic link is missing. This is an important point to clarify because the central theme of the work is that Epac2 regulates cocaine SA via DA release. In addition, it's unclear if chemogenetic activation of DA neurons also enhances sucrose reward. A potentially positive result would not affect the conclusion that enhancing DA release can rescue cocaine SA in mutant mice but will affect the interpretation and specificity of the rescue data.

      The reviewer’s viewpoint is well taken. We agree that Gs-DREADD activation may restore the Epac2-cKO-induced decrease in dopamine release, but not other deficits caused by Epac2 deletion. We acknowledge the limitations of our DREADD experiments (see our response to Reviewer 2 above). Please also see our response to question 2 below.

      In the revised manuscript, we provided representative temporal patterns of FR1 sucrose self-administration in WT and Epac2-cKO mice, which did not display significant differences between genotypes (see newly added Figure 3 – figure supplement 2). To prevent excessive sucrose intake, sessions ended if the maximum number (64) of reinforcers were earned during the 1-hour training session. Almost all wild-type and Epac2-cKO mice had approached this maximum level near the end of the 10-day training. While testing if chemogenetic activation of VTA dopamine neurons enhances sucrose self-administration is, in principle, a good idea, such enhancement would likely lead to a ceiling effect, making the detection of potential differences between genotypes difficult.

      2) Relatedly, chemogenetic inhibition experiments in Fig 8 showed that inhibiting DA neurons reduced DA release and cocaine SA in WT mice, which suggested that the strength of DA transmission was a regulator of cocaine SA. This is expected given the essential role of DA transmission in reward in general, but it did not provide strong insights regarding the specific roles of Epac2 in the process.

      An ideal experiment would be to examine whether viral expression of Epac2 in VTA dopamine neurons in Epac2-cKO mice could restore cocaine self-administration to the level of WT mice. However, our lab is not equipped to do this type of study at its current capacity, but we are very interested in exploring this exciting experiment in the future.

      3) Fig 7B. DCZ-induced DA releases enhancement in the fiber photometry recording seems to only last for ~30 min, well short of the duration of a cocaine SA session (3 hrs). It's unclear how this transient DA release enhancement could cause the prolonged cocaine SA behavior.

      We appreciate the insight from the Reviewer. We have included the time course of dopamine transients following DCZ injection (now Fig. 6B,C). Although the DCZ-induced enhancement of DA transients was most robust during the first 30 min, an enhancement persisted for the duration of fiber photometry recording (1 hour after DCZ injection). In the original study in which DCZ was developed as a DREADD ligand (Nagai et al., 2020), in vivo two-photon imaging of somatosensory cortex neurons that co-expressed Gq-DREADD (hM3Dq) and GCaMP6 revealed that i.p. injection of DCZ led to a rapid increase in GCaMP6 activity in mice that peaked at about 10 min and plateaued for at least 150 min (see Fig. 4 in that paper). Although Gs-DREADDs may respond to DCZ differently, it appears that DCZ induces long-lasting activation of DREADDs expressed in the brain. We have added a brief discussion in the Results section of the revised manuscript (page 12, lines 262-265).

      4) Fig. 9. working hypothesis: hM4D(Gi) and hM3D(Gs) are shown to inhibit and enhance synaptic vesicle docking, which is not accurate. These DREADDS presumably regulate neuronal excitability, which in turn affects SV release.

      We agree with the reviewer and have removed synaptic vesicle docking from the model (now Figure 8).

    1. Author Response

      Reviewer #1 (Public Review):

      “The authors suggest that they uncovered two distinct phases of how the posterior axial identity is controlled; the first involving TBXT/Wnt to generate posterior 'uncommitted progenitors', which then go on to generate NCCs, and the second involving FGF to impart posterior axial identity onto CNS/spinal cord cells.”

      Based on our new data we have slightly modified our model: (i) TBXT controls posterior axial identity acquisition in NMP precursors and both their trunk NC and CNS spinal cord derivatives; (ii) this early, TBXT-driven posteriorisation phase appears to be WNT dependent; (iii) a subsequent TBXT/WNT-independent phase of Hox cluster regulation occurring during the transition of NMPs towards their NC/spinal cord derivatives is controlled predominantly by FGF signalling. This model is shown in Figure 9 in the revised manuscript.

      “I am not convinced that their data show this; it is equally possible that NMPs are heterogeneous and the effects observed simply reflect a differential response of cells or selection. Since the authors largely analyse their data by qPCR it is difficult to disentangle this.”

      We believe that the inclusion of new data defining the emergence of NMP derivatives at the single cell level through analysis of key trunk lineage-specific markers (HOXC9, SOX10, SOX1, SOX2) via immunostaining and image analysis/flow cytometry (see Figure 3-figure supplement 1, Figure 4C-D, Figure 5-figure supplement 1, Figure 7D-E in revised manuscript) should address the reviewer’s point. See also our response to the editorial comments above. It should be note that the vast majority of day 3 hESC-derived NMPs (>95%) is positive for TBXT protein expression based on antibody staining and thus the starting population for the generation of trunk NC/spinal cord progenitors can be considered largely homogeneous when it comes to the expression of this transcription factor.

      “The authors include some expression data in mouse to support their in vitro findings. However, these need to be explained and integrated better.”

      We hope that breaking down figure 4 and the related text into two parts has improved the integration of the in vivo data in the revised version of the manuscript.

      Reviewer #2 (Public Review):

      “The fact that the regimes are distinct makes the comparisons of neural crest versus spinal cord difficult to interpret as the cells have been exposed for different amounts of time to WNT and FGF when they asses the Hox code in neural crest or spinal cord cells. Specially because the spinal cord induction protocol involves four additional days of culture with FGF and CHIR, and the cells after seven days are not mature neural progenitors.

      To address this point, we employed “neutral”, extrinsic signal-free culture conditions that drive NMPs towards a mixture of early pre-neural spinal cord progenitors and mutually exclusive SOX1+HOXC9+ CNS spinal cord and SOX10+HOXC9+ NC populations. This facilitated the effective assessment of cell fate and posterior axial identity acquisition simultaneously in both NMP-derived spinal cord and NC cells, during discrete time windows of TBXT knockdown (Figure 4 in revised manuscript). For details see our response above.

      Likewise, the authors have previously shown that such a treatment induces the expression of dorsal neural tube/early neural crest markers”.

      Although we have no evidence of SOX10 expression in cultures generated from NMPs following WNT and FGF agonist treatment for 4 days indicating absence of definitive NC cells, we opted to remove the “CNS” references when describing this cell population to accommodate for the possibility that it may be NC-potent given its previously described dorsal neural tube/early NC character (Cooper et al, 2022; Wind et al., 2021).

      “It would be good to see some quality controls on the percentages of neural crest progenitors or spinal cord neural progenitors that they get in each signalling regime. Can the authors separate neural progenitor cells and neural crest cells (for example by FACS sorting with specific markers) to confirm the cell-type specific expression of the HOX genes in these experiments?”.

      As mentioned above, we have now included immunostaining data quantifying thoroughly the induction of trunk SOX1+HOXC9+ CNS spinal cord and SOX10+HOXC9+ NC cells under different culture conditions/TBXT levels (see Figure 4C-D, Figure 5-figure supplement 1, Figure 7 and Figure 7-figure supplement 1).

      “In the neural crest differentiation protocol, there is a slight, non-significant upregulation of neural progenitor markers following TBXT knockdown, can the authors quantify the percentage of neural cells in their cultures to see how much of the observed effect is specific to neural crest cells?”

      We have quantified the emergence of SOX1+ CNS spinal cord progenitor cells in NMPderived trunk NC cultures using both FACS/intracellular staining and immunostaining/image analysis but their numbers are too small (2-3% of total cells with no statistically significant difference between control and TBXT knockdown cells, see Figure 3-figure supplement 1) to extract any meaningful conclusions on the effect of TBXT depletion on them. However, quantification of SOX1+HOXC9+ cells generated from NMPs upon culture in “neutral” basal conditions revealed that TBXT depletion results in a decrease in their number in addition to its established impact on trunk NC (see Figure 4C-D in revised manuscript).

      “Previous work from the lab showed that a 3-day FGF/CHIR treatment of hESCs followed by a two-day incubation on basal medium is sufficient to induce neural progenitors that express Hox genes of posterior identity (PMID: 25157815). Can the authors draw the same conclusions for the spinal cord cells with this protocol if they deplete TBXT during the first three days and assay at day 7 the cells on basal medium, or if they deplete TBXT during the last four days of the protocol? The comparison of the 3-day FGF/CHIR regime followed by basal medium treatment versus the continuous FGF/CHIR for a 7-day period may help clarify the temporal and cell-type specific effects of the HOX code via TBXT/FGF on the neural crest and/or spinal cord cells”.

      We have carried out this experiment as suggested by the reviewer (Figure 4C-D/line numbers 226-256 in the revised manuscript), for details see our responses above.

      “In their data, it seems that anterior HOX genes (PG1-5) as well as other posterior HOX (PG6-9) are expressed in wild-type posterior neural crest and early spinal cord cells. Can HOX genes that mark posterior cranial, vagal or trunk identities be co-expressed in trunk neural crest or spinal cord cells? Is it possible that the differentiations generate cells that have different axial identities? I wonder if this interpretation comes from the normalization. Perhaps the authors could clarify if the levels of expression of the 3' Hox genes are higher or lower than 5' Hox genes in their differentiations”.

      Co-expression of HOX paralogous group (PG) (1-5) and (6-9) transcripts does occur in the posterior part of the mouse embryo around E9.5, both in the NMP-containing tailbud region (Gouti et al, 2017) as well as in differentiated posterior neural/neural crest cells e.g. for Hoxb1 expression in E9.5 mouse embryos see (Arenkiel et al, 2003; Glaser et al, 2006); for Hoxc9 expression see (Bel et al, 1998). Thus, the presence of HOXPG(1-5) transcripts in HOXC9+ trunk NC cells is not surprising and in line with what has been reported previously in other studies describing the generation of posterior NC/spinal cord cell types from hESC/NMPs (Frith et al., 2018; Hackland et al, 2019; Lippmann et al, 2015; Mouilleau et al, 2021). Alternatively, the simultaneous detection of transcripts belonging to both HOXPG(1-5) and HOXPG(6-9) could indicate the co-emergence of a separate population of posterior cranial/cardiac/vagal NC cells during trunk NC differentiation. Moreover, the detection of HOX transcripts does not always correlate with corresponding protein positivity (Faustino Martins et al, 2020) pointing to the existence of post-transcriptional/-translational mechanisms controlling HOX protein expression. Unfortunately, we have not identified reliable (in our hands) antibodies against HOXPG(1-5) members that we can use together with HOXC9 in order to distinguish between these possibilities.

      “In the experiments where the authors asses if TBXT binds directly or indirectly to the HOX clusters, the authors compare pluripotent cells with hNMPs. This data confirms that TBXT acts as an activator in hNMPs and that it binds to regions in the HOX clusters. Do the HOX regions overlap with known enhancers for the HOX genes for neural crest or spinal cord?”

      We have included new ATAC-seq data mapping chromatin accessibility in day 8 trunk NC cells generated from TBXT-depleted and control hESC-derived NMPs. These data, combined with the ATAC-seq and TBXT ChIP-seq analyses from day 3 hESC-derived NMPs, indicate that TBXT controls chromatin accessibility in trunk NC-specific enhancers within HOX clusters, both directly through genomic binding, and indirectly possibly by influencing expression of other key transcriptional regulators such as CDX2. For details see Figure 8-figure supplement 2 and Appendix Table S9 and line numbers 458-482 in the revised manuscript.

      “As they see distinct temporal phases of TBXT activity on spinal cord progenitors versus neural crest cells, the authors should test if there are changes in accessibility or TBXT binding in neural crest and spinal cord cells in the HOX locus and/or genome-wide. This comparison may help identify cell-type specific TBXT targets (perhaps acting with distinct coactivators) that are key in the two distinct phases of posterior axial identity control”.

      As mentioned above, we have added new ATAC-seq data from analysis of trunk NC cells derived from TBXT knockdown shRNA hESC-derived NMPs in the presence and absence of Tet. These data can be found in Figure 8-figure supplement 2 and Appendix Table S9 in the revised manuscript. As expected, ATAC-seq analysis of pre-neural CNS spinal cord progenitors generated from TBXT knockdown shRNA hESC-derived NMPs in the presence and absence of Tet showed no significant differences in chromatin accessibility between the two conditions again our gene expression data (Figure 6 in revised manuscript). These data were not included in the new manuscript version but they are publicly available as part of our revised GEO submission (GSE184227). Mapping of TBXT genomic binding in NMP-derived trunk NC cells/spinal cord progenitors is not feasible due to the very low/absent expression of TBXT protein in these cell populations. See also our response to the editor’s suggestions.

      “In the experiments where the authors examine the signalling pathway dependence of HOX expression during the transition in the neural crest differentiation protocol, it appears that CHIR/LDN treatment induces the highest levels of HOX expression (FIG 3F). Also, there is an increased expression of SOX1 while SOX10 expression is not detected "pointing to a role for BMP signalling in steering NMPs/dorsal pre-neural progenitors toward a NC fate in agreement with previous observations". The results may indicate that WNT and BMP inhibition may induce HOX gene expression in neural cells irrespective of FGF. How do the authors interpret this? How does it affect their final model where FGF (and not WNT) drives the expression of HOX genes in late pre-neural spinal cord progenitors?”.

      Based on our data and published work, we speculate that during the transition of hESCderived NMPs towards trunk NC cell, cultures still exhibit autocrine and/or paracrine FGF signalling even in the absence of exogenous FGF agonist supplementation. This is supported by previous reports showing the expression of the active, phosphorylated version of the FGF effector ERK1/2 in differentiating pluripotent stem cells cultured in FGF-free media (Diaz-Cuadros et al, 2020; Stavridis et al, 2007; Ying et al, 2003). This endogenous FGF activity is probably sufficient for the maintenance of HOX gene expression in these cells, while exogenous BMP signalling stimulation is required for the induction of a NC fate. Given the reported antagonism between these two pathways during early neural/NC induction (Anderson et al, 2016; Marchal et al, 2009), treatment with the BMP inhibitor LDN193189 results in FGF signalling potentiation, which in turn leads to increased HOX gene expression and a switch toward a CNS neurectodermal fate at the expense of NC. Further work is needed to mechanistically dissect this hypothesis, which is beyond the scope of this manuscript.

      “The identity of the cells in the inhibition of WNT or FGF treatments during the final four days towards spinal cord cells experiments is unclear. It would be very useful if the authors could characterize what cell types emerge after the treatments. In principle, I would expect that these treatments would generate different progenitor types (FGF inhibition may presumably give rise to mesoderm cells, whereas WNT inhibited may be pre-neural). Why would the authors expect these different cell types to have similar levels of expression of WNT targets or Hox genes?”

      The inclusion of the new immunostaining data and the quantification of the proportions of SOX2+HOXC9+ emerging upon various WNT/FGF inhibitor treatments (Figure 7D-E in revised manuscript) has now enabled us to define the role of these signalling pathways in controlling HOX gene expression specifically in pre-neural spinal progenitors thus confirming our conclusions from the qPCR data without any bias introduced from contaminating, nonneural HOXC9+ cells.

    1. Author Response

      Reviewer #1 (Public Review):

      Jones et al. investigated the relationship between scale free neural dynamics and scale free behavioral dynamics in mice. An extensive prior literature has documented scale free events in both cortical activity and animal behavior, but the possibility of a direct correspondence between the two has not been established. To test this link, the authors took advantage of previously published recordings of calcium events in thousands of neurons in mouse visual cortex and simultaneous behavioral data. They find that scale free-ness in spontaneous behavior co occurs with scale free neuronal dynamics. The authors show that scale free neural activity emerges from subsets of the larger population - the larger population contains anticorrelated subsets that cancel out one another's contribution to population-level events. The authors propose an updated model of the critical brain hypothesis that accounts for the obscuring impact of large populations on nested subsets that generate scale free activity. The possibility that scale free activity, and specifically criticality, may serve as a unifying theory of brain organization has suffered from a lack of high-resolution connection between observations of neuronal statistics and brain function. By bridging theory, neural data, and behavioral dynamics, these data add a valuable contribution to fields interested in cortical dynamics and spontaneous behavior, and specifically to the intersection of statistical physics and neuroscience.

      Strengths:

      This paper is notably well written and thorough.

      The authors have taken a cutting-edge, high-density dataset and propose a data-driven revision to the status-quo theory of criticality. More specifically, due to the observed anticorrelated dynamics of large populations of neurons (which doesn't fit with traditional theories of criticality), the authors present a clever new model that reveals critical dynamics nested within the summary population behavior.

      The conclusions are supported by the data.

      Avalanching in subsets of neurons makes a lot of sense - this observation supports the idea that multiple, independent, ongoing processes coexist in intertwined subsets of larger networks. Even if this is wrong, it's supported well by the current data and offers a plausible framework on which scale free dynamics might emerge when considered at the levels of millions or billions of neurons.

      The authors present a new algorithm for power law fitting that circumvents issues in the KS test that is the basis of most work in the field.

      Weaknesses:

      This paper is technically sound and does not have major flaws, in my opinion. However, I would like to see a detailed and thoughtful reflection on the role that 3 Hz Ca imaging might play in the conclusions that the authors derive. While the dataset in question offers many neurons, this approach is, from other perspectives, impoverished - calcium intrinsically misses spikes, a 3 Hz sampling rate is two orders of magnitude slower than an action potential, and the recordings are relatively short for amassing substantial observations of low probability (large) avalanches. The authors carefully point out that other studies fail to account for some of the novel observations that are central to their conclusions. My speculative concern is that some of this disconnect may reflect optophysiological constraints. One argument against this is that a truly scale free system should be observable at any temporal or spatial scale and still give rise to the same sets of power laws. This quickly falls apart when applied to biological systems which are neither infinite in time nor space. As a result, the severe mismatch between the spatial resolution (single cell) and the temporal resolution (3 Hz) of the dataset, combined with filtering intrinsic to calcium imaging, raises the possibility that the conclusions are influenced by the methods. Ultimately, I'm pointing to an observer effect, and I do not think this disqualifies or undermines the novelty or potential value of this work. I would simply encourage the authors to consider this carefully in the discussion.

      R1a: We quite agree with the reviewer that reconciling different scales of measurement is an important and interesting question. One clue comes from Stringer et al’s original paper (2019 Science). They analyzed time-resolved spike data (from Neuropixel recordings) alongside the Ca imaging data we analyzed here. They showed that if the ephys spike data was analyzed with coarse time resolution (300 ms time bins, analogous to the Ca imaging data), then the anticorrelated activity became apparent (50/50 positive/negative loadings of PC1). When analyzed at faster time scales, anticorrelations were not apparent (mostly positive loadings of PC1). This interesting point was shown in their Supplementary Fig 12.

      This finding suggests that our findings about anticorrelated neural groups may be relevant only at coarse time scales. Moreover, this point suggests that avalanche statistics may differ when analyzed at very different time scales, because the cancelation of anticorrelated groups may not be an important factor at faster timescales.

      In our revised manuscript, we explored this point further by analyzing spike data from Stringer et al 2019. We focused on the spikes recorded from one local population (one Neuropixel probe). We first took the spike times of ~300 neurons and convolved them with a fast rise/slow fall, like typical Ca transient. Then we downsampled to 3 Hz sample rate. Next, we deconvolved using the same methods as those used by Stringer et al (OASIS nonnegative deconvolution). And finally, we z-scored the resulting activity, as we did with the Ca imaging data. With this Ca-like signal in hand, we analyzed avalanches in four ways and compared the results. The four ways were: 1) the original time-resolved spikes (5 ms resolution), 2) the original spikes binned at 330 ms time res, 3) the full population of slow Ca-like signal, and 4) a correlated subset of neurons from the slow Ca-like signal. Based on the results of this new analysis (now in Figs S3 and S4), we found several interesting points that help reconcile potential differences between fast ephys and slow Ca signals:

      1. In agreement with Sup Fig 12 from Stringer et al, anticorrelations are minimal in the fast, time-resolved spike data, but can be dominant in the slow, Ca-like signal.

      2. Avalanche size distributions of spikes at fast timescales can exhibit a nice power law, consistent with previous results with exponents near -2 (e.g. Ma et al Neuron 2019, Fontenele et al PRL 2019). But, the same data at slow time scales exhibited poor power-laws when the entire population was considered together.

      3. The slow time scale data could exhibit a better power law if subsets of neurons were considered, just like our main findings based on Ca imaging. This point was the same using coarse time-binned spike data and the slow Ca-like signals, which gives us some confidence that deconvolution does not miss too many spikes.

      In our opinion, a more thorough understanding of how scale-free dynamics differs across timescales will require a whole other paper, but we think these new results in our Figs S3 and S4 provide some reassurance that our results can be reconciled with previous work on scale free neural activity at faster timescales.

      Reviewer #2 (Public Review):

      The overall goal of the paper is to link spontaneous neural activity and certain aspects of spontaneous behavior using a publicly available dataset in which 10,000 neurons in mouse visual cortex were imaged at 3 Hz with single-cell resolution. Through careful analysis of the degree to which bouts of behavior and bouts of neural activity are described (or not) by power-law distributions, the authors largely achieve these goals. More specifically, the key findings are that (a) the size of bouts of whisking, running, eye movements, and pupil dilation are often well-fit by a power-law distribution over several decades, (b) subsets of neurons that are highly correlated with one of these behavioral metrics will also exhibit power-law distributed event sizes, (c) neuron clusters that are uncorrelated with behavior tend to not be scale-free, (d) crackling relationships are generally not found (i.e. size with duration exponent (if there is scaling) was not predicted by size power-law and duration power-law), (e) bouts of behavior could be linked to bouts of neural activity. In the second portion of the paper, the authors develop a computational model with sets of correlated and anti-correlated neurons, which can be accomplished under a relatively small subset of connection architectures: out of the hundreds of thousands of networks simulated, only 31 generated scale-free subsets/non-scale-free population/anti correlated e-cells/anti-correlated i-cells in agreement with the experimental recordings.

      The data analysis is careful and rigorous, especially in the attention to fitting power laws, determining how many decades of scaling are observed, and acknowledging when a power-law fit is not justified. In my view, there are two weaknesses of the paper, related to how the results connect to past work and to the set-up and conclusions drawn from the computational modeling, and I discuss those in detail below. While my comments are extensive, this is due to high interest. I do think that the authors make an important connection between scale-free distributions of neural activity and behavior, and that their use of computational modeling generates some interesting mechanistic hypotheses to explore in future work.

      My first general reservation is in the relationship to past work and the overall novelty. The authors state in the introduction, "according to the prevailing view, scale-free ongoing neural activity is interpreted as 'background' activity, not directly linked to behavior." It would be helpful to have some specific references here, as several recent papers (including the Stringer et al. 2019 paper from which these data were taken, but also papers from McCormick lab and (Anne) Churchland lab) showed a correlation between spontaneous activity and spontaneous facial behaviors. To my knowledge, the sorts of fidgety behavior analyzed in this paper have not been shown to be scale-free, and so (a) is a new result, but once we know this, it seems that (e) follows because we fully expect some neurons to correlate with some behavior.

      R2a: We agree with the reviewer that our original introductory, motivating arguments needed improvement. We have now rewritten the last 2 paragraphs of the introduction. We hope we have now laid out our argument more clearly, with more appropriate supporting citations. In brief, the logic is this:

      1. Previous theory, modeling, and experiments on the topic of scale-free neural activity suggest that this phenomenon is an autonomous, internally generated thing, independent of anything the body is doing.

      2. Relatively new experiments (including those by Churchland’s lab and McCormmick’s lab: Stringer 2019; Salkoff 2020; Clancy 2019; Musall 2019) suggest a different picture with a link between spontaneous behaviors and ongoing cortical activity, but these studies did not address any questions about scale-free-ness.

      3. Moreover, these new experiments show that behavioral variables only manage to explain about 10-30% of ongoing activity.

      4. Is this behaviorally-explainable 10-30% scale-free or perhaps the scale-free aspects of cortical dynamics fall withing the other 70-90%. Our goal is to find out.

      Digging a bit more on this issue, I would argue that results (b) and (c) also follow. By selecting subsets of neurons with very high cross-correlation, an effective latent variable has emerged. For example, the activity rasters of these subsets are similar to a population in which each neuron fires with the same time-varying rate (i.e., a heterogeneous Poisson process). Such models have been previously shown to be able to generate power-law distributed event sizes (see, eg., Touboul and Destexhe, 2017; also work by Priesemann). With this in mind, if you select from the entire population a set of neurons whose activity is effectively determined by a latent variable, do you not expect power laws in size distributions?

      Our understanding is that not all Poisson processes with a time-varying rate will result in a power law. It is quite essential that the fluctuations in rate must themselves be power-law distributed. As a clear example of how this breaks down, consider a Poisson rate that varies according to a sine wave with fixed period and amplitude. In this case, the avalanche size distribution is definitely not scale-free, it would have a clear typical scale. Another point of view on this comes from some of the simplest models used to study criticality – e.g. all-to-all connected probabilistic binary neurons (like in Shew et al 2009 J Neurosi). These models do generate spiking with a time-varying Poisson rate when they are at criticality or away from criticality. But, only when the synaptic strength is tuned to criticality is the time-varying rate going to generate power-law distributed avalanches. I think the Priesmann & Shriki paper made this point as well.

      My second reservation has to do with the generality of the conclusions drawn from the mechanistic model. One of the connectivity motifs identified appears to be i+ to e- and i- to e+, where potentially i+/i- are SOM and VIP (or really any specific inhibitory type) cells. The specific connections to subsets of excitatory cells appear to be important (based on the solid lines in Figure 8). This seems surprising: is there any experimental support for excitatory cells to preferentially receive inhibition from either SOM or VIP, but not both?

      R2b: There is indeed direct experimental support for the competitive relationship between SOM, VIP, and functionally distinct groups of excitatory neurons. This was shown in the paper by Josh Trachtenberg’s group: Garcia-Junco-Clemente et al 2017. An inhibitory pull-push circuit in frontal cortex. Nat Neurosci 20:389–392. However, we emphasize that we also showed (lower left motif in Fig 8G) that a simpler model with only one inhibitory group is sufficient to explain the anticorrelations and scale-free dynamics we observe. We opted to highlight the model with two inhibitory groups since it can also account for the Garcia-Junco-Clemente et al results.

      In the section where we describe the model, we state, “We considered two inhibitory groups, instead of just one, to account for previous reports of anticorrelations between VIP and SOM inhibitory neurons in addition to anticorrelations between groups of excitatory neurons (Garcia-Junco-Clemente et al., 2017).”

      More broadly, I wonder if the neat diagrams drawn here are misleading. The sample raster, showing what appears to be the full simulation, certainly captures the correlated/anti-correlated pattern of the 100 cells most correlated with a seed cell and 100 cells most anti-correlated with it, but it does not contain the 11,000 cells in between with zero to moderate levels of correlation.

      R2c: We agree that our original model has several limitations and that one of the most obvious features lacking in our model is asynchronous neurons (The limitations are now discussed more openly in the last paragraph of the model subsection). In the data from the Garcia-Junco-Clemente et al paper above there are many asynchronous neurons as well. To ameliorate this limitation, we have now created a modified model that now accounts for asynchronous neurons together with the competing anticorrelated neurons (now shown and described in Fig S9). We put this modified model in supplementary material and kept the simpler, original model in the main findings of our work, because the original model provides a simpler account of the features of the data we focused on in our work – i.e. anticorrelated scale-free fluctuations. The addition of the asynchronous population does not substantially change the behavior of the two anticorrelated groups in the original model.

      We probably expect that the full covariance matrix has similar structure from any seed (see Meshulam et al. 2019, PRL, for an analysis of scaling of coarse-grained activity covariance), and this suggests multiple cross-over inhibition constraints, which seem like they could be hard to satisfy.

      R2d: We agree that it remains an outstanding challenge to create a model that reproduces the full complexity of the covariance matrix. We feel that this challenge is beyond the scope of this paper, which is already arguably squeezing quite a lot into one manuscript (one reviewer already suggested removing figures!).

      We added a paragraph at the end of the subsection about the model to emphasize this limitation of the model as well as other limitations. This new paragraph says:

      While our model offers a simple explanation of anticorrelated scale-free dynamics, its simplicity comes with limitations. Perhaps the most obvious limitation of our model is that it does not include neurons with weak correlations to both e+ and e- (those neurons in the middle of the correlation spectrum shown in Fig 7B). In Fig S9, we show that our model can be modified in a simple way to include asynchronous neurons. Another limitation is that we assumed that all non-zero synaptic connections were equal in weight. We loosen this assumption allowing for variable weights in Fig S9, without changing the basic features of anticorrelated scale-free fluctuations. Future work might improve our model further by accounting for neurons with intermediate correlations.

      The motifs identified in Fig. 8 likely exist, but I am left with many questions of what we learned about connectivity rules that would account for the full distribution of correlations. Would starting with an Erdos-Renyi network with slight over-representation of these motifs be sufficient? How important is the homogeneous connection weights from each pool assumption - would allowing connection weights with some dispersion change the results?

      R2e: First, we emphasize that our specific goal with our model was to identify a possible mechanism for the anticorrelated scale-free fluctuations that played the key role in our analyses. We agree that this is not a complete account of all correlations, but this was not the goal of our work. Nonetheless, our new modified model in Fig S9 now accounts for additional neurons with weak correlations. However, we think that future theoretical/modeling work will be required to better account for the intermediate correlations that are also present in the experimental data.

      We confirmed that an Erdo-Renyi network of E and I neurons can produce scale-free dynamics, but cannot produce substantial anticorrelated dynamics (Fig 8G, top right motif). Additionally, the parameter space study we performed with our model in Fig 8 showed that if the interactions between the two excitatory groups exceed a certain tipping point density, then the model behavior switches to behavior expected from an Erdos-Renyi network (Fig 8F). Finally, we have now confirmed that some non-uniformity of synaptic weights does not change the main results (Fig S9). In the model presented in Fig S9, the value of each non-zero connection weight was drawn from a uniform distribution [0,0.01] or [-0.01,0] for excitatory and inhibitory connections, respectively. All of these facts are described in the model subsection of the paper results.

      As a whole, this paper has the potential to make an impact on how large-scale neural and behavioral recordings are analyzed and interpreted, which is of high interest to a large contingent of the field.

      Reviewer #3 (Public Review):

      The primary goal of this work is to link scale free dynamics, as measured by the distributions of event sizes and durations, of behavioral events and neuronal populations. The work uses recordings from Stringer et al. and focus on identifying scale-free models by fitting the log-log distribution of event sizes. Specifically, the authors take averages of correlated neural sub-populations and compute the scale-free characterization. Importantly, neither the full population average nor random uncorrelated subsets exhibited scaling free dynamics, only correlated subsets. The authors then work to relate the characterization of the neuronal activity to specific behavioral variables by testing the scale-free characteristics as a function of correlation with behavior. To explain their experimental observation, the authors turn to classic e-i network constructions as models of activity that could produce the observed data. The authors hypothesize that a winner-take-all e-i network can reproduce the activity profiles and therefore might be a viable candidate for further study. While well written, I find that there are a significant number of potential issues that should be clarified. Primarily I have main concerns: 1) The data processing seems to have the potential to distort features that may be important for this analysis (including missed detections and dynamic range), 2) The analysis jumps right to e-i network interactions, while there seems to be a much simpler, and more general explanation that seems like it could describe their observations (which has to do with the way they are averaging neurons), and 3) that the relationship between the neural and behavioral data could be further clarified by accounting for the lop-sidedness of the data statistics. I have included more details below about my concerns below.

      Main points:

      1) Limits of calcium imaging: There is a large uncertainty that is not accounted for in dealing with smaller events. In particular there are a number of studies now, both using paired electro-physiology and imaging [R1] and biophysical simulations [R2] that show that for small neural events are often not visible in the calcium signal. Moreover, this problem may be exacerbated by the fact that the imaging is at 3Hz, much lower than the more typical 10-30Hz imaging speeds. The effects of this missing data should be accounted for as could be a potential source of large errors in estimating the neural activity distributions.

      R3a: We appreciate the concern here and agree that event size statistics could in principle be biased in some systematic way due to missed spikes due to deconvolution of Ca signals. To directly test this possibility, we performed a new analysis of spike data recorded with high time resolution electrophysiology. We began with forward-modeling process to create a low-time-resolution, Ca-like signal, using the same deconvolution algorithm (OASIS) that was used to generate the data we analyzed in our work here. In agreement with the reviewer’s concern, we found that spikes were sometimes missed, but the loss was not extreme and did not impact the neural event size statistics in a significant way compared to the ground truth we obtained directly from the original spike data (with no loss of spikes). This new work is now described in a new paragraph at the end of the subsection of results related to Fig 3 and in a new Fig S3. The new paragraph says…

      Two concerns with the data analyzed here are that it was sampled at a slow time scale (3 Hz frame rate) and that the deconvolution methods used to obtain the data here from the raw GCAMP6s Ca imaging signals are likely to miss some activity (Huang et al., 2021). Since our analysis of neural events hinges on summing up activity across neurons, could it be that the missed activity creates systematic biases in our observed event size statistics? To address this question, we analyzed some time-resolved spike data (Neuropixel recording from Stringer et al 2019). Starting from the spike data, we created a slow signal, similar to that we analyzed here by convolving with a Ca-transient, down sampling, deconvolving, and z-scoring (Fig S3). We compared neural event size distributions to “ground truth” based on the original spike data (with no loss of spikes) and found that the neural event size distributions were very similar, with the same exponent and same power-law range (Fig S3). Thus, we conclude that our reported neural event size distributions are reliable.

      However, although loss of spikes did not impact the event size distributions much, the time-scale of measurement did matter. As discussed above and shown in Fig S4, changing from 5 ms time resolution to 330 ms time resolution does change the exponent and the range of the power law. However, in the test data set we worked with, the existence of a power law was robust across time scales.

      2) Correlations and power-laws in subsets. I have a number of concerns with how neurons are selected and partitioned to achieve scale-free dynamics. 2a) First, it's unclear why the averaging is required in the first place. This operation projects the entire population down in an incredibly lossy way and removes much of the complexity of the population activity.

      R3b: Our population averaging approach is motivated by theoretical predictions and previous work. According to established theoretical accounts of scale-free population events (i.e. non-equilibrium critical phenomena in neural systems) such population-summed event sizes should have power law statistics if the system is near a critical point. This approach has been used in many previous studies of scale-free neural activity (e.g. all of those cited in the introduction in relation to scale-free neuronal avalanches). One of the main results of our study is that the existing theories and models of critical dynamics in neural systems fail to account for small subsets of neurons with scale-free activity amid a larger population that does not conform to these statistics. We could not make this conclusion if we did not test the predictions of those existing theories and models.

      2b) Second, the authors state that it is highly curious that subsets of the population exhibit power laws while the entire population does not. While the discussion and hypothesizing about different e-i interactions is interesting I believe that there's a discussion to be had on a much more basic level of whether there are topology independent explanations, such as basic distributions of correlations between neurons that can explain the subnetwork averaging. Specifically, if the correlation to any given neuron falls off, e.g., with an exponential falloff (i.e., a Gaussian Process type covariance between neurons), it seems that similar effects should hold. This type of effect can be easily tested by generating null distributions using code bases such as [R3]. I believe that this is an important point, since local (broadly defined) correlations of neurons implying the observed subnetwork behavior means that many mechanisms that have local correlations but don't cluster in any meaningful way could also be responsible for the local averaging effect.

      R3c: We appreciate the reviewer’s effort, trying out some code to generate a statistical model. We agree that we could create such a statistical model that describes the observed distribution of pairwise correlations among neurons. For instance, it would be trivial to directly measure the covariance matrix, mean activities, and autocorrelations of the experimental data, which would, of course, provide a very good statistical description of the data. It would also be simple to generate more approximate statistical descriptions of the data, using multivariate gaussians, similar to the code suggested by the reviewer. However, we emphasize, this would not meet the goal of our modeling effort, which is mechanistic, not statistical. The aim of our model was to identify a possible biophysical mechanism from which emerge certain observed statistical features of the data. We feel that a statistical model is not a suitable strategy to meet this aim. Nonetheless, we agree with the reviewer that clusters with sharp boundaries (like the distinction between e+ an e- in our model) are not necessary to reproduce the cancelation of anticorrelated neurons. In other words, we agree that sharp boundaries of the e+ and e- groups of our model are not crucial ingredients to match our observations.

      2c) In general, the discussion of "two networks" seems like it relies on the correlation plot of Figure~7B. The decay away from the peak correlation is sharp, but there does not seem to be significant clustering in the anti-correlation population, instead a very slow decay away from zero. The authors do not show evidence of clustering in the neurons, nor any biophysical reason why e and i neurons are present in the imaging data.

      R3d: First a small reminder: As stated in the paper, the data here is only showing activity of excitatory neurons. Inhibitory neurons are certainly present in V1, but they are not recorded in this data set. Thus we interpret our e+ and e- groups as two subsets of anticorrelated excitatory neurons, like those we observed in the experimental data. We agree that our simplified model treats the anticorrelated subsets as if they are clustered, but this clustering is certainly not required for any of the data analyses of experimental data. We expect that our model could be improved to allow for a less sharp boundary between e+ and e- groups, but we leave that for future work, because it is not essential to most of the results in the paper. This limitation of the model is now stated clearly in the last paragraph of the model subsection.

      The alternative explanation (as mentioned in (b)) is that the there is a more continuous set of correlations among the neurons with the same result. In fact I tested this myself using [R3] to generate some data with the desired statistics, and the distribution of events seems to also describe this same observation. Obviously, the full test would need to use the same event identification code, and so I believe that it is quite important that the authors consider the much more generic explanation for the sub-network averaging effect.

      R3e: As discussed above, we respectfully disagree that a statistical model is an acceptable replacement for a mechanistic model, since we are seeking to understand possible biophysical mechanisms. A statistical model is agnostic about mechanisms. We have nothing against statistical models, but in this case, they would not serve our goals.

      To emphasize our point about the inadequacy of a statistical model for our goals, consider the following argument. Imagine we directly computed the mean activities, covariance matrix, and autocorrelations of all 10000 neurons from the real data. Then, we would have in hand an excellent statistical model of the data. We could then create a surrogate data set by drawing random numbers from a multivariate gaussian with same statistical description (e.g. using code like that offered by reviewer 3). This would, by construction, result in the same numbers of correlated and anticorrelated surrogate neurons. But what would this tell us about the biophysical mechanisms that might underlie these observations? Nothing, in our opinion.

      2d) Another important aspect here is how single neurons behave. I didn't catch if single neurons were stated to exhibit a power law. If they do, then that would help in that there are different limiting behaviors to the averaging that pass through the observed stated numbers. If not, then there is an additional oddity that one must average neurons at all to obtain a power law.

      R3f: We understand that our approach may seem odd from the point of view of central-limit-theorem-type argument. However, as mentioned above (reply R3b) and in our paper, there is a well-established history of theory and corresponding experimental tests for power-law distributed population events in neural systems near criticality. The prediction from theory is that the population summed activity will have power-law distributed events or fluctuations. That is the prediction that motivates our approach. In these theories, it is certainly not necessary that individual neurons have power-law fluctuations on their own. In most previous theories, it is necessary to consider the collective activity of many neurons before the power-law statistics become apparent, because each individual neurons contributes only a small part to the emergent, collective fluctuations. This phenomenon does not require that each individual neuron have power-law fluctuations.

      At the risk of being pedantic, we feel obliged to point out that one cannot understand the peculiar scale-free statistics that occur at criticality by considering the behavior of individual elements of the system; hence the notion that critical phenomena are “emergent”. This important fact is not trivial and is, for example, why there was a Nobel prize awarded in physics for developing theoretical understanding of critical phenomena.

      3) There is something that seems off about the range of \beta values inferred with the ranges of \tau and $\alpha$. With \tau in [0.9,1.1], then the denominator 1-\tau is in [-0.1, 0.1], which the authors state means that \beta (found to be in [2,2.4]) is not near \beta_{crackling} = (\alpha-1)/(1-\tau). It seems as this is the opposite, as the possible values of the \beta_{crackling} is huge due to the denominator, and so \beta is in the range of possible \beta_{crackling} almost vacuously. Was this statement just poorly worded?

      R3g: The point here is that theory of crackling noise predicts that the fit value of beta should be equal to (1-alpha)/(1-tau). In other words, a confirmation of the theory would have all the points on the unity line in the rightmost panels of Fig9D and 9E, not scattered by more than an order of magnitude around the unity line. (We now state this explicitly in the text where Fig 9 is discussed.) Broad scatter around the unity line means the theory prediction did not hold. This is well established in previous studies of scale-free brain dynamics and crackling noise theory (see for example Ma et al Neuron 2019, Shew et al Nature Physics 2015, Friedman et al PRL 2012). A clearer single example of the failure of the theory to predict beta is shown in Fig 5A,B, and C.

      4) Connection between brain and behavior:

      4a) It is not clear if there is more to what the authors are trying to say with the specifics of the scale free fits for behavior. From what I can see those results are used to motivate the neural studies, but aside from that the details of those ranges don't seem to come up again.

      R3h: The reviewer is correct, the primary point in Fig 2 is that scale-free behavioral statistics often exist. Beyond this point about existence, reporting of the specific exponents and ranges is just standard practice for this kind of analysis; a natural question to ask after claiming that we find scale behavior is “what are the exponents and ranges”. We would be remiss not to report those numbers.

      4b) Given that the primary connection between neuronal and behavioral activity seems to be Figure~4. The distribution of points in these plots seem to be very lopsided, in that some plots have large ranges of few-to-no data points. It would be very helpful to get a sense of the distribution of points which are a bit hard to see given the overlapping points and super-imposed lines.

      R3i: We agree that this whitespace in the figure panels is a somewhat awkward, but we chose to keep the horizontal axis the same for all panels of Fig 4B, because this shows that not all behaviors, and not all animals had the same range of behavioral correlations. We felt that hiding this was a bit misleading, so we kept the white space.

      4c) Neural activity correlated with some behavior variables can sometimes be the most active subset of neurons. This could potentially skew the maximum sizes of events and give behaviorally correlated subsets an unfair advantage in terms of the scale-free range.

    1. Author Response

      Reviewer #1 (Public Review):

      In this article, the authors are trying to ascertain how emigrated SVZ cells can be beneficial - via neuroreplacement or neuroprotection. They provide evidence for the latter and also show that it is primarily precursors and not differentiated cells that migrate to photo-thrombotic cortical models of stroke.

      The writing is lucid and the flow of the experiments logical. The images and quality of data are high and the depth of investigation appropriate (eg 100 cells examined per marker in Figure 1). The methods are clearly described. They appropriately control for changes in cortical lesion size. The photo-thrombotic lesion is a good choice in terms of controlling lesion placement and size.

      A distinctive advantage of this paper is they show that reducing SVZ cytogenesis in the stroke model diminishes recovery, especially behavioural (single seed reaching behavior). This essential experiment has been remarkably under-utilized in the field.

      The 2-photon imaging of dendric spines after stroke combined with multi-exposure speckle imaging is a technical tour-de-force especially since they combine it with ganciclovir-induced loss of cytogenesis and behavioural assays. Importantly, they show that SVZ cells are needed for full spine plasticity.

      They are correct to examine the SVZ response in aging as it diminishes dramatically in animal models but in humans is associated with more strokes. As expected, they show reduced SVZ proliferation after stroke. This was associated with significantly worse performance in the seed-reaching task and depleting SVZ precursors with ganciclovir did not make it worse.

      The viral VEGF delivery rescue experiment is fantastic. Behavior, blood vessel growth, and spine density are all rescued.

      The idea that SVZ cells are beneficial via mechanisms other than cell replacement is not really that new. For example, neural stem cells from the SVZ have been shown to reduce inflammation and thereby be neuroprotective as the authors themselves acknowledge and cite (Pluchino et al., 2005).

      The fact that it is primarily precursor cells that migrate towards the stroke does not mean that cell replacement does not occur. The precursors could gradually differentiate (even after 6 weeks post-injury) into more mature cells that do replace cells lost to injury. Also, the two events are not mutually exclusive.

      Our findings indicate that there is no appreciable differentiation of SVZ-derived cells up to 6 weeks after stroke. By this time, we find complete recovery of behavioral deficits. While it is conceivable that cells may differentiate after this timepoint, such a phenomenon would not be contributing to recovery.

      Overall this is an interesting addition to the literature and methodologically it is quite strong. It is sure to generate follow on studies showing how different growth factors may be secreted by SVZ cells in various models of neurological disease.

      Reviewer #3 (Public Review):

      Williamson et al. have investigated the role of cells derived from a neural stem cell (NSC) region of the adult mouse brain called the subventricular zone (SVZ) in a model of stroke. The authors labeled SVZ cells with Nestin-CreER and the Ai14 (tdTomato) reporter, induced cortical infarcts 4 weeks later, then analyzed brains 2 weeks thereafter. Most of the tdTomato+ cells in the peri-infarct regions were not neurons but less differentiated neural precursor cells. They then ablated proliferating NSCs in the SVZ with GFAP-TK mice and ganciclovir (GCV) administration, and this reduced SVZ-derived peri-stroke cells and impaired motor recovery. Older mice have less proliferation in the SVZ, and these older mice have fewer peri-infarct SVZ-derived cells and worse recovery than younger mice. Using multi-exposure speckle imaging (MESI) and 2 photon imaging, the authors found that ablation of proliferating SVZ cells reduced vascular remodeling and synaptic turnover in peri-infarct areas. Immunohistochemical analysis revealed the expression of VEGF, BDNF, GDNF, and FGF2. The authors selected VEGF for functional studies, conditionally knocking out VEGF in SVZ cells and finding that this reduced recovery and neuronal spine density. Finally, the authors expressed VEGF by AAV vectors in mice with ablated SVZ, finding that VEGF could improve repair and recovery after stroke.

      The results presented in the paper support some of the authors' general conclusions and may be of interest to investigators of adult mouse SVZ. The use of genetic labels for lineage analysis and studies of VEGF conditional knockout in SVZ cells are technical strengths of the study. The results support the idea that VEGF in SVZ cells is important for recovery from stroke in younger adult mice. However, the impact of the work may be somewhat limited, as outlined below.

      1. It is already well known that VEGF is an important aspect of stroke recovery (at least in rodent models), and that ectopic expression of VEGF can be beneficial. Showing that some of the VEGF in peri-stroke regions might come from SVZ-derived cells would be a relatively incremental discovery.

      We disagree. In our view, the identification of SVZ-derived cells as a major cellular source of an important trophic factor for recovery is itself an important finding. We also demonstrate that VEGF produced by this cell population is necessary for effective neural repair and recovery, while replacement of VEGF is sufficient to induce repair and recovery in mice lacking this cell population. Moreover, these findings provide a compelling explanation for the worsening of recovery and diminishment of repair that occurs with age (i.e., loss of VEGF signaling from the neural stem cell lineage). Finally, the demonstration that replacing VEGF rescues deficits that accompany loss of SVZ stem cells provides rationale for the replacement of neural stem cell lineage factors as a potential treatment.

      The molecular mechanism aside, our main goal was to understand the function of SVZ cytogenesis in stroke recovery. Our findings that 1) the majority of cells arising from the SVZ after stroke remain in an undifferentiated state, 2) these cells facilitate neuronal and vascular reparative processes in order to promote recovery, and 3) very few new neurons are produced during the recovery phase, provide a new and unexpected understanding of the purpose of post-injury cytogenesis. The dogma of past literature is that neural stem cells produce new neurons that mature and integrate into damaged circuits after injury. An implication of this dogma is that neuronal replacement after stroke is an important treatment target. Accordingly, substantial effort has been devoted to developing cell transplantation and conversion treatment strategies to create new neurons. Our study reframes the function of newborn cells in stroke recovery and provides a compelling rationale against treatment strategies aimed at replacing neurons, instead demonstrating that trophic factor mediated repair and remodeling of spared tissue is sufficient for profound recovery of function. To emphasize these important findings, we have expanded our discussion (lines 338-378).

      1. Furthermore, while it seems clear that the VEGF conditional knockout (VEGF-cKO) in SVZ cells reduces behavioral recovery and certain histological measures, it is not clear that these impairments are due to a lack of VEGF delivery from the SVZ cells. It is possible that VEGF-cKO changed the proportion of SVZ cells that arrive in the peri-stroke region. It is also possible that VEGF-cKO makes these cells impaired in the expression of other trophic factors.

      We disagree with this interpretation. We used a cell type-specific, inducible knockout of VEGF to examine the function of VEGF produced by the adult neural stem cell lineage after stroke. We show in Figure 6E that numbers of lineage traced cells are not different between control and cKO mice. We have added new data, as Figure 6 – figure supplement 2, showing that the proportion of Ascl1-expressing cells is not different between groups, indicating that there is no change in the amount of differentiation. We have also added staining demonstrating that VEGF cKO cells still express GDNF, BDNF, and FGF-2 (Figure 6 – figure supplement 2). Notably, we did not detect a decrease in these three proteins in peri-infarct regions in mice in which neural stem cells were ablated, suggesting that while SVZ-derived cells do produce them, their production is small relative to other cell types. VEGF is thus unique in that large quantities of it are produced by SVZ-derived cells. We also provide evidence for direct effects of VEGF on other cell types (rather than cell-autonomous effects of VEGF regulating other factors in SVZ-derived cells that then act on other cell types) since restoring VEGF in mice with ablated neural stem cells rescued repair and recovery. Importantly, even if VEGF cKO led to perturbed expression of some other proteins, our conclusion would still be that VEGF produced by SVZ-derived cells is crucial for promoting repair and recovery.

      1. The cytogenic response to stroke was not characterized in much detail at the cellular level. Essentially only one time point (2 weeks) was selected for immunohistochemistry (Fig. 1), and so the dynamics of this response cannot be evaluated. Does the proportion of cell types change over time? Are migratory cells more homogeneous and then diversify after arrival to the peri-stroke region? At longer time points, do these SVZ-derived cells still exist? Such an analysis is important to the story since the behavior was evaluated at a range of time points (3-28 days after stroke), and recovery was noted as early as 7 days. Are SVZ-derived cells already at the peri-stroke area after 7 days? If they are not already there, then how would the recovery be explained? The behavioral recovery also continues to improve at 28 days; are SVZ-derived cells still present in large numbers at that time? How would the authors explain continued recovery if the SVZ-derived cell population drops away after 2 weeks?

      Thank you for these suggestions – we agree that these are important points to address. In our original submission we provided evidence that there is no significant differentiation into neurons even 6 weeks after stroke. We have added additional data (Figure 1-figure supplement 2) in which we assessed expression of a range of markers at 6 weeks post-stroke, a time by which recovery is typically complete in this model. These new data show that cell type distribution of lineage traced cells at 6 weeks is highly similar to the 2-week timepoint and show that little differentiation occurs during the course of recovery. We have also included new data quantifying numbers of SVZ-derived cells in peri-infarct cortex at 1, 2, and 6 weeks post-stroke. These data show that lineage traced cells are abundant by day 7 and numbers increase progressively to 6 weeks, suggesting that these cells are present when we initially detect functional improvement, survive, and are continuously produced during the course of recovery. Finally, we have added data showing that the proportion of Ascl1-expressing cells does not change from 1 to 6 weeks, which is consistent with the idea that there are no dynamic changes in cell phenotype during recovery.

      1. The SVZ-derived peri-stroke cells were not characterized in much detail at the molecular/transcriptomic level. The authors studied 4 trophic factors by antibody staining, but there are many other potential genes that may contribute to the effect. Transcriptomic analyses of SVZ-derived peri-stroke cells (e.g., by single-cell RNA-seq) may provide deeper insights into potential mechanisms.

      We acknowledge that single-cell RNA sequencing of SVZ-derived cells may reveal other interesting molecular mechanisms, but such a study could easily stand on its own. There are also technical limitations, such as the population of cells being relatively small, that would create difficulty in generating such a dataset. Instead, we focused on protein-level expression of a small cohort of factors that could be potentially involved based on our initial findings and past literature. In particular, we focused on examining proteins known to potently drive angiogenesis, axon outgrowth, and/or synapse formation given our findings of deficient vascular and synaptic repair in mice lacking cytogenesis. Even if single cell sequencing provided new molecular targets, the ensuing workflow would mirror what we have done in our study (validation of protein level expression, loss-of-function manipulation, gain-of-function manipulation). The magnitude of deficits in VEGF cKO mice did not completely match that seen in mice in which neural stem cells were ablated, making it likely that SVZ-derived cells also contribute to recovery by other mechanisms. We have added to the discussion: “Importantly, these past studies have identified an array of factors produced by precursor cells depending on context. It is possible that multiple factors produced by SVZ-derived cells promote recovery after stroke. This is suggested by our finding that recovery is worse in mice with ablated neural stem cells compared with VEGF cKO mice. Thus, future studies could examine other molecular targets. These efforts could be aided by techniques such as single-cell RNA sequencing.” (lines 338 - 343).

      1. The significance of this work for understanding stroke in human patients is unclear since the adult human brain SVZ is essentially devoid of neurogenic stem cells. Thus, although some of the observations in this paper are interesting, the cytogenic response to stroke described here may not occur in human patients.

      We disagree for several reasons. First, while there is an ongoing debate on whether neurogenesis or neural stem cells persist in adult humans, this debate has not been resolved. At least in our opinion, the preponderance of evidence is in support of persistent cytogenesis and neural stem cells in the adult human SVZ due to convincing data across many studies (e.g., PMID: 10328940, 9809557, 14973487, 11333968, 10870078, 24561062). While it appears that neural stem cell proliferation declines with aging, as in rodents, there is evidence of increased SVZ proliferation and cytogenesis in response to stroke in adult and elderly humans (e.g., PMID: 20054008, 16924107, 17167100, 20104652). Thus, although it is exceedingly difficult to study in humans, it is likely that neural stem cells persist in the adult human brain and can respond to injury by producing new cells. One reason for the somewhat sparse evidence of post-stroke cytogenesis in humans may be that the focus of past studies has been on finding new neurons. Importantly, our study demonstrates that other cells types, especially undifferentiated precursors, arise from the SVZ after stroke in far greater numbers than neurons, which may spur further examination of the phenomenon in humans.

      Second, while stroke incidence increases with age, stroke is not uncommon in the young. Moreover, incidence of stroke in the young is increasing (PMID: 32015089). It is generally accepted that young humans have intact neural stem cells, and the phenomenon we describe in our study shows clear benefits of SVZ cytogenesis in young mice.

      Third, our study provides evidence that “neurogenic” capacity of neural stem cells may not be important for the beneficial functions of cytogenesis after injury. The overwhelming majority studies in humans and mice have focused on “neurogenesis”. Our study demonstrates that undifferentiated precursors constitute the majority of SVZ-derived cells after stroke and identifies Ascl1 as a marker of them, which may be useful for identifying these cells in humans.

      Fourth, if neural stem cell numbers are substantially reduced in aged humans, as in rodents, our study provides clear rationale for the development of treatments to restore stem cell numbers/activation or limit their decline with aging.

      Fifth, our study not only identifies VEGF as a mechanism by which SVZ-derived cells promote repair and recovery after stroke, but also demonstrates that replacing VEGF is sufficient to improve repair and recovery in mice lacking neural stem cells. Thus, even if the argument that cytogenesis does not occur in adult or elderly humans is true, our study shows that identification and replacement of factors produced by the neural stem cell lineage, such as VEGF, could be a reasonable treatment strategy with clear translational potential.

      In order to more clearly state these points in the manuscript, we have expanded our discussion of them.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Scalabrino et al. show persistent cone-mediated RGC signaling despite changes in cone morphology and density with rod degeneration in CNGB1 mouse model of retinitis pigmentosa. The authors use a linear-nonlinear receptive field model to measure functional changes (spatial and temporal filters and gain) across the RGC populations with space-time separable receptive fields. At mesopic and photopic conditions, receptive field changes were minor until rod death exceeded 50%; while response gain decreased with photoreceptor degeneration. Using information theory, the authors evaluated the fidelity of RGC signaling demonstrated that mutual information decreased with rod loss, but cone-mediated RGC signaling was relatively stable and was more robust for natural movies than artificial stimulus. This work reveals the preservation of cone function and a robustness in encoding natural movies across degeneration. This manuscript is the first demonstration of using information theory to evaluate the effects of neural degeneration on sensory coding. The study uses a systematic evaluation of rod and cone function in this model of rod degeneration to make the following findings: (1) cone function persists for 5-7 months, (2) spatial and temporal changes to the ganglion cell receptive fields were not monotonic with time, (3) mutual information between spikes and photopic stimuli remained relatively constant up to 3-5 months, and (4) information rates were higher for natural movies than for checkerboard noise stimuli.

      The strengths of this paper include the following:

      A systemic evaluation of potentially confusing data. The authors do an excellent job of organizing the results in terms of light levels and time points. The results themselves are confusing and difficult to draw across metrics, but the data are presented as clearly as possible. The work is especially well executed and presented.

      The insight that cone responses remain relatively stable despite rod loss. The study clearly demonstrates that despite cone loss and morphological changes, cone-mediated responses remain robust and functional.

      The application of information theory to degeneration is the first of its kind and the study clearly shows the utility of the metric.

      The results are thoughtfully interpreted.

      We thank the reviewer for these comments.

      The weaknesses of this study include the following:

      The inability to follow the same ganglion cell types over time is a major weakness that could confound the interpretation in terms of whether the changes are happening from artifacts of the recording method or from dynamic changes in the pooled population of ganglion cells. Is there even a single cell class, for example the ON-OFF direction-selective ganglion cells, that this group has so well quantified on the MEA, that the study could track over time, in addition to examining the pooled population changes over time? Tracking a single cell type for each of the metrics would make the population data more convincing or could clearly show that not all ganglion cells follow the population trend.

      As suggested by the reviewer, we have added a cell type that is tracked through all the analyses: ON brisk sustained RGCs. Example receptive field mosaics, temporal receptive fields, and spike train autocorrelation functions for WT and 4M Cngb1neo/neo animals are shown in Figure 2-figure supplement 1E-F. These RGCs follow the trends displayed by the larger populations of RGCs in each analysis. We chose this cell type because they are readily identified by their spike train autocorrelation functions compared to other RGC types and they have approximately space-time separable receptive fields (RFs). There are many text changes associated with adding an analysis of the ON Brisk sustained RGCs (see lines 202-207; 227-229; 264-267, etc).

      We chose not to focus on direction selective RGCs because we are analyzing the spatial and temporal RFs of RGCs in Figures 3-5 and direction-selective RGCs do not have space-time separable RFs (see example in Figure 2C-D). Thus, those cells could not be used to track those receptive field properties across degeneration. Also, we did not collect responses to drifting gratings or bar responses across a range of speeds or contrasts, so we are unable to reliably distinguish the different types of direction-selective RGCs (e.g., ON vs ON-OFF) from these data.

      While the non-monotonic changes are interesting, they are also difficult to make sense of. Can the authors speculate in the Discussion what could be underlying mechanisms that give rise to non-monotonic changes. In the absence of potential mechanisms, the concern of recording artifacts arises.

      Thank you for raising this point. We have added some speculation for the cause of these non-monotonic changes in the Discussion (lines 455-462). “While we do not know why non-monotonic changes are occurring for some RF properties, they largely occurred in the 3-5M range. During this time, there is a transient decrease in the rate of rod death (4-5M) and cone death begins (Figure 1). Consequently, there may be complex changes to retinal circuitry as the retina reacts to a temporary stabilization in rod numbers and an acceleration in cone death. Intracellular studies of the light-driven synaptic currents impinging onto bipolar cells and RGCs during this time will be important for understanding the origin of these non-monotonic changes in RF properties.”

      The mutual information calculation seems to be correlated with the spike rate despite the argument made in Fig 10E-F. Can the authors show this directly by calculating the bits per spike in Figures 8 and 9? Of all the metrics, the gain function and the mutual information seem to be more consistent with each other. Can the authors demonstrate or refute a connection between the spike rate and information rates?

      We added a supplementary figure to each of the information figures (see for Figures 8-10 figure supplement 1) showing the trends hold after dividing the information rate by the spike rate. Certainly, changing spike rates are contributing, but there are also clear changes in the bits/spike plots (Figure 8-figure supplement 1D; Figure 9-figure supplement 1D, Figure 10-figure supplement 1D).

      Can the authors provide an explanation for why the mutual information calculation remains stable despite lower SNR and lower gain, especially after the contributions of oscillations have been ruled out?

      The mutual information depends more strongly on the precision of spiking (both in terms of time and spike number within a small time bin) than the mean spike rate (averaged over the stimulus). Diminishing the total number of spikes (because of reduced gain) will have a relatively small effect on the information rate if the spike trains continue to exhibit low variability (high precision). Indeed, spike generation by RGCs is distinctly sub-Poisson (Berry, Warland, and Meister 1997), indicating it can exhibit relatively high information rates even when spike rates are relatively low. We clarified this in Results at lines 493-496.

      Lack of age-matched WT controls to accompany the different time points. It is known that photoreceptor degeneration can occur naturally in WT mice. Though the authors have used controls pooled from across the ages used in the CNGB1 mutants, it would be informative to know if there are age-dependent changes in any of the metrics for WT mice.

      WT recordings were pooled from retinas from littermate control mice between 2 and 7 months of age (n=3 2M, n=1 each 4M, 6M, 7M). We have added data points from individual retinal recordings to the figure supplements for Figure 2-6 and 8-10 to illustrate the consistency between these recordings, which allowed us to confidently pool the results.

      Can the authors elaborate on why cone function persists despite the rod loss and morphological changes? This is unique for other models of rod loss and is worth extra discussion.

      This is something we are also very interested in, but outside the scope of this study. The Sampath Lab (co-author and collaborator) has data from single cell recordings in late stage rd10 retinas that show abnormal cone signaling (and structure similar to the 7M Cngb1neo/neo cones), yet relatively normal cone bipolar cell and horizontal cell responses. Thus, somehow there is either compensation or a high level of redundancy in the transmission of signals from cones to 2nd-order neurons that makes the responses of the 2nd-order neurons robust to deteriorating cone function. These results suggest our observations in Cngb1neo/neo mice are not unique to this model of RP. Future experiments are needed to understand how this compensation is occurring.

      Reviewer #2 (Public Review):

      In this study, the authors assess the decline of retinal function in a mouse model of slow photoreceptor degeneration - the Cngb1neo/neo. Rod loss occurs between 1-7 months and complete cone loss occurs by 8-9 months. The authors characterize cone loss in the first 7 months and find that 70% of cones are still there at 7 months, though their outer segments are highly degraded. They then use MEA recordings to characterize retinal function using a variety of measures. First, they use spike-triggered averaging to determine the spatial and temporal receptive fields, restricting this analysis to RGCs that have separable spatial and temporal receptive fields. They find that both rod and cone receptive fields are surprisingly intact over the first 5 months, identifying primarily a reduction in contrast response functions (and a reduction in the number of rods that are light responsive-though this is not quantified). Second, they show that oscillatory activity does not appear until after photoreceptors are completely deteriorated-in sharp contrast to other PR degeneration models (e.g. rd10) in which oscillatory activity appears while there are still light-evoked responses. Third, they use information theory to assess the reliability of signaling. When examining the 10% of RGCs with the highest information rates they see a significant decrease at mesoscopic light levels, while information rates were mostly stable at photopic light levels. Finally, they showed that at photopic light levels, the mutant retinas conveyed more information about natural movies than a repeating checkerboard, and this was maintained across light levels.

      My primary question is whether this represents a significant advance. There have been many studies regarding the changing retinal circuits in various rodent models of photoreceptor degeneration. The authors make a few arguments regarding the uniqueness of this study.

      One is that this is a novel analysis that is not limited to particular cell types but rather characterized the retinal as a "whole". But in this point is also its weakness. First, one cannot speak to the retinal as a "whole" since they state that there is a reduction in the number of light-responsive cells across degeneration - yet they do not quantify it. This seems incredibly important to know because even presuming the remaining cells have perfect receptive field structure if only 10% of cells are left, assessing the receptive fields of only the remaining cells is clearly not a characterization of the retention of visual function.

      We never claim that we have assessed the “retina as a whole”. We do state that we are measuring certain features of RGC signaling that reflect the “net changes” induced by photoreceptor degeneration (e.g., changes in photoreceptor function, retinal rewiring, homeostatic mechanisms, etc.) on those features. In fact, we are explicit that we are only measuring certain RF properties in certain RGC types, such as the linear spatial and temporal RFs in cells with space-time separable RFs: Figure 2 makes this point explicitly. We do not measure changes in direction-selectivity, object motion sensitivity, orientation selectivity, edge detection, looming detection, luminance encoding, chromatic opponency, contrast adaptation, motion reversal signaling, etc., because doing so would produce a manuscript with at least one figure for every RGC type (e.g., 45 figures). This would clearly be an unreasonable amount for a single study.

      We agree with the Reviewer that explicitly quantifying the number of light responsive RGCs is important, and we now include this information as a function of degeneration time point in Figure 2-figure supplement 1. Under photopic conditions, this fraction is quite stable until 5M and then begins to deteriorate. We also observe a decrease in the number of RGCs with space-time separable RFs at 5M (Figure 2F), suggesting (but not proving) that these RGCs are representative of changes across all RGCs. We also described these results in the Results (lines 167-174).

      Second, it is hard to assess whether this mouse model is better than existing models for human disease. Their phenotype is different than the rat model of this same disease. It also shows a lack of oscillatory activity that is apparent in rd models.

      We are not making the claim that this model is better than other models. Each model has value. However, because the degeneration in this model is relatively slow, it may be more representative of changes that occur in slower forms of human retinal degeneration (emphasis on “may be”). This is a discussion point, not something that we are aiming to prove. We also believe the utility of a model depends on the questions being asked. In this case, we aimed to track changes over time during photoreceptor loss to better understand the extent to which retinal output is impaired.

      Also, retinitis pigmentosa is a heterogenous disease with a spectrum of phenotypes that may or may not be genotype specific. A patient with a PDE6B mutation presents with differing phenotypes than a patient with CNGB1 mutation, despite both having an RP diagnosis. It is fallacy to assume a mouse is the exact same as a human, just as it is incorrect to assume clinical presentations are identical for all patients for one broad disease that is known to have a diverse set of underlying causes. Studying a range of models is thus essential to understanding the disease. Given that mutations causing RP have different impacts on retinal signaling, we believe it is important to contextualize findings to their mutation. We make this point in Discussion: Comparison to previous studies of RGC signaling in retinitis pigmentosa (beginning on line 436).

      Finally, the model we study does not lack oscillatory activity, it simply arises later than in rd1 or rd10 mice and does so only after all the photoreceptors have died (Figure 7). To our knowledge, it is not clear when or even if RGCs exhibit oscillations in human patients with RP. We discuss why oscillation might arise at different time points in different genetic models of RP in lines 555-570.

      Reviewer #3 (Public Review):

      In the manuscript by Scalabrino et al. a rigorous characterization of the functionality of retinal ganglion cells in a mouse model of rod photoreceptor degeneration is presented. The authors analyzed the degeneration of cone photoreceptors, which is known to be linked to rod degeneration. Based on the time course of cone degeneration they investigated the functional properties of retinal ganglion cells aged between 1 month and seven months.

      The most interesting finding is robust preservation of functional properties, as reflected in little changes of the receptive fields (spatial and temporal characteristics) or signaling fidelity/information rate. In contrast to other mouse models, the present one shows no oscillatory activity until a complete loss of cone photoreceptors occurred at an age of nine months.

      Although the receptive fields of retinal ganglion cells remain nearly intact, the number of ganglion cells with identifiable receptive fields decreases significantly with age (Fig.2F). Could the authors comment, if this might imply a "patchy" vision?

      Visual field loss is a predominant clinical observation in patients with retinitis pigmentosa, including those with Cngb1 mutations. We connect to this observation in the Discussion at lines 521-529: “At the latest stages of photoreceptor degeneration in the Cngb1neo/neo mice (5-7M), we did observe a decrease in the fraction of RGCs with spike rates that were strongly modulated by checkerboard noise (Supplemental Figure 2). It is possible these RGCs were losing their light response completely, or that changes in their light response properties made them relatively unresponsive to checkerboard noise. If the former, it is possible that light responsive RGCs are becoming sparser at the later stages of degeneration which may result in inhomogeneous, or “patchy”, visual sensitivity described by RP patients (see reviews by Hull et al., 2017; Nassisi et al., 2021).”

      Reviewer #4 (Public Review):

      Scalabrino et al. report the remarkable persistence of cone-driven retinal ganglion cell responses in a mouse model of retinitis pigmentosa (i.e., Cngb1 KO mice). The authors first map the time course of primary rod and secondary cone degeneration in Cngb1 KO mice. Approximately 30% of rods are gone at one month (1M), and all rods are lost by 7M in Cngb1 KO retinas. The cone morphology changes progressively as rods degenerate, cone outer segments shrink and are largely absent by 5M. Cones die between 8-9M. Scalabrino et al. next perform multielectrode array recordings from wild-type and Cngb1 KO retinas from 1M to 5M in mesopic and photopic stimulus conditions. They find that spatiotemporal receptive fields remain relatively stable in the face of photoreceptor degeneration, whereas contrast gain gradually decreases. Oscillatory spontaneous ganglion cell activity emerges late (~9M) in Cngb1 KO mice compared to other retinal degeneration models. Finally, the authors analyze mutual information between stimuli (white noise and naturalistic movies) and ganglion cell spikes trains and find that the encoding of the most informative ganglion cells is preserved relatively late into photoreceptor degeneration and that information rates decline less in photopic vs. mesopic conditions and for naturalistic movies vs. white noise stimuli.

      Overall, this is an exciting study that shows remarkable preservation of cone-driven ganglion cell light responses in advanced stages of a retinitis pigmentosa model when most rods have died, and cone morphologies are dramatically altered. The results are presented clearly in the text and figures and are scholarly discussed. Nonetheless, the authors should address a few specific comments to clarify and better support some of the conclusions they draw.

      Specific comments:

      1) In describing the results on information encoding, the authors write and show data (panels A of Figures 8-10) that suggest that most ganglion cells, even in recordings from wild-type retinas, respond unreliably to white noise stimuli and naturalistic movies. Why does such a large fraction of cells have such low repeat reliability? Does this reflect unreliable spike detection and sorting, poor cell or tissue health, or true variability in the responses of healthy retinal ganglion cells. The latter does not seem to align with results from patch-clamp recordings targeted to specific ganglion cell types. The limited repeat reliability also raises questions about how well the linear-nonlinear model, which the authors use to compare responses between wild-type and Cngb1 KO mice of different ages, predicts the responses of these cells. Comparing model parameters (receptive field size, temporal filtering, and contrast sensitivity) between genotypes and ages only makes sense if the model is a good description in the acquired datasets.

      We agree with the reviewer that this is an important point to be clear about. In Figures 8-10 some RGCs exhibit high repeatability, some exhibit low repeatability as quantified by their information rates. The reviewer is concerned about those cells with low repeatability and the ability of capturing their responses with an LN model. This is a valid concern, but to be clear, we are not fitting an LN model to cells with low information rates. In Figures 3-6, where an LN model is being used to estimate the spatial and temporal components of the RFs, we are fitting a subset of all the RGCs: those with space-time separable RFs (see Figure 2). Those particular cells exhibit high information rates and highly reproducible responses, and an LN model captures ~60% of the explainable variance in the spike rate (see Figure 2-figure supplement 1A-B; also see lines 157-151). This is typical for LN models that approximately predict the responses of RGCs to checkerboard noise. Thus, we think the LN model reasonably captures the responses of cells for which we use the LN model. The information rate estimates include these cells as well as other cells that are not well described by an LN model. Note, the LN model is not used to calculate the mutual information rates. We have added text in the Results (lines 324-327) to clarify this.

      In addition, the information rates we estimated in mouse are consistent with past studies from guinea pig (Koch et al, 2004 and Koch et al, 2006). We think cells with very low repeatability are not well driven by checkerboard noise or the particular 10s natural movies we showed. We have updated the example neurons to better reflect the reliability of the cells near the median of the MI distributions in Figures 8-10.

      2) The authors should, maybe in figure supplements and parts of the main figures, break results down by recordings. Inter-experimental variability has been well documented (e.g., Shah et al. Neuron 2022, Zhao et al Sci Rep 2020), and it would be reassuring to see that the conclusions drawn by the authors are supported by statistics in which n = number of recordings (e.g., there is a somewhat difficult to explain broadening of temporal filters in 4M Cngb1 KO retinas that recover by 5M).

      We agree that inter-experiment variability can be large and is important to control for. We now show all the analyses broken down by experiment in Supplemental Figures (2, 3, 4, 5, 6, 8, 9, and 10) for each analysis. None of the trends we describe or highlight in the manuscript were driven by inter-experiment variability.

      3) At different points in their manuscript, the authors conclude that their results "suggest that homeostatic mechanisms in the retina serve to compensate for deteriorating photoreceptors" (or similar). I think that this may well be the case. However, in its present form, the study provides no evidence that retinal circuits in Cngb1 KO mice change to preserve function compared to the alternative that the observed stability is evidence for functional redundancy or resilience in retinal circuits (as they are) without the need for adjustments. Distinguishing between these alternatives would be conceptually important. For example, Care et al. Cell Rep 2019 and Care et al. Cell Rep 2020 used partial stimulation to activate fewer photoreceptors and compare light responses in downstream neurons to those in retinas with fewer photoreceptors. Other studies have directly observed changes in circuit wiring in models of retinal degeneration. If the authors cannot provide experimental evidence for homeostatic changes, it would be good to reflect this in the interpretation and discussion.

      The reviewer raises a terrific point and potential alternative interpretation. We agree. We have not been able to identify an equivalent analysis to that in Care et al. 2019 that we can run that will cleanly distinguish between these two possibilities, without doing many more experiments across timepoints of degeneration. We have thus rewritten portions of the Introduction and the Discussion to recognize the potential of this alternative interpretation.

      Introduction (lines 39-44): Alternatively, homeostatic plasticity or redundancy in retinal circuitry may compensate for photoreceptor loss (Care et al., 2020; Lee et al., 2021; Shen et al., 2020). Such mechanisms could facilitate reliable signaling at the level of retinal output, despite deterioration in photoreceptor function. Identifying the extent to which changes in photoreceptor morphology impact retinal output will inform treatment timepoints for gene therapies aimed at halting rod loss to preserve cone-mediated vision.

      Discussion (lines 514-520): There are two potential classes of mechanisms for this compensation. First, homeostatic plasticity has been documented in models of photoreceptor loss in which the retina remodels to preserve signal transmission (Care et al., 2019; Keck et al., 2013, 2011, 2008; Leinonen et al., 2020; Shen et al., 2020). Alternatively, functional redundancy within the circuit could explain how robust retinal signaling is retained longer than the changes in cone morphology would suggest (Care et al., 2020). This study did not distinguish between the two compensation models.

      4) The authors do not attempt to classify retinal ganglion cells into functional types as functional changes from degeneration may confound such classifications. However, it would be beneficial to separate some categorical response types (direction-selective ON-OFF and ON ganglion cells, maybe orientation-selective [horizontal, vertical, ON, OFF] ganglion cells) and compare how their responsiveness, reliability, and information encoding change with degeneration. This would provide additional insights and address concerns that changes caused by degeneration may be obscured by the differences between ganglion cell types in the present analysis.

      We agree. We now track ON brisk sustained RGCs across degeneration time points for the RF analyses and mutual information analyses. These RGCs are likely the ON sustained alpha cells because they generate very large spikes on the MEA as would be expected for cells with large somata. Example receptive field mosaics, temporal receptive fields, and spike train autocorrelation functions for WT and 4M Cngb1neo/neo animals are shown in Figure 2-figure supplement 1E-F. These RGCs follow the trends displayed by the larger populations of RGCs in each analysis. We chose this cell type because they are readily identified by their spike train autocorrelation functions compared to other RGC types and they have approximately space-time separable receptive fields (RFs). There are many text changes associated with adding an analysis of the ON Brisk sustained RGCs (see lines 202-207; 227-229; 264-267, etc).

      We chose not to focus on direction selective RGCs because we are analyzing the spatial and temporal RFs of RGCs in Figures 3-5 and direction-selective RGCs do not have space-time separable RFs (see example in Figure 2C-D). Thus, those cells could not be used to track those receptive field properties across degeneration. Also, we did not collect responses to drifting gratings or bar responses across a range of speeds or contrasts, so we are unable to reliably distinguish the different types of direction-selective RGCs (e.g., ON vs ON-OFF) from these data.

    1. Author Response:

      Reviewer 2 (Public Review):

      Weaknesses 1. I had difficulty following the ANOVA results for Figure 1. I assume ANOVA was performed with factors of session and block. However, a single F statistic is reported. I do not know what this is referring to. It would be more appropriate to either perform repeated measures ANOVA with session and block as factors for each dependent variable or even better, multiple analyses of variance for the three dependent measures concurrently. Then report the univariate ANOVA results for each dependent measure. The graphs in Figure 1 (C-E) suggest a main effect of block, but as reported, I cannot tell if this is the case. Further, why was sex not included as an ANOVA factor?

      For the sake of transparency, we had included plots showing sessions split by each block whereas statistics related to the right side bar plots where data are collapsed across risk (which was done to minimize effects from ‘missing’ data). We appreciate that this may have caused confusion. In the revised manuscript we specify the exact figure for each statistical result and have added a better description in the methods and updated the statistics (Table 1) with the ANOVA and post-hoc results.

      Previously we had used a mixed effects model because one subject did not complete any risk trials in session 3 but in the revised manuscript, we decided to remove that subjects’ sessions to permit RM ANOVA. As requested by the reviewer, we performed a multivariate analysis on risk and no risk trials. Because of the repeated measures design we opted to utilize the multiple RM package developed by Friedrich et al. 2019, which permits multivariate analysis on repeated measures data with minimal assumptions and bootstrapped p-values for small sample sizes. We found significant interactions for session (or treatment) and risk (see tables below). This justifies the two-way univariate ANOVA which is now reported in the rest of the manuscript. Sex differences were not included in the ANOVA because the study was not intended to assess sex differences but, rather, was designed according to NIH requirements for inclusion of males and females.

      Note: MATS test was utilized based on author recommendations in Friedrich et al., (2019) for when test violates singularity, which was reported. To replicate use a random seed of 8675309.

      Package link: https://rdrr.io/github/smn74/MANOVA.RM/man/multRM.html Publication: Friedrich, S., Konietschke, F., & Pauly, M. (2019). Resampling-based analysis of multivariate data and repeated measures designs with the R package MANOVA. RM. R J., 11(2), 380.

      1. The authors describe session 1 as characterized by 'overgeneralization' due to increased reward latencies. I do not follow this logic. Generalization typically refers to a situation in which a response to one action or cue extends to a second, similar action or cue. In the authors' design, there is only one cue and one action. I do not see how generalization is relevant here.

      This wording has been changed to “non-specific” in the results and discussion.

      1. The authors consistently report dmPFC and VTA 'neural activity'. The authors did not record neural activity. The authors recorded changes in fluorescence due to calcium influx into neurons. Even if these changes have similar properties to neural activity measured with single-unit recording, the authors did not record neural activity in this manuscript.

      We do not imply that we recorded unit activity in these studies and state in the introduction that fiber photometry is an indirect measure of neural activity. We have also reworded much of the text in the manuscript to use “calcium activity.”

      1. Comparing the patterns in Figures 2 and 3, it appears that dmPFC change in fluorescence was similar in non-shocked and shock trials up until shock delivery. However, the VTA patterns differ. No cue differences were observed for sessions 1-3 on shock trials (Figure 3A), yet differences were observed on non-shocked trials (Figure 2F). Further, changes in fluorescence between sessions 1 and 2/3 appeared to emerge just as foot shock would have been delivered. A split should be evident in Figure 3B - but it is not. Were these differences caused by sampling issues due to foot shock trials being rarer?

      We agree, although some of this could be because footshock trials were collapsed across blocks 2 and 3 (as no differences in shock response was observed between blocks). Nevertheless, we have added a caveat about cue responses to the results (see page 11-bottom and 15-top). Regarding the lack of a split in Figure 3A – this difference may be due to shock onset time. The permutation tests indicate the differences in action activity in Figure 2B emerge about 0.5 – 0.8 seconds after action which is when the shock begins. So it is not surprising the results in 2F do not match well with 3A given the rapid and robust response to the footshock.

      1. Similar to Figure 1, I could not follow the ANOVA results for the effects of diazepam treatment on trials completed, action latency and reward latency (Figure 4). Related, from what session do the bar plot data in Figure 4B come from? Is it the average of the 6% and 10% blocks? I cannot tell.

      Please see our response in comment 1 for relevant analysis to this comment. Yes average of risk blocks is the average of 6 and 10%. Better explanation of what bar plot data represent are now explained in the methods.

      1. For the diazepam experiment, did all rats receive saline and diazepam injections in separate sessions? If so, were these sessions counterbalanced? And further, how did the session numbers relate to sessions 1-3 of the first study? All of these details are extremely relevant to interpreting the results and comparing them to the first study, as session # appeared to be an important factor. For example - the decrease in dmPFC fluorescence to reward during the No-Risk block appeared to better match the fluorescent pattern seen in sessions 1 and 2 of the first experiment. In which case, the saline vs. diazepam difference was due to saline rats not showing the expected pattern of fluorescence.

      Subjects received saline and diazepam in separate sessions. Furthermore, diazepam was not tested until animals had at least 3 sessions of training (range of sessions 4-8). Clarification has been added to the methods.

      The new AUC analysis for Reviewer 1 allows for better assessment of the potential differences between earlier sessions and saline (see figure 7- supplements 2 and 3). We also found the effect with reward and diazepam perplexing and somewhat modest. However, even after comparing only Saline and Session 3 PFC AUC data we found no significant effect of session or session*risk interaction for action or reward (F values < 1.3, p values >.27).

      1. The authors seem convinced that fiber photometry is a surrogate for neural activity. Although significant correlation coefficients are found during action and reward, these values hover around 0.6 for the dmPFC and 0.75 for the VTA. Further, no correlations are observed for cue periods. A strength of the calcium imaging approach is that it permits the monitoring of specific neural populations. This would have been very valuable for the VTA, in which dopamine and GABA neurons must show very different patterns of activity. Opting for fiber photometry and then using a pan-neuronal approach fails to leverage the strength of the approach.

      The parent paper (Park & Moghaddam, 2017) used unit recording in this task (including reporting data from dopamine and non-dopamine VTA units). We assure the reviewer that we do not claim that fiber photometry is a perfect surrogate for direct recording of neural activity. However, a key question we wanted to answer in this study was whether the response of PFC and VTA to the footshock changes during task acquisition (please see last paragraph of introduction), hence the choice to use fiber photometry. We note in the results and discussion that this approach is not optimal for detecting cue or other rapid responses (see page 15 and 23).

      Reviewer 3 (Public Review):

      Probably the biggest overall issue is that it is unclear what is being learned specifically. There is no probe test at the end to dissociate the direct impact of shock from its learned impact. And the blocks are not signaled in some other way. And though there seems to be some evidence that the shock effects get more pronounced with a session, it is not clear if the rats are really learning to associate specific shock risks with the particular trials. Indeed with so few sessions and so few actual shocks, this seems really unlikely, especially since without an independent cue, the shock and its frequency is the cue for the block switch. It seems especially unlikely that there is a strong dichotomy in the rats model of the environment between 6% and 10% blocks. This may be quite relevant for understanding foraging under risk. But I think it means some of the language in the paper about contingencies and the like should be avoided.

      While the parent paper (Park & Moghaddam, 2017) delved more deeply into this question we agree that what exactly is learned may be difficult to ascertain. To address this (please also see response to reviewer #1’s first comment), we have toned down our use of the “contingency learning” throughout the manuscript and use the word contingency in relation to the underlying reinforcement/punishment schedules.

      The second issue I had was that I had some trouble lining up the claims in the results with what appeared to be meaningful differences in the figures. Just looking at it, it seems to me that VTA shows higher activities at higher shocks, particularly at the time of reward but also when comparing safe vs risky anyway for the cue and action periods. DmPFC shows a similar pattern in the reward period. […] But these results are not described at all like this. The focus is on the action period only and on ramping? I don't really see ramping. it says "Anxiogenic contingencies also did not influence the phasic response to reward...". But fig 3 seems to show clearly different reward responses? The characterization of the change is particularly important since to me it looks like the diazepam essentially normalizes these features of the response. This makes sense to me […].

      We initially believed that much of the differences in reward (with the exception of Session 2 in the PFC) were from carryover of differences in the peri-action period. However upon quantifying these responses again using AUC change scores to adjust for pre-event differences in the signal, we observed small reward related increases (data are in Figure 7 – supplements 2/3) and have updated results and the discussion.

      Although some lessening of reward response may be apparent across the diazepam session in the VTA (Figure 7 – supplement 2/3G), we do not have statistical support for this as no significant differences were observed in permutation comparisons to saline and only session 3 deviated from the first session for the reward period in the AUC analyses.

    1. Author Response

      Reviewer 3

      This papers builds on a previous publication from the same group that showed compartmentalisation model of beta-cell fuel metabolism in which plasma membrane-localized pyruvate kinase is sufficient to close KATP channels required for insulin secretion. In this current manuscript the authors identified the PK isoforms involved in this process using tissue specific KO mouse models. Using excised patch-clamp experiments, they demonstrated that although redundant in their function both the constitutively active PKm1 and allosterically PKm2 are associated with the PM and locally regulate KATP channel closure. Further, the authors showed that the mitochondrial PEP carboxylase (PCK2) is essential for amino acids to promote an increase in cytosolic ATP/ADP and closure of KATP channels. Therefore, this study very nicely demonstrates that he distinct response of PK isoforms to the mitochondrial and glycolytic sources of PEP impacts beta cell nutrient preference and affects the oscillatory cycle regulating secretion. These findings do provide new mechanistic information about the control of the regulated secretory pathway and will be of interest to broader audience.

      Strength<br /> The major strength of the study is the use of tissue/isoform specific KO mouse models. Although limited by constitutive KOs with compensatory increase in other isoforms, the authors have achieved what they were set out to do i.e identify the PK isoform involved in the regulation of PM ATP generation and regulation of KATP channel closure. Their experimental rigorosity including the ability to perform the excised patch clamp experiments and use of PKa to show the specific effect of the allosterically regulated PKM2 are also strength.

      Weakness<br /> It is not clear from the manuscript what the "littermate controls" are used in all the experiments. Given the limitations of the cre lox system, it is really important to clearly show what controls have been used and their phenotypes (and the rationale for pooling the different controls if that is what is done here).

      Response: We apologize that this was unclear. Littermate Ins1-Cre controls were used for the PKm1-βKO and PKm2-βKO models, whereas floxed controls (i.e. Pck2f/f) were used for the PCK2-βKO. This is described in the first paragraph of the results section as well as the methods.

      The data adds to our understanding of the role of PM localised PK on the regulated exocytosis pathway however the claim that these findings question the canonical mitochondrial ATP coupled to KATP channel closure is not fully supported by the data especially given glucose induced insulin secretion is not affected by any of the KO models.

      Response: Had we performed experiments that fully block flux through pyruvate kinase, we expect that glucose-stimulated insulin secretion would be impaired if not eliminated. However, as this experiment would prevent both glycolytic and mitochondrial ATP production, it would not address whether a mitochondrially-derived increase in cytosolic ATP/ADP is the primary mechanism of KATP channel closure, as proposed by the canonical model.

      Due to the redundant response of PKm1 and PKm2 to glucose, isoform-specific deletion allowed us to test whether a fuel-stimulated rise in cytosolic ATP/ADP is sufficient to close KATP channels in healthy β-cells (i.e. in the absence of glucose intolerance) using a protocol of low glucose and amino acid stimulation. Our findings show that raising cytosolic ATP/ADP with amino acids is insufficient to close KATP channels in the absence of PK activity or PCK2, indicating that KATP channels are regulated primarily by PEP that provides ATP via plasma membrane-associated PK rather than mitochondrially-derived ATP. In these experiments, insulin secretion shows the expected reduction in both the β-cell PCK2 and PKm1 knockout models.

      In the discussion, we point out that mitochondrially-derived ATP is likely important for buffering this plasma membrane-compartmentalized KATP closure by pyruvate kinase. However, on balance, our data argue that glycolysis preferentially closes KATP channels, as previously shown in cardiac myocytes (Lamp and Weiss, Science 1987).

    1. Author Response:

      Reviewer #1:

      Charpentier et al. use facial recognition technology to show that mothers in a group of mandrills lead their offspring to associate with phenotypically similar offspring. Mandrills are a species of primate that live in large, matrilineal troops, with a single, dominant male that fathers the majority of the offspring. Male breeder turnover and extra-pair mating by females can lead to variation in relatedness between group members and the potential for kin-selected benefits from preferentially cooperating with closer relatives within the group. The authors argue that the strategy of influencing the social network of their offspring could be favoured by "second-order kin selection", a mechanism by which inclusive fitness benefits are accrued to female actors through kin-selected benefits to their offspring. This interpretation is supported by a theoretical model.

      The paper highlights a previously unappreciated mechanism for favouring association between non-kin in social groups and also contributes a nice insight into the complexity of social interactions in a relatively understudied wild primate species. The conclusions are strengthened by data showing associations between mothers were not influenced by the facial similarity of their offspring -- this suggests that mothers are making decisions based on the appearance of offspring and not their mothers.

      Some remaining questions regarding the strength of the authors' interpretation exist: Given the challenges of studying mandrills in the field, the fact that the study reports data from a single group is understandable but potential issues remain with the independence of data points. There may be an additional issue arising from the fact that this troop is semi-captive.

      The study group is not semi-captive. Instead, it originated from two release events of a few captive individuals into the wild (in 2002 and 2006). The population is now composed of more than 250 individuals and all of them, except for 7 founder females (<3%), were born in the wild. In addition, the study group is not fed and occasionally wanders into a fenced protected area. Fences of the park do not represent a boundary for mandrills and most of the time (c.a. 80% of days), the study group ranges outside the park. We have clarified this misunderstanding.

      Regarding the independence of data points, we would be grateful if this reviewer could clarify her/his thoughts. As a tentative response, we indeed have access to a single (although large) study group, but that’s unfortunately often the case when studying primates or other large mammals. Regarding our study questions, we have clearly demonstrated increased nepotism among paternally related mandrills in two different social groups (Charpentier et al. 2007: semi-captive mandrills; Charpentier et al. 2020: wild mandrills). More generally, we do not see any parsimonious explanations for why the studied mandrills would behave or experienced selective pressures that may have differently shaped their genetic structure and social organization compared to other wild mandrill groups.

      The number of genotyped offspring is relatively small (n = 15) and paternity is inferred from the identity of the dominant male. However, the authors also refer to the fact that it's normal for female mandrills to mate with several males during ovulation.

      Indeed, both sexes mate promiscuously during the mating season. We have very recently (June 2022) obtained new genetic profiles for a subset of the study infants (it took two years to obtain these data). We have now increased our sample size of infants with a known father, from 15 to 32. With these new data, we were able to distinguish between four categories of infant-infant dyads: those sharing the same father (PHS), those not sharing the same father (not PHS), those conceived during the same alpha male tenure, and those that were not (both infants with unknown dads). The graph below shows the average facial distance among individuals for each of these four categories. It shows that infants conceived during the same alpha male tenure are significantly more similar to each other than infants sired by different fathers or during the tenure of different alpha males, but they are also significantly less similar to each other than infants born to the same father (the four categories are all significantly different from each other, except when comparing infants born to different fathers with those conceived during different alpha male tenures). As suggested by this reviewer, the fact that females mate predominantly with the alpha male, but to some extent also with other males, likely explains the difference between “same father” and “same alpha male tenure”. Importantly, however, considering all infants conceived during the same alpha male tenure as “PHS” is highly conservative. It is thus likely that knowing the paternity of every infant would produce even clearer effects (and indeed, increasing the data set from 15 to 32 strengthened this result). We have now updated this result (first model) based on this new sample.

      What evidence is there to support a beneficial effect of nepotism in this species?

      In mandrills, females who affiliate more (groom more/associate more) with their groupmates (kin or non-kin) during juvenility also reproduce 1 year earlier than those females that are poorly socially integrated (Charpentier et al. 2012). These results are similar to what is known in many mammalian species (see for review Snyder-Mackler et al. 2020). However, the positive effects of a rich social life are generally triggered by all group members, not only close kin. However, if beneficial social relationships impact the direct fitness of individuals, as reported in mandrills and other species, then kin selection theory predicts that these effects should further translate into indirect fitness benefits.

      We have now added this relevant reference (Charpentier et al. 2012) in the revised version of our manuscript and present the results of this early study on mandrills.

      What form could nepotism take and does it necessarily have to involve full sibs?

      We are unsure why this reviewer is mentioning full-sibs here. For this reviewer information, on the 2556 study dyads (model 1 on the impact of maternal and paternal origins on facial distance), only one dyad was a full-sib pair. Full-sibs are therefore very rare in the study population due to male migration patterns and generally short alpha male tenures.

      If a female did not associate with offspring as shown here, would nepotistic interactions simply arise between her offspring and offspring that were less facially similar?

      We guess that facial similarity would not be a predictor of spatial association anymore. Indeed, we think that young mandrills do not use self-referent phenotype matching, precluding the self-evaluation of those infants that look like them. However, as stated below, we cannot fully exclude the possibility that other social partners, such as fathers, may also influence infant-infant relationships, although we think that this alternative mechanism is less parsimonious than the one we propose and test.

      Reviewer #2:

      This paper uses data on patterns of spatial association and facial similarity in mandrills to develop a new hypothesis for the evolution of kin recognition based on facial cues. Previous work on this system has shown that, among females, paternal half-sibs resemble each other visually more than maternal half-sisters do. The authors hypothesise that this paternally inherited facial similarity provides opportunities for kin selection, but it is unclear how offspring themselves could recognise kin using phenotype matching since they are unable to see their own face. One answer to this puzzle is that third parties -- mothers -- may promote social interactions between their own offspring and other offspring that resemble them since these other offspring are likely to share the same father. In support of this hypothesis, the authors find that mothers and offspring show spatial proximity to infants that are facially more similar than average. They also use an analytical evolutionary model to confirm the logic of this hypothesis. The model shows that mothers can gain inclusive fitness benefits by encouraging reciprocal social interaction among their offspring and other paternally-related offspring. They term this idea 'second-order' kin selection and identify a range of other circumstances in which it might play an important role in shaping the evolution of social behaviour.

      The main strengths of the paper are the interesting mandrill data and the cutting-edge methods used to analyse facial similarity, which have stimulated the development of a theoretically interesting hypothesis about the evolution of facially based kin recognition. The theoretical model enhances the generality and rigour of the work. The paper will be of wide interest and the concept of second-order kin selection may be applicable to other social circumstances, such as interactions among in-laws in close-knit family groups. Thus, I can see that this paper will be a stimulus for future work.

      We are grateful for these positive comments.

      The data are, I think, rather overinterpreted in terms of the degree to which they support the hypothesis. The spatial proximity data are interesting, but on their own, they are not definitive support for the hypothesis or model. A more critical approach to the hypothesis, clearly setting out the limitations of the data, and what tests in future could be used to falsify the hypothesis or model, would make for a stronger paper.

      We agree with this general comment and have addressed it by 1. Adding a model on grooming relationships between females and infants, 2. Toning down our interpretation throughout the manuscript and 3. Propose future directions of research.

      Overall the authors have presented data that support a fascinating new mechanism by which natural selection can influence social interactions among the members of family groups, in potentially surprising ways. I also find it remarkable that 60 years after the development of the kin selection theory new implications of this theory are still being uncovered. The concept of second-order kin selection may prove important in understanding the evolution of social organisation and behaviour in species that live in groups containing a mixture of kin and non-kin, such as many primates and of course humans.

      We are grateful to this reviewer for this very positive comment. We fully agree with the fact that 60 years after the kin selection theory has emerged, we are still discovering further implications!

      Reviewer #3:

      This is a very interesting and impressive manuscript. It is complex in its multiple components, and in some ways that makes it a difficult manuscript to evaluate. There is a lot in it, including empirical analyses of a face dataset and of behavioral association data, combined with a theoretical model.

      We are very grateful for this positive comment and are glad that you liked our manuscript.

      The three main findings are: 1) Paternal siblings look alike (similar to, and building on, a recent manuscript the authors published elsewhere); 2) Infants that are more facially similar tend to associate; and 3) mothers tend to be found in association with other unrelated infants that look more like their own infants. Such results are interesting, and indeed one potential interpretation, perhaps even the most likely, is that mothers are behaving in such a way that promotes association between their own infants and the paternal kin of their infants.

      Nonetheless, the evidence provided is logically only consistent with the authors' hypothesis, rather than being strong direct evidence for it. As such, the current framing and indeed the title, "Primate mothers promote proximity between their offspring and infants who look like them", are both problematic. (In addition, the title should be about mandrills, not "primates", since this manuscript does not provide evidence from any other species.) The evidence provided is consistent with the hypothesis, but also consistent with other potential hypotheses. The evidence given to dismiss other potential hypotheses is not strong, and rests on the fact that many males are not around all year to influence things, and that "males that were present during a given reproductive cycle are not responsible for maintaining proximity with either infants or their mothers (MJEC and BRT, pers. obs.)".

      We agree with this comment. Although, after examining several alternative mechanisms, in the light of the natural history of mandrills we are confident that the proposed mechanism is at work in that species, although we cannot firmly exclude some of these alternative mechanisms. To address this comment, we have changed the title of our manuscript that now reads “Mandrill mothers associate with infants who look like their own offspring using phenotype matching”. We have also included an additional model on grooming relationships (see response to R1) and have toned down the interpretation of our results throughout our revised manuscript. Finally, we have further discussed alternative scenario, in particular the one involving fathers (see details above).

      My opinion is that these are really interesting analyses and data, which are being somewhat undermined by the insistence that only one hypothesis can explain the observed association patterns. It could easily be presented differently, as a demonstration that paternal siblings look alike and that they associate. The authors could then go on to explore different possible explanations for this using their association data, make the case that maternal behavior is the most plausible (but not the only) explanation, and present their model of how such behavior could bring fitness benefits.

      In my view, such a presentation would be both more cautious and more appropriate, without in any way reducing the impact or importance of the data. In the current iteration, I think there are issues because the data do not provide sufficient support for the surety of the title and conclusion, as presented.

      We think that the current organization of our manuscript was not that different from the one proposed here and follows a reasoning already proposed in a former manuscript (Charpentier et al. 2020). Indeed, we first start by reminding the reader what we already know from that previous studies: paternal siblings look alike and they associate. We then go on exploring different mechanisms. That being said, and as suggested, we have been more cautious in interpreting our results, that are indeed only correlative.

    1. Author Response

      Reviewer 1

      In this manuscript, Hansen and coworkers make use of the powerful, single-molecule assay CoSMoS to study the recognition of the 5' splice site by the U1 snRNP. Specifically, they investigate how 5' splice site oligos interacts with purified U1 snRNP to isolate 5' splice sitebinding from other factors, including the CBC, BBP, and any other factors in whole cell extract that may impact binding; previous studies have investigated binding in vivo or in cellular extracts or with limited quantitative capabilities. The authors find evidence for a reversible, two-step, binding reaction in which a short-lived interaction precedes a longlived interaction and in which binding depends on the 5' splice site sequence and the 5' end of U1. The data further suggests a compelling kinetic framework for how U1 surveys nascent transcripts for a bona fide 5'SS; specifically, both authentic and inauthentic 5' splice sites form the short-lived complexes but whereas the inauthentic complex preferentially dissociates, the authentic complex preferentially proceeds to a stable complex. Using oligos with different mutations to limit base-pairing they find that at least six potential base-pairs are required for association but that a stretch of seven base-pairs, with a maximum of one mismatch, is required for the long-lived interaction, with residues near the 5' splice site playing more important roles and with length being a stronger predictor of complex lifetime than thermodynamics, with implications for splice site predictions.

      The work focuses on the determinants and mechanism of the first and a pivotal step in splicing, in a manner that completes recent structural advances. The work extends findings presented in a previous publication from the lab (Larson and Hoskins, 2017) studying binding of U1 snRNP to the 5' splice site in extract. In that study, the authors provided early evidence of two-step U1 snRNP binding in the absence of the cap binding complex or the branch point binding protein, with a more stable state following a weaker state; although factors in the extract may have influenced binding, the results are not qualitatively different here. The authors also showed some evidence in the previous study that longer binding depended on crossing a threshold and did not increase further with greater stabilization. Still, this new work is of high quality with conclusions justified by the data and of significant interest to the splicing field and of general interest to those investigating binding of snRNPs to nucleic acid.

      Specific Points:

      1. To test and define the role of protein in the snRNP, the authors need to investigate the roles of Yhc1 and Luc7 in 5' splice site binding in this assay, particularly with respect to defining the basis of asymmetry and snRNP destasbilization.

      See Reviewer Comment #1.

      1. The similarity or difference of the two-step recognition mechanism described here to the recognition mechanisms of other nucleic acids by other RNP complexes is unclear. The authors need to put their findings into a larger context, relating their findings to studies of analogous systems described in the literature.

      See Reviewer Comments #2 and #4

      1. It is important that the authors address whether they can rule out that the exclusively long-lived complexes skip the short-lived conformation.

      See Reviewer Comment #5. Overall, a model with reversible connection between the unbound state and the long-lived bound state (U->B*) is less likely to explain our data.

      1. Given the co-transcriptional nature of many splicing events, the authors should discuss how recruitment by RNAP II might impact the two-step process. For example, fast dissociation by short duplexes might be countered by retention of U1 locally via RNP II.

      See Reviewer Comment #4.

      Reviewer 2

      In this work, the authors use co-localization single-molecule spectroscopy (CoSMoS) to dissect the sequence-directed nature of pre-mRNA 5' splice site recognition by U1 snRNP using purified, surface-tethered U1 snRNP complexes and truncated substrate RNA oligonucleotides containing the 5'splice site (SS) consensus sequence. The senior author previously has extensively published on related findings using the CoSMoS approach (PMIDs 23569281, 24075986, 27244240), and the current work is a logical extension. Here the authors find that the U1 snRNP reversibly selects a suitable 5' SS in a sequencedependent, two-step mechanism. They derive a kinetic selection scheme that suggests initial base pairing at particular positions, followed by a commitment to a longer-lived complex that enters the chosen 5' SS into the splicing cycle. This type of scheme is widespread among nucleic acid-binding enzymes and sometimes referred to "conformational proofreading". The work could be further strengthened by making more connections to existing kinetic selection schemes for other enzymes.

      In the following, major suggestions for improvements are summarized.

      1. The model described in the paragraphs starting with line 262 through 280 to interpret the observation of long and short complex lifetimes is not entirely clear. There are at least two potential models that can be considered to fit the observations: a linear and a circular model. A linear model would be one where U1 and substrate RNA are not associated (state 1), then they partially associate (state 2), and finally they isomerize to the completely associated/fully hybridized complex (state 3). The circular model is the same, except that it would additionally allow switching between states 1 and 3 directly (bypassing the partially associated state). To differentiate between these two scenarios, the authors would have to vary the concentration of the RNA probe and see if there is a uniform change in a single kon rate or if two kon rates start to appear. These rate subpopulations would be much easier to detect by fitting with hidden Markov models. It would seem unjustified to decide between these two models without obtaining such additional supporting data.

      See Reviewer Comment #5.

      1. In the section describing U1/5'SS duplexes destabilization in U1 snRNP (line 281) an underlying assumption is that the binding of two RNAs (in the absence of the spliceosomal proteins) would share the same characteristics or trends as two identical RNAs incorporated into the U1 snRNP. While this may be a rhetorical device to increase the clarity/connection between the concepts of predicted binding free energies and the residence time of hybridized oligonucleotides, it does not address the possible reasons for the discrepancy observed in RNA oligonucleotide versus U1 snRNP binding. The authors should point to a reference and derive a physical model from the available cryo-EM structures to show that the U1 snRNA is, most likely, being constrained by its associated proteins in such a way that it increases the binding affinity to complementary RNA oligonucleotides.

      See Reviewer Comment #2

      1. While the two-factor authentication metaphor of Figure 7 is charming, it seems off-topic. Instead, the authors should review the literature for examples of short, exploratory binding events involving an RNA:protein complex, followed by more stable, accommodated binding events, see e.g., the work by Sarah Woodson on 30S ribosomal subunit assemble and on Hfq function, work on kinetic proofreading of the ribosome, work on Cas9-based recognition of its target site, and many others. A potential descriptive framework to be used here is that of "conformational proofreading".

      See Reviewer Comment #4.

      1. There is significant concern that the single molecule sampling rate used to acquire the CoSMoS data is too slow to accurately measure the shortest lifetimes observed, which are only ~10 seconds long. According to the Nyquist sampling criterion, the sampling rate needs to be (at least) twice the frequency of the event being measured, implying that the authors cannot meaningfully observe any lifetime shorter than ~10 seconds given their limited sampling rate. Further considering that at minimum two consecutive data points are needed for observing a 10 second lifetime, artifacts (e.g., camera noise) could make up a disproportionate amount of the signal observed in their data for these short lifetimes. For an accurate measurement, the authors need to repeat the experiments at a higher sampling rate to make sure that there are no faster, transient interactions than those currently reported, and that the values reported are accurate.

      See Reviewer Comment #6.

      1. The authors have chosen to extrapolate rates via exponential fitting to dwell time distributions. This is a reductive approach that ignores the relationship between consecutive events. It is strongly recommended that the authors consider using a hidden Markov modeling (HMM) approach instead. HMMs have long become the gold standard in single molecule biophysics. Even better, a Bayesian approach could help analyze entire datasets at the same time. In this reviewer's opinion, the ebFRET software package from the Gonzalez lab at Columbia University could, for example, work well here.

      See Reviewer Comment #7

      1. The manuscript would be majorly strengthened if the authors were testing their hypothesis that Yhc and Luc7 contribute to U1 snRNA:5'SS stabilization, by generating (e.g., temperature sensitive) mutant strains that allow them to interfere with this function of the two proteins, either in purified U1snRNPs or whole cell extracts. Alternatively, the authors could choose to test the role of trans-activing factors such as BBP/Mud 2. Without such data, and given the extensive work the authors have previously performed to already demonstrate that U1 snRNP binds to a 5'SS reversibly, with fast and slow dissociation events, one can argue that the current work falls somewhat short in providing major new biological insights. More generally, the plethora of recent cryo-EM structures gives a wonderful opportunity to ask incisive mechanistic questions, which the authors do not fully leverage.

      See Reviewer Comment #1.

      Reviewer 3

      This study of U1 snRNP interaction with the 5'ss is an interesting and exciting piece of work. In particular, the data support two important conclusions of general importance to the field: 1) the association of the U1 snRNP with the 5'ss is largely determined by the snRNP itself and does not require other splicing factors and 2) the ability to form "productive" (i.e. longlived) interactions between the U1 snRNP and the 5'ss cannot be accurately predicted by base-pairing potential alone. This second point is particularly important as many algorithms for predicting splicing efficiency are based on base-pairing strength between the U1 snRNA and the 5'ss sequence. The data immediately suggest two additional questions.

      1. The authors repeatedly speculate that the benefit of basepairing toward the 3' end is due to the activity of Yhc1. If this model is true, these 3' end basepairs should not influence binding for a U1 snRNP with a mutant Yhc1. Since the authors have used mutant Yhc1 in other studies it seems possible to test this prediction.

      See Reviewer Comment #1.

      1. Since splice sites are often "found" in the context of alternative or pseudo/near-cognate splice sites, it would be interesting to know how the "rules" identified in the experiments presented in this study influence splice site competition and whether both the short- and longlived states are subject to competition or, rather, only the short-lived complexes. Is it possible to repeat the CoSMoS experiment with two oligomer sequences of different colors?

      See Reviewer Comment #3.

      1. Finally, the authors should say more about the particular requirement for basepairing at position 6, especially in the context of the experiments in Figure 5. This is particularly striking as this position is not well conserved in natural 5'ss, at least compared to position 5.

      See Reviewer Comment #8

    1. Author Response

      Reviewer 1

      Bailon-Zambrano and colleagues were trying to answer the general question: what contributes to phenotypic variation when a gene of strong effect is mutated?

      The work has several major strengths for answering this interesting question. First, they decided to study mef2ca in zebrafish for which they had previously shown that mutants displayed highly variable facial phenotypes. To learn how phenotypic variation depends on phenotypic severity, they realized they had studied more alleles, and so induced two more alleles to have three different types of molecular lesions (start codon mutation, premature stop codon, and full coding gene deletion). Investigating these alleles showed that increasingly severe alleles had more variation among individuals in the population but not necessarily more variation between the left and right sides of the face within individuals.

      Over several years, these investigators had spent considerable effort to select lines of fish that segregate the start-codon mutation and have either severe or weak effects on facial phenotypes. wondered: what factors were selected out of the original genetic background that would increase or decrease phenotypic severity? They hypothesized that one or more of the five mef2 paralogs in zebrafish might help to ameliorate the phenotype in the low line or reciprocally intensify the phenotype in the high line. They studied expression of the mef2 paralogs in neural crest cells by single-cell transcriptomics. They found that paralogs were downregulated in the high-penetrance line with respect to an unselected line, a result expected if expression of the paralogs contributed to buffering phenotypic severity. This experiment has two weaknesses, first that the method only examined neural crest cells but we know that signals from the ectodermal and endodermal epithelia contribute to craniofacial morphologies by diffusible signals. If genes regulating craniofacial morphologies that act in epithelia had genetic variation that contributes to severity, those genes would not be investigated in these crest-only experiments. A minor problem (which is associated with the expense of the experiment) is that the scRNA-seq experiments compared only the high and unselected lines, not the low line. To address both problems, the investigators performed qPCR on RNAs extracted from whole heads of genetically mef2ca-wild types from the high and low line. In these qPCR experiments, however, they did not investigate the unselected line. Leaving out the low line in one approach and leaving out the unselected line in the other approach somewhat weakens the strength with which one can draw conclusions (e.g., the qPCR conclusion assumes that the unselected line would be intermediate between the two selected lines) but is unlikely to change the basic conclusions the authors drew. In addition, using whole heads in the qPCR experiments, while it has the advantage that it includes epithelia, does not distinguish between genes expressed only in the crest and genes expressed in other cell types, and these experiments did not test for any genes known to affect craniofacial development that are epithelium-specific.

      In response to this comment, and those below, we removed the scRNA-seq comparing neural crest cells from unselected and high-penetrance strains. We replaced those data with new important results which considerably advance our model. We found significant paralog expression variation among unselected zebrafish families (Fig. 4D). These results strongly suggest that our breeding selected upon standing paralog variation the unselected parental strains. See more below.

      Finally, in key experiments that are a major strength of the work and require significant effort, the researchers systematically made mutations in four of the five zebrafish mef2 paralogs (mef2aa, mef2b, mef2cb, and mef2d, all except mef2ab, which didn't become mutated despite significant effort) in the genetic background of the lowpenetrance strain and studied them in single homozygotes, in double mutants, and in various heterozygous combinations. These important experiments showed that some paralogs provided significant buffering in the low-penetrance strain, the strain that up-regulated expression of these paralogs. It would be helpful in the discussion to mention that mef2ab couldn't be mutated and a phrase added about what that means for the general conclusions - in the opinion of this reviewer, the impact of this is not great but it should be acknowledged.

      We acknowledge that mef2ab couldn’t be mutated and consider what that means for the general conclusions in the text.

      A strength of the experiments is that the workers quantified effects of various genotypes by focusing on the length of the symplectic, a convenient element for quantification both within single individuals and among fish in a population. It would be helpful to have a statement on the evidence that this measure is a good representative for other aspects of the phenotype.

      We provide new data indicating that the symplectic cartilage length is significantly correlated with another mef2ca-associated phenotype (Fig. 1-figure supplement 2). See more below.

      Finally, the paper presents a model for understanding the results presented that does a good job of summarizing the data and, importantly, suggests ways to move the analysis deeper. Missing from the description of the model is a discussion about whether the genetic variation that was selected and ultimately upregulated mef2 paralogs is in regulatory elements of the mef2 paralogs themselves or whether it might be in trans-acting transcriptional regulators that simultaneously regulate all mef2 paralogs due to the authors' hypothesized 'cryptic vestigial' functions.

      We considerably revised the discussion, thoroughly considering both these possibilities.

      This work is likely to have a significant impact on the fields of developmental biology, the interpretation of human mutational variation (in for example the concept of phenotypic expansion), and the way people think about the evolution of new morphologies over time. A brief comparison of the authors' results and interpretations to those of C.H. Waddington's concept of genetic assimilation would provide improved historical context and broaden the potential impact of the work.

      We now include a discussion of our study in the context of Waddington’s genetic assimilation.

      Reviewer 2

      Bailon-Zambrano et al study the possible mechanisms that contribute to the oft-observed phenomenon that an individual mutation may be associated with variable expression of a phenotype. They focus on loss-of-function of the mef2ca gene of zebrafish, which is needed for the normal development of several craniofacial structures. They demonstrate that recessive putative loss-of-function mutant alleles of the mef2ca gene of zebrafish are associated with a range of expressivity. By focusing on one aspect of the mutant phenotype, the length of the symplectic cartilages that support the jaw, they find a correlation between the average strength of the phenotype of an allele (measured as reduction in length) and the extent of variability between mutant individuals that carry the allele. I am concerned about this conclusion and generalizations that may be drawn from focus on a single quantifiable character, the symplectic cartilage. Perhaps there is always a fixed variation in the length of this cartilage. As stronger alleles produce shorter cartilage pieces, variations in size may appear to be of greater significance when affecting shorter average length.

      We now show that the symplectic cartilage length is a good proxy for other craniofacial phenotypes (Fig. 1figure supplement 2). Further, we clarify in the text that we use the coefficient of variation (standard deviation/mean) which is the accepted best practice for determining and comparing variation. We also use the F-test statistic which is the standard statistical method to test for equality of two variances. This test tells us if the standard deviations from two datasets are significantly different.

      The authors hypothesize that one factor that contributes to the varied phenotypic expression of an allele (expressivity) is the co-expression of paralogs that may provide wildtype function and thus partially or wholly rescue the mutant phenotype. They test this hypothesis by "fixing" conditions where a single mutation may be expressed with low or high penetrance. By selective breeding based on phenotype, they create two sets of strains that carry an identical mef2ca mutation: one strain has high penetrance of the mutant phenotype and the other low penetrance. They then investigate the factors that are likely responsible for the high vs low penetrance. Historically we would call these factors "genetic modifiers". There is extensive literature on the nature of genetic modifiers and there are many current screens in both mice and Drosophila to identify genetic modifiers and uncover their nature, but there is little reference to these studies in the current manuscript. Further, there is previously published work that hypothesizes that one important function of paralogs in multicellular organisms is to provide a buffer to stabilize levels of gene expression needed for developmental decisions.

      Following this reviewer’s suggestion, we now include many new references (increased from ~50 to >80) incorporating much of the important work leading up to our study. These include referencing both genetic modifier mutagenesis screens, paralogous buffering in other systems, and “natural” modifier studies that set the stage for our work.

      The authors find that paralogs of the mef2ca gene are expressed in cells that normally express mef2ca, and that these paralogs are expressed at higher levels in the mutant strain with low penetrance than in the mutant strain with high penetrance. They say that selection for high penetrance of the mef2ca mutant phenotype "leads to down-regulation" of paralog expression. As the authors only show that paralog expression is at lower levels in high penetrance vs low penetrance strains, it is not clear what they mean by "down-regulation". Perhaps their breeding scheme has only "captured" what is natural variation and there is no active mechanism of "down-regulation". The authors need to clarify what they mean.

      Thank you for this suggestion. We clarified that we do not mean active down or up regulation but rather selection on preexisting genetic variation. This conclusion is supported by new data (Fig. 4D).

      The authors also find that individuals from the high penetrance strains that don't carry the mef2ca mutation (they are wildtype for this gene) sometimes exhibit mef2ca mutant characters. They suggest the reduced paralog expression is responsible for the occasional emergence of the mef2ca mutant characters. In contrast with this suggestion, the authors later claim the paralogs "have no function" in craniofacial development. The authors need to clarify their thoughts about what is paralog function in craniofacial development and why reduced paralog function might contribute to the expression of mef2ca mutant characters. This topic is worthy of discussion.

      We considerably revised our discussion of this topic including our interpretation that the decreased expression of mef2ca in high penetrance strain led to the phenotypes we observe in mef2ca wild types from this strain. We also are more careful with our language, stating that the paralog mutants are indistinguishable from wild types, rather than stating that paralogs do not function in craniofacial development. In fact, they do function in craniofacial development, as buffers. Thank you for this suggestion that strengthened our manuscript.

      The authors claim is there is both up-regulation of paralogs in low penetrance strains and down-regulation of paralogs in high penetrance strains. As they only compare steady state levels of expression in each strain, they can only reasonably conclude that there are differences - they seem to imply a mechanism and they need to be clear about what they are thinking.

      Excellent point. In the revised manuscript, we are clear that there is not active up or down regulation but rather selection upon preexisting variation.

      They hypothesize that paralog expression in the low penetrance strain masks the effects of loss of mef2ca. They test this by creating CRISPR-engineered mutations of two paralogs and examining the effects of the paralog mutations in wildtype fish or in fish carrying the mef2ca mutation. They find the putative loss-offunction mutations in the paralogs have no effect in wildtype backgrounds and conclude these paralog genes have no function in craniofacial development. However, the paralog mutations enhance the mutant phenotype in fish that carry the mef2ca mutation. This provides strong evidence consistent with the model that the elevated expression of the paralogs functions to reduce the severity of the phenotype associated with the mef2ca mutation.

      Reviewer 3

      In this elegant genetic study, Bailon-Zambrano et al. draw on classical genetic concepts to address the clinically pertinent question of how genetic variants in the same gene can yield wildly different phenotypes in different individuals. They focus on the Mef2c gene, which is required for craniofacial and cardiac development in humans and model organisms yet shows highly variable phenotypes across and within individuals. Previous work by this lab had established that zebrafish mef2ca craniofacial phenotypes are highly variable and, importantly, that this variability is heritable and can be selectively bred for low vs. high penetrance. The authors hypothesize that vestigial expression of paralogous genes variably compensates for loss of mef2ca, such that individuals with higher levels of paralogous genes will show lessened severity and vice versa. To test their hypothesis, they methodically quantify the penetrance, expressivity, and variability of all known mef2caassociated craniofacial phenotypes in fish carrying 1) different mef2ca mutations, 2) the same mutation but after selecting for high vs. low penetrance for many generations, and 3) mef2ca mutations combined with mutations in paralogous genes. They find that not only does allele severity directly correlate with variation, but also that different paralogs buffer the severity and variability of different craniofacial phenotypes. Another particularly interesting finding is that some of the craniofacial phenotypes are apparent even in mef2ca wildtypes from the high penetrance strain, which they explain by the very low expression of paralogs on this background. A weakness of the study is that the authors do not directly show whether paralog expression is increased in the low-penetrance strain relative to the initial, unselected genetic background. It is therefore not clear whether the selection for low penetrance worked in this manner, as the authors imply. Overall, the authors have achieved an important step forward in understanding the genetic basis for the high variability of human faces among both healthy individuals and those with craniofacial anomalies.

      We can’t go back (over ten generations) to survey the original parental strain. However, we can use the unselected AB strain as a proxy for the initial unselected genetic background. In an important addition to the manuscript, we found significant paralog expression variation between unselected AB families (Fig. 4D). These results strongly suggesting there is cryptic, standing paralog expression variation that we selected upon. We would like to thank the reviewer for this excellent critique which motivated these important new experiments considerably advancing our model.

    1. Author Response

      Reviewer 1

      Ting Tang et al. present the results of a species x genotype diversity experiment within BEF China. The authors assess the relative impacts of species and genotype diversity on community-level primary productivity of the trees and the potential mediation of this effect via interactions of plants with soil fungi and herbivores. The results show that both species and genotype diversity influence productivity via changes in herbivory, soil fungal diversity, and other unknown mechanisms. Most of the species diversity effects could be directly related to functional diversity, while genotype diversity effects were not well represented by the way functional diversity was measured in this study.

      Thanks for the positive comments on the paper.

      The study is based on an impressive experiment that will certainly allow achieving major insights into the role of genotype and species diversity on ecosystem functioning. However, there are some significant shortcomings in the methods that limit this study. In particular, the incomplete assessment of functional traits, herbivory, and fungal diversity across the subplots used for this study reduces statistical power. Specific measurements of traits, herbivory and fungal diversity in each plot would substantially simplify the design and the analyses and likely also reduce the unexplained variance observed in the study. However, this is nothing that can be changed now and has the likely explanation of feasibility constraints.

      Thank you for the positive comments on the paper and the understanding of the feasibility constraints. In our study, functional traits of all the seed families of the four species across all the species × genetic diversity combinations were sampled, but to reduce circularity, we used the seed-family means across all tree diversity combinations to calculate functional diversity for every subplot instead of only using the functional trait measures obtained in that particular subplot. We have taken up the suggestion to also calculate functional diversity based on trait measurements of individual trees, but also here used data across all plots to reduce circularity. Additionally, we now acknowledge the incomplete assessment of herbivory in the Methods and state that fungal diversity in plant species mixtures was sampled on plot level because of feasibility constraints.

      Lines 334–337: “To reduce circularity, we used the seed-family means across all species × genetic diversity combinations to calculate FDis values per subplot that did not only depend on the functional trait measures obtained in that particular subplot. Using traits measured in a particular subplot to calculate FDis for that subplot bears the risk that the measured traits reflect a response to the local environment, yet we want to use FDis as a predictor variable for the performance of that subplot.

      Lines 380–382: “The mean value of herbivore damage per species × genetic diversity level was used to fill in missing values in a few subplots with tree individuals lacking herbivory data (Table S3).

      Lines 385–388: “Soil fungal diversity was used as a proxy of unspecified trophic interactions. To be consistent with the species and genetic diversity treatment design, soil samples were taken on subplot level for the 1.1 and 1.4 diversity treatments, but, due to feasibility constraints, on plot level for the 4.1 and 4.4 diversity treatments in 2017.”

      The writing of the manuscript is generally good. However, given the somewhat diffuse results obtained for genetic diversity effects, they receive a lot of attention in the discussion, while species diversity effects are little mentioned. This could be better balanced and also referred back to the hypotheses. For example, I miss the discussion of the very clear hypothesis that genotype diversity effects are positive in species monocultures but neutral in species mixtures. How do your results fit with this hypothesis? My general impression is that the study is very well framed, but lacks to stick to this frame in the discussion. I am aware that this might be a challenge with the results obtained, but worth trying.

      Thank you for the positive comments on the writing and pointing out the unclear part of the genetic diversity effects. To better connect the discussion to our hypothesis that genotype diversity effects are “more important in species monocultures than in species mixtures” (lines 114–115), we have rewritten the corresponding Discussion section.

      Lines 248–164: “In contrast of our second hypothesis, we found that the effects of genetic diversity via functional diversity and multi-trophic feedbacks were negative in species monocultures but positive in the species mixture (Fig. 5 and Fig. S3). We found genetic diversity had positive effects on tree functional diversity and soil fungal diversity, which supports the trade-offs between genetic and species diversity discussed in the previous section. However, the hypothesized positive effects of tree functional diversity on productivity turned negative in species monoculture. This result indicates that functional diversity may not have positive effects on the ecosystem functioning under low environmental heterogeneity, i.e. species monocultures in our study (Hillebrand and Matthiessen 2009). Therefore, our findings show that the different effects of genetic diversity on tree productivity between species monocultures and mixtures, not only depend on the different effects of genetic diversity on functional diversity and trophic interaction but also on the varied tree productivity consequences from functional diversity and trophic interaction on tree productivity between species monocultures and mixtures. Moreover, other aspects of tree genetic diversity seem to play an important role not only for productivity in tree species mixtures (see previous section) but also for productivity in tree species monocultures. These may include unmeasured functional traits such as root traits (Bardgett et al., 2014) or unknown mechanisms underpinning effects of tree genetic diversity.

      Given the complex results obtained, I also feel that the title and main message received in the abstract do not fully reflect the results. Genetic diversity effects on productivity, but also on herbivory and fungal diversity, are not general (e.g. Fig. 2) nor are all genetic diversity effects on productivity mediated by functional diversity and trophic feedback. I think the title and main message of the study should be articulated more precisely.

      In this study we did not find direct effects of genetic diversity on tree productivity in the binary analyses (Fig. 2), but we did find indirect effects of genetic diversity on tree productivity via functional diversity and trophic feedbacks in the path analysis (Fig. 4). Now we have pointed this out in the Discussion.

      Lines 201–204: “Although only species diversity but not genetic diversity was found to affect tree productivity in binary analyses, both kinds of diversity positively affected tree community productivity and trophic interactions via functional diversity according to our structural equation models (SEMs) depicted in the corresponding path-analysis diagrams (see Fig. 4).

      We agree that not all genetic diversity effects on productivity were mediated by functional diversity and trophic feedbacks. This may have been because we did not include all relevant functional traits and trophic interactions in this study. Nevertheless, our findings support the hypothesis that genetic diversity can affect productivity via functional diversity and trophic feedbacks and suggest more possibilities for further research. We have explained this in the Discussion.

      Lines 230–238: “Even after accounting for tree functional diversity and trophic feedbacks, we still detected a direct negative effect of tree genetic diversity on tree productivity, while the direct effect of tree species diversity was fully explained by functional diversity and trophic feedbacks. This suggests that aspects of genetic diversity that do not contribute to functional diversity or trophic interactions as measured in this study may reduce ecosystem functioning, e.g. due to trade-offs between genetic diversity and species diversity. For example, it has been shown that in species-diverse grassland ecosystems, niche-complementarity between species can increase at the expense of reduced variation within species (van Moorsel et al., 2018; van Moorsel et al., 2019; Zuppinger-Dingley et al., 2014; Zvereva et al., 2012).

      Lines 260–264: “Moreover, other aspects of tree genetic diversity seem to play an important role not only for productivity in tree species mixtures (see previous section) but also for productivity in tree species monocultures. These may include unmeasured functional traits such as root traits (Bardgett et al., 2014) or unknown mechanisms underpinning effects of tree genetic diversity.”

      Reviewer 2

      This study aims to disentangle the contributions of genetic and species diversity to tree community fitness. It confirms the role of genetic diversity in functional and ecological traits but shows how these effects change when plant species diversity is increased, which can potentially add to our understanding of the interplay between plant diversity at various levels and community and ecosystem functions. It would be desirable to make emphasis whether differences between the effects of genetic and species diversity are comparable since they can act at complementary but different levels. It is hard to establish whether the effects of species diversity override the effects of genetic diversity by shared mechanisms; or whether a high species diversity reduces plant intraspecific interactions and the consequent effects of genetic diversity by density-dependent effects. However, this point has to be emphasized in the discussion.

      Thank you for your positive comments on this paper. In the binary analyses in this paper, we used general linear mixed-model analysis to detect the effects of genetic diversity within species. Now we have clarified this in the Methods and the Results section. However, in Fig. 2 we also indicate the significance of the main effect of genetic diversity. We do not focus on this because of the interaction between species and genetic diversity. In statistical terms, fitting genetic diversity effects separately for species monocultures and mixture (2 degrees of freedom) is equivalent (i.e. has the same sum of squares) as fitting the main effect of genetic diversity (1 degree of freedom) and the interactions species x genetic diversity (1 degree of freedom).

      Lines 415–424: “To determine how species and genetic diversity and their interaction affected tree functional diversity and trophic interactions, linear mixed-effects models (LMMs) were fitted with two types of contrast coding. In the first, we used the ordinary 2-way analysis of variance with interaction and in the second we replaced the genetic diversity main effect and the interaction with separate genetic diversity effects for species monocultures and the species mixture (Table S6). Note that as our design was orthogonal, fitting sequence did not matter in either of the codings. However, we focused our major analysis on the second type of coding to make it consistent with our hypotheses. Main effects of genetic diversity are presented in inset panels in Fig. 2. Our second contrast coding ensured that we tested the effects of genetic diversity separately in species monocultures and species mixture, but within the same analysis.

      Lines 120–121: “Using linear mixed-model analyses, we tested the effects of species diversity and genetic diversity within species on trophic interactions and community productivity.

      Meanwhile, to emphasize that species diversity and genetic diversity could affect each other, we discussed that the trade-offs between species and genetic diversity could contribute to the effects of tree diversity on tree community productivity. We also discussed that the different effects of genetic diversity between species monocultures and mixtures may occur because different biotic environments resulted from different species diversity.

      Lines 232–238: “This suggests that aspects of genetic diversity that do not contribute to functional diversity or trophic interactions as measured in this study may reduce ecosystem functioning, e.g. due to trade-offs between genetic diversity and species diversity. For example, it has been shown that in species-diverse grassland ecosystems niche-complementarity between species can increase at the expense of reduced variation within species (van Moorsel et al., 2018; van Moorsel et al., 2019; Zuppinger-Dingley et al., 2014; Zvereva et al., 2012).

      Lines 250–260: “We found genetic diversity had positive effects on tree functional diversity and soil fungal diversity, which supports the trade-offs between genetic and species diversity discussed in the previous section. However, the hypothesized positive effects of tree functional diversity on productivity turned negative in species monoculture. This result indicates that functional diversity may not have positive effects on the ecosystem functioning under low environmental heterogeneity, i.e. species monocultures in our study (Hillebrand and Matthiessen 2009). Therefore, our findings show that the different effects of genetic diversity on tree productivity between species monocultures and mixtures, not only depend on the different effects of genetic diversity on functional diversity and trophic interaction but also on the varied tree productivity consequences from functional diversity and trophic interaction on tree productivity between species monocultures and mixtures.

      The experimental design has to be explained in more detail, in particular how plants were planted in the species monocultures. It is not stated whether the same or different species were used in the plots or in subplots. The design lacks proper replication for the treatment with high genetic diversity in species monocultures (n=2) which could lead to a biased result, especially if those plots were located in the same area.

      Thank you for the valuable comments on the experiment design. In total, we used four species and eight seed families per species for the whole experiment, and now we have added a diagram of the experimental design to the supplementary material (Fig. S5) to show the species and seed-family information for every subplot. Furthermore, we have added a table to the supplementary material to indicate the occurrence time of each species and each seed family across all the tree diversity-treatment combinations (Table S2). The high genetic diversity in species monoculture (1.4 treatment) was replicated 2 times per species and thus had 8 replications (Fig. S5). However, because we did not have enough seedlings, we could only establish these treatments at subplot level and thus put the different species for the 1.4 treatment into only two plots. Now we have added more explanation of the plot design in the Methods part. The plot distribution was completely randomized across the experimental site and plots of the same treatments were mostly located at least 50 m from each other (see Fig. 1 from Bongers et al., 2020, pasted here further below). The reason that there are more plots for the 1.1 treatment is that typically in biodiversity experiments more plots are needed at the lowest diversity treatment because of the desire to have all seed families occurring in any mixture also present as monoculture. Regarding the point that the four diversity treatments varied between rather than within plots, we ensured that diversity effects were tested at the plot level by including plot as random-effects term in the mixed models.

      Lines 305–323: “For each of the four species, we collected seeds from eight mother trees to allow for two replications of four-family mixtures per species. Furthermore, to avoid the effects of unequal representation of particular seed families and correlations between seed family presence and diversity treatments, we made sure that every seed family occurred the same number of times at each diversity level (see Table S2, small deviations from the rule were required where not enough seeds from a seed family could be obtained). Due to budget limitations and the number of replicates required per single seed family, the 1.1 and 1.4 diversity treatments were applied at subplot level (0.25 mu) and replicated 32 and 8 times, respectively. The 4.1 and 4.4 diversity treatments were applied at plot level (1 mu) and were replicated 8 and 6 times, respectively (Fig. S5; see also Fig. 1 in Bongers et al., 2020). To allow for simpler analysis, we obtained most community measures at subplot level also for the 4.1 and 4.4 diversity treatments and thereafter used the subplots for all tests of diversity effects on these community measures, including plots as error (i.e. random-effects) term for testing the diversity effects in the corresponding mixed models. In total, because one 1-mu plot could not be established due to logistic constraints, the number of subplots used was 92 (32 subplots of 1.1, 8 subplots of 1.4, 28 subplots of 4.1 and 24 subplots of 4.4 diversity treatment). Note that in biodiversity experiments lower richness levels represent more different communities and thus require more plots. For the highest richness level, where there is typically only one species composition, this same community is typically replicated multiple times, as we did here for the 4.4. diversity treatment.

    1. Author Response

      Reviewer 1

      Employing in vitro and Drosophila model, the authors interrogate which domain of Hsp27 binds to which region on Tau, and how these interactions facilitate the proteinaceous aggregation. They utilized various biochemical, biophysical, cellular, and genetic tools to dissect the association, and identified the structural basis for the specific recognition of Hsp27 to pathogenic p-Tau. Conceivably, Hsp27 may play some role in preventing Tau abnormal aggregation and p-Tau pathology in AD. Overall, the data support the main claim, especially, the biophysical data are very impressive. Nevertheless, the manuscript could be strengthened by complementary cellular or biochemical methods for validation. For example, the authors can use a stably transfected Tau cell line to interrogate Hsp27's role in its cellular aggregation or proteinaceous inclusions by immunoblotting. Immunofluorescent and immunohistochemical staining and IB with different antibodies may be conducted to validate the observations.

      REPLY: We sincerely thank the reviewer for the positive assessment of our work, and for providing very insightful suggestions. We appreciate the reviewer for considering our biophysical data to be impressive. We totally agree with the reviewer that the work could be strengthened by complementary cellular methods for validation. In our work, we used the Drosophila tauopathy model, where expression of human TauR406W in the Drosophila nervous system leads to age-dependent neurodegeneration recapitulating some of the salient features of tauopathy in FTDP-171,2, to interrogate the role of Hsp27 in aggregation and proteinaceous inclusions of pTau.

      In our Drosophila Tau model study, three different antibodies including a total Tau antibody 5A63, a pTauSer262 specific antibody4, and a hyper-phosphorylated Tau antibody AT8 that recognizes hyper-phosphorylation of Tau at Ser202 and Thr205 sites5 were used in western blot analysis to explore the role of Hsp27. As shown in Figure R1-1A and 1B, overexpression of Hsp27 significantly reduced the level of both pTauSer262 and hyper-phosphorylated Tau at both 2 and 10 days after eclosion (DAE). In addition, we further examined the morphology of the fly brain as well as the accumulation of hyper-phosphorylated Tau by immunofluorescence staining. Consistent with previous findings, brains with neuronal expression of TauR406W exhibited an accumulation of filamentous pTau and a reduction of brain neuropil size indicative of neurodegeneration (Figure R11C-F). Importantly, overexpression of Hsp27 restored the size of brain neuropil and suppressed the accumulation of filamentous pTau (Figure R1-1C-F), suggesting that Hsp27 protects against mutant TauR406W - induced neurodegeneration. Taken together, our Drosophila results show that Hsp27 protects against synaptic dysfunction in a Drosophila tauopathy model by reducing pTau aggregation, which well supports our biophysical data.

      Figure R1-1 Hsp27 reduces pTau level and protects against pTau-induced synaptopathy in Drosophila. (This figure represents Fig. 2A-F in the revised manuscript) (A) Brain lysates of 2 and 10 days after eclosion (DAE) wild-type (WT) flies (lanes 1 and 6), flies expressing human Tau with GFP (lanes 4 and 9), or human Tau with Hsp27 (lanes 5 and 10) in the nervous system were probed with antibodies for disease-associated phospho-tau epitopes S262, Ser202/Thr205 (AT8), and total Tau (5A6). Actin was probed as a loading control. Brain lysates of flies carrying only UAS elements were loaded for control (lanes 2, 3, 7, and 8). (B) Quantification of protein fold changes in (A). The levels of Tau species were normalized to actin. Fold changes were normalized to the Tau+GFP group at 2 DAE. n = 3. (C) Brains of WT flies or flies expressing Tau+GFP or Tau+Hsp27 in the nervous system at 2 DAE were probed for AT8 (heatmap) and Hsp27 (green), and stained with DAPI (blue). Scale bar, 30 μm. (D-F) Quantification of the Hsp27 intensity (D, data normalized to WT), brain optic lobe size (E), and AT8 intensity (F, data normalized to the Tau+GFP group). n = 4.

      Reviewer 2

      Abnormal accumulation and aggregation of amyloid-β protein are one of the main pathological hallmarks of Alzheimer's disease. It is well known that molecular chaperones play central roles in regulating tau function and amyloid assembly in disease. In this manuscript, Zhang, Zhu, Lu, Liu, et al., have investigated that Hsp27, a member of the small heat shock protein, specifically binds to phosphorylated Tau, which prevents pTau fibrillation in vitro and in a Drosophila tauopathy model. Using NMR spectroscopy and cross-linking mass spectrometry, the authors found that the N-terminal domain of Hsp27 directly binds to phosphorylation sites of pTau. Overall, the study is important and provides the demonstration of interactions between Hsp27 and pTau.

      REPLY: We sincerely thank the reviewer for the positive remarks of this work, and appreciate that the reviewer summarizes the major conclusions of our manuscript, and evaluates our work is important in the area of fundamental biology of the interaction between chaperones and clients, and its implications in AD pathology.

    1. Author Response

      Reviewer 2

      The manuscript by Huisjes et al presented an open-source platform for the storage and processing of imaging data, particularly for single-molecule imaging experiments. Compared to sequencing data, which have a more standardized format for data storage, imaging data have more diverse formats due to the fact that different research labs tend to use different instruments and software (either commercial or home-built) for data collection and analysis. Manual input is almost always necessary at certain steps of data analysis. All these create difficulties in data storage and reproducibility. The authors provide a practical solution to this problem by the molecular archive suite, "Mars". This platform is integrated into imageJ/Fiji, and can be used for storing detailed description of experimental settings, performing standard imaging processing steps, and recording manual input information during data analysis. I judge this platform, if fully functional and generalizable, will be very useful to many labs who are using single-molecule imaging methods in the research.

      Strength:

      1. The work presented a fairly user friendly interface (using Fiji directly), and fairly detailed protocol and other documentations in a very nicely designed website. I was able to download and use it based on the tutorial.

      2. It is integrated very well with Fiji, and some analysis modules are directly from existing Fiji analysis/plugins.

      Weakness:

      I invited one of my students to co-test the suite. We tried on both Mac and Windows systems, using the example FRET data set described in the manuscript and one of our own single-molecule images. We encountered some technical issues.

      We are very happy with the overall positive assessment of the reviewer that Mars could offer a common format that helps to enforce reproducible analysis workflows that can easily be shared with others.

      We are grateful for the additional feedback and testing done by the reviewer and her student. Ensuring that Mars works as expected on all computers and configurations is difficult given that we don’t have them at hand for testing ourselves. During the revision period, we have done more testing on more computer systems and we hope we have addressed the issues. We believe it will be impossible for us to guarantee that Mars works without problems on the first try for everyone. Therefore, Mars is a community partner on the Scientific Community Image Forum where users can report their problems in posts with the mars tag and we can help troubleshoot them (https://forum.image.sc/tag/mars). We believe this approach will offer the best support going forward. Nevertheless, we continue to make improvements and test to make sure all bugs we discover are addressed.

      In the revision, we completely reworked the smFRET example workflow and added two additional workflows to address all the comments from the reviewers and reviewing editor. In addition to expanding the explanations, and troubleshooting information on the Mars documentation website, we also created a YouTube channel with tutorial and example videos (https://www.youtube.com/channel/UCkkYodMAeotj0aYxjw87pBQ). We go through the new dynamic smFRET workflow from start to finish in one of the videos provided (https://www.youtube.com/watch?v=JsyznI8APlQ). We hope this will make it clear what inputs and outputs are expected and how the workflow should proceed. This was done on a mac but we have also tested this workflow on windows without encountering problems.

    1. Author Response

      Reviewer 1

      This article creates a formal definition of the 'informativeness' of a randomized clinical trial. This definition rests upon four characteristics: feasibility, reporting, importance, and risk of bias. The authors have conducted a retrospective review of trials from three disease areas and reported the application of their definition to these trials. Their primary finding is that about one quarter of the trials deemed to be eligible for assessment satisfied all four criteria, or, equivalently, about three quarters failed one or more of their criteria. Notably, industry‐sponsored studies were much more likely to be informative than nonindustry‐sponsored studies. It would be interesting to see a version of Figure 3 that categorizes by industry/non‐industry to see the differences in fall‐off between the four criterion.

      Thank you for this suggestion. We have added an additional figure to the supplement, eFigure 1 ‐ The Cumulative Proportion of Trials Meeting Four Conditions of Informativeness by Sponsor.

      We have also indicated the following in the legend of Table 1: (as related to study sponsor)

      Lines 332 – 334 “Included within the designation “Other” are 7 trials that received funding from the U.S. National Institutes of Health (NIH) or other U.S. Federal agencies, and 60 trials that are nonindustry and non‐NIH/U.S. Federal agency funded.”

      As the authors point out, the key limitations to this work are its inherent retrospective nature and subjectiveness of application, making any sort of prospective application of this idea all but impossible. Rather, this approach is useful as a 'thermometer' for the overall health of the type of trials satisfying the eligibility criteria of this metric. A secondary and inherent limitation of this measure is the sequential nature of the four criteria: only among the trials that have been determined to be feasible (the first criterion measured) can one measure reporting, importance, and lack of bias. And only among those trials that are both feasible and reported properly can one measure their importance and lack of bias, and so forth. Thus, except for feasibility, one cannot determine the proportion of all trials that were properly reported, were importance, or evinced lack of bias.

      “Thermometer” is an apt metaphor. Please see response to Essential Revisions # 4 regarding the retrospective nature of our assessment.

      The sequential nature of our assessment is indeed a limitation for readers wanting to know the fraction of trials fulfilling each of the four criteria. This reflects a compromise between our aspirations and our labor capacity. However, we emphasize that our pre‐specified primary outcome was the fraction of trials fulfilling all informativeness criteria. We have also elaborated upon the following in our limitations section:

      Line 521 – 533 “Third, we used a longitudinal and sequential approach, since some of the conditions were only relevant once others had been met. For example, incorporation into a clinical synthesizing document can only occur once results have been reported. Our sequential approach enabled us to address our primary outcome with an economy of resources. However, our study does not enable an assessment of the proportion of trials fulfilling three of the four criteria in isolation from each other. In addition, changes in research practices or policy occurring over the last decade might produce different estimates for the proportion of randomized trials that are informative.

      Reviewer 2

      The authors present a systematic review of 125 trials (in three disease areas: ischemic heart disease, diabetes mellitus and lung cancer) available on clinicaltrials.gov, with the goal of estimating how often clinical trials result in a meaningful impact on clinical practice (or policy or research decisions). This is a very interesting and important question which, if not approached carefully, could lead to results that are misleading and/or difficult to interpret.

      Thank you!

      To help reduce the potential for misleading results, the authors employed sensible criteria for inclusion of trials in this analysis (with trials being independently evaluated by multiple authors to determine whether they should be included). Once trials were selected, they had to be classified as "informative" or not. While such classification is, by definition, subjective, the authors attempted to make this process as objective as possible. They proposed a definition of "informative" based on four factors: feasibility (of achieving the target enrollment/completing the trial in a timely fashion), reporting (of results; either on clinicaltrials.gov or in a publication), importance (of the clinical question being addressed) and the quality of the design. As with the evaluation of trial inclusion, the authors independently evaluated each trial to determine informativeness.

      The authors provide a thorough discussion of key issues that could affect the interpretability of a trial and include a nice discussion of the limitations of their research. To me, the major limitation of this analysis (which the authors acknowledge) is that "clinically interesting/informative" is subjective. It is possible that their criteria will miss informative trials (or classify truly non‐informative trials as informative). For example, while perhaps uncommon, a trial could be classified as non‐informative due to poor design selection, but could end up being truly informative due to overwhelmingly positive results. Also, the "importance" component of their classification criteria could lead to truly important "niche" trials being misclassified as non‐informative.

      Thank you for this assessment. We agree that our measures of informativeness are imperfect. We hope that we have been forthright in the limitations of our approach. We believe that our first study limitation (lines 498 ‐ 513) outlines the issues highlighted above.

      Reviewer 3

      The paper describes an ingenious and painstakingly reported method of evaluating the informativeness of clinical trials. The authors have checked all the marks of robust, welldesigned and transparently reported research: the study is registered, deviations from the protocol are clearly laid out, the method is reported with transparency and all the necessary details, code and data are shared, independent raters were used etc. The result is a methodology of assessing informativeness of clinical trials, which I look forward to use in my own content area.

      Thank you!

      My only reserve, which I submit more for discussion than for other changes, is the reliance on clinicaltrials.gov. Sadly, and despite tremendous efforts from the developers of clinicaltrials.gov (one of the founders is an author of this paper and I am well‐aware of her unrelenting work to improve reporting of information on clinicaltrials.gov), this remains a resource where many trials are registered and reported in a patchy, incomplete or downright superficial and sloppy manner. For outcome reporting, the authors compensate this limitation by searching for and subsequently checking primary publications. However, for the feasibility surrogate this could be a problem. Also, for risk of bias, for the trials the authors had to rate themselves (i.e., ratings were not available in a high‐quality systematic review), what did the authors use, the publication or the record from the trial registry?

      Thank you. We agree that the sources of data that we relied on for this assessment are imperfect.

      We added the following to our limitations section:

      Lines 534 ‐ 535 “Fourth, our evaluation is limited by the accuracy of information contained in the ClinicalTrials.gov registration record...

      We also added the following sentence to clarify the sources of our risk of bias assessment in eMethods 9:

      Lines 996 – 998 “Information from both the primary study publication and the ClinicalTrials.gov registration record were used in our risk of bias assessments.

      In general, it seems like a problem for this sophisticated methodology might be the scarcity of publicly available information that is necessary to rate the proposed surrogates. Though the amount of work involved is already tremendous, the validity of the methodology would be improved by extracting information from a larger and more diverse pool of sources of information (e.g., protocols, regulatory documents, sponsor documents).

      In that sense, maybe it would be interesting for the authors to comment on how their methodology would be improved by having access to clinical trial protocols and statistical analysis plans. Of course, one would also need to know what was prospective and what was changed in those protocols, i.e., having protocols and statistical analysis plans prospectively registered and publicly available. Having access to these documents would open interesting possibilities to assessing changes in primary outcomes, though as the authors say that evaluation would also require making a judgement as to whether the change was justified. Relatedly, perhaps registered reports could be a potential candidate for clinical trials that would also support a more accurate assessment of informativeness, per the authors' method, provided the protocol is made openly available.

      Still related to protocols, were FDA documents consulted for pivotal trials, which again could give an indication of the protocol approved by the FDA and subsequent changes to it?

      We appreciate this comment and suggestion! And thanks for acknowledging the work it took to derive our estimates. Please see our full response to Essential Revision # 2 above.

    1. Author Response

      Reviewer 2

      The authors use the model of polyamine attraction and build on their previous observation that mated Drosophila females show increased attraction to polyamines that is outlasting a short term modulation of olfactory sensory neurons. Females do not require exposure to seminal fluid or sperm and do not need to start to ovulate for this change in preference. This is remarkable since the vast majority of female postmating changes in behavior have been shown to rely on sex peptide in the male seminal fluid. It also sets an exciting starting point for the present work, suggesting new mechanisms of how a female can adjust their behavior to mating state.

      We thank the reviewer very much for their encouraging comments.

      The authors find that females have to be able to smell odors detected by the Or system (but not polyamines) during mating in order to change their preference for polyamine.

      Mushroom body Kenyon cells are required during mating and during choice behavior for the polyamine preference of mated females. Activation cVA responsive PAM-b1 neurons of the mushroom body is sufficient to replace the mating experience and change polyamine preference in virgin flies. Activation of the same neurons during mating abolishes the preference development. Other specific mushroom body neurons are required during the choice behavior to promote attraction in mated females or repress it in virgins. Calcium imaging of different mushroom body neurons does not uncover a clear difference in polyamine response between mated and virgin flies. Connectome mining and genetic silencing further indicates that circuit motifs in the lateral horn are also involved in the response to polyamines and might interact with mushroom body circuits.

      While the exact circuits and mechanisms of plasticity that explain the change in postmating preference of polyamines remain to be discovered, this work makes substantial progress in identifying neurons that have a strong impact on development or expression of the preference. It is an exciting paradigm that invites further research.

      This work explores a very interesting example of state-dependent behavioral change in Drosophila. Previously, state dependent changes in sensory neurons have been demonstrated- here, the authors tackle the experimentally much more challenging task of identifying changes in higher order processing areas. The data suggests that polyamine attraction is encoded by a recurrent network of mushroom body neurons. Although the authors do not demonstrate an exact mechanism by which mating/male exposure reconfigures this polyamine attraction network, they have made a substantial advance for our understanding how odor valence is encoded in a flexible and experience dependent way by identifying and characterizing the neuronal players and their roles in induction and expression of preference behavior.

      Their experimental paradigm is special in that it is not a case of classical odor reward learning (mating could be the rewarding experience, but polyamine odor does not have to be present during mating to induce preference). It is also special in that it is a case of long lasting mating induced behavior change that is not dependent on sex peptide or other male seminal fluid proteins. The paradigm has thus great potential to uncover novel mechanisms of encoding experience and adaptively changing behavior.

      Reviewer 3

      Mating changes behavior of female fruit flies. Authors previously reported that putrescine-rich foods increase number of progenies per mated female and mated females detect putrescine with IR76b and IR41a and are attracted to putrescine odor (Hussain, Zhang et al., 2016). In another paper, authors reported that this change of putrescine preference is mediated by sex peptide receptor (SPR) and its ligand, myoinhibiotry peptides (MIPs; Hussain, Ucpunar et al., 2016). In yet another paper, authors reported that two types of dopaminergic neurons (DANs) which innervate alpha prime 3 (a'3) or beta prime 1 (b'1) compartment of the mushroom body (MB) show enhanced response to cVA, the male sex pheromone 11-cis-Vaccenyl acetate (Siju et al., 2020). The present study investigated neural circuits that potentially link these observations.

      The authors first showed that putrescine-attraction in mated females is sustained over 7-days, which cannot be explained by SPR-MIP dependent mechanism that disappears in one week. Then they explored a factor that is transferred from males during copulation and required for putrescineattraction in mated females. They found that blocking synaptic transmission of cVA-sensitive OR67d olfactory receptor neurons during 24 hour period of pairing with males reduces putrescineattraction 3-5 days later (Figure 1). On the other hand, experiments with mutant flies lacking ability to generate eggs or sperms indicated that fertilization is not essential for the change in odor preference. In a proposed scenario, cVA transferred to the female during copulation activates DANs projecting to the b'1 and that in turn induces a shift in how the MB regulates the expression of polyamine odor preference, possibly by alternating activity of MB output neurons (MBONs) in the beta prime 2 (b'2) compartment.

      Some data are in line with this scenario. Blocking synaptic transmissions of Kenyon cells during mating or odor preference test reduced attraction to putrescine (Figure 2). Activation of dopaminergic neurons projecting to the beta prime 1, gamma 3 and gamma 4 in virgin females promoted attraction to putrescine when tested 3-5 days later (Figure 3). Flies expressing shibire ts1 in the MBONs in the b'1 compartment showed reduced putrescine preference when females were mated at restrictive temperature (Figure 4). Using calcium imaging and EM connectome, authors also found candidate lateral horn output neurons that may mediate putrescine signals from olfactory projection neurons to the b'1 DANs.

      This study utilized molecular genetic tools, behavioral experiments and calcium imaging to comprehensively investigate neural circuits from sensory neurons for cVA or putrescine to the learning circuits of the MB. Addressing points detailed below will strengthen a causal link between enhanced cVA response in beta prime 1 DANs and enhanced putrescine preference in mated females.

      1) The MB is the center for olfactory associative learning. It is not so surprising that 24-hour long activation of any MB cell types have long-term consequence on fly's odor preference. As authors showed in Hussain et al., 2016 and Figure S1, mated females change preference to polyamines but not ammonium. Therefore, it is important to show odor specificity of the circuit manipulations to claim that phenomenon in mated females are recapitulated by each manipulation. Wang et al., 2003 (DOI:https://doi.org/10.1016/j.cub.2003.10.003) reported that blocking a broad set of Kenyon cells impairs innate odor attraction to fruit odors and diluted odors but not repulsion.

      We very much appreciate the thorough comments of this reviewer. We have carried out the experiments suggested in the editor’s summary. Due to time and people limitations encountered by the lab’s move during the week of July 11, we were forced to prioritize the number and type of experiments we carried out for this revision.

      We also agree that the change in odor preference due to manipulation of KCs during test is not a very surprising result. We do, however, strongly believe, that the result we received with the inhibition of KCs during mating is not expected. Previous studies using associative learning paradigms suggested that KCs are not essential during learning but only during test:

      • McGuire, S. E., Le, P. T. & Davis, R. L. The Role of Drosophila Mushroom Body Signaling in Olfactory Memory. Science 293, 1330–1333 (2001).

      • Schwaerzel, M., Heisenberg, M. & Zars, T. Extinction Antagonizes Olfactory Olfactory Memory at the Subcellular Level. Neuron 35, 951–960 (2002).

      • Dubnau, J., Grady, L., Kitamoto, T. & Tully, T. Disruption of neurotransmission in Drosophila mushroom body blocks retrieval but not acquisition of memory. Nature 411, 476–480 (2001).).

      Only a very recent study (currently only on BioRXiv, Pribbenow et al. 2022 (https://www.biorxiv.org/content/10.1101/2021.07.01.450776v2) showed that KC output is required during appetitive training suggesting that postsynaptic plasticity in KCs is needed to establish appetitive memories.

      These new findings are in line with our results given that the KCs are likely providing the odor input to DANs and MBONs. We have included a paragraph in the discussion section.

      2) Requirement of PAM-b'1 DANs for putrescine-attraction in mated females should be demonstrated. The authors suggested existence of alternative mechanisms that may mask requirement of PAM-b'1 (Figure 3B). In a previous study, the authors reported SPR-dependent mechanism. I suggest testing the requirement of PAM-b'1 DANs in SPR mutant background or oneweek after mating when SPR-dependent effect on sensory neurons disappear.

      Please see response above to point 4 of the editorial summary. SPR mutants do not undergo the switch in polyamine odor preference. Therefore, SPR signaling likely presents this compensatory mechanism. Nevertheless, MBON-β’1 is required during mating for the transition from virgin to mated female behavior. In the future, we plan to analyze the relationship between SPR and this MBON in detail.

      3) Activation phenotype of MB188B-split-GAL4/UAS-dTrpA1 cannot be ascribed to activation of PMA-b'1 alone because of additional expression in DANs projecting to gamam3 and gamma4 compartments. Run the same experiment with more PMA-b'1 specific driver line.

      Please see response to point 3 of the editorial summary. We do observe a very high preference in virgins with the genetic background MB025B>TrpA1 even in the absence of temperature-mediated activation. Therefore, the experiment, unfortunately, provided no meaningful result. We have instead adjusted the text to include a possible role of γ3 and γ4.

      4) Some of EM connections are too low to be considered (e.g. two in Figure S3 and five in Figure 5). Although these connections could be functional, previous EM connectome analysis typically set much higher threshold (e.g. 10 in Hulse et al., 2021 DOI: 10.7554/eLife.66039) to avoid considering artifacts.

      We thank the reviewer for pointing this out. We have included this reference and the customary threshold of 10 in the methods section.

      5) Data for Kenyon cells (Figure 2) and LHON (Figure 6) are interesting, but not directly related to other data regarding PAM-b'1 and MBON-b'1. Due to lack of long-term changes in MBON's odor responses in mated females (Figure 5), it is unclear what information needs to be read out from Kenyon cells and how does it affect processing of putrescine signals potentially carried by LHAD1b2.

      We agree. In the revised version of the manuscript, we now show that LHAD1b2 neurons appear to undergo a change upon mating. Please see response to the editor’s summary, point 6.

      Kenyon cell output during mating could be required for odor input and odor (i.e. cVA)-mediated activation of MBONs and DANs involved. This would be in line with our data in Fig. 1D,E where we show that ORCO and OR67d OSNs are required during mating to induce the change in behavior.

    1. Author Response

      Reviewer 1

      Comment 1

      Selecting appropriate Bioinformatics approaches to arrive at a consensus classification of SNVs can be labor intensive and misleading due to discordance in results from different programs. The authors evaluated 31 Bioinformatic or computational tools used for in silico prediction of single nucleotide variants (SNVs). They selected a filtered list of SNVs at the HBA1, HBA2, and HBB genes, and compared in silico prediction results with annotations based on evidence in literature and databases curated by an expert panel comprising coauthors of this study. They found both specificity and concordance among different tools lacking in certain aspects when thresholds are chosen to maximize the Matthews correlation (MCC) and thus proposed an improved strategy. For this, the authors focused on the top prediction algorithms and varied their decision thresholds separately for pathogenic and benign variant classification and optimized the predictive power of these tools by choosing thresholds that generated at least supporting strength likelihood ratios (LRs) to achieve balanced classification.

      The authors have likely spent significant effort annotating the list of pathogenic or benign SNVs in adult globin genes by iteratively evaluating independent annotations submitted by experts and arriving at a consensus. These annotations when added to the database of SNVs might improve the breadth of knowledge on the pathogenicity of adult globin SNVs and likely lead to an improved prediction by the existing tools. Further, setting non-overlapping thresholds for pathogenic and benign variants seems to improve the balance in the prediction of some of the tools (with certain tradeoffs) in the context of the gene and the variant class. This is consistent with the findings of Wilcox et. al., while at the same time the authors have focused on globin variants and compared many more programs. Thus, while not a novel approach, the scale is expansive, and might guide future studies with the improved ACMG/AMP framework.

      Response: We would like to thank the reviewer for the positive comments and for appreciating the potential impact of the manuscript.

      Comment 2:

      However, there are certain caveats from my perspective and these need to be explained or improved.

      • The authors' approach relies heavily on the revised consensus annotations which, from my understanding, is essentially being considered as a "truth dataset", whereas variants are classified in silico according to existing annotations in the databases. The binary classification metrics compare the in silico predictions to the authors' annotations and these showed low specificity but higher sensitivity and accuracy indicating that many benign variants were misclassified as pathogenic. The authors have not clearly mentioned whether the "observed_pathogenecity" information in the input dataset in supplementary file 2 is from the Ithagenes database or the authors' reannotations. Hence, if a significant number of pathogenic variants were reannotated as benign by the expert panel, that will likely result in the tools misclassifying them as pathogenic since the tools rely on database annotations.

      Response: The IthaGenes database did not provide an annotation about the clinical impact of each globin gene variant before this study. Nevertheless, as part of this study, an initial annotation was provided based on the information already annotated in the database and the pathogenicity criteria defined for this study (pg 7). This initial set was subsequently provided to the experts that proceeded with confirmation or reclassification of each variant’s pathogenicity using a Delphi approach. The final classification was used for the study and is now also included in IthaGenes. This process is now clarified in the Methodology (pg. 7-8; lines 145-147 & 155-160), while the initial and final classification are provided in the dataset file (Supplementary File 2; sheet “Input-Full dataset”). In addition, Supplementary Figure 1 illustrates the changes in pathogenicity annotation after the expert evaluation, demonstrating that annotation of benign variants to pathogenic and vice versa was minimal.

      We consider the final annotation as the best available evidence regarding the pathogenicity of globin gene variants due to the involvement of experts responsible for molecular diagnosis of haemoglobinopathies in five countries worldwide and the use of a Delphi approach for the variant pathogenicity.

      Comment 3

      • The results and measure of success focus on different benchmarks for the two major analyses the authors performed. While they generated a lot of data, they have not attempted to explore and present all facets of the data for each analysis. For instance, to assess the predictive power of the 31 tools initially, the authors focus on benchmark metrics for binary classification such as Accuracy, Sensitivity, Specificity and MCC. However, in the later improved approach, the focus is on LRs but the effect of separate thresholds for pathogenic and benign classification on accuracy, sensitivity, and specificity and MCC are not explored in the results instead just mentioning PPV for certain variant types, tools, and genes.

      Response: In the latter improved approach, we do not use a binary prediction, but instead we trichotomize the problem to define different thresholds for each pathogenicity class. Therefore, benchmark metrics like MCC are not informative as many of the variants are not classified (i.e., they are classified as VUS) due to the grey zone between the benign and pathogenic thresholds. The benign and pathogenic thresholds are derived from two different binary classifiers, with their corresponding binary metrics provided in Supplementary File 3. We have now included the sensitivity and specificity of the pathogenic and benign classifiers, respectively, in Table 2 and we discuss them in pg 15-16 (lines 338-343).

      Moreover, Table 2 presents the analysis on the full dataset and, also, in different subsets of the dataset, based on variant type (missense/non-missense) and globin genes.

      Comment 4

      • There is a general trade-off to altering thresholds to increase specificity which leads to reduced accuracy and sensitivity. Thus, in this case, the improved approach suggested by the authors increases specificity but there is a simultaneous reduction in accuracy and sensitivity thus leading to the potentially higher misclassification of pathogenic variants as benign. One has to consider then, whether this is ideal in the case of globins where an in silico misclassification of pathogenicity can be easily verified by subsequent diagnostic testing to confirm whether the variant actually affects hemoglobin. Overclassification of pathogenicity in the case of globins is thus not necessarily a major problem since they will not directly lead to patients receiving treatment before additional confirmatory tests. However, misclassification of pathogenic variants as benign will pose greater harm to individuals at risk of disease.

      Response: The reduction of accuracy in the improved approach is not due to the misclassification of pathogenic variants as benign, but instead due to misclassification of benign or pathogenic variants as VUS. The reviewer is correct the LR-based approach decreases the accuracy of the method, but at the same time it increases the confidence of a pathogenic or benign classification. This is the basis of the Bayesian ACMG/AMP framework. We further clarify this point on pg 17 (lines 384-386).

      Misclassification of benign or pathogenic variants as VUS will not have a significant impact on variant classification. However, we do not agree that overclassification of pathogenic variants is not a major issue, because many countries provide prevention programmes based on prenatal screening (see IthaMaps: https://ithanet.eu/db/ithamaps?hc=3). Therefore, a misclassified benign or VUS variant as pathogenic can lead to unnecessary abortions, preimplantation genetic diagnosis tests and can have devastating psychological and economic impact on lives of affected individuals

      Comment 5

      • This is a largely descriptive study of the performance of various programs, but the authors did not attempt to explain why according to them the various tools performed a certain way in their analysis. Thus, their rationale for proposing the improved approach of separate thresholds for pathogenic and benign variants was unclear. Attempting to understand whether there is a correlation between the type of data the tool uses, and its performance could explain the tools' prediction power and how to improve it. For instance, some of the tools are metapredictors that take as input scores from various other tools also tested in this study. Thus, there will be some redundancy in the final classification.

      Response: We thank the reviewer for the suggestion. We have now addressed this point at the revised manuscript (pages: 18 – 19, lines: 397-408 and 417-426) by adding two paragraphs discussing the low concordance rate between predictors, aligned with previous studies. We also discuss the main reasons for the superior performance observed for meta predictors.

      Comment 6

      • Expanding on the previous point, the reason for discordance in HBA genes but concordance in HBB was unclear. It might be a result of the bigger HBB dataset compared to HBA although the authors did not explore or mention whether the size of the dataset correlates with concordance. They also did not test for concordance or discordance after the separate thresholds were applied so it is not clear whether their proposed approach improves concordance for the HBA variant predictions of the top tools.

      Response: In the later improved approach, we have assessed the concordance of the top performing tools as shown in Figure 3B and 3C, but without grouping by gene. We have now added a heatmap of concordance in Supplementary Figure 4 which still shows higher concordance for pathogenic variants on HBB (also see pg , lines 360-364).

      The lower concordance that the tools exhibit for HBA1 and HBA2 can be explained by the fact that the pathogenicity of HBA1 and HBA2 variants almost always cannot be determined by the phenotype at the heterozygous state simply, as is the case for HBB, due to gene numbers (4 alpha alleles VS 2 beta alleles).

      This exacerbates the frequent confusion between the categorization of pathogenic at the gene level (reduced expression) versus the phenotypic level! (See MacArthur’s seminal paper MacArthur DG, et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014 Apr 24;508(7497):469-76. doi: 10.1038/nature13127. PMID: 24759409; PMCID: PMC4180223 – although you must be familiar with it). We have discussed this point in pg 18-19 (lines 409-416).

    1. Author Response

      Reviewer 1

      In this manuscript the authors set out to characterise the process of differentiation of inner cell mass cells within the mouse blastocyst into either epiblast or primitive endoderm, which is a binary fate choice, using various models. To this end, they made use of well-established reporter cell lines previously generated in their lab as well as a widely used fluorescent system (FUCCI) that allows stages in the cell cycle to be visualised and sorted. The experimental output was compared with computational models and published data generated from mouse embryos during the process of primitive endoderm and epiblast segregation. Their data uncovered interesting mechanistic insight into the dynamics of the cell cycle and how these correlate with lineage choice and amplification. The methods have been carefully considered and validated in previous work by the group and the analysis is thorough. The single cell profiling is particularly well presented, and backed up by immunofluorescence data using well-characterised lineage reporters with appropriate statistical analysis. Probably the most interesting finding, which the authors identify as unexpected, is the considerable lengthening of the G1 part of the cell cycle in cells differentiating into PrE, but coinciding with a reduction in overall cell cycle length. Also, cell cycle length from mother to daughter cells in all conditions appears not to be inherited, yet sister cells, and to a lesser extent, cousins, appear to retain similar cell cycle dynamics. This feature is attributed to differential levels of FGF, suggested by the use of PD03 or PD17 as downstream inhibitors. Not surprisingly, levels of the PrE-associated factor Hex could predict the likelihood of differentiation to PrE, but also higher levels of Hex correlated with a shorter cell cycle. Also, blocking MEK/ERK signalling increased cell cycle duration as well as reducing differentiation to PrE in the culture conditions designed to promote differentiation to epiblast. The aims of the paper appear to be achieved and the results adequately support the authors' conclusions. A similar system to the one established here could be envisaged for downstream developmental processes, such as those involving binary decisions for specific tissue formation in organogenesis, but would require the generation and validation of different reporter cell lines.

      We thank this reviewer for their support of our manuscript.

      Reviewer 2

      In this paper, the authors show that the maintenance of pluripotent mouse stem cell cultures and their conversion towards primitive endoderm relies on selective effects of specific culture media that act on the survival and cell cycle properties of different primed subpopulations. They further demonstrate that FGF/ERK signaling underlies correlations between cell cycle length in daughter cells, and identify characteristic, lineage-specific differences in relative G1 length as cells differentiate along the endodermal lineage that is recapitulated in vivo. The study is based on a technically challenging combination of long-term time-lapse imaging with reporters for cell cycle and lineage priming and delivers new insight into the old question of whether extracellular signals regulate differentiation in cell populations by selection or by induction.

      A central conclusion of the study, that media conditions control differentiation through cell cycle regulation, is based on the analysis of time-lapse imaging of cells in PrE differentiation and pluripotency conditions. Even though the authors acknowledge that cell death contributes to population composition, the manuscript mainly talks about changes to the cell cycle. This focus on the cell cycle does not do the data justice, especially in the context of PrE differentiation: The time-lapse movie shows that there is massive cell death from 24 h onwards, to a degree that there is no net growth of the population but rather a decline in cell numbers over the course of the experiment. This impression is supported by the lineage trees, where the NEDiff cells appear to selectively die, rather than being outcompeted by PrE cells. Thus, while cell cycle regulation clearly contributes to differentiation at the population level, it remains an open question how important this effect is in the chosen differentiation paradigm, compared to selective effects that act through cell survival.

      • We acknowledge that there is a large amount of cell death in the beginning of differentiation and believe this could be a response to changes in media and cytokines. This is also observed in other differentiation and reprogramming protocols (Ying and Smith 2003; Hayashi et al. 2011; Yasunaga et al. 2005; Argelaguet et al. 2019; Rugg-Gunn 2022).

      • While our original analyses were focused on cell cycle, we did not mean to imply that survival is irrelevant to selecting the correct populations at the end of differentiation. Although there is clearly an increase in NEDiff cell death, these rates fluctuate during differentiation (Table S2) and it is difficult to make any hard conclusions about the timing of selection. In addition, we find no signature for apoptosis in all scRNA-seq clusters associated to early NEDiff. However, we thank the reviewers for raising this issue and we address the issue of survival more extensively in the revised manuscript (lines140-142, and 342-347, in addition to a new modelling section described below).

      • To further explore when relative survival is most important for effective differentiation, we have reformulated our model. Our new modelling results show that changes in the survival rates have stronger impact shifting the cell proportions at later stages of differentiation. Phase diagrams in Figure 3- Supplement 1 show that at day 5 reaching an appropriate proportion of PrE requires less change in the survival rates than in doubling times, suggesting that survival can have greater impact on the ratio than cell division. However, at day 3, the modelling suggests the opposite, consistent with a first wave of a cell cycle regulation and PrE fastest division time (Fig. 2D). We have discussed this in the Results section, lines 162-168.

      • We have added survival alongside proliferation to the abstract.

      Reviewer 3

      The manuscript of Birckman and colleagues tackles the link between lineage priming, lineage specification, and cell cycle in the ESCs culture. This is an interesting piece of work, with several noteworthy findings, that elegantly explain how lineage priming can be efficiently achieved during the changing cultural conditions. There are several interesting points raised by the authors, relating to lineage priming, cell specification, and cell cycle, that can be presented to the scientific community. Namely:

      • Differential regulation of the cell cycle can tip the balance between populations of cells primed to different cell fate choices (here PrE and Epi).

      • Different culture conditions favour acceleration/stimulation of the cell cycle of different cell populations.

      • Only a small population of cells from the original culture enters a differentiation process which is followed by selected expansion and/or survival of their progeny.

      • In the case of endodermal type specification (towards PrE), a shortening of the cell cycle is accompanied by the proportional relative increase of G1 phase length.

      • FGF activity is responsible for cell cycle synchronisation, required for the inheritance of similar cell cycles between sisters and cousins

      Unfortunately, in the current version of the manuscript, the authors try to create the impression that the relationship between cell cycle, heterogeneity and cell fate found in ESCs can be directly translated to the in vivo system. It is not clear, however, how easily and reliably the information about the cell cycle in ESCs can be translated to an in vivo setting. The timeline of PrE vs Epi specification in vivo and in vitro are completely different. In embryos, PrE is specified within 24h, whereas with in vitro it takes 6 days. I cannot see how these two timelines - and also different cell cycle lengths - can be reliably compared.

      • We were aware of the difficulties inherent in this comparison and apologize if our statements were not sufficiently conditional. We have now added caveats to the results section (line 328) and a revised discussion that explicitly discusses the difference in time lines (lines 404-409).

      • On line 111, we added a clarification that our in vitro model is a tool for deconstructing “transcriptional signatures” as this is precisely what we measure both in vivo and in vitro.

      • On line 322 we clarified that even though the time frames are different, we found a similar G1 signature trend in our in vitro dataset than the one found in the in vivo published dataset.

      • On line 337 we amended the sentence to clarify that some aspects of cell cycle regulation found in vitro could be also exploited in vivo.

      • On line 414 we have added a caveat line explaining that the use of G1 to drive increasing levels of commitment is a facet of cell fate choice both in vivo and in vitro, even though the time scales are different.

    1. Author Response

      Reviewer 1

      In this article Farrell et al. leverage existing datasets which measure frailty longitudinally in mice and humans to model 'robustness' (the ability to resist damage) and 'resilience' (the ability to recover from damage), their dynamics across age, and their relative contributions to overall frailty and mortality. The concept of separating damage/robustness from recovery/resilience is valid and has many important applications including better assessment and prediction of effective intervention strategies. I also appreciate the authors' sophisticated attempts to effectively model longitudinal data, which is a challenge in the field. The use of human and mouse data is another strength of the study, and it is quite interesting to see overlapping trends between the two species.

      While I find the rationale sound and appreciate the approach taken at a high level, there are a few key considerations of the specific data used which are lacking. The authors conceptualize resilience based on studies which primarily use short time scales and dynamic objective measures (ex. complete blood cell counts in Pyrkov et al.) often in conjunction with an acute stress stimulus. For example, they heavily cite Ukraintseva et al. who define resilience as "the ability to quickly and completely recover after deviation from normal physiological state or damage caused by a stressor or an adverse health event."

      Resilience and robustness are typically studied at short time-scales, with small numbers of continuous health attributes. We study transitions of binary health attributes, which we call damage and repair, and which we suggest should be thought of as resilience and robustness. Our approach is well suited for studying large numbers of binary health attributes over long time-scales without acute external stimuli. How resilience and robustness in these limits (binary, large numbers, long times, intrinsic dynamics) compare with resilience and robustness as has been typically measured (continuous, short times, acute stimuli) is an interesting and important question that arises from our work.

      Given these definitions, the human data used seem to fit within this framework, but we should carefully consider the mouse data. The mouse frailty index is a very useful tool for efficiently measuring the organismal state in large cohorts. A tradeoff for quickly measuring a broad range of health domains is that the individual measurements are low resolution (categorical) and involve inherent subjectivity (which may be considered part of the measurement error). Some transitions in individual components are due to random measurement error and I believe this is especially likely with decreases (or 'resilience' transitions).

      The reason I think the resilience transitions are subject to high measurement error is that I am skeptical as to whether many of the deficits in the mouse index are reversible under normal physiologic conditions. For example, it is exceptionally unlikely for a palpable/visible tumor to resolve in an aged mouse over the time scales studied here, thus any reversal that was observed is very likely due to random measurement error. Other components which I have doubts about reversibility are alopecia, loss of fur color, loss of whiskers, tumors, kyphosis, hearing loss, cataracts, corneal capacity, vision loss, rectal prolapse, genital prolapse.

      In summary, I applaud the authors' efforts in generating complex models to better understand longitudinal aging data. This is an important area that needs further development. I appreciate their conceptualization of resilience and robustness and think this framework has an important place in aging research. I also appreciate their cross-species approach. However, the authors may have over-conceptualized and made some assumptions about the mouse data which may not be valid. It will be important to assess the results with careful consideration of the time scales of the underlying biology and the resolution and measurement error inherent to these tools.

      For each of our mouse attributes, there are published studies demonstrating reversibility (see our new Supplementary Table 1). Nevertheless, we cannot distinguish what causes the observed discrete transitions (measurement error, stochastic fluctuations in underlying organismal features, or logisticlike continuous transitions in underlying continuous variables). We analyze the discrete data as given.

      The question of time-scale is interesting. From survival curves of individual binarized attributes, we obtain reasonable fits to exponential models (i.e. a single timescale) see Fig 5 supplement 1 and 2. For the human data there are a broad range of timescales for both robustness and resilience. For the mouse data there appears to be a similarly broad range (note the logarithmic scale) though with considerable uncertainty. We work with the data we have, so we are unable to probe shorter timescales than the measurement interval (months for mice, and years for humans). We have reinforced this caveat in the discussion.

      Reviewer 2

      This study uses repeated measurements of the frailty index (FI), composed of multiple binary parameters. It is posited that newly detected changes in the number of these parameters represent damage and that the parameters that have previously been detected but are not detected currently represent damage repair. Statistical treatment then follows, deriving resilience and robustness and their changes over time. This is an interesting idea. Strengths of the study include analyses across species (mice and humans), including multiple datasets in mice.

      To be clear, our data analysis is on the binary health attributes that are used in the FI. By considering the damage/repair (binary transitions) of individual attributes, we can obtain the aggregate damage/repair rates.

      What are the elements of FI that increase at each period of life, and what are those that decrease? For example, humped phenotype or alopecia are more likely to appear in old mice and are essentially irreversible, whereas weight loss due to infection may be more common in young mice and is reversible. Therefore, the choice of health deficits would affect the model and, for example, may artificially lead to a decreased value of what the authors call damage repair.

      More generally, information on the frailty index lacks sufficient details. I doubt that this method has sufficient accuracy to draw conclusions from as little as 32 female mice (21 + 11 animals in datasets 1 and 2) and 63 males (13 + 6 + 44 animals in datasets 1, 2 and 3). Also, only 25 enalapril-treated mice of each sec were analyzed, and only 17 exercised mice (11 females and 6 males). The number of human participants is large, but the total follow-up period is not shown, and the subjects were assessed based on 23 parameters only.

      We have not examined other choices of health attributes. While we picked standard sets from available data, we do not know whether other attributes would behave differently. It would be difficult to do our detailed modelling on single attributes in the mouse data, since the data is so sparse. Our approach was developed specifically to be able to draw conclusions from limited mouse data. Where possible we aggregate the individual mice, sex, health attributes, studies, and measurement times. The analysis of human data shows that the approach generalizes.

      While we have mostly not studied individual attributes (we have considered survival times, but without age or time effects), we would expect that some of them may have behavior that qualitatively differs from our aggregate results. If attribute selection was biased towards (or away from) qualitatively distinct behaviors that would, of course, be reflected in aggregate results. We suspect that this would be unlikely, but that any such distinctive behavior would be interesting and important to identify and understand. We have added some discussion on this point, since we cannot exclude this possibility.

      A key assumption in this work is that increased FI is equivalent to the rise in damage. However, the relationship between changes in FI and damage is unknown. One can imagine a situation when damage increases, but protection also increases. In this case, fitness may increase, decrease or remain unchanged. What is the basis for calling an increased number of health deficits damage? Is there a more reliable method to measure damage that could support the authors' claims?

      See also discussion point #1 in essential revisions. We call binary states 0 “healthy” and 1 “damaged”, but we could instead say “more healthy for most individuals” and “less healthy for most individuals” – where “healthy” means associated with desirable (low FI and low mortality) health outcomes. We have not explored other measures of organismal damage. We have not explored how interactions between variables could affect resilience or robustness for individuals. We do not think that alternative approaches would be easy to study without much more data (for mice) that is more finely resolved in time (for mice and humans). We are quite happy to have found an approach to use with binarized data, but would welcome viable alternative approaches to compare with.

      Reviewer 3

      In this work, the authors aimed at investigating two related components of aging-related processes of health deficits accumulation in mice and humans: the processes of damage (representing the robustness of an organism) and repair (corresponding to resilience), and at determining how different interventions (the angiotensin-converting enzyme inhibitor enalapril and voluntary exercise) in mice and a representative measure of socio-economic status (household wealth) in humans affect the rates of damage and repair. Two key elements in this study allowed the authors to achieve their goals: 1) the use of relevant data containing repeated measurements of health deficits from which they were able to compute the cumulative indices of health deficits in mice and humans and which are also necessary to evaluate the processes of damage and repair; 2) the methodological approach that allowed them to formulate the concepts of damage and repair, model them and estimate from the available data. This methodological framework coupled with the data resulted in important findings about the contribution of the age-related decline in robustness and resilience in health deficits accumulation with age and the differential impact of interventions on the processes of damage and repair. This provides important insights into these key components of the process of aging and this research should be of interest to both lab researchers who plan experimental studies with laboratory animals to study potential mechanisms and interventions affecting health deficits accumulation as well as researchers working with human longitudinal studies who can apply this approach to further investigate the impact of different factors on robustness and resilience and their contribution to the overall health deterioration, onset of diseases and, eventually, death.

      The key strength of this work is a rigorous analytic approach that includes joint modeling of longitudinal measurements of health deficits and mortality (in mice). This approach avoids biased inference which would be observed if longitudinal data were analyzed alone, ignoring attrition due to mortality. Another strength is a comprehensive analysis of both laboratory animal data that allows exploring the impact of different interventions on the processes of damage and repair and human data that allows investigating disparities in these processes in individuals with different socioeconomic backgrounds (represented by household wealth).

      One weakness (which is commonplace for human studies) is self-reported data on health deficits in humans which makes it difficult to compare with lab data where deficits are assessed objectively by lab researchers. The subjective nature of health deficits measurements complicates the interpretation of findings, especially about repairs of deficits. In addition, it is not clear whether the availability/absence of caregivers at different exams/interviews factors into the answers on difficulty/not difficulty with specific activities constituting health deficits and, respectively, into their change over time reflected in damage/repair estimates.

      Variability of the evaluator is expected in any longitudinal study, and amounts to a variety of measurement error. The question of whether there are age-effects in the measurement error, such as bias or age-dependent variability is interesting. For the mouse data, evaluator training is designed to minimize such errors and inter-evaluator differences are not large (Feridooni et al, 2015; Kane et al, 2017). For the human self-report data any such age-effects are unavoidable.

    1. Author Response

      Reviewer 1

      Sadeh and Clopath analyze two mouse datasets from the Allen Brain Atlas and show that sensory representations can have apparent representational drift that is entirely due to behavioral modulation. The analysis serves as a caution against over-interpreting shifts in the neural code. The analysis of data is coupled with careful modeling work that shows that the behavioral state reliably shifts sensory representations independently of stimulus modulation (rather than acting as a gain factor), and further show that it is reproducibly shifted when the behavioral state is adequately controlled for. The methods presented point towards a more careful consideration and measurement of behavioral states during sensory recordings, and a re-analysis of previous findings. The findings held up for both standard drifting grating stimuli as well as natural movies.

      The fact that neurons may have different tuning depending on the behavioral state of the animal raises obvious questions about readout. The authors show that neurons with strong behavioral shifts should simply be ignored and that this can be achieved if the downstream decoder weights inputs with more stimulus information. While questions remain about why behavior shifts representations and how that could be more effectively utilized by downstream circuits, the results presented clearly show that sensory representations might not always be simply drifting over time, and will spark some careful analysis of past and future experimental results.

      Many thanks for a clear summary of the work and emphasizing the significance of the results.

      Reviewer 2

      Studies from recent years have shown that neuronal responses to the same stimuli or behavior can gradually change with time - a phenomenon known as representational drift. Other recent studies have shown that changes in behavior can also modulate neuronal responses to a given sensory stimulus. In this manuscript, Sadeh and Clopath analyzed publicly available data from the Allen Institute to examine the relationship between animal behavioral variability and changes in neuronal representations. The paper is timely and certainly has the potential to be of interest to neuroscientists working in different fields. However, there are currently several important issues with the analysis of the data and their interpretations that the authors should address. We believe that after these concerns are addressed, this study will be an important contribution to the field.

      We really appreciate the time and the effort the reviewer(s) have taken to evaluate our results and analysis in detail. Their comments are very relevant and critical to the improvement of the manuscript. We explain below how we addressed their various comments and concerns

      1. The manuscript raises a potential problem: while previous work suggested that the passage of time leads to gradual changes in neuronal responses, the causality structure is different: i.e., the passage of time leads to gradual changes in behavior, which in turn lead to gradual changes in neuronal responses. The authors conclude that "variable behavioral signal might be misinterpreted as representational drift". While this may be true, in its current form, the paper lacks critical analyses that would support such a claim. It is possible that both factors - time and behavior - have a unique contribution to changes in neuronal responses, or that only time elicits changes in neuronal responses (and behavior is just correlated with time). Thus, the authors should demonstrate that these changes cannot be explained solely by the passage of time and elucidate the unique contributions of behavior (and elapsed time) to changes in representations.

      This is a very important point and we addressed it with new analyses, by dedicating a new figure (Figure 1–figure supplement 5) and a new part of the Results section to it. The results of our new analyses show that strong representational drift mainly exists in those animals/sessions with large behavioral changes between the two blocks, and that in animals/sessions with small behavioral changes, such drift is minimal, despite the passage of time (see our responses below to Major comments for further details).

      1. There are also several issues with the analysis of the data and the presentation of the results. The most concerning of which is that the data shows a non-linear (and non-monotonic) relationship between behavioral changes and representational similarity. In many of the presented cases, the data points fall into two or more discrete clusters. This can lead to the false impression that there is a monotonic relationship between the two variables, even though there is no (or even opposite) relationship within each cluster. This is a crucial point since the clusters of data points most likely represent different blocks that were separated in time (or separation between within-block and acrossblock comparisons).

      This is an important concern. To address this, we analyzed the source of the non-monotonic relationship / opposite trend in the data and demonstrated the results in a new figure (Figure 4–figure supplement 2). Our results show that the non-monotonic relationship does not compromise the result of our previous analysis. Furthermore, it suggests that the non-monotonic / opposite trend is emerging as a result of more complex interactions between different aspects of behavior. We have also shown, in separate analyses, that the passage of time is not the main contributing factor to representational drift, rather large behavioral changes are correlated with strong drifts between the two blocks of presentation (Figure 1—figure supplement 5, and Figure 3—figure supplement 2).

      More generally, we did not intend to claim that the relationship with behavioral changes is linear or/and monotonic. We used linear analysis just to show the main trend of decrease in representational similarity with large behavioral changes. Any other analysis should assume some form of nonlinearity, but because the nonlinear relationships between behavior and activity were complex, it was not easy to assume such nonlinearity.

      We in fact tried to use two other ways of analysis, nonlinear correlations and generalized linear models (GLM), but there were issues hindering a proper use of each analysis. Nonlinear correlations assume a specific type of nonlinearity, but the nature of nonlinearity underlying the data is not clear (in fact, it looks to be different in different example non-monotonic trends in the data). We could not, therefore, assume a nonlinearity that best fitted all the data; we believe the nature of this nonlinearity, or how behavior modulates neuronal activity in a nonlinear manner, is in itself an interesting and open question for future investigation, but beyond the scope of this study. GLM did not provide useful results either, as the relationship between behavioral changes and neural activity/representational similarity was state-dependent and transitioning between nonlinear states, therefore hindering the usage of linear methods.

      We therefore opted for the simplest analysis which can show and quantify this dependence - emphasizing that further analyses are in fact needed to get to the bottom of the exact nonlinear relationship (for further details, see the responses below to Major comments).

      1. The authors also suggest that using measures of coding stability such as 'population-vector correlations' may be problematic for quantifying representational drift because it could be influenced by changes in the neuronal activity rates, which may be unrelated to the stimulus. We agree that it is important to carefully dissociate between the effects of behavior on changes in neuronal activity that are stimulus-dependent or independent, but we feel that the criticism raised by the authors ignores the findings of multiple previous papers, which (1) did not purely attribute the observed changes to the sensory component, and (2) did dissociate between stimulus-dependent changes (in the cells' tuning) and off-context/stimulus-independent changes (in the cells' activity rates).

      That’s a very valid point. As population vector correlations are used quite often in (experimental and theoretical) works on representational drift, we wanted to highlight the pitfalls of such a metric in dissociating between sensory-evoked and sensory-independent components. However, as the reviewers have mentioned, these two aspects have been separated and addressed independently in some of the past literature in the field. For instance, as we discussed in the Discussion, Deitch et al. (Current Biology, 2021) have calculated this for different metrics, including tuning curve correlations, which can potentially alleviate this problem:

      A recent analysis of similar datasets from the Allen Brain Observatory reported similar levels of representational drift within a day and over several days5. The study showed that tuning curve correlations between different repeats of the natural movies were much lower than population vector and ensemble rate correlations5; it would be interesting to see if, and to which extent, similarity of population vectors due to behavioural signal that we observed here may contribute to this difference.

      We tried to highlight these contributions better in the revised manuscript (see further on this below in our responses to Major comments).

      1. Another important issue relates to the interchangeable use of the terms 'representational drift' and 'representational similarity'. Representational similarity is a measure to identify changes in representations, and drift is one such change. This may confuse the reader and lead to the misconception that all changes in neuronal responses are representational drift.

      We thank the reviewer(s) for raising this point. We have clarified our use of the terms representational similarity and representational drift in the revised manuscript. Specifically, we have quantified representational drift index between the two blocks according to a previously used metric (RDI; Marks & Goard, 2021) in our new analysis (Figure 1–figure supplement 5).

      For the main part of the paper, however, we have decided to base our analysis on representational similarity (RS), and to evaluate the drop of RS with changes in behavior. Our reasoning for this is twofold. First, any measure of representational drift should ultimately be a function of the representational similarity. The measure we used above, for instance, is calculated as RD = (RS_ws - RS_bs)/(RS_ws + RS_bs) (Marks and Goddard, 2021), with RS_ws and RS_bs referring to the average representational similarity within a session or between different sessions. However, RS contains more information, especially with regard to fine-tuned changes - the above metric, for instance, averages all the changes within each block of presentation. By focusing on the basic function of representational similarity, we could capture both the gross changes between the blocks as well as more nuanced changes that can arise within them, especially with regard to behavioral changes. Another aspect that would have been lost by only using the usual metric of representational drift is the direction of change. In our analysis, we in fact found that the average RS increased within the second block of presentation, which might be contrary to the usual direction of drift. We found this unconventional change of RS interesting and informative too. We could highlight that, presenting the raw RS provided a better analysis strategy. Based on these reasons, we think representational similarity would be a better metric to base our analyses upon, although we have now calculated a conventional representational drift index for comparison too.

      Reviewer 3

      Although it is increasingly realized that cortical neural representations are inherently unstable, the meaning of such "drift" can be difficult or impossible to interpret without knowing how the representations are being read out and used by the nervous system (i.e. how it contributes to what the experimental animal is actually doing now or in the future). Previous studies of representational drift have either ignored or explicitly rejected the contribution of what the animal is doing, mostly due to a lack of high-dimensional behavioural data. Here the authors use perhaps the most extensive opensource and rigorous neural data available to take a more detailed look at how behaviour affects cortical neural representations as they change over repeated presentations of the same visual stimuli.

      The authors apply a variety of analyses to the same two datasets, all of which convincingly point to behavioural measures having a large impact on changing neural representations. They also pit models against each other to address how behavioural and stimulus signals combine to influence representations, whether independently or through behaviour influencing the gain of stimuli. One analysis uses subsets of neurons to decode the stimulus, and the independent model correctly predicts the subset to use for better decoding. However, one caveat may be that the nervous system does not need to decode the stimulus from the cortex independently of behaviour; if necessary, this could be done elsewhere in the nervous system with a parallel stream of visual information.

      Overall the authors' claims are well-supported and this study should lead to a re-assessment of the concept of "representational drift". Nonetheless, a weakness of all analyses presented here is that they are all based on data in head-fixed mice that were passively viewing visual stimuli, such that it is unclear what relevance the behaviour has. Furthermore, the behavioural measurements available in the opensource dataset (pupil movements and running speed) are still a very low dimensional representation of what the mice were actually doing (e.g. detailed kinematics of all body movements and autonomic outputs). Thus, although the authors here as well as other large-scale neural recording studies in the past decade or so make it clear that relatively basic measures of behaviour can dramatically affect cortical representations of the outside world, the extent to which any cortical coding might be considered purely sensory remains an important question. Moreover, it is possible that lowerdimensional signals are overly represented in visual areas, and that in other areas of the cortex (e.g. somatosensory for proprioception), the line between behaviour parameters and sensory processing is blurred.

      Many thanks for the clear and insightful summary of the results, significance and caveats of our analysis. We totally agree with this critical evaluation - and suggestions for future work.

    1. Author Response

      Reviewer 2

      In the manuscript, the cellular deformation that is due to the shear stress generated in a classical microfluidic channel is used to deform detached cells that are moving in the flow. A very elegant point of the paper is that the same cells are used in the provided software to determine the fluid flow, which is a key parameter of the method. This is particularly important, as an independent way to crosscheck the fluid flow with the expected values is important for the reliability of the method. Instead of complicated shape analysis that are required in other microfluidic methods, here the authors simply use the elongation of the cell and the orientation angle with respect to the fluid flow direction. The nice thing here is that a well-known theory from R. Roscoe can be successfully used to relate these quantities to the viscoelastic shear modulus. Thanks to the knowledge of the fluid flow profile, the mechanical properties can be related to the tank treading frequency of the cells, which in turn depends on the position in the channel, and the flow speed. Hence, after knowing the flow profile, which can be determined with a sufficiently fast camera, and the actual static cell shape, it is possible to obtain frequency dependent information. Assuming then that cells do have a statistically accessible mean viscoelastic property, the massive and quick data acquisition can be used to get the shear modulus over a large span of frequencies.

      The very impressive strength of the paper is that it opens the door for basically any, non-specialized cell biology lab to perform measurements of the viscoelastic properties of typically used cell types in solution. This allows to include global mechanical properties in any future analysis and I am convinced that this method can become a main tool for a rapid viscoelastic characterization of cell types and cell treatment.

      Although it is both elegant and versatile, there remain a couple of important questions open to be further studied before the method is as reliable as it is suggested by the authors. A main problem is that the model and the data simply don't really work together. This is most prominent in Figure 3a. This is explained by the authors as a result of non-linear stress stiffening. Surely this is a possible explanation, but the fact that the question is not fully answered in the paper makes the whole method seems not sufficiently backed. I agree that the test with the elastic beads are beautiful, but also here the results obtained with the microfluidic method and the AFM seem not to match sufficiently to simply use the proposed model in conjecture with a single power law approach to fully translate the single frequency data into a frequency dependent plot. There are more and more hints that two power law models are more reasonable to describe cell mechanics. If true this would abolish the approach to exploit only a single image to get the mechanical power law exponent and the prefactor in a single image. Despite all the excitement about the method, I have the feeling that the used models are stretched to their extreme, and the fact that the only real crosscheck (figure 3a) does not work for the power law exponent undermines this impression.

      We had assumed that the probing frequency equals the tank treading frequency. This is incorrect. As the cell undergoes a full rotation, any given volume element inside the cell is compressed twice and elongated twice. Hence, the frequency with which the cell is probed is twice the tank-treading frequency. This correction shifts the G’ and G” versus frequency curves to the right (by a factor of two), and in addition, the G” data points are shifted (increased) by a factor of two (Eq. 17). This also increases the fluidity alpha (and hence the slope of the power-law relationship) roughly by a factor of two (Eq. 22), and since the actual slope of the G’ and G” versus frequency data “cloud” is unchanged by the correction, the single power-law description now describes the data much better (see new Fig. 3a).

      Regarding the critique that models are stretched to their extreme: The Roscoe model assumes that cells behave as the visco-elastic continuum-mechanics equivalent of a Kelvin-Voigt body consisting of an elastic spring in parallel with a resistive (or viscous) dash-pot element . This then gives rise to a complex shear modulus with storage modulus G’ and loss modulus G”, measured at twice the tank treading frequency 𝜔. Roscoe makes no assumptions whatsoever about how G’ and G” might change as a function of frequency. Hence, our “raw” G’ and G” data, e.g. in Fig. 3a, are obtained without any power law assumption.

      One could leave it at that, as the reviewer suggests below, and only present the raw G’ and G” vs. frequency plots. However, this would also make it nearly impossible to compare our measurements to those obtained with other techniques that operate at different, non-overlapping time- or frequency-scales. For such a comparison to work, one needs a model to predict how G’ and G” scale with frequency.

      A commonly used and very simple model to predict how G’ and G” scale with frequency, which is also the model used by Fregin et al. and many others, is that of a Kelvin-Voigt body consisting of an elastic spring in parallel with a resistive element (dash-pot), both with a frequency-independent stiffness and resistance (viscosity), respectively. However, our data show that G’ and G” of different cells, all measured at different tank-treading frequencies, exhibit a behavior that is very unlike that of a simple Kelvin-Voigt body with a constant, frequency-independent stiffness and resistance. In this case, G’ would be flat (power law exponent zero), and G” would increase proportional with frequency (power law exponent of unity). This is clearly not what our data show.

      Rather, we find that G’ and G” increase with increasing frequency according to a power law, with the same exponent 𝛼 for G’ and G”. At high frequencies (beyond the range of our microfluidic method, but in the range of our AFM measurements), G” increases more strongly with frequency, akin to a Newtonian viscosity (power law exponent of unity), which we take into account in the case of the AFM measurements. A large number of publications have shown that many types of cells, including cells in suspension, follow power law rheology, regardless of the measurement method. Also the AFM measurements that we include in this study support the validity of power-law rheology.

      Power law rheology predicts a peculiar behavior: The ratio of G”/G’ in the low-frequency regime (where the high-frequency viscous term is not yet dominating) must be equal to tan(𝛼𝜋/2), for mathematical reasons (Eq. 22). With our correction (that the probing frequency is twice the tank-treading frequency), we find that Eq. 22 correctly predicts the power-law exponent of the G’ and G” vs. frequency data.

      Note that we actually do not fit a power law model (Eq. 1) to the population data of G’ and G” vs. frequency in Fig. 3a. The G’ and G” data are obtained by applying Roscoe-theory, without any further assumptions such as power-law rheology. Only the lines shown in Fig. 3a that go nicely through the data are a prediction of how a typical cell (selected from the mode of the joint probability density of alpha and k, see Fig. 3b) would behave if we had measured it at different frequencies, under the assumption that this cell follows power law rheology, based on Eq. 22. With this assumption, we can directly convert the measured G’ and G” of any cell into a stiffness k and power law exponent 𝛼 using Eqs. 21 and 22 - no fit is needed here.

      Since we only measure two parameters for any given cell at twice its tank-treading frequency, namely strain and alignment angle, we can only extract two parameters for each cell (i.e., G’ and G”, or k and alpha) but not a third parameter. In essence, the reviewer expresses concerns that the G' and G" behavior of a typical cell, when extrapolated to higher or lower frequencies, may not necessarily match the frequency behavior of the entire cell population (Fig. 3a). However, our data show that a single (typical) cell that was measured at a single mid-range frequency comes remarkably close to describing the G’ and G” versus frequency behavior of all other cells.

      The reviewer suggests that a power law model with two exponents may be able to even more accurately describe the mechanics of the cell population. This is certainly correct, and in particular when cell mechanics is measured over a larger range of frequencies or strain rates, as we have done here using AFM, we find that at higher frequencies, G” deviates from a weak power law and merges into a different power law with a larger slope (i.e., power law exponent) that approaches unity or a value close to unity, akin to a Newtonian viscous term. Therefore, the single power law expression (Eq. 1) is not sufficient for the AFM data, and we use Eq. 2 instead. However, in the case of our shear stress cytometry measurements, the tank-treading frequency remains below the range where this second power law behavior becomes prominent. Therefore, the Newtonian viscosity term of Eq. 2 cannot be fitted with reasonable fidelity to the data from a single measurement.

      In the case of polyacrylamide beads, we start to see a hint of an upward trend in G” versus frequency at tank-treading frequencies of around 10 Hz, and therefore have performed a global fit with Eq. 2 to the shear flow data where we keep the Newtonian viscous term constant for all conditions (different shear stresses and bead stiffnesses).

      The reviewer furthermore cautioned that mechanical non-linearities such as strain stiffening may distort or otherwise bias the results. As the reviewer brings up this issue in more detail below, we have addressed it there.

      Regarding the concern that “results obtained with the microfluidic method and the AFM seem not to match sufficiently to simply use the proposed model in conjecture with a single power law approach to fully translate the single frequency data into a frequency dependent plot.”:

      First, we tend to agree more with the opinion of Reviewer #1 who found it remarkable that results obtained with the microfluidic method and the AFM method are actually fairly similar. Now that we have introduced the correction that the probing frequency is twice the tank-treading frequency, the cells in suspension turn out to be softer and more fluid-like compared to the cells measured with AFM. But there are many more commonalities between the AFM data and the shear flow data, which we list above in our reply to reviewer #1, the most relevant here is that cells show power-law behavior both when measured with AFM and with our new method.

      Second, we did not use a single power law to fit the AFM data. Rather, we used Eq. 2, which contains two power law relationships (the second power law exponent of unity for the Newtonian viscosity therm is usually not explicitly written). However, the origin of the Newtonian viscosity therm arises mainly from the hydrodynamic drag of the cantilever with the surrounding liquid, and less so from the cells. This hydrodynamic drag is absent in our shear flow deformation cytometry method, and moreover the tank treading frequency of most cells remains far below 10 Hz where an additional Newtonian viscosity therm does not yet come into play.

      Third, we disagree that Fig. 3a is “the only real crosscheck for the power law exponent”. The inverse relation that we see between the power law exponent and the stiffness of individual cells (Fig. 3b) has been previously reported for different cell types and methods. Moreover, we find a power law exponent close to zero for PAA beads at small strain values, which is to be expected for a predominantly elastic material such as PAA. We think that this last result is a particularly convincing experimental cross-check.

    1. Author Response

      Reviewer 2

      In this manuscript, Gao et al claim that they have constructed a gene regulatory network underlying alveologenesis and its significance to bronchopulmonary dysplasia (BPD). Using RTPCR and in situ hybridization, the authors claim that Igf1 and Igf1r are expressed in secondary crest myofibroblasts (SCMFs) and their loss of function using Gli1-creER results in alveolar simplification, a tissue level disorganization of alveoli that phenocopies BPD. Further, the authors investigate transcriptomic changes in mesenchymal and epithelial populations from control and Igf1r mutant lungs. For this, the authors developed a 47-gene panel that they claim to represent signaling modules within SCMFs and used this panel for RT-PCR analysis. These data are used to generate an interaction network to evaluate signaling partners, co-effectors mediated by IGF1 signaling in SCMFs, other fibroblasts and alveolar epithelial cells. Using this GRN, the authors concluded that Wnt5a is a key signaling molecule downstream of IGF1 signaling that regulates alveologenesis.

      While the authors' claims are salient, some of the conclusions were previously shown by others. For example, a role for Wnt5a driven Ror/Vangl2 has already been shown to be a key mediator of alveologenesis, by virtue of the same signaling effectors identified in this study (Zhang 2020 eLife).

      Response: For the network construction, we not only used our own data, but also considered perturbation analyses and gene regulation data contributed from the work reported by others in the lung research community. The goal was to decipher the genetic regulations among these genes, connect them together, and build the GRN so the lung research community can employ it as a tool to study lung development and its disease in a network-type context.

      We are aware of the excellent study by Zhang et al. 2020 and appreciate their findings related to WNT5a/Ror/Vangl2 functions in lung development. With due respect, we would like to point out that Zhang and colleagues inactivated WNT5a by Tbx4Cre which is highly expressed during the embryonic stage (i.e. Cebra-Thomas et al., 2003). This approach cannot exclude the indirect effects of the mutation that originate during the embryonic phase, on alveologenesis, which is an entirely postnatal process. In contrast, our study used a conditional CreER model to inactivate WNT5a specifically during alveologenesis (i.e. PN2). More importantly, our work identified the upstream and downstream regulatory connections of this signaling pathway and further elucidated its role and function from the network ground.

      Additionally, the genetic loss of function studies performed here are not specific to SCMFs and instead they target broader alveolar and airway fibroblasts.

      Response: Please see our Response to General Comment #3 above for the specificity of Gli1 to SCMFs.

      The construction of a gene regulatory network is a potentially exciting tool, but this requires additional perturbations to distinct nodes identified in this work. It would be of particular interest to determine whether there is any redundancy among these nodes and what are the downstream effectors that are specific to each node. While I recognize that this is outside the scope of this work, the authors need to demonstrate the significance of at least one such node.

      Response: We agree with the general validity of the reviewer’s comment that much more can be done that is not in this first report of the alveologenesis GRN. There is always much more.

      The reviewer has raised some critically important and interesting points. We appreciate the reviewer’s acknowledgement that these very key studies are not within the scope of what is presented in this initial report on GRN construction. Each of such studies will likely require a separate report. The content of the present manuscript was chosen with the goal of disseminating a highly cogent and focused set of data that introduces the utility of GRN in alveologenesis. As noted above by the Editors, this is a novel approach in “translating publicly available data into meaningful biological insights”. We hope this clarifies the main purpose of our study and this manuscript.

  4. Jul 2022
    1. Author Response:

      Reviewer #2 (Public Review):

      The topic is interesting, the study addresses an important question, and the manuscript is fairly well written. However, in my opinion, there are a number of serious problems that need to be addressed - because there are many similar laboratory studies on this subject. In my opinion, the novelty of the main message (a neutrophil - B-cell axis governs disease tolerance during sepsis via Cxcr4) is limited. My explanation is as follows.

      Firstly, human sepsis is defined (according to Sepsis 3 criteria) as life-threatening organ dysfunction caused by a dysregulated host response to infection (Singer et al. 2016) and in agreement with the Authors, despite considerable research efforts the pathophysiology is still unknown. Besides, the mechanisms underlying the signaling and trafficking of PMN leukocytes within the affected tissues, and the roles of adhesion molecules, including chemokine receptors are still not fully understood. Indeed, diverse therapies directed against these targets have shown dramatic effects in animal models; however, in humans, their clinical impact has been modest. There are many reasons for the discordance between biological promise and bedside reality, and one of them might be the use of animal models. Extrapolation from animal models always requires an awareness of species-related and other dependent variations and I would like to stress that this study employed mice, the Author has investigated the mechanisms involved in the PMN reactions in a mouse model of intraabdominal (O18:K1 E. Coli) bacterial infection, but this fact is not mentioned in the title or abstract (only in keywords) or introduction.

      We thank the reviewer for this comment. We fully agree that the translational potential of many mouse studies is limited, but truly believe that there are conserved mechanisms between mouse and human which are worth being explored and reported. We certainly did not intend to hide that our study was performed in mice and have now clarified this by adapting the abstract as follows: “Here, we established a mouse model of long-lasting disease tolerance during severe sepsis, manifested by diminished immunothrombosis and organ damage in spite of a high pathogen burden.“ page 2, line 11

      Secondly, according to the Authors they "established and investigated a model of disease tolerance during sepsis, which enabled them to reveal the importance of B cells and neutrophils in mediating tissue tolerance in the context of severe infections". Here it should be noted that 1. LPS administration to rodents does not mimic human sepsis, and 2. the mechanism of endotoxin tolerance (or preconditioning) has been studied for decades. It is characterized by a hyporesponsive state following low-dose stimulation with a TLR4 ligand (i.e. lipopolysaccharide). Besides, endotoxin tolerance provokes cross-tolerance against other forms of injuries as well such as liver ischemia-reperfusion (J Surg Res 57, 1994).

      Ad 1. We thank the reviewer for this critical comment. While we fully agree on the fact that LPS administration does not mimic human sepsis, we are slightly puzzled by this criticism as we at no point say so in our manuscript. What we instead say is that E. coli peritonitis is a sepsis model, a claim that is widely accepted in the field.

      We are explaining this in the intro: “In this study, we investigated mechanisms of disease tolerance and tissue damage control, by comparing tolerant and sensitive hosts during a severe bacterial infection. While sensitive animals developed severe coagulopathy and tissue damage during sepsis, tolerant animals were able to maintain tissue integrity in spite of a high bacterial load. Disease tolerance was induced by the prior exposure of animals to a single, low-dose of LPS and could be uncoupled from LPS-induced suppression of cytokine responses.” page 3, line 32-34, page 4, line 1-3

      …as well as in the result section: “We thus challenged mice intravenously (i.v.) with a subclinical dose of LPS 1 day, 2 weeks, 5 weeks or 8 weeks, respectively, prior to the induction of Gram-negative sepsis by intraperitoneal (i.p.) injection of the virulent E. coli strain O18:K1.” page 5, line 7-9

      … and in our methods section, which we have adapted to make this point even more clear to the reader: Tolerance was induced by i.v. injection of 30μg E. coli LPS (Sigma-Aldrich) at indicated times before induction of bacterial sepsis by intraperitoneal (i.p.) infection with 1-2x104 E. coli O18:K1. E. coli peritonitis was induced as described previously (Knapp, de Vos et al. 2003, Knapp, Matt et al. 2007, Gawish, Martins et al. 2015). page 24, line 13-15

      Ad 2. We agree that LPS- or endotoxin tolerance is an old topic which has been studied for a long time. However, as we explain in detail in our response to the editor, the contribution of “LPS tolerance” to “disease tolerance” is still under investigation. As explained extensively above (please refer to our response to the editor), we observe signs of LPS tolerance in our experimental setup, as LPS pre-exposed mice produce lower levels of inflammatory cytokines shortly after infection. However, our data collectively suggest, that reduced early cytokine production (most likely a sign of LPS tolerance) is not the reason for improved tissue damage control in LPS-pre-treated mice. As such, we argue that the protective phenotype we observed is independent of early cytokine suppression and independent of monocytes and macrophages, hence not a result of LPS tolerance.

      Thirdly, according to the Authors they considered the possibility that B cells regulate infection-induced neutrophil functionalities because “we discovered that LPS-induced protection was still observed in splenectomized animals”. Nevertheless, the role of spleen in endotoxin effects (or more correctly, that the absence of spleen) in the development of tolerance to endotoxin was demonstrated decades ago (and studied repeatedly by the group of Agarwal (Br J Exp Pathol 53 1972).

      It seems that there has been a misunderstanding as we do not make this claim in our manuscript. To be precise the correct wording that we used in our manuscript is: “Since we discovered that LPS-induced protection was still observed in splenectomized animals, we considered the possibility that B cells regulate infection-induced neutrophil functionalities via effects exerted by sharing the same bone marrow niche. In fact, B cells, neutrophils and their precursors build up the majority of the constitutive CD45+ bone marrow cell pool, where they mature while sharing the same niche (Yang, Busche et al. 2013).“ page 19, line 17-21

      This obviously has a different meaning. As we saw that LPS-induced tissue tolerance was abrogated in (full body) B cell deficient mice, our intention was not to study the effect of splenectomy on LPS-tolerance but to make use of splenectomy to narrow down the B cell compartment which is driving the protective effect. Of course, the role of the spleen during endotoxemia and during peritonitis has been studied before, but the point we wanted to make with our finding was that splenectomy, in contrast to a full B cell deficiency (in Rag2-/- or JHT-/- animals), did not abrogate LPS-induced tissue protection.

      To address and clarify this point, we slightly modified our explanation in the results section and included some of these “old” splenectomy studies: “We then tested if splenectomy would replicate the protective effects of full B cell deficiency during sepsis and interestingly found that splenectomy was associated with reduced liver damage in naïve, sensitive mice, which is in line with other studies (Agarwal, Parant et al. 1972, Karanfilian, Spillert et al. 1983), but, in contrast to complete lymphocyte deficiency, not sufficient to abrogate LPS-induced tissue protection in tolerant animals (Figure 2G and S2F). This suggested that mature splenic B cells contributed to tissue damage during severe infections, while other, not spleen derived, B cell compartments were instrumental in driving disease tolerance.“ page 8, line 1-7

      Next, according to the Authors, "it is tempting to speculate that B cells act as important regulators of granulopoiesis and neutrophil trafficking at steady state and under inflammatory conditions". Nevertheless, previous studies have established that neutrophil trafficking is regulated mainly via CXCR4 in steady state, and the attenuation of CXCR4 signaling leads to the entry of PMNs into the circulation from the bone marrow (JCI 120(7) 2010).

      We believe that there is another misunderstanding. This comment suggests that an involvement of B cells in neutrophil regulation would be against already published knowledge about CXCR4 as a master regulator of neutrophil trafficking. This is not at all what we are claiming in our manuscript.

      A potential involvement of B cells in neutrophil regulation is not in conflict with what is already known about the important role of CXCR4. As CXCR4 signaling is critical for both B cells AND neutrophils, competition for the CXCR4 ligand Cxcl12 (SDF1) and differences in the sensitivity to altered ligand availability might serve as an explanation.

      We are actually discussing the role of CXCR4 in neutrophil trafficking extensively in the results section and in the discussion of our manuscript and have now slightly modified our wording and extended our discussion part to clarify this point:

      “…neutrophil aging, a process that is counteracted by Cxcr4 signaling, the master regulator of neutrophil trafficking between the bone marrow and the periphery (Martin, Burdon et al. 2003, Eash, Greenbaum et al. 2010, Adrover, Del Fresno et al. 2019).” page 13, line 7-9

      “Considering the reported importance of Cxcr4 signaling in neutrophil retention in the bone marrow and their release to the periphery (Martin, Burdon et al. 2003, Eash, Greenbaum et al. 2010, Adrover, Del Fresno et al. 2019),…“ page 15, line 1-2

      “Cxcr4 interaction with its ligand Cxcl12 (stromal cell-derived factor 1, SDF1) has been shown to be critical for the retention of neutrophils in the bone marrow under steady state, their release to the periphery as well as their homing back to the bone marrow when they become senescent (Martin, Burdon et al. 2003, Eash, Greenbaum et al. 2010). Importantly, Cxcr4 signaling is essential, as Cxcr4 knockout mice die perinatally due to severe developmental defects ranging from virtually absent myelopoiesis and impaired B lymphopoiesis to abnormal brain development (Ma, Jones et al. 1998). A different sensitivity to changes in SDF1 concentrations as a potential mechanism of the reciprocal regulation of lymphopoiesis and granulopoiesis has been suggested earlier (Ueda, Kondo et al. 2005).“ page 19, line 33-34, page 20, line 1-7

      Furthermore, selective inhibition of CXCR4 by AMD3100 is protective in many injury models, including mice with fecal peritonitis (Front. Immunol 2020 | https://doi.org/10.3389/fimmu.2020.00407).

      We thank the reviewer for pointing this out. In the above cited paper injection of AMD3100 blocks neutrophil migration into the peritoneal cavity as well as into the tissue during zymosan- or fecal slurry-induced peritonitis. Interestingly, we do not see any benefit of Cxcr4 inhibition (using AMD3100) in our model, but in contrast observed tissue protection by administration of the Cxcr4 agonist ATI2341 (Figure 5F). This can be explained by a different timing (and maybe also route) of administration. In the aforementioned study, AMD3100 is injected intraperitoneally, 1h BEFORE induction of peritonitis. AMD3100 injection is known to cause an immediate release of neutrophils from the bone marrow to the blood (Liu, Li et al. 2015), which means that in this experimental setup, neutrophils are already in the circulation prior the injection of zymosan or fecal slurry. Our experimental setup in contrast uses a therapeutic approach, as we inject AMD3100 or ATI2341 just 6h post induction of E. coli peritonitis. At 6h post infection, E. coli infection has already caused a massive increase in blood neutrophils and infiltration of neutrophils into the peritoneum and various organs. We thus believe that these experimental setups are not truly comparable, but agree with the reviewer that this needs to be addressed in our manuscript.

      We have now added the following paragraph to the discussion to address this study:

      “Given its clinical importance, Cxcr4 inhibition (using AMD3100) has been studied in different injury models, but interestingly only little is known about the therapeutic impact of Cxcr4 activation. Strikingly, activating, but not antagonizing, Cxcr4 during sepsis promoted tissue damage control in our model, which is in conflict with a study showing that Cxcr4 blockade with AMD3100 prior induction of peritonitis prevents neutrophil infiltration and tissue inflammation (Ngamsri, Jans et al. 2020). While we only see a tissue protective effect of ATI2341, but not AMD3100, we believe that this is due to differences in the timing and maybe also the route of drug administration. As we use a therapeutic approach and target Cxcr4 as late as 6h post E. coli injection, a time when there is already substantial neutrophilia in blood and organs, our data support an impact of Cxcr4 signaling on neutrophils’ tissue damaging properties and suggest that B cell driven regulation of Cxcr4 is a potential mechanism of disease tolerance and thus might be an interesting therapeutic target during severe sepsis.“ page 20, line 12-23

      Lastly, it has been previously recorded that Rag1 animals had a higher death rate to CLP than wild type animals and that this could be ameliorated by adaptive transfer of syngeneic T cells induced to overexpress the anti-apoptotic molecule Bcl-2 by gene transfer techniques (Hotchkiss, Crit Care Med 1997)

      We thank the reviewer for this comment. We are aware of the fact that the role of lymphocytes in different experimental models of peritonitis-induced sepsis has been studied before. However, the contribution of different lymphocyte populations remains controversial, likely due to different experimental setups and methodologies used in different studies. While we do not really see the connection of lymphocyte apoptosis to our study, we want to point out that the study, which is mentioned by the reviewer does not report a higher death rate of Rag1-/- animals, but shows that there is substantial lymphocyte apoptosis during sepsis (Hotchkiss, Swanson et al. 1997). We believe that the study which the reviewer refers to is from 1999 and has been published in the Journal of Immunology (Hotchkiss, Swanson et al. 1999).

      However, as lymphocyte apoptosis is not connected to the topic of our study, we chose to discuss our data in the context of more B cell specific literature and have cited another important CLP study which showed that IFN-activated B cells are critical for survival by promoting early inflammation and neutrophil effector functions during CLP which in turn improves bacterial clearance (Kelly-Scumpia, Scumpia et al. 2011). In this setup, the increased mortality of Rag1-/- animals during CLP is likely a result of an increased pathogen load and therefore not in conflict with our data. While we appreciate that CLP is the more physiologic model, the strength of our E. coli sepsis model is that bacterial outgrowth already occurs at a maximum speed and is not further enhanced by the absence of certain immune cell types which is why we can uncouple immunopathologic effects from the pathogen load.

      We mention this study in our result section: “Given that B cells were shown to promote early production of proinflammatory cytokines such as IL-6 during sepsis in a type I IFN dependent manner …“ page 8, line 8-9

      We have further added a paragraph to our discussion to better explain the differences between CLP and the model we have used for this study: “It was demonstrated earlier that mature, splenic B2 cells promote neutrophil activation by boosting type-I IFN dependent early inflammation, which in turn improves bacterial clearance and survival during CLP (Kelly-Scumpia, Scumpia et al. 2011). While enhanced inflammation can mediate pathogen clearance during CLP, it at the same time contributes to tissue damage which is of particular importance in our model. In support of proinflammatory, tissue-damaging properties of mature B2 cell subsets, we found splenectomy similarly protective as B cell deficiency during primary sepsis and reconstitution of Rag2-/- mice with B cells to increase tissue damage. Interestingly, we did not identify an important role for the proposed IFNAR-driven inflammatory function of B cells (Kelly-Scumpia, Scumpia et al. 2011) in sepsis, and inflammation did not differ between wild type and lymphocyte deficient mice. “ page 18, line 29-34, page 19, line 1-5

      Therefore, until this point, I consider the work largely repetitive and suggest to downgrade the wording a bit due to the lack of novelty. The paper confirms these previous observations in a partially new context, which is the demonstration of a crosstalk between PMNs and B cells in the bone marrow, in which B cells influence PMN trafficking likely by modulating Cxcr4 related pathways. We have highlighted the novelties of our study in this response letter and our revised manuscript, and hope the reviewer agrees with us that this improved the clarity and better explains the novelties.

    1. Author Response:

      Reviewer #3 (Public Review):

      The main goals of this study by Guan, Aflalo and colleagues were to examine the encoding scheme of populations of neurons in the posterior parietal cortex (PPC) of a person with paralysis while she attempted individual finger movements as part of a brain-computer interface task (BCI). They used these data to answer several questions: 1) Could they decode attempted finger movements from these data (building on this group's prior work decoding a variety of movements, including arm movements, from PPC)? 2) Is there evidence that the encoding scheme for these movements is similar to that of able-bodied individuals, which would argue that even after paralysis, this area is not reorganized and that the motor representations remain more or less stable after the injury? 3) Related to #2: is there beneficial remapping, such that neural correlates of attempted movements change to improve BCI performance over time? 4) Can looking at the interrelationship between different fingers' population firing rate patterns (one aspect of the encoding scheme) indicate whether the representation structure is similar to the statistics of natural finger use, a somatotopic organization (how close the fingers are to each other), or be uniformly different from one another (which would be advantageous for the BCI and connects to question #3)? Furthermore, does the best fit amongst these choices to the data change over the course of a movement, indicating a time-varying neural encoding structure or multiple overlapping processes? The study is well-conducted and uses sound analysis methods, and is able to contribute some new knowledge related to all of the above questions. These are rare and precious data, given the relatively few people implanted with multielectrode arrays like the Utah arrays used in this study. Even more so when considering that to this reviewer's knowledge, no other group is recording from PPC, and this manuscript thus is the first look at the attempted finger moving encoding scheme in this part of human cortex .

      An important caveat is that the representational similarity analysis (RDA) method and resulting representational dissimilarity matrix (RDM) that is the workhorse analysis/metric throughout the study is capturing a fairly specific question: which pairs of finger movements' neural correlates are more/less similar, and how does that pattern across the pairings compare to other datasets. There are other questions that one could ask with these data (and perhaps this group will in subsequent studies), which will provide additional information about the encoding; for example, how well does the population activity correlate with the kinematics, kinetics, and predicted sensory feedback that would accompany such movements in an able-bodied person?

      What this study shows is that the RDMs from these PPC Utah array data are most similar to motor cortical RDMs based on a prior fMRI study. It's innovative to compare effectors' representational similarity across different recording modalities, but this apparent similarity should be interpreted in light of several limitations: 1) the vastly different spatial scales (voxels spanning cm that average activity of millions of neurons each versus a few mm of cortex with sparse sampling of individual neurons, 2) the vastly different temporal scales (firing rates versus blood flow), 3) that dramatically different encoding schemes and dynamics could still result in the same RDMs. As currently written, the study does not adequately caveat the relatively superficial and narrow similarity being made between these data and the prior Ejaz et al (2015) sensorimotor cortex fMRI results before except for (some) exposition in the Discussion.

      We agree that vastly different spatiotemporal scales (comments 1 and 2) limit the chances of finding correspondence between fMRI and single-neuron recordings. We have added motivation for our comparisons to the Results and Discussion sections.

      Revised text in the Results: “We note that our able-bodied model was recorded from human PC-IP using fMRI, which measures fundamentally different features (millimeter-scale blood oxygenation) than microelectrode arrays (sparse sampling of single neurons).”

      Revised text in the Discussion: “This match was surprising because single-neuron and fMRI recordings differ fundamentally; single-neuron recordings sparsely sample 102 neurons in a small region, while fMRI samples 104 – 106 neurons/voxel (Guest and Love, 2017; Kriegeskorte and Diedrichsen, 2016). The correspondence suggested that RSA might identify modality-invariant neural organizations (Kriegeskorte et al., 2008b), so here we used fMRI recordings of human PC-IP as an able-bodied model.” “This result does obscure a straightforward interpretation of the RSA results – why does our recording area match MC better than the corresponding implant location? Several factors might contribute, including differing neurovascular sensitivity to the early and late dynamic phases of the neural response (Figure 4e), heterogeneous neural organizations across the single-neuron and voxel spatial scales (Arbuckle et al., 2020; Guest and Love, 2017; Kriegeskorte and Diedrichsen, 2016), or mismatches in functional anatomy between participant NS and standard atlases (Eickhoff et al., 2018).”

      …3) that dramatically different encoding schemes and dynamics could still result in the same RDMs…

      Regarding point 3, we agree that RSA provides a second-order correspondence (Kriegeskorte et al., 2008a) rather than direct neuron-to-neuron comparisons. To supplement RSA, we also provide more detail on single-neuron responses for the reader in Figure 1–figure supplement 5. However, we believe that population metrics helpfully summarize the computational strategies of recorded brain regions (Cunningham and Yu, 2014; Saxena and Cunningham, 2019), so we focus on population comparisons here.

      Relatedly, the study would benefit from additional explanation for why the comparison is being made to able-bodied fMRI data, rather than similar intracortical neural recordings made in homologous areas of non-human primates (NHPs), which have been traditionally used as an animal model for vision-guided forelimb reaching. This group has an illustrious history of such macaque studies, which makes this omission more surprising.

      We agree that similar intracortical recordings from homologous areas of NHPs would be useful to construct an able-bodied model. While our lab has historically studied NHP reaching and grasping, we unfortunately did not perform any analogous experiments involving individuated finger movements. We have updated the Discussion to clarify this.

      Revised text in the Discussion: “We asked whether participant NS’s BCI finger representations resembled that of able-bodied individuals or whether her finger representations had reorganized after paralysis. Single-neuron recordings of PC-IP during individuated finger movements are not available in either able-bodied human participants or non-human primates. However, many fMRI studies have characterized finger representations (Ejaz et al., 2015; Kikkert et al., 2021, 2016; Yousry et al., 1997), and representational similarity analysis (RSA) has previously shown RDM correspondence between fMRI and single-neuron recordings of another cortical region (inferior temporal cortex) (Kriegeskorte et al., 2008b).”

      A second area in which the manuscript in its current form could better set the context for its reader is in how it introduces their motivating question of "do paralyzed BCI users need to learn a fundamentally new skillset, or can they leverage their pre-injury motor repertoire". Until the Discussion, there is almost no mention of the many previous human BCI studies where high performance movement decoding was possible based on asking participants to attempt to make arm or hand movements (to just list a small number of the many such studies: Hochberg et al 2006 and 2012, Collinger et al 2013, Gilja et al 2015, Bouton et al 2016, Ajiboye, Willett et al 2017; Brandman et al 2018; Willett et al 2020; Flesher et al 2021). This is important; while most of these past studies examined motor (and somatosensory) cortex and not PPC (though this group's prior Aflalo, Kellis et al 2015 study did!), they all did show that motor representations remain at least distinct enough between movements to allow for decoding; were qualitatively similar to the able-bodied animal studies upon which that body of work was build; and could be readily engaged by the user just by attempting/imagining a movement. Thus, there was a very strong expectation going into this present study that the result would be that there would be a resemblance to able-bodied motor representational similarity. While explicitly making this connection is a meaningful contribution to the literature by the present study (and so is comparing it to different areas' representational similarity), care should be taken not to overstate the novelty of retained motor encoding schemes in people with paralysis, given the extensive prior work.

      We agree that multiple previous BCI studies instruct participants to attempt arm/hand movements and that these studies are important to discuss. We have updated the Introduction/Discussion to include these references.

      Our work does fill in two important gaps in the existing literature. First, prior BCI studies had shown general resemblance between able-bodied and BCI movement, but previous human BCI studies had not shown whether the details of pre-injury representations are preserved. We have also updated the manuscript to describe a second motivation: that outside of the BCI community, neuroscientists do not agree on whether BCI studies of tetraplegic humans generalize to able-bodied movement, given the potential for reorganization after severe injury. In the Discussion sections of several recent BCI studies (Armenta Salas et al., 2018; Fifer et al., 2021; Flesher et al., 2016; Stavisky et al., 2019; Willett et al., 2020), the authors addressed whether the newly discovered phenomena were simply artifacts of reorganization (we believe not).

      Revised text in the Introduction: Understanding plasticity is necessary to develop brain-computer interfaces (BCIs) that can restore sensorimotor function to paralyzed individuals(Orsborn et al., 2014). First, paralysis disrupts movement and blocks somatosensory inputs to motor areas, which could cause neural reorganization (Jain et al., 2008; Kambi et al., 2014; Pons et al., 1991). Second, BCIs bypass supporting cortical, subcortical, and spinal circuits, fundamentally altering how the cortex affects movement. Do these changes require paralyzed BCI users to learn fundamentally new motor skills (Sadtler et al., 2014), or do paralyzed participants use a preserved, pre-injury motor repertoire (Hwang et al., 2013)? Several paralyzed participants have been able to control BCI cursors by attempting arm or hand movements (Ajiboye et al., 2017; Bouton et al., 2016; Brandman et al., 2018; Collinger et al., 2013; Gilja et al., 2015; Hochberg et al., 2012, 2006), hinting that motor representations could remain stable after paralysis. However, the nervous system’s capacity for reorganization (Jain et al., 2008; Kambi et al., 2014; Kikkert et al., 2021; Pons et al., 1991) still leaves many BCI studies speculating whether their findings in tetraplegic individuals also generalize to able-bodied individuals (Armenta Salas et al., 2018; Fifer et al., 2021; Flesher et al., 2016; Stavisky et al., 2019; Willett et al., 2020). A direct comparison, between BCI control and able-bodied neural control of movement, would help address questions about generalization.

      In the revised Discussion, we further contextualize our study in the prior work. In particular, as BCI studies have made fundamental neuroscience discoveries, they have had to address whether their results generalize to able-bodied individuals. Direct comparisons between able-bodied movement and tetraplegic BCI movement, like our study, help to bridge this gap.

      Revised text in the Discussion: Early human BCI studies (Collinger et al., 2013; Hochberg et al., 2006) recorded from the motor cortex and found that single-neuron directional tuning is qualitatively similar to that of able-bodied non-human primates (NHPs) (Georgopoulos et al., 1982; Hochberg et al., 2006). Many subsequent human BCI studies have also successfully replicated results from other classical NHP neurophysiology studies (Aflalo et al., 2015; Ajiboye et al., 2017; Bouton et al., 2016; Brandman et al., 2018; Collinger et al., 2013; Gilja et al., 2015; Hochberg et al., 2012), leading to the general heuristic that the sensorimotor cortex retains its major properties after spinal cord injury (Andersen and Aflalo, 2022). This heuristic further suggests that BCI studies of tetraplegic individuals should generalize to able-bodied individuals. However, this generalization hypothesis has so far lacked direct, quantitative comparisons between tetraplegic and able-bodied individuals. Thus, as human BCI studies expand beyond replicating results and begin to challenge conventional wisdom, neuroscientists have questioned whether cortical reorganization could influence these novel phenomena (see Discussions of (Andersen and Aflalo, 2022; Armenta Salas et al., 2018; Chivukula et al., 2021; Fifer et al., 2021; Flesher et al., 2016; Stavisky et al., 2019; Willett et al., 2020)). As an example of a novel discovery, a recent BCI study found that the hand knob of tetraplegic individuals is directionally tuned to movements of the entire body (Willett et al., 2020), challenging the traditional notion that primary somatosensory and motor subregions respond selectively to individual body parts (Penfield and Boldrey, 1937). Given the brain’s capacity for reorganization (Jain et al., 2008; Kambi et al., 2014), could these BCI results be specific to cortical remapping? Detailed comparisons with able-bodied individuals, as shown here, may help shed light on this question.

      The final analyses in the manuscript are particularly interesting: they examine the representational structure as a function of a short sliding analysis window, which indicates that there is a more motoric representational structure at the start of the movement, followed by a more somatotopic structure. These analyses are a welcome expansion of the study scope to include the population dynamics, and provides clues as to the role of this activity / the computations this area is involved in throughout movement (e.g., the authors speculate the initial activity is an efference copy from motor cortex, and the later activity is a sensory-consequence model).

      An interesting result in this study is that the participant did not improve performance at the task (and that the neural representations of each finger did not change to become more separable by the decoder). This was despite ample room for improvement (the performance was below 90% accuracy across 5 possible choices), at least not over 4,016 trials. The authors provide several possible explanations for this in the Discussion. Another possibility is that the nature of the task impeded learning because feedback was delayed until the end of the 1.5 second attempted movement period (at which time the participant was presented with text reporting which finger's movement was decoded). This is a very different discrete-and-delayed paradigm from the continuous control used in prior NHP BCI studies that showed motor learning (e.g., Sadtler et al 2014 and follow-ups; Vyas et al 2018 and follow-up; Ganguly & Carmena 2009 and follow-ups). It is possible that having continuous visual feedback about the BCI effector is more similar to the natural motor system (where there is consistent visual, as well as proprioceptive and somatosensory feedback about movements), and thus better engages motor adaptation/learning mechanisms.

      We agree that different BCI paradigms could better engage motor adaptation and learning, although it is interesting that participant NSS did not improve her performance simply by attempting “natural” finger movements. To better caveat our findings, we have revised our manuscript as suggested.

      Revised text in the Discussion: “The stability of finger representations here suggests that BCIs can benefit from the pre-existing, natural repertoire (Hwang et al., 2013), although learning can play an important role under different experimental constraints. In our study, the participant received only a delayed, discrete feedback signal after classification (Figure 1a). Because we were interested in understanding participant NS’s natural finger representation, we did not artificially perturb the BCI mapping. When given continuous feedback, however, participants in previous BCI studies could learn to adapt to within-manifold perturbations to the BCI mapping (Ganguly and Carmena, 2009; Sadtler et al., 2014; Sakellaridi et al., 2019; Vyas et al., 2018). BCI users can even slowly learn to generate off-manifold neural activity patterns when the BCI decoder perturbations were incremental (Oby et al., 2019). Notably, learning was inconsistent when perturbations were sudden, indicating that learning is sensitive to specific training procedures. So far, most BCI learning studies have focused on two-dimensional cursor control. To further understand how much finger representations can be actively modified, future studies could benefit from perturbations (Kieliba et al., 2021; Oby et al., 2019), continuous neurofeedback (Ganguly and Carmena, 2009; Oby et al., 2019; Vyas et al., 2018), and additional participants.”

      Overall the study contributes to the state of knowledge about human PPC cortex and its neurophysiology even years after injury when a person attempts movements. The methods are sound, but are unlikely (in this reviewer's view) to be widely adopted by the community. Two specific contributions of this study are 1) that it provides an additional data point that motor representations are stable after injury, lowering the risk of BCI strategies based on PPC recording; and 2) that it starts the conversation about how to make deeper comparisons between able-bodied neural dynamics and those of people unable to make overt movements.

    1. Author Response

      Reviewer #1 (Public Review):

      Okawa et al show that topical oral application of an agent used in SPECT imaging, hydroxymethylene diphosphonate (HMDP-DNV), displaces pre-existing nitrogen-containing bisphosphonate (N-BP) from the jawbone of mice and prevents the development of bisphosphonate-related osteonecrosis of the jaw (BRONJ), a devastating complication that rarely occurs after invasive dental procedures in N-BP treated patients. They further demonstrate pro-inflammatory genomic signaling in gingival cells of N-BP treated mice, which reverses with HMDP-DNV. The methods are well-described overall and the results are potentially important. However, limitations include the short study period and the lack of multiple time points. Additional data to address these limitations would help to strengthen the authors' conclusions. If these results are added, this work could have a high impact in the field and the data could set the stage for further testing. The significance lies in the unmet need for therapeutic options to prevent this complication, which is widely dreaded and impedes the use of often needed bisphosphonate therapy.

      We agree with this comment and performed an additional experiment to investigate a long-term healing of HMDP-DNV treatment. We performed an additional experiment, in which the outcome of HMDP-DNV topical treatment to the tooth extraction wound of ZOL-injected mice was obtained after 4 weeks of the tooth extraction. In addition to the reported effect of HMDP-DNV at 2 weeks after tooth extraction, this additional experiment addressed: (1) the longer study period and (2) more than one time point for treatment assessment.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, Chaudhary et al assessed 143 children with AML, and out of 20 mitochondria-related DEGs that were chosen for validation, 16 were found to be significantly dysregulated. They show that upregulation of SDHC and CLIC1 and downregulation of SLC25A29 are independently predictive of lower survival, which was included in developing a prognostic risk score. They also show that this risk score model is independently predictive of survival better than ELN risk categorization, and high-risk patients had significantly inferior OS and event-free survival. The authors demonstrated that high-risk patients are associated with poor-risk cytogenetics, ELN intermediate/poor risk group, absence of RUNX1-RUNX1T1, and not attaining remission (p=0.016). The risk score also predicted survival in the TCGA dataset. They concluded that they have "identified and validated mitochondria-related DEGs with prognostic impact in pediatric AML and also developed a novel 3-gene based externally validated gene signature predictive of survival."

      Although this paper is interesting, it lacks novelty and does not advance the field significantly. The authors have used a similar approach in their recent paper in Mitochondrion where they showed that PGC1A driven increased mitochondrial DNA copy number predicts outcome in pediatric AML patients. Additionally, the authors have a small number of patients and chose only 20 genes for their analysis.

      We appreciate that the reviewer found our paper interesting and read our recently published article in Mitochondrion.

      In our previous work, the key finding was the predictive impact of mitochondrial DNA copy number on patients’ survival outcome in pediatric AML. Hence in the current paper, we deciphered it further to explore the heterogeneity associated with altered mitochondrial DNA copy number by identifying dysregulated genes in patients stratified by mitochondrial DNA copy number. We have identified several dysregulated mitochondria-related genes in pediatric AML for the first time hence we believe there is novelty in the work. Furthermore, developed novel mitochondria related gene signature that can predict survival of the pediatric patients with AML by analysing the prognostic impact of the dysregulated genes. These genes were identified to be dysregulated in pediatric AML for the first time and the gene signature model had predictive ability over and above ELN. Hence, we believe there is a novelty in the current work.

      The initial transcriptomic analysis was done in limited number of patients which remains a limitation of the study and mentioned in discussion line 313-315. However, the validation of selected genes by RT-PCR was carried out in a consecutively recruited cohort of pediatric AML patients over more than 3 years (total 143 patients between July 2016 to December 2019), with a median follow up of 36 months.

      As the main aim of the study was to decipher and validate mitochondria-related DEGs, the 20 genes were chosen based their mitochondrial localization as per mitochondrial compartment score. We have further validated these 20 genes in the external cohort from TCGA dataset (n=179), and the external validation adds to the strength of the study.

      Reviewer #2 (Public Review):

      The design of the study is good. One of the strengths/novelties of the paper is that the authors stratified patients into 3 groups based on mitochondrial DNA (mtDNA) copy number and performed RNA-seq, where previous studies have no/little information about mtDNA copy number. Out of 143 patients, they sequenced only 3, 4, and 5 patients from 3 groups, along with 3 controls. High heterogeneity among cancer patients is unlikely to be accounted for by 3-5 samples per group, thus having limited statistical power. The authors should discuss the limitations of the small sample size and possible outcomes. The authors validated their differentially expressed genes with TCGA LAML which are mostly adult patients. A correct comparison will be with pediatric AML from other larger studies that had not stratified patients based on mtDNA.

      We appreciate that reviewer liked our study design. We agree that the initial transcriptomic sequencing was done in only few patients out of 143 patients recruited for this study.

      Small sample size in the sequencing cohort is a limitation of the study considering the heterogeneity of AML (mentioned in the revised manuscript discussion line 313-315).

      However, the validation of selected genes by RT-PCR was carried out in a consecutively recruited cohort of pediatric AML patients over more than 3 years (total 143 patients between July 2016 to December 2019), with a median follow up of 36 months. Only validated genes were used for further analysis and developing risk score.

      We have validated the differentially expressed genes in the publicly available TCGA adult AML dataset and we agree that the better comparison would be with the larger cohort of pediatric AML patients. However, we could not find publicly available pediatric AML patients DEGs database, which additionally reports clinical outcomes for analysing prognostic impact. Hence the TCGA dataset was chosen.

      External validation of our gene signature risk score further adds to the strength of the study as the gene signature appears valid for both pediatric as well as adult AML patients (this is mentioned in revised manuscript result line 205-207).

    1. Author Response

      Reviewer #1 (Public Review):

      My primary criticism of this paper is that it misses the opportunity to give some key details about the statistics of neural activity during 'ripples' rather than studying identified replay events. A secondary criticism is that they limit their analyses to neurons that have place fields in both environments. I think the activity of the other 3 categories of neurons (active in Track 1 only, active in Track 2 only, and not active in either track) are also of critical interest.

      We agree with the reviewer that it is important to demonstrate that the main observations are not due to a small subset of neurons or replay events. We have described above the inclusion of Figure 1- figure supplement 6, where the threshold for replay detection is made less stringent and the ratio of significant replay events/candidate replay events are now reported in the manuscript. To address the concern that the analysis is limited to neurons only with place fields on both tracks, we have added four more subpanels to Figure 1-figure supplement 6, where we perform our regression analysis on all spatially tuned (pyramidal) neurons (Figure 1-figure supplement 6E), neurons with only place fields on one track (track 1 and track 2 neurons will be in the upper right and lower left quadrant of plot respectively, Figure 1-figure supplement 6F), neurons with peak amplitude <1Hz on each tracks (Figure 1-figure supplement 6G) and finally, interneurons (Figure 1-figure supplement 6H). Consistent with our previous findings, we observe significant regressions for POST replay events for all spatially tuned neurons and neurons with place fields only one track. Conversely, neurons that were not active on either track and interneurons are not rate modulated by experience during replay.

      It is important to note that replay detection uses all spatially tuned cells, but the regression analysis is limited to cells active on both tracks in the main analysis. The reason for this is now explained in more detail in the revised manuscript (page 5):

      “It is important to note that a significant regression would be expected when analyzing neurons with a place field only on one track, as they are expected to participate in replay events of this track, while being silent during the replay of the other track. As such, our regression analysis only analyzed place cells active on both tracks and stable across the whole run (Figure 1-figure supplement 1B and see Methods).”

      Reviewer #2 (Public Review):

      This study by Tirole et al. addresses to what extent differences in firing rate that occurs during the awake experience of two different tracks are replayed during SWRs.

      In principle, this is a topic broadly relevant to our understanding of the circuit-level mechanisms and neural coding of memory, because it can provide insight into the ways in which experience is transformed into memory traces, and in particular, whether an entire coding modality (firing rate patterns) is available for replay. However, I didn't have an easy time situating this study in the context of the existing literature. When I first read the title, I expected this work was going to address the question of if there is replay of rate-remapped experiences, which is still an understudied topic (but see Takahashi, 2015) and would be important to examine. But once I realized that the two experiences here are actually more like global remapping, it was less clear to me what is novel here.

      My best guess about what's novel is that even though on the one hand, many studies have shown a distinguishable replay of two (or more) distinct experiences, e.g. different mazes like in Karlsson et al. 2009, different arms of a T-maze in Gupta et al. 2010, the overlapping central stem element of different trajectories in various mazes (Takahashi, 2015 and work from the Jadhav lab). On the other hand, there have been extremely detailed examinations of the contributions of firing rate changes (as distinct from temporal order or synchrony) as in Farooq et al. 2019. But perhaps the authors think that the intersection of those two kinds of work has not been studied, that is, how much do firing rate changes specifically contribute to the replay of two distinct experiences? In any case, regardless of whether I understood that correctly or not, the authors need to be more explicit in the introduction and discussion in contextualizing their work. I also suspect that the current findings are a direct logical consequence of putting together these well-established previous results; this would not mean the current work isn't a useful advance, but it would moderate the novelty and general interest.

      Beyond this overall question of how the work relates to the extant literature, I have a suggested modification to the data analysis. I think that the quality of the data and the care taken in the analyses were very high in general, so I do not have any major concerns, and the conclusions are very thoroughly supported. However, I wonder if there is a way to simplify some of the analyses and make them a bit more straightforward to interpret. As the authors have realized, there is potential for a circularity in the analysis, in the sense that to compare firing rate differences for two tracks between Track and Replay, Replay events first need to be assigned to one or the other (decoded) Track. But then any firing rate differences may be contributing to the output of the decoder, rendering the analysis circular. I understand the authors use various methods like the firing-rate-insensitive method in Figure 2 to deal with this crucial issue. But wouldn't a simpler way be to leave out the cell whose firing rates are being analyzed out of the decoding step so that the labeling of Replay events is independent of that cell? This seems an intuitive and rigorous way to address the central question the authors have. Is there some reason why that isn't done?

      We thank the reviewer for this feedback, and agree it is important to emphasize the novel contributions of the manuscript (as we see it), and clarify this further if needed. The reviewer is correct that there are several studies that have looked at rate remapping during reactivation. We have cited some of these, but have now updated our citations in the intro and discussion based on the comments here. While we have avoided directly criticizing a particular study in our earlier draft of the manuscript, these previous studies are affected generally by several issues: 1) replay detection methods were sensitive to rate modulation, creating a circular argument for the existence of rate modulation in replay. [Our study thoroughly addresses this with several controls]. 2) the analysis of reactivations rather than replay, which lacks the statistical rigor of sequence detection [we have focused on replay using a strict threshold for significance] 3) Replay/reactivations are analyzed for a single environment, making it difficult to distinguish between rate modulation and changes in the overall excitability levels of neurons maintained over behavior and sleep. [our studies uses two tracks to avoid this potential issue]. 4) When multiple contexts were decoded, neurons that only fired in one context were not removed from the analysis, artificially “inflating” any observed rate modulation. [we have circumvented this issue by only analyzing neurons with place fields in both environments]

      The suggestion to repeat the analysis and leave one neuron out for replay detection is excellent, however this was avoided due to the required processing time- to run our complete analysis takes more than a week, and repeating this for each possible “leave-one-out” combination would take significantly longer (this has to be done independently for each neuron). We used multiple controls (track rate shuffle, replay rate shuffle, rank order correlation- figure 2, figure 2—figure supplement 2) to eliminate any possibility that a neuron’s firing rate could influence replay detection. Specifically, for rank-order correlation based replay detection, each burst of spikes is only treated as a single event (median of spike times in the burst), which directly circumvents the problem of firing rate biasing replay event selection.

    1. Author Response

      Reviewer 1

      In general, I consider that the manuscript reflects a huge effort in terms work done and data collection, the manuscript is very well written, and it brings new knowledge in terms of cooperative breeding and its connection with groups size in ostrich. My major concerns are about the title and introduction that are in my opinion too broad and not enough detailed.

      In the introduction the scientific background that led to this research is lacking, and the manuscript would benefit from a more supported introduction, which makes it difficult to understand how far this study went comparatively to previous studies. The research work was well conducted, and adjusted to the study aims. However, it would benefit from including more details on the observational data collected by the authors.

      I think the research topic is interesting, and the study was well performed, but the manuscript would benefit from a more clear approach to the working hypothesis, expected results and background theories/hypotheses.

      We are very grateful for the positive and constructive feedback. The title and introduction have been revised according to the reviewer’s suggestions. We provide a more extensive introduction to the hypotheses being tested, which are now explicitly stated. The observational data we collected have been described in more detail and we integrate our observational and experimental data more thoroughly.

      In the evaluation summary, the reviewer highlights that we did not address some aspects of groups, such as relatedness and parentage. We have now added additional analyses to show these do not change the conclusions of our study (for details please see responses to reviewer 2 who raises similar concerns more extensively). These were not originally included in the manuscript as the aim of our study was to examine how group size and composition influence the average reproductive success for any given individual, irrespective of variation in relatedness and parentage within groups.

      Reviewer 2

      This work sets out to investigate experimentally the effect of differences in group size and group composition on reproductive behavior and success in ostrich groups. Direct field observations of the relationship between group composition/group size and reproductive success, do not allow for causal inference, as there may be several reasons why patterns may arise. For example, observing individuals having a higher reproductive success in larger groups than in smaller groups may not be a direct result of a larger group size per se, but it may be that higher quality individuals manage to establish themselves more often in larger groups. Hence, experimental manipulation of group size and group condition in natural contexts is important. 96 experimental groups of ostriches were established in fenced off areas in the Karoo in South Africa, varying the number of males (1 / 3) and the number of females (1 / 3 / 4 / 6) across groups. Groups were followed for almost a year, studying a period without parental care (eggs were removed and incubated in an incubator to measure reproductive success) and a period with parental care (eggs were left in the enclosures).

      In the latter case, behavioral observations were done to study nest incubation, and sexual conflict (interruptions of incubation). The study was done for seven years, and having such data on experimental manipulations in semi-wild conditions is very valuable. The combination of behavioral analysis, with careful tracking of the fate of eggs (by daily nest checks), the experimental nature, and measuring reproductive success make for a very complete analysis of the breeding ecology of this system and can serve as a blueprint for more of such work in the fields of cooperation, group living and breeding ecology.

      Some aspects, however, deserve more attention. First, at present, the origin and familiarity and possible relatedness among the group members of the experimentally composed groups is not discussed, and it may be that these factors play a role in shaping the results. Second, the reproductive measure used was the average number of chicks per sex, but it was not calculated at the individual level. There were no genetic analysis done to establish which individuals were actually successful in terms of reproduction. Since individual level selection is likely very important in this system, the results of average reproductive success need to be interpreted with great care. Third, the study was done under semi-natural conditions, meaning that the effects of other factors possibly shaping the success of group size and group composition in the wild (e.g., possible nest predation) were weakened. Finally, a closer connection between the experimental results on optimal group size, and whether this can actually be found in the dataset on natural variation in group size and group composition can be explored.

      We are very grateful for the careful review of our work and positive feedback. The suggestions and comments have been extremely helpful in revising the manuscript, which have led to the following changes:

      1) We have added details about the origin and familiarity of group members, together with extra analyses verifying that our results are not confounded by variation in within-group relatedness. The study population has a nine-generation pedigree allowing us to accurately estimate relatedness between individuals. In the design phase of the experiment, relatedness amongst individuals was kept low in accordance with data from natural populations, but there were related individuals of the same sex in some groups. We tested if the average relatedness within groups influenced the average number of chicks individuals produced and found no significant relationship (Supplementary file 1 – Tables S16 and S17).

      2) We have included genotyping analyses of 3227 offspring to verify that our non-genetic estimates of average reproductive success per sex (total chicks produced by groups / number of same sex individuals) accurately reflect measures obtained using genetic estimates of individual reproductive success. Genetic and non-genetic measures were highly correlated (R >0.95). We have added these verification analyses to the manuscript. The text has also been edited to further clarify that our aim is to estimate the average reproductive benefits for any given individual of being in group of a particular size, rather than examining differences in reproductive success between individuals within groups, for which genetic methods are required.

      3) We have clarified the advantages and limitations of experimental studies. As reviewer 2 highlights, observational studies alone do not provide causal insight into the factors influencing group size, but as reviewer 1 indicates, experimental studies can lack ecological context. Consequently, both have their merits. Experimental manipulations of entire social groups are currently lacking on large vertebrate cooperative breeders, but can be used to estimate the costs and benefits of living in different group sizes that arise independently of ecological conditions. The results of such experimental studies can be used as a benchmark against which other data can be compared, such as observational data on wild groups subject to ecological pressures, including nest predation. The discrepancies between experimental and observational data can then be used to infer the relative importance of social versus ecological factors in shaping social groups.

      4) We have added a figure (Figure 1 - figure supplement 1) and extended the discussion to better connect our experimental data with our observations of natural variation in group size.

    1. Author Response

      Reviewer 1

      This manuscript attempts to explain the well-known difference in DNA mutation rates between father vs. mother (paternal mutation is 4 times higher than maternal mutation in humans). Although the mutation rate difference was believed to arrive from the number of cell divisions (male germ cells undergo many more divisions compared to female germ cells), recent studies suggested that most mutations arise from DNA damage (which will be proportional to the absolute time) rather than DNA replication-induced mutations (which will be proportional to the number of cell divisions). The authors thus revisited the question as to why the paternal mutation rate is higher (if absolute time is more important than the number of cell divisions in causing mutations). They used 'taxonomic approaches' comparing paternal/maternal mutation rates of mammals, birds, and reptiles, correlating them to specifics of reproductive mode in these species. To measure paternal vs. maternal mutation rate, they compared the mutation rates of neutrally evolving DNA sequences between the X chromosome vs. autosomes, as well as the Z chromosome (utilizing the fact that the X chromosome will spend twice more generations in females than males, while autosomes spend equal time. Likewise, the Z chromosome will spend twice more time in males than in females, while autosomes spend equal time).

      They first confirm the paternal bias across a broad range of species (amniotes), eliminating many species-specific parameters (longevity, sex chromosome karyotype (XY vs. ZW), etc) as a contributor to the paternal bias. This implies that something common in males in these broad species causes paternal bias. They show that in mammals, the paternal bias correlates with a generation time. They propose that the total mutation is determined by the combination of the mutation rate during early embryogenesis (when both male and female have the same mutation rate) and the later mutation rate when two sexes exhibit different mutation rates. This model seems to explain why generation time correlates well with the extent of paternal bias in mammals. However, this does not explain at all why birds do not exhibit any correlation with a generation time. The speculation on this feels rather weak (although there is nothing they can do about this. Fact is fact).

      The logic behind their analysis is well laid out and seems mostly sound. Their finding is of broad interest in the field.

      • I am confused by this statement (the last sentence in the result section): 'If indeed the developmental window when both sexes have a similar mutation rate is short in birds then, under our model, generation times are expected to have little to no influence on α." Based on their model, if the early period is gone, when the mutation rates are similar between sexes are similar, intuitively it feels that generation time influences α even more. Am I missing something? (if the period with the same mutation rate is gone, then females and males are mutating at different rates the whole time).

      We apologize for the lack of clarity, as we should have made clear that here we are assuming a fixed ratio of paternal to maternal generation times. Under that assumption, if female and male germ cells are accumulating mutations as a fixed rate over time, then for each sex, the number of mutations accumulated with time is a line that goes through the origin, and the ratio of the paternal-to-maternal slopes (α) will be constant regardless of the age of reproduction. In other words, if Me=0 in equation 1, then α would be constant for any fixed ratio Gm/Gf. We have revised this sentence to be clearer; lines 334-338 now read:

      If indeed the mutation rate in the two bird sexes differs from very early on in development (i.e., if term Me ≈ 0 in equation 1), then assuming a fixed ratio of paternal-to-maternal generation times, our model predicts the sex-averaged age of reproduction will have little to no influence on α.

      • The authors state that this paper provides a simple explanation as to why paternal biases arise without relying on the number of cell divisions. However, it seems to me that the entire paper relies on the recent findings that mutation arises based on absolute time (instead of cell division number), and the novelty in this paper is the idea of 'two-phase mutation rates' to explain the observed numbers of paternal bias in various species. Yet it fails to explain the mutation rate difference in birds. There is not enough speculation or explanation as to what determines different mutation rates in males of various species. Although the modeling seems to be sound and there is nothing that can be done experimentally, I felt somewhat unsatisfied at the end of the manuscript.

      We agree with the reviewer that our paper does not address why the ratio of paternal-tomaternal mutation rates is lower in birds than mammals, and had stated so explicitly (lines 358360): “Another question raised by our findings is why, after sexual differentiation of the germline, mutation appears to be more paternally-biased in mammals (∼4:1) than in birds and snakes (∼2:1).

      To try to gain more insight into this question, we are now analyzing mutations in a set of three generation pedigrees from birds and reptiles, which should allow us to obtain a direct estimate of α and characterize sex differences in the mutation spectra, which we can then compare to what is seen in mammals. While this analysis is beyond the scope of this manuscript, we now note how this question might be pursued (lines 360-362):

      In that regard, it will be of interest to collect pedigree data from these taxa, with which to compare mutation signatures to those typically seen in mammals.

      Reviewer 2 The primary goal of this paper is to re-assess the cause for the excess of male over female germline mutations seen in many animals. By re-analyzing X (Z) and autosomal substitution rates across 42 species of mammals, birds, and snakes, and fitting a model that allows for a constant and equal-sex embryonic mutation rate, along with a mutation rate that increases with age, the authors show that there is no need to invoke the model that assumes mutation rate depends strictly on numbers of cell divisions.

      Strengths 1. The paper challenges a dogma in evolutionary genomics, which states that males have a higher germline mutation rate than females. It establishes convincingly that the count of premeiotic mitotic divisions is NOT the primary driver of the excess male mutations, but instead, it is the intrinsic mutation rate in males (balance of DNA damage vs DNA repair) that accumulates over time.

      1. The authors establish a simple model where the number of mutations that accumulate each generation depends on the embryonic mutation rate (which is shown empirically to not differ between the sexes) and a post-maturity mutation rate, which has elevated male mutation (driven presumably by a shift in the balance between DNA damage and DNA repair). The model is very clear and intuitive described.

      2. The paper is extremely carefully thought-out, planned, and executed. Criteria for inclusion and exclusion of species in the phylogenetic work are clearly laid out. Similarly, decisions about filtering genomic regions (avoiding repeats, etc.) are well done and exhaustively documented. The standard of scholarship is very high - for example, the analysis of de novo mutation rates in mammals pulled in data from no fewer than 15 published studies.

      Weaknesses 1. The method of estimating alpha relies on the assumption that the mutation process (and rates) are the same in autosomes and sex chromosomes. There is an attempt to control for GC content and replication timing, but it is easy to imagine other factors at play, including the inactivation of one X in females, the extensive differences in chromatin modifications, especially of the X, that differ in males vs. females. The case of the cat X chromosome, with its 50 Mb of recombination cold spot and corresponding oddly slow substitution rate, might be just one example of features in other species that cause other perturbations in the substitution rate of the X. This does not seriously erode confidence in the results, but there is more potential for intrinsic mutation rates of sex chromosomes and autosomes to differ than is suggested by the authors.

      We agree with the reviewer that despite our attempts, we do not control for all factors that distinguish X and autosomes beyond exposure to sex. We had written that “while our pipeline may not account for all the differences between autosomes and X (Z) chromosomes unrelated to sex differences in mutation, the qualitative patterns are reliable.” and have now included a sentence to make this limitation clearer (lines 165-167):

      Nonetheless, it is unlikely that our regression model perfectly accounts for all the genomic features that differ between sex chromosomes and autosomes other than exposure to sex.”

      In turn, the assumption that mutation rates in X (Z) and autosomes differ only with regard to their exposure to sex (after accounting for base composition and other genomic features) is unproven; we now state this assumption explicitly in the Methods (lines 678-681). Nonetheless, it seems warranted by the high concordance of evolutionary- and pedigree-based estimates of alpha in humans, mice and cattle. With regard to the specific factors mentioned by the reviewer, excluding CpG sites has little effect on our qualitative conclusions for mammals (see Fig S1E), suggesting that DNA methylation differences between X and autosomes are not having a major influence on our findings. Moreover, X-inactivation in the germline of mammals (as distinct from the soma) is likely quite short-lived, given that it lasts around three days in early development of mice (Chuva de Sousa Lopes et al. 2008) and at most four weeks in humans (Guo et al. 2015). Thus, it is unlikely to be an important mutation rate modifier. We have now reworked three paragraphs in the main text to make the limitations above clearer (lines 127-175).

      1. The authors point out that the human mutations in spermatogonia are due to mutation signatures SBS5/40 ( which are known not to be correlated with cell division rates). The work on the nonhuman species could be greatly extended with this mutation spectrum approach. For each species, one could ask: Are the mutation spectra of the embryonic mutations consistent between males and females? What about the mutation spectra for the post-puberty individuals? Is alpha consistent across mutation signatures? Does the GC bias correction impact these inferences?

      Unfortunately, there is not enough de novo data to address this question outside of humans. In turn, the analysis of substitution data is unreliable, because of the differential impact of repeated substitutions at a site and the effects of GC-biased gene conversion.

      1. While the data do not suggest reasons WHY males display a higher mutation rate, it is fair to ask whether the evolutionary drive for a higher mutation rate might shape the mechanism whereby it happens. There is a certain amount of speculation in the paper as it is, and it is done in a way that is often well supported by data after the fact. Speculation about why males have an elevated mutation rate would not erode the overall quality of the paper, and I would expect that many readers would be eager to see what the authors have to say on the subject.

      As we envisage it, along the lines of Lynch’s models for the evolution of germline mutation (Lynch 2010), there is likely selection to keep the mutation rate as low as possible, subject to the constraints of the need to replicate DNA, repair damage, etc. efficiently. Why the attainable lower limit would be higher in males than in females is unclear to us, both mechanistically and in terms of evolutionary selection pressures. As we now note lines 353-355, a potential proximal cause is a greater effect of reactive oxygen species, a major source of DNA damage, in male germ cells than in oocytes (Smith et al. 2013; Rodríguez-Nuevo et al. 2022). Potential evolutionary causes are even less clear to us, but could be related to the greater competition among sperm vs. oocytes (added in lines 354-357).

      Another way to think about these results is as shifting the question somewhat, broadening it from the long-standing puzzle of the selection pressures shaping sex differences to asking what determines the relative mutation rates of different cell types, including oocytes and spermatagonia but also somatic cell types/tissues. We had previously written that “our results recast long standing questions about the source of sex bias in germline mutations as part of a larger puzzle about why certain cell types (here, spermatogonia versus oocytes) accrue more mutations than others.” We have revised the final paragraph of the Discussion to try to emphasize this point.

      Overall the paper achieves its intended goal of toppling the dogma that the excess male mutation rate is driven by number of rounds of cell division in spermatogenesis (compared to oogenesis).

    1. Author Response

      Reviewer #2 (Public Review):

      While several studies have now been performed using time-varying exposures, i.e. PMIDs 35532785, 33749377, 35484151, 35679339, MR studies with time-varying outcomes have been lagging behind. Richardson et al. leveraged time-varying effects of both adiposity and vitamin D levels to investigate the dependency of childhood and adulthood body size on vitamin D levels during childhood and adulthood. Hence, in this way, MR analyses are conducted to the same outcome, measured at different timepoints in the life course.

      Strengths:

      Usage of both time-varying exposures and outcomes: Through exploiting individual-level data on vitamin D status, the authors demonstrate that childhood body size, but not adult body size, has an effect on childhood vitamin D.

      In addition, both childhood and adulthood body size do affect vitamin D levels during adulthood, however the effect of childhood body size on vitamin D seems to be indirect, as its effect is attenuated when taking adulthood body size into account.

      Hence, these effects are time-specific and illustrate the feasibility of using time-dependent genetic effects in MR to separate effects from early and later life on an outcome measured at different timepoints in the life course.

      Limitations:

      While multivariable MR is an elegant tool to assess direct and indirect effects of an exposure on outcome, a recent preprint highlights poor performance of multivariable MR to study time-varying causal effects (https://www.medrxiv.org/content/10.1101/2022.03.16.22272492v1). A description of strengths and limitations of MR, and especially of the multivariable MR design is lacking in the Discussion section.

      Many thanks for this suggestion. The preprint mentioned by the reviewer (which has yet to be peer reviewed at the time of this response) in fact mentions the approach applied in our work where the authors claim that ‘multivariable Mendelian randomization is a legitimate tool for obtaining meaningful inferences in this case.’ That said, we agree that adding some further discussion regarding the strengths and weaknesses of our approach is warranted (page 7):

      ‘Findings from these endeavours should facilitate studies conducting techniques such as lifecourse MR, which can provide insight into the direct and indirect effects of modifiable early life exposures on disease outcomes by harnessing genetic estimates obtained from unprecedented sample sizes when conducted in a two-sample setting. That said, lifecourse MR requires careful examination of genetic instruments to ensure that they are capable of robustly separating the effects of an exposure at different timepoints over the lifecourse (Sanderson et al (2022, in press)).’

      Abstract: background and conclusions focus on the influence of vitamin D deficiency on disease risk, hence using vitamin D as an exposure variable. The current study investigates the effects of childhood and adulthood body size on childhood and adulthood vitamin D levels, hence using vitamin D as an outcome variable. I agree that differential time points at which exposures and outcomes are measured may impact conclusions drawn from MR studies, and that childhood/adulthood adiposity may have differential effects on vitamin D levels, during childhood or adulthood. However, as vitamin D is used as an outcome variable in this study, it is less clear how the findings impact the causal influence of vitamin D deficiency on disease risk.

      We agree with the reviewer that future work is necessary to investigate how the findings of this work may impact disease risk, although this is a substantial undertaking which is outside the scope of this short report. We have therefore added the following to page 6 of our Discussion to make the point that future work is necessary to investigate this:

      ‘Future studies are therefore warranted to disentangle the causal factors which influence disease risk from non-causal confounding factors through the triangulation of multiple lines of evidence including those identified by robustly conducted MR studies.’

      Following up on the previous remark, the authors state in the Discussion that adiposity may have acted as a confounding factor on the observed association between vitamin D and T1D (Hypponen et al. 2001). I agree that obesity/adiposity may act as a confounder in observational study designs assessing the effect of vitamin D on disease risk. This is supported by the causal effect of body size on vitamin D levels. However, as also other factors beyond adiposity might influence the observed association between vitamin D and T1D, and hence might explain the discrepancy between observational and MR studies, I would like to suggest rephrasing it more cautionary before jumping to the clinical implications.

      We have updated this section of the Discussion on page 6 to tone down the clinical implications of our findings by replacing the previous sentence with:

      ‘Evidence from this study therefore highlights the importance of developing a deeper understanding into the role that confounding factors, such as adiposity, may potentially have in distorting observational associations between vitamin D levels and disease risk’

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes a new software tool: smartScope, for automated screening of cryo-EM grids. SmartScope can also perform automated data collection on suitable grids, including using beam-image shifts and tilted stage geometries. SmartScope uses deep-learning approaches for the selection of squares and holes of interest. The description of the software given in the paper is very promising, and as the code has not yet been made available, I cannot comment on its modularity, ease of installation, or general usability.

      The convolutional neural networks for square and hole detection were trained on relatively few examples, and supposedly all from the same microscope. How easy would it be for users to re-train these detectors for their own purposes? Could a description of that be added to the paper/documentation?

      Training was done on a mix of images coming from Ceta and K2 detectors. We added more details about the nature of the training data in the Materials and Methods section. Users will be able to re-train the model using the code provided.

      The introduction makes the same point over multiple pages, and could probably be easily cut in half length-wise. This will force the authors to formulate more succinctly, and thereby more clearly. Hopefully, this would then eliminate wooly or incorrect statements like: "the beginning of each new project is fraught with uncertainty", "[The number of combinations] grows exponentially with the inclusion of each parameter" (it doesn't!), "would be an invaluable tool".

      We carefully drafted the introduction to appeal to the broad audience of eLife while emphasizing the significance of our work. We have edited the text to make it more concise without missing the important points while reducing its length by over one third.

      Also, the first half of the Abstract needs some rewriting. It focuses first on grid optimisation, which is not what smartScope is about. SmartScope is about grid screening. Just say that and save some lines in the Abstract too.

      While SmartScope is not a tool for grid optimization, it provides direct feedback on grid quality which is a critical component of cryoEM specimen optimization. To clarify this point, we edited the abstract to highlight the screening aspect of SmartScope and we shortened it from 197 to 151 words.

      Lines 257-261 describe some setup in serialEM. Perhaps because I am not familiar with that software myself, but I had no clue what those lines meant. Perhaps some example setup files could be provided as supplementary information?

      Since setup files are tailored to specific hardware combinations, a settings file itself would not be beneficial. However, we added a new supplementary table with examples of 2 tested microscope configurations. As with any software, we expect SmartScope to evolve as new users report bugs and request new features. We also hope that a community of open-source developers will help us move it forward. For that reason, users are encouraged to refer to the “live” table on the documentation website of SmartScope where additional hardware combinations will be posted as the software is tested on new systems.

      For the DNA polymerase data set: mention in the Results section how long the entire data collection (or 4.3k images) took. Also, the sharpened map in the validation file has a very weird distribution of greyscale values. Its inclusion of volume with varying greyscale is basically a step function, indicating that this is more or less a binary density map. I suspect that this is a result of the DeepEMhancer procedure. But given that the scattering potential of proteins is not binary, I wonder how such a map can be justified. Also, the FSC curve shown in the paper does not mention any masks, but the reported resolution of 3.4A is higher than the unmasked resolution calculated by the PDB: 3.7A. Why is the DeepEMhancer software used here? Is it hiding a slightly suboptimal map? As map quality is not what this paper is about, perhaps it would suffice to show the original map alone?

      Thank you for pointing out the need of more clarity regarding this point. DeepEMhancer seems to apply a more conservative “sharpening” in the lower resolution areas of the map leading to more pleasing images. Hence, we used the corrected map for display purposes only. The raw map and half-maps are provided via EMDB. The FSC curve and overall resolution values reported in the paper were obtained using a shape mask produced using standard procedures implemented in CryoSparc. FSC curves and local-resolution map (Resmap) were calculated using the half-maps produced during refinement prior to sharpening We have now added the requested details to the figure legend. We also added the unmasked resolution in table 1 together with information about data collection throughput.

    1. Author Response

      Reviewer 1

      In Drosophila germline, most piRNA loci use a non-canonical mechanism to transcribe piRNA precursors at the presence of H3K9me3, which depends on an HP1a paralog called Rhino/HP1d that specifically binds piRNA loci. How does Rhino find the right loci to bind? The current model in the field posits that maternally deposited piRNAs provide a specificity cue for Rhino. Now, Baumgartner et al. from Brennecke Group described a novel factor, the ZAD zinc-finger protein CG2678/Kipferl, that appears to provide another key specificity input to a subset of Rhino's chromatin binding, specifically in differentiated female germline (but not in males or stem/progenitor cell types in the female germline). Using genetics, genomics, genome editing, microscopy and biochemical approaches, Baumgartner et al. propose that Kipferl binds a G-rich DNA motif and, at the presence of local H3K9me3, recruits and/or stabilizes the binding of Rhino to these loci and then convert them from transcriptionally inert heterochromatin to piRNA-producing loci. Overall, the text is well written, the figure is clear, and the data is of high quality. With some additional experiments and text edits, this work represents a significant contribution to the field and should attract readers working on piRNA, transposon, satellite DNA, zinc-finger proteins, HP1 and heterochromatin.

      Specific concerns

      1. The genetic hierarchy between Kipferl and Rhino requires further clarification. Authors seem to propose a simple model where Kipferl acts genetically upstream of Rhino. This simple hierarchy is at odds with several observations. First, the center of Kipferl binding generally has less Kipferl binding without Rhino (Fig 5D). In some cases, Kipferl binding is completely gone without Rhino (Fig 7E middle, bottom). The text describes the loss of Kipferl spreading without Rhino but should also mention this reduction/loss in Kipferl binding. The effect of rhino-/- on Kipferl's chromatin binding should be shown along with wildtype level of Kipferl enrichment in Fig 5C for proper comparison. How should readers understand the effect of Rhino on Kipferl? What is the prominent Kipferl domain in rhino-/- in Fig 5B? Second, the broad binding of Kipferl is gone in rhino-/-, does it mean Kipferl requires Rhino to spread? Or, could Rhino (that is recruited by maternally deposited Piwi/piRNA) recruit Kipferl to neighboring sites, which look like a spreading phenomenon? Most importantly, the argument of Kipferl recruiting Rhino should be directly demonstrated by a sufficiency test in addition to the presented evidence of necessity. Could authors tether Kipferl in H3K9me3decorated regions to see if Rhino is recruited and vice versa? Observations like 42AB in Fig 5E make one wonder if Rhino also recruits Kipferl, so their relationship is not simply Kipferl recruiting or acting upstream of Rhino, as described throughout this manuscript. Clarifying the relationship between Kipferl and Rhino is essential as it is a central claim made.

      The relationship between Kipferl and Rhino is indeed complex and we agree that the linear hierarchy as stated in the first submission is too simplistic. We therefore added a clear statement that loss of Rhino impairs spreading as well as stability/strength of the Kipferlchromatin interaction (text relating to Figure 5, second paragraph, Figure 5C,D) . We furthermore edited relevant passages in the text to clarify the points raised by this reviewer. We added an analysis of Rhino/Kipferl domains and the binding of Rhino or Kipferl in kipferl mutants and rhino mutants, respectively (panel 5C). This strengthens the conclusion that Rhino and Kipferl are co-dependent at many sites. Together with the previous analysis that focuses on Kipferl peaks in rhino mutants (now panel 5E), we conclude that Kipferl does bind many Rhino domains by itself, albeit at considerably lower levels and in less broad peaks. We do not know what the prominent Kipferl accumulation observed in immuno fluorescence in rhino mutants corresponds to. The suggested tethering experiment is an interesting suggestion that we consider part of a follow-up study that also aims at understanding the exact molecular and structural basis of the Rhino-Kipferl interaction, ideally in complex with DNA.

      1. DNA binding of Kipferl remains putative. Since the 4th zinc-finger is shown to impact Kipferl localization via interaction with Rhino, it remains formally possible that the first three zinc-fingers control Kipferl localization via protein-protein interaction rather than direct DNA binding. Unless direct biochemical evidence of Kipferl binding DNA is available, the DNA binding of Kipferl should be toned down and described as putative and requires further investigation in text.

      We agree that definitive statements addressing this question require biochemical or structural insight into Kipferl-DNA interactions. We therefore made text changes to reflect this throughout the text relating to Figures 5 and 6.

      1. The relative contribution of maternally deposited piRNAs versus Kipferl in recruiting Rhino is unaddressed. Prior work from multiple groups including Mohn et al. 2014 Cell from the same group of this manuscript suggested a role of maternally deposited piRNAs in determining a subset of H3K9me3 domains as Rhino binding sites. Is Kipferl or maternally deposited piRNA a better predictor of Rhino binding? This manuscript proposes that Kipferl binds a simple G-rich motif and at the presence of H3K9me3 recruits Rhino binding. The readers are left wondering where maternally deposited piRNAs fit in the model of Rhino recruitment, which should be tested or discussed in text, as maternally deposited piRNA is seen as the key determinant of Rhino binding before this work.

      At this point, we cannot firmly separate the role of maternal piRNAs (which would act early in embryogenesis) from a guiding function of Kipferl, whose function during early embryogenesis is unclear (e.g. we see strongly reduced levels of Kipferl in germline stem cells). The current data in the field, together with the new Kipferl findings, indicate that Rhino requires H3K9me2/3 and an additional specificity factor/determinant for stable chromatin binding in the differentiating female germline. While maternal piRNAs might be essential to provide locus-specific H3K9-methylation, Kipferl has the capacity to alter the Rhino profile considerably at sites where H3K9me2/3 co-occurs with Kipferl recruitment sites to chromatin (presumably DNA motifs). Together the two pathways might act in parallel to explain why certain transposon insertions are bound by Rhino, while others are not. We aimed to clarify our view on this important topic in the revised Discussion section (second paragraph).

      Reviewer 3

      In this manuscript, Baumgartner et al investigated how cells control Rhino specific deposition on only a subset of the H3K9me3 chromatin domains to specify piRNA source loci. They identified a previously unknown protein, Kipferl, which by interacting with the chromodomain of Rhino guides and stabilizes its specific recruitment to selected piRNA source loci. Kipferl would be preferentially recruited to Guanine-rich DNA motifs. They show that in Kipferl mutant flies, Rhino nuclear subcellular localization and Rhino's chromatin occupancy changes dramatically. Then, they dissect all the domains of the Kipferl protein and show that the Rhino- and DNA-binding activities can be separated and that the 4th ZnF of Kipferl is required to interact with Rhino.

      It is a very elegant genetic work (CRISPR-edited, rescue, KD, overexpression fly lines). In addition, the authors used a combination of yeast two hybrid screen, ChIP, small-RNA-seq and imaging to dissect the function of this new protein. The data in this paper are compelling. Some conclusions might be more moderate. Even if the effect of Kipfler on 80F (Rhino binding, piRNA production) is very obvious, this study also clearly demonstrates that other protagonists are required for the specific binding of Rhino to other piRNA source loci (including 42AB and 38C).

      • Is Kipferl expressed early during oogenesis development? If Kipferl starts to be expressed only after the GSCs and cystoblast stage, Kipferl is probably not required to determine the specification of piRNA source loci identity but probably more for the maintenance of the specification. Could the authors discuss or comment on that?

      According to our image in Fig. 2D, Kipferl is very weakly expressed in GSCs and early cystoblasts. This is also supported by our unpublished observations on a cultured germline stem cell line (see above), where Kipferl is not detectable on chromatin by ChIP-seq. In these cells, Rhino has a remarkably different chromatin occupancy. Also in testes where Kipferl is not expressed, a different Rhino pattern was observed (Aravin laboratory) despite males inheriting the same complement of maternally deposited piRNAs. Together these data are consistent with a model where Kipferl acts as a specificity factor at its binding sites. We agree, however, that several Rhino domains exist where Kipferl does not show pronounced binding without Rhino. At these sites, Kipferl might act as a stabilizer or maintenance factor for Rhino, as it is nevertheless required for stable Rhino binding. In agreement with findings from testes (Aravin lab), we argue that Rhino’s chromatin occupancy in ovaries is not stable across developmental stages. And that it is respecified upon cystoblast differentiation, at least in part, by Kipferl. We also addressed this central point above (general comment #4 and 5).

      • To perform most of their ChIP-seq analysis, the authors have divided the genome into pericentromeric heterochromatin and euchromatin based on H3K9me3 ChIP-seq data performed on ovaries. With this classification the 42AB (2R:6,256,844-6,499,214) and the 38C (2L:20148259-20227581) piRNA clusters known to be heterochromatic fall in the euchromatic part of the genome. Was there a problem with the annotation?

      As stated in the text, clusters 42AB, 38C, and 80F were analyzed separately (as reference loci) and therefore were not included in either euchromatin or heterochromatin. The reviewer is correct that in the heatmaps, these clusters fall into the euchromatic compartment, as the classification into heterochromatin was not performed based on the presence or absence of the H3K9me3 chromatin mark at any given locus, but defined as inclusion in the continuous body of pericentromeric heterochromatin, which ends 400 kb upstream of 42AB and 2,000 kb downstream of 38C. We added a respective comment to the methods section (lines 945946).

      • Some regions exist in euchromatin that are strongly enriched in Rhino, in Kipferl and in H3K9me3 but are not producing piRNA. Does this type of region exist in heterochromatin?

      In euchromatin, roughly 80% of all Rhino-bound 1kb-tiles produce less than 10 piRNAs per kb per 1 million sequenced miRNAs. In heterochromatin, this is the case for only 10% of Rhino-bound tiles. This difference is likely caused by the high density of transposon fragments within heterochromatin, which allow the initiation of piRNA production from Rhino/Moonshiner-dependent transcripts through triggering.

      • Kipferl has been identified to interact with Rhino by a yeast two-hybrid screen (Figure 2). A co-IP which is the classical method for confirming the occurrence of this intracellular RhinoKipferl interaction should be provided.

      See our response to main comment #1.

      • Rhino is known to homodimerize and it has been reported that this homodimerization is important for its binding to H3K9me3 (Yu et al, Cell Res 2015). It is surprising not to find Rhino among the interactors that were picked up from the screen. Do the authors have any explanations or at least comments on these results?

      We can only speculate as to why Rhino was not identified in the Y2H screen. We are able to detect the homodimerization of Rhino in dedicated yeast two hybrid experiments in the lab, although the interaction was weak. One potential explanation is that dimerization of bait and prey is in competition with dimerization of bait and bait or prey and prey, reducing the efficiency of bait recruitment. For our yeast two hybrid experiments in the lab we use the Gal-4 system, while the screen was based on the more stringent LexA system, for which the homodimerization of Rhino might be too weak to be detected.

      • In Kip mutants, the delocalization of Rhino to a very large structure at the nuclear periphery is a very clear phenotype (Figure 3). All the very elegant genetic controls are provided. This particular localization of Rhino is correlated with an increase in 1.688 Satellite expression and a colocalization of Rhino and the 1.688 RNAs in the nucleus. The authors propose that this increase is consistent with an elevated Rhino occupancy at 1.688 satellites. The authors should moderate their statements in the light of the results of ChIP experiments. Rhino is maintained on these loci in Kip mutants but an increase is not very clearly observed. Couldn't it be the RNA and not the DNA of this 1.688 region traps Rhino? The same in situ experiment should be performed after an RNAse treatment. The delocalization of Rhino is lost in the Kipferl, nxf3 double mutant flies. What is the chromosomal Rhino distribution in this context? Is the increase in nascent transcripts of 1.688 satellites lost?

      The suggestion that RNA might trap Rhino at Satellite loci is a very interesting point. We performed the suggested RNAse treatment experiment in ovaries. This did lead to the disappearance of Rhino foci, however this is the case for both wildtype and kipferl depleted ovaries. While this indeed might point towards a role of RNA in stabilizing Rhino at chromatin, more in-depth experiments are required to clarify this.

      Regarding the Nxf3 point: we clarified this in the revised text (lines 293-296): In nxf3/kipf double mutants we still observe strongly increased RNA FISH signal for the Satellite transcripts of 1.688 and Rsp families, which colocalize with GFP-Rhino (Fig. S3H). We therefore assume that Rhino is still associated with the same chromosomal regions as it is in the Kipf mutant. Just the localization at the nuclear envelope is lost.

      • The level of some Rhino dependent germline TE piRNAs is affected in Kipferl GLKD. Is there a direct correlation between TEs which lost piRNAs and those for which the level of transcripts increases (Diver, 3S18, Chimpo, HMS Beagle, flea, hobo) ?

      We added a dedicated statement to this in the revised text: piRNAs antisense to the TEs that are upregulated at the RNA level are strongly reduced, but they are not the only TEs where piRNAs are decreased (lines 340-343).

      • Figure 5E, it seems that Kipferl binding is also dependent on Rhino. All the presented loci have much less binding of Kip in Rhino -/- (The scale for the 42AB locus should be the same between the Rhino -/- and the control MTD w-sh). In addition, the distribution of Rhino in the Kipferl-sh on the 42AB is maintained but seems to be different. Could the authors discuss these points?

      This point has been addressed above (main revision requests).

      • It is not clear why the authors focus only on Kipferl binding sites in a Rhino mutant in the Figure 5D? Even if the authors mention in the text that "Kipferl binding sites in Rhino mutants ... often coincided with regions bound by Kipferl and Rhino in wildtype ovaries" it should be added the same analysis presented in figure 5D centered on Kipferl peaks detected in ChIP experiments in WT condition in the different genotypes.

      We addressed this in the new revised Figure 5 and the corresponding text.

      • There is a discrepancy between the results found Figure 3A and Supp figure 3B. In the Rhino mutant the level of Kipferl protein does not seem to be affected whereas in the Rhino GLKD, there is a strong decrease of Kipferl protein. The authors completely elude this point.

      See our comment to reviewer 1 above.

      • Comparing the figure 5E and the figure 6G presenting both the 80F piRNA cluster, depending of the scale and the control line that was chosen to illustrate the results we can draw different conclusions. In the figure 5E we can conclude that le level of Kipferl decreases on the 80F locus in Rhino (-/-) compared to the control MTD w-sh, whereas in the figure 6G we can conclude that the level of Kipferl is similar in the Rhino (-/-) compared to the control w1118.

      We made a mistake with the axis label for the Kipferl ChIP in w1118 in Fig. 6F (former panel G), which goes up to 800 like for the Rhino ChIP. This has been fixed.

      • gypsy8 or RT1b are enriched in GRGG motifs and are also the ones that among Rhinoindependent Kipferl enrichment are the most Rhino enriched. Are these 2 elements present in the 80F cluster? Are these two elements derepressed upon Kipferl GLKD ? Where are these two elements in the figure presenting the change in TE transcript level upon Kipferl GLKD?

      Both TEs are indeed present in cluster 80F. However, Kipferl loss does not result in their derepression despite piRNA loss. Rt1b/a are also not significantly upregulated in the rhino KD, suggesting that only evolutionarily old copies exist that are not able to reactivate. Gypsy8 is slightly upregulated in a rhino KD, but not in kipf KD. This discrepancy might be due to the difference in developmental timing when the effect of Rhino or Kipferl depletion sets in. Neither element is known to react strongly to any perturbation of the piRNA pathway and they are mostly considered old and inactive.

    1. Author Response

      Reviewer 1

      This manuscript reports the cryo-EM structure of HOPS, a heterohexameric tether that participates in the fusion of late endosomes, autophagosomes, and AP-3 vesicles with lysosomes. HOPS has been characterized extensively through biochemical studies, which indicate that HOPS cooperates with SNAREs to facilitate membrane fusion. The authors conclude that HOPS is not a highly flexible structure as has been proposed, but instead has a stiff backbone to which the SNARE-binding Vps33 subunit is tightly anchored. Because the ends of HOPS bind to opposing membranes, the implication is that HOPS acts as a lever and membrane stressor, thereby amplifying the effects of SNARE assembly and catalyzing fusion.

      The structural biology analysis was based on an improved purification protocol and appears to be well done. An atomic-level structure is always valuable, and this contribution will undoubtedly guide further research involving HOPS. Initial steps in this direction are presented in the form of functional studies of structure-guided mutants.

      Structures are most useful when they help to define mechanisms, and the authors argue that the HOPS structure explains how HOPS catalyzes membrane fusion. The key conclusion is that the antiparallel association of the Vps11 and Vps18 subunits create a rigid core for the complex, leaving flexible ends that bind the Ypt7 GTPase to anchor the two membranes. This model is inconsistent with earlier suggestions that HOPS bends to bring the two membranes together. Instead, the inferred rigidity of the HOPS core, combined with the central location of the SNARE-binding module, suggests that HOPS acts as a lever that exerts a force on the membranes to promote SNARE-driven membrane fusion.

      This interpretation is interesting and potentially exciting, but I question why the authors are certain that the Vps11-Vps18 core is truly rigid. Proteins can undergo all sorts of rearrangements. Is there evidence that Vps11 and Vps18 interact strongly and in a unique configuration? Portions of a protein that have a consistent structure in vitro might nevertheless rearrange during functional interactions in vivo. If there is any flexibility of the Vps11-Vps18 core, this property combined with the evident flexibility of the Ypt7binding portions and the low affinity of Vps41 for Ypt7 would make HOPS anything but a rigid membrane stressor. If the authors wish to make a strong point about the functional implications of the HOPS structure, these points need to be addressed.

      Based on our data we conclude that the Vps11-Vps18 core represents a rigid structure. Our extensive 2D and 3D classifications, as well as the 3D variability analysis of cryo-EM data indicate no flexibility in this region of the complex (in contrast to the Vps41- and Vps39-termini of the particle), as illustrated in Figure 1 and Figure 2 - Supplemental Figure 3. Additionally, the highest resolution achieved in this region within the whole structure suggests the least flexibility of this region in comparison to other parts of the complex.

      To get a better idea, we mapped the interface between Vps11 and Vps18. The interface area between Vps11 and Vps18 is 1972 A2 according to the PDBePISA tool, which is large enough to form a strong interaction and is comparable with protein interfaces in other complexes with similar structural elements as in HOPS (e.g. Yang et al. Nat. Comm. 2021, Kschonsak et al. Nature 2022). To demonstrate this, we added an additional Supplement Figure to Fig. 2 (Fig. 2-S4) addressing the interaction area between Vps11 and Vps18 and revised the manuscript text in line 111 with words “…large interface area of 1972 Å2 which provides a…”.

      Reviewer 3

      This is an exciting new cryoEM structure of the HOPS tethering complex, which is necessary for membrane fusion at the vacuole/lysosome in eukaryotic cells. Finally, we can visualize, at moderate resolution, the positioning of HOPS subunits with respect to each other, and predict how HOPS and its various binding partners, such as Rab GTPases and SNAREs, can interact and control fusion. A conceptual advance put forward by this structure seems to be a rigid central core of HOPS that may contribute to helping drive the efficiency of the SNARE-mediated fusion mechanism.

      As exciting as this new structure is, however, the study seems to fall a bit short of its promise to explain "why tethering complexes are an essential part of the membrane fusion machinery, or how HOPS "catalyzes fusion." As such, the title is also misleading with regard to HOPS being the "lysosomal membrane fusion machinery."

      Overall, the manuscript could benefit greatly, especially for a non-HOPS specialist reader, in providing more introduction and context to the complex and tethering/fusion mechanisms in general. Additionally, the examination of the structure, in light of decades of biochemistry and cell biology studies of HOPS (and homologous proteins that regulate fusion), seems superficial and suggests that deeper analyses may reveal additional insights and lead to a more detailed and impactful model for HOPS function. Moreover, are the insights gained here applicable to other tethering complexes, why or why not?

      We thank the Reviewer for her/his kind and helpful comments and have addressed the concerns below and in the revised manuscript.

    1. Author Response

      Reviewer 1

      The authors used a combination of biochemical assays and cryoEM to investigate the role of PME-1 in regulating PP2A, which revealed that PME-1 uses its unstructured loops to associate with the B-domain of the PP2A holoenzyme to regulate the function of the C-domain. This is a high quality work. This reviewer finds the later work involving p53 to be a helpful step in explaining the role the PME-1:PP2A interaction can have on important phosphorylation pathways, but I consider this aspect of the work to be very preliminary, especially given its rather minor effects. That said, the authors do not make claims that extend beyond the scope of the evidence they provide and thus I find the connection and discussion of PME-1, PP2A and p53 to be suitable on the whole.

      Response: We greatly appreciate the positive comments and the recognition of our work.

      Reviewer 2

      The manuscript by Li et al is well-written and contributes an elegant cryo-EM structure of the PP2A-B56 holoenzyme, providing key structural rationale for holoenzyme demethylation and the inhibition of PP2A holoenzyme activity. A strength of the manuscript is the complementation of the structural data with a comprehensive biochemical/functional characterization demonstrating a mechanism for an oncogenic function of PME-1 in the regulation (inhibition) of p53 phosphorylation via PP2A-B56 holoenzymes under basal and DDR conditions.

      Response: We greatly appreciate the recognition and positive comments of our work.

      Reviewer 3

      PME-1 catalyzes the removal of carboxyl methylation of the PP2A catalytic subunit and negatively regulates PP2A activity. Like the PP2A methyltransferase LCMT-1, PME-1 was previously thought to act only on the PP2A core enzyme. However, in this study, the authors show that PME-1 can interact and demethylate different families of PP2A holoenzymes in vitro. They also report the cryo-EM structure of the PP2A-B56 holoenzyme in complex with PME-1. Their structure reveals that the substrate-mimicking motif of PME-1 binds to the substratebinding pocket of B56 subunit, which tethers PME-1 to PP2A, blocks substrate-binding to PP2A, and promotes PME-1 activation and demethylation of PP2A holoenzyme. Their further mutagenesis and functional analyses indicate that cellular PME-1 function in p53 signaling is mediated by PME-1 activity towards PP2A-B56 holoenzyme. In summary, this study has provided significant insights into our understanding of PP2A regulation by PME-1, demonstrating that PME-1 not only demethylates the PP2A core enzyme, but also the holoenzyme to control cellular PP2A homeostasis.

      Response: We greatly appreciate the recognition and the positive comments on our work.

    1. Author Response

      Reviewer 3

      The number of identified anti-phage defense systems is increasing. However, the general understanding of how phages can overcome such bacterial defense mechanisms is a black box. Srikant et al. apply an experimental evolution approach to identify mechanisms of how phages can overcome anti-phage defense systems. As a model system, the bacteriophage T4 and its host Escherichia coli are applied to understand genome dynamics resulting in the deactivation of phage-defensive toxin-antitoxin systems.

      Strengths: The application of a coevolutionary experimental design resulted in the discovery of a geneoperon: dmd-tifA. Using immunoprecipitation experiments, the interaction of TifA with ToxN was demonstrated. This interaction results in the inactivation of ToxN, which enables the phage to overcome the anti-phage defense system ToxIN. The characterization of the genomes of T4 phages that overcome the phage-defensive ToxIN revealed that the T4 genome can undergo large genomic changes. As a driving force to manipulate the T4 phage genome, the authors identified recombination events between short homologous sequences that flank the dmd-tifA operon. The discovery of TifA is well supported by data. The authors prepared several mutant strains to start the functional characterization of TifA and can show that TifA is present in several T4-like phages.

      In addition, they describe T4 head protein IPIII as another antagonist of a so far unknown defense system.

      In summary, the application of a coevolutionary approach to discover anti-phage defense systems is a promising technique that might be helpful to study a variety of virus-host interactions and to predict phage evolution techniques.

      Weaknesses: The authors apply Illumina sequencing to characterize genome dynamics. This NGS method has the advantage of identifying point mutations in the genome. However, the identification of repetitive elements, especially their absolute quantification in the T4 genome, cannot be achieved using this method. Thus, the authors should combine Illumina Sequencing with a longread sequencing technology to characterize the genome of T4 in more detail.

      We think the combination of Illumina-based sequencing and PCR analyses presented are more than sufficient to arrive at the conclusions drawn about the repeats that emerge in our evolved T4 clones.

      To characterize the influence of TifA during infection, T4 phage mutants are generated using a CRISPR-Cas-based technique. The preparation of these phages is unclearly described in the methods section. The authors should describe in detail whether a b-gt deficient strain was applied to prepare the mutants. Information about the used primers and cloning schemes of the Cas9 plasmid would allow the community to repeat such experiments successfully.

      We have added details to the Methods section to clarify and expand on our mutagenesis approach.

      The discovery of TifA would benefit from additional data, e.g. structure-based predictions, that describe the protein-protein interaction TifA/ToxN in more detail.

      We were unable to predict the ToxN-TifA interaction interface using AlphaFold, and we are currently conducting follow-up work to characterize how TifA neutralizes ToxN.

      Several publications have described that antitoxins can arise rapidly during a phage attack. The authors should address that this concept has been described before as well by citing appropriate publications.

      We believe that we have already addressed this point sufficiently in the Introduction of the manuscript, in which we discuss (1) the emergence of phage-encoded pseudo-toxI repeats to overcome P. atrosepticum toxIN and (2) the presence of the naturally-occurring antitoxins Dmd and AdfA in T4 and T-even phages, respectively. We also discuss the similarities between TifA, Dmd, and AdfA in the discussion of the manuscript. To our knowledge, these are the only known examples of antitoxins arising during phage attack outside of TifA, but we are happy to include additional citations of which the reviewers are aware.

      The authors propose that accessory genomes of viruses reflect the integrated evolutionary history of the hosts they infected. However, the experimental data do not support such a claim.

      We disagree with the reviewer’s comment, as our evolution experiment demonstrates the plasticity of the T4 genome during adaptation to different hosts, as well as showing that the T4 accessory genome includes genes necessary for infection of some, but not all hosts. The proposal also comes as the last sentence of the Abstract and is framed not as a conclusion, but as a proposal based on the work done here, with the clear intention of providing a sense of how future work may build off our work.

    1. Author Response

      Reviewer 2

      Hansen et al. investigates the catalytic behavior of phosphatidylinositol phosphate kinases (PIPKs), a family of enzymes that generate the regulatory lipid, phosphatidylinositol 4,5bisphosphate (PIP2) of eukaryotic cells. In their previous studies the Authors showed the positive feed-back regulation of these enzymes by their reaction product, PIP2 using a clever methodology, namely the real-life fluorescent monitoring of the enzymatic activity in supported lipid bilayers. This time the Authors noted a substantial difference between the strength of dimerization of the type II (PIP5P 4-kinases) and the type I (PI4P 5-kinases) enzymes, the latter exhibiting very weak dimerization in solution in contrast to the stable dimer formation of the former. Using supported membrane bilayers, the Authors showed that at low protein density the type I enzyme (they used PIP5KB) followed the behavior described previously, namely membrane interaction determined by the presence of PIP2 in the bilayer and this behavior was the same for a mutant protein, unable to dimerize. However, at increased protein concentration, the PIP5KB enzyme started to form dimers, which increased its time of membrane residence, still dependent on PIP2. Furthermore, the Authors showed that dimerization had a major impact on catalytic activity, multiplying the positive feed-back effect described for the monomeric form. Lastly, they demonstrated the impact of the enhanced feed-back regulation under competitive reaction conditions (in the simultaneous presence of a PIP2 5-phosphatase) showing that the previously described bistable reaction product pattern is highly dependent on dimerization, which also increases the stochastic nature of product bistability in a competitive reaction setting. The Authors discuss the potential impact of these findings on the regulation of the enzyme in the real cellular setting.

      Strengths:

      This is an important study revealing a new layer of complexity in the interfacial kinetic behavior of an enzyme family that is central to the regulation of multiple cellular functions. The simplified in vitro set up allowed the Authors to examine in very exact terms the impact of protein dimerization on reaction kinetics and complex behavior under competitive reaction conditions. The in vitro methods are creative, the experiments are well done with appropriate controls and together with the data analysis convincingly support the Authors' conclusions.

      Weaknesses:

      1. The Discussion misses opportunities to relate the present findings to specific published observations: For example, it has been reported that type II PIP4KC knockout cells display increased PIP5K activity, presumably because of the heterodimerization of the proteins with the PIP5Ks, thereby reducing their activity (PMID: 31091439). An additional recent study described the regulation of the PIP5K by phosphatidylserine and cholesterol-rich domains (PMID: 31402097). Both of these studies raise questions that can be easily addressed by the reagents and methods described in the present study. Even if these studies are saved for the future, discussion of those published studies would emphasize the importance of the current findings in the context of the questions raised by them.

      We thank the reviewer for mentioning these important articles. We extended our discussion to connect the finding of these articles to the PIP5K regulatory mechanism described in our manuscript. The following statements have been added to the discussion section:

      “Although new molecular mechanisms concerning PIP5K activation have been revealed through single molecule characterization of PIP5K in vitro, it remains challenging to interpret how dimerization, PI(4,5)P2 binding, and interactions with peripheral membrane proteins regulate membrane localization of PIP5K in vivo. Complicating our interpretation of cellular localization, PIP5K can also reportedly interact with phosphatidylserine and sterol lipids, which modulate lipid kinase activity (Nishimura et al. 2019).”

      “Left unregulated, the PIP5K positive feedback loop has the potential to generate excessively high concentrations of PI(4,5)P2 in cells, which would be detrimental to numerous signaling pathways that rely on cellular PIP lipid homeostasis. New evidence suggests that, in vivo and in vitro, PIP4K can attenuate PIP5K activity through the formation of a membrane bound heterokinase complex (Wang et al. 2019; Wills et al. 2022). Deciphering the molecular basis of PIP4KPIP5K complex formation using single molecule in vitro measurements will be critical for determining both the mechanism of kinase inhibition and for generating separation of functions mutants that perturb this regulatory mechanism.”

      1. As written, the paper is not always easily accessible to readers who are not experts in biophysical methodology and terminology. Some explanation may help general readers to follow the manuscript.

      In our revised manuscript, we include additional description in the results and discussion sections to more clearly explain how our experiments were executed and the rationale for our interpretations.

    1. Author Response

      Reviewer 1

      They adopted a comprehensive experimental and analytic approach to understand molecular and cellular mechanisms underlying tissue-specific responses against 3-CePs. They used two cell lines - BxPC-3 and HCT-15 - as example models for responsive and non-responsive cell lines, respectively. Although mutation rates didn’t differ by the drug treatment, they observed changes in cell cycle and expression of genes involved in DNA damage, repair and so on. Furthermore, they combined RNA-seq and ATAC-seq data and applied two approaches, pairwise and crosswise, to identify a number of gene groups that are altered in each cell line upon the drug treatment. Finally, they calculated enrichment of up/down genes in different cell lines, tumor types and samples to estimate potential responsitivity against the drug. This study is unique in in-depth analysis of RNA-seq and ATAC-seq data in identifying genetic signature underlying drug treatment. This study has the potential to be applied to different drugs and cell lines.

      We thank the reviewer for the precise and kind summary of our work.

      However, several major concerns need to be resolved. First of all, the biological and clinical performance of 3-CePs is not clearly described. They referenced several papers but they seem to have focused on the chemical properties of the drug. Without proven activity of 3-CePs against cancers in vitro and in vivo, the rationale of the study would be compromised.

      We apologize for not being clear enough when introducing previous findings on the differential sensitivity of HCT-15 and BxPC-3 cancer cell lines to 3-CePs. In the revised manuscript, we now cite references on the preferential activity of these agents against the pancreatic cancer cell line in 2D and 3D in vitro cancer models (see lines 71-74, 128-129). These compounds have been selected to exemplify the use of the pipeline in drug discovery and early-stage of drug development: indeed, only cellular data are available for these molecules, which have not yet been characterized in vivo. The pipeline itself offered a final perspective on directions to take for their further development, i.e. most sensitive tumor types to target (PAAD, KIRC).

      Their RNA-seq analysis was focused on discovering differentially expressed genes between cell lines, time points, etc. Interestingly, they found that DNA damage and repair signal was specifically increased in HCT-15. But is this approach capable of finding signals that are constitutively expressed in different cell lines? In other words, what if the differential responsiveness to 3-CePs was already there even before the drug was introduced?

      We thank the reviewer for pointing out such key concept. The premise for the developed approach is that factors determining the overall cellular sensitivity to a treatment must be determined by intrinsic characteristics of the cell line. For this reason, we built the sensitivity signature on basal transcriptome profiles, where we prioritized a subset of genes based on perturbational evidence (perturbation-informed basal signature).

      Beyond signature genes, we show in figure R1 (see above) the results of a GSEA analysis on the whole overlap (300 genes) between DE genes from the baseline comparison (BxPC-3 ctrl vs HCT-15 ctrl) and those from the 6 h M treatment comparison, in the sensitive cell line (BxPC-3 M 6 h vs BxPC-3 ctrl). Pathways like ribosome biogenesis, ROS metabolism, UPR also arise, attesting that genes activated in response to the treatment also have a constitutively different expression in unperturbed cells.

      Are there any overlapping signals between pairwise vs crosswise approaches?

      We thank the reviewer for this question. To make it easier for the reader to compare the output from the two types of integration and to intuitively grasp their functional overlap, we changed the visualization of the results from the pairwise approach (Figure 4 D).<br /> Indeed, some functional pathways both new or already emerging from previous analysis, arise from both integrations. This overlap has now been directly discussed from the functional point of view in the main text (from line 348 and in the following crosswise integration paragraph).

      Genes used as input in both types of integration are DE or DAR-associated, so this means that many of the hits that we find having the same double regulation (pairwise) also appear in CoCena modules. Among them, only few hits show both 1) the same double regulation in a specified comparison (as suggested by crosswise) and also 2) end up having the similar pattern of regulation across all conditions (contributing to the same CoCena module, one of the strengths of the crosswise integration). Indeed, while the pairwise integration checks one single comparison per time, CoCena checks the pattern throughout conditions providing a more holistic view of the gene regulation (e.g one gene can have a different pattern across conditions at the transcriptional and chromatin level). This is due to the biological fact that RNA and chromatin regulation is not 1:1 (also, for instance, from a timing perspective).

      The major added value of the two approaches consists in their intrinsically different output information. Within a specific comparison, the pairwise integration detects genes consistently activated at the transcriptome and chromatin level. At this information level gene set enrichment can simplify the coherent functional role of this set of genes; we now report this extra information in figure 4 to provide a more granular description of the pairwise integration. Instead, CoCena analyzes the pattern throughout conditions, and clusters together genes and peaks that behave similarly. Functional annotation of genes behaving similarly can put together promoters and/or transcripts that together may orchestrate a specific process (as highlighted by GSEA on each module).

      Probably a similar question with the above: is this methodology applicable to other drugs in addition to 3-CePs?

      To address this extremely important point, that we agree with the reviewer would be key to prove the versatility of our approach, we further applied the pipeline to the prediction of cancer cell lines’ sensitivity to cisplatin, a thoroughly reported broad-acting chemotherapeutic also acting as a DNA damaging agent. Results strongly supported the broad applicability of our approach, which was able to predict sensitivity to this reference drug with extremely high accuracy.

      Reviewer 2

      Carraro et al. describe a framework to understand MoA and susceptibility of drug candidates by integrating RNA-seq and ATAC-seq information. More specifically, by collecting drug responses from high-sensitive and low-sensitive cell lines, the authors identified a key set of pathways with co-expression analysis, and further predicted sensitivity of different cancer cell lines.

      The authors provided a new bioinformatics pipeline to integrate multi-omics data (RNA-seq and ATAC-seq) in a drug response study. This approach increased detection power and identified additional key pathways that are associated with drug 3-CePs. This framework has the potential to be applied to the general drug discovery process.

      We thank the reviewer for the precise summary of our study.

      However, the current manuscript failed to describe the integration methodology in a clear and concise way. Without a full understanding of the methodology, it’s tough to evaluate the downstream results in an unbiased manner.

      We apologize for not having included sufficient details in describing the difference between CoCena and the other two horizontal and vertical approaches. As already discussed in the response to Reviewer 1, we now included a more detailed description not only in the Methods section (from line 894) but also in the main text (lines 393-400).

      In addition, the authors didn’t mention how much additional value this multi-omics approach provided compared to the single-omic data set, as multi-omics approaches are more expensive and labor-intensive.

      We thank the reviewer for this valuable point. To better support the claim for multi-omics approaches, we have extended the Introduction (lines 96-98), as successful integration of information derived from multiple omic layers usually strengthens the determination of the major observed cellular responses. Here, this information helps dissecting and predicting how perturbations (here by drugs) can affect the overall cellular dynamics and mechanisms underlying a certain niveau of sensitivity. We agree with the reviewer that current costs are still prohibitive for large scale use of multi-layer omics in many settings, mainly when it comes to clinical use or drug development. However, significantly less expensive technologies (90% cost-reductions, lines 53-55) have recently been announced, which assures us that approaches as outlined here, will be applicable to many more clinical questions in the near future. Further, we show evidence that some cellular responses to the drug-induced perturbation was only revealed by applying multilayer analysis, but not by a single omics layer, e.g. TGF beta and EMT signaling (see lines 456-459).

      Reviewer 3

      Carraro et al utilize systems biology approaches to decode the mechanism of action of 3chloropiperidines (a novel class of cancer therapeutics) in cancer cell lines and build a drugsensitivity model from the data that they evaluate using samples from The Cancer Genome Atlas and cancer cell lines. The approach provides a framework for integrating transcriptomic and open-chromatin data to better understand the mechanism of action of drugs on cancer cell types. The author’s approach is of sound design, is clearly explained, and is bolstered by validation via holdout sets and analysis in new cell lines which lends the findings and approach credibility.

      The major strength of this approach is the depth of information provided by performing RNA-seq and ATAC-seq on cells treated with 3-CePs at various time points, and the author’s utilization of this data to perform pairwise and crosswise analyses. Their approach identified gene modules that were indicative of why one cell type was more sensitive to a particular drug compared to another. The data was then used to build a sensitivity model which could be applied to samples from The Cancer Genome Atlas, and the authors evaluated their sensitivity predictions on a set of cancer cell lines which validated the predictions.

      We thank the reviewer for the accurate recapitulation of our work.

      The major drawback to this type of approach is that it relies on next-generation sequencing (somewhat costly) and requires intricate bioinformatics analyses. While I agree with the author’s perspective that this approach can be applied to additional classes of drugs and cancer samples, I disagree with their view that it is efficient and versatile. However, for research teams with the means to perform both transcriptomic and open-chromatin studies, I think this integrated approach has promise for evaluating novel classes of drugs, particularly in cancer cell lines that are easy to manipulate in vitro.

      We thank the reviewer for this insightful comment. As with almost every technology, the early years are more difficult and at times adventurous. However, we have seen enormous improvements in robustness of the technology and significant cost reduction with more to come. Only recently sequencing technologies have been introduced into the market with a further 90% cost reduction (as stated in line 53-55). We are convinced that due to their increasing affordability and robustness, RNA-seq and ATAC-seq will be implemented routinely into clinical contexts. As a group working at the cross-section between drug discovery and bioinformatics, we hope that our current work, accompanied by a fair and detailed sharing of our scripts, will become a head start to run this type of analysis also by others in the field who are not (yet) so close to bioinformatics and computational biology.

      While there are examples of similar frameworks being applied to drug development, this work will add to the body of literature utilizing an integrated systems biology approach for pairing drugs with specific tumor or cancer types and understanding their mechanism of action on an epigenetic level.

      We thank the reviewer for this very positive statement and the support for our approach and her/his interest in the described pipeline.

    1. Author Response

      Reviewer 1

      The manuscript provides a dataset of single-cell transcriptomics of several adult mice ovaries and performs computational analysis to determine the molecular signatures of the cells isolated.

      Strengths: - Provide data from different estrous stages and lactating. - Many markers are validated. - Several estrous cycle-specific biomarkers are revealed.

      We thank the reviewer for the positive assessment of our efforts to comprehensively validate cell and estrous-cycle specific biomarkers.

      Weaknesses: - It does not stratify or provided trajectories of the data according to the different estrous stages and lactation periods.

      We have now added stratification of data sources in figures 1B, 5A, and 5B.

      • Only single markers are validated, making it difficult to see the population.

      While we show representative RNAish of single markers for the identification of the cellular populations, we provide heatmaps with complete signatures in Figures 1C, 2B, 3B, 4B, Dotplots in figures 4D, 5D, 6A, Figure 2 – supplement 1D,E, feature plots in Figure 3-supplement 1A,B, as well as fully referenced tables of validated markers in Supplementary Files 2, 3, and 4.

      • The population of peri-ovulatory GC could be better characterized.

      We now provide Oxtr as a more specific marker of peri-ovulatory Gc (see figure 3C and figure 3- supplement 1D), in addition to the validated markers presented in Supplementary File 4.

      • There is no mention of specific populations or states in the lactation sample.

      We now provide cluster composition by sample type in Figure 1B, as well as represent cell state differences in lactating samples in S3E, which are discussed in lines 196-197 and 272-275.

      • Monocle analysis could be made more robust.

      Since the graphical representation of lineage pseudotime trajectories of granulosa cells was counter intuitive, we have removed this analysis in favor or more concise explanation of differentiation and cell states amongst the mural and cumulus granulosa cells and their response to LH in lines 378-382.

      • Specific populations of theca cells (interna and externa) are not named.

      Specific populations of theca are now shown in figures 2 and figure 2 – supplement 1 and more extensively described in the results (lines 230-244) and discussion (lines 397-417).

      • Differences between stroma 1 and stroma 2 are not found.

      After reanalyzing the data, we concluded that these two interstitial stromal cell clusters could not be differentiated by specific dichotomous markers. Nevertheless, we noted that the expression of Ectonucleotide Pyrophosphatase/Phosphoiestrase 2 (Ennp2) was specific and limited to one of the clusters. Given that this same cluster expressed markers such as Col1a1, Lum, Loxl1, shown in the literature to be characteristic of fibroblast, we named them fibroblast-like stroma. The second subcluster, Enpp2 negative, was shown to be enriched in expression of steroidogenic markers such as Cyp11a1, and Cyp17a1 and was named steroidogenic stroma cluster. These results are now presented in figure 2 - supplement 1E,F.

      • OSE is only mentioned in the Discussion.

      The findings in OSE are now presented in Figure 4 and discussed in the results (lines 287-294) and discussion (lines 418-429).

      Reviewer 2

      This manuscript by Morris et al., entitled "A single cell atlas of the cycling 1 murine ovary" presented an interesting dataset for understanding cellular and transcriptional dynamics during the estrous cycle in mice. By using single-cell RNA sequencing, the authors reported new marker genes for different cell types and validated some markers using In situ hybridization. However, I believe that the main problem of this paper lies in the interpretation of data.

      The major points include: 1. The authors used tSNE for visualization of the generated scRNA seq dataset, which, according to my knowledge, is outdated for scRNA seq data visualization as its reproducibility has become an issue. Which version of the Seurat package does the author use? And also the other software information should be implemented. Therefore, I suggest the authors reanalyze their dataset using an updated Seurat pipeline, and also reanalyze all of their data using UMAP.

      The data have been reanalyzed with UMAP instead of tSNE. The R version is 4.1.3, it is now indicated in the Material and Methods section.

      1. The authors aimed to explore the unrecognized complexity in the cellular subtypes and their cyclic expression states. For cellular heterogeneity, the authors reported detailed cell percentages of different cell types and validated the new markers using in situ hybridization, however, how do these cells change during the estrous cycle? According to the current manuscript (Figures 3A and 4A), it's hard to interpret such changes. The authors should emphasize this aspect of the description, one possible solution is to add a stack bar chart to show the proportions of different cells at different stages.

      We added a plot showing the composition of each cluster by stage of the estrous cycle in figure 1B, 5A, and 4C. The percentage of cells in each cluster has been added to each feature plot.

      1. For trajectory analysis, which is the root state? according to line 233 and Fig. S3B, the root state should be state 3 in Fig. S3A, is it right? The authors should clarify them. Also for each branch, adding a piechart for each branch will be more informative for the readers.

      Given that the monocle pseudotime trajectories of granulosa cells were difficult to interpret, we have removed this analysis from the manuscript.

      1. In lines 235-237, the authors demonstrated that "corpus luteum clusters (CL1-3) and the periovulatory cluster were ordered on another branch suggesting this latter branching fate represents a continuum of differentiation states corresponding to luteinization of granulosa cells.", which doesn't sound convincing enough to me because there is significant overlap for cells in CL1 and CL3, not ordering. To provide a more convincing interpretation of the scRNA dataset, I suggest the authors perform RNA velocity analysis and corporate RNA velocity vectors in the trajectory plots, which will greatly help to understand the scRNA dataset.

      Unfortunately, the inDROP pipeline was not compatible with RNAvelocity. Other pseudotime analyses gave similar representation of trajectories. We have therefore elected to remove pseudotime analysis from the manuscript.

      1. For Fig. 4A, I suggest the authors add a barplot to show the number of different cells for the different stages as for scRNA dataset, it's sometimes common that some cell types were covered by the other in the tSNE plot.

      We thank the reviewer for this suggestion, we have now added the percentage of clusters by sample type in every UMAP dimplot.

      1. For Fig. 4D, what is the expression level of these genes according to the scRNA seq dataset? A comparison of such information will increase data reliability.

      We thank the reviewer for this suggestion, we have added a dotplot representing the level of expression of each gene in the scRNAseq dataset beside the qPCR graphs (now Fig 5D and S5E).

      1. How did the authors identify the secreted markers using their scRNA seq dataset? An explanation should be added.

      We screened the significantly differentially expressed genes by estrous stage for proteins predicted to be secreted according to UniProt annotation (Table S5), and prioritized genes with the highest fold change. The text has been modified to include this information (lines 320-324).

      Reviewer 3

      The authors sought to identify transcriptional changes that occur in the various somatic cell populations of the adult mouse ovary during different reproductive states using single-cell RNA sequencing. The ovaries for the analysis were harvested from mice during the four stages of the normal estrus cycle (proestrus, estrus, metestrus and diestrus), from lactating or non-lactating 10 days postpartum mice, and from randomly cycling mice. They identified the major cell subtypes of the adult ovary but focused their analysis on the mesenchyme (stromal and theca) and granulosa cells. They identified novel markers for stromal, theca and granulosa cell subpopulations and validated these by RNA in situ hybridization. They used trajectory analysis to infer differentiation lineages within the stromal and granulosa cell subtypes. Finally, from their data set they identify four secreted factors that could serve as biomarkers for staging estrus cycle progression.

      Strengths - This is the first study to profile ovarian somatic gonad cells at different stages of the reproductive cycle.

      We thank the reviewer for the positive assessment of our efforts to profile ovarian gonadal somatic cells comprehensively across the estrous cycle

      Weaknesses - Enthusiasm for the current manuscript is lessened because it does not employ stateof-the-art scRNA-seq analysis. For example, once general cell populations have been determined by clustering with all cells, it is best to individually re-cluster these cell populations to identify more refined and accurate subpopulations. The PC used for the initial clustering is very useful for distinguishing different general cell populations (e.g. mesenchyme vs. granulosa vs. endothelial) but may not be as useful for distinguishing biologically relevant subpopulations (e.g. stromal subpopulations). Finally, certain cell subpopulations were excluded from the trajectory analysis without justification - specifically, the mitotic and atretic granulosa cells - calling into question what conclusions can be drawn from this analysis.

      We have re-analyzed our dataset using the most up to date version of Seurat and R, including reclustering, and changed our dimensional reduction to UMAP. We have also removed pseudotime analysis from the manuscript, given the difficulties in interpreting and representing trajectories.

    1. Author Response

      Reviewer #2 (Public Review):

      In the report by Brandao A. et al the authors used a zebrafish adult tail fin regeneration model to elucidate the role of metabolic adaptation in cell fate transition and cell proliferation during regeneration with a focus on bone regeneration. Firstly, the authors used transgenic reporter bglap:GFP to label mature osteoblasts and co-immunostaining with a pre-osteoblast marker runx2 to show that within 6 hours post amputation, osteoblasts show signs of dedifferentiation giving rise to pre-osteoblasts that re-enter the cell cycle between 12 - 24 hpa. The authors then use evidence from gene expression changes, metabolomic analysis, pharmacological perturbation experiments, cell proliferation analysis and histological detection of lineage markers to demonstrate that an immediate metabolic switch from OXPHOS to glycolysis precedes blastema formation in amputated tail fin stump. Importantly, blocking glycolysis with 2-DG suppressed mature osteoblast dedifferentiation and proliferation as well as blastema formation which resulted in failure of tail fin regeneration. In summary, this study has shown that a rapid metabolic switch from OXPHOS to glycolysis immediately after tissue damage is important for subsequent bone regeneration, and more specifically the authors provide evidence to show that glycolysis is required for both dedifferentiation and for cell proliferation, both processes are crucial for appropriate blastema formation.

      This study established that metabolic switch is an early response to tissue damage and metabolic adaption is key for cellular responses during bone regeneration. Conclusions of this study are well supported by data provided. There are some details of data mentioned in the text should be clarified.

      The authors used Microarray transcriptome analysis to demonstrate dynamic gene expression response in 6hpa OBs compared to 0hpa OBs and stated in page 6 line 154 - 159 that at 6hpa, OBs undergo dramatic gene expression changes with 2200 differentially expressed genes, and a set of genes related to energy metabolism was also dramatically altered. However, in supplement figure 1 there was no mention of which genes related to energy metabolism are altered, there is no real data as to what kind of gene expression changes are happening in 6hpa OBs, because DE gene list is not provided. Do the gene expression changes reflect partial dedifferentiation of OBs? The authors should provide more details of their microarray analysis, or at least provide the DE gene list.

      We are thankful for the reviewer positive and constructive comments regarding our experimental work and the feedback for the improvement of the manuscript. We just want to point out to the reviewer that the data on Fig 2B relates to major glycolytic enzymes and OXPHOS components retrieved from the Osteoblast ArrayXS and that, as it was mentioned in the methods section (Lines 714-715), we submitted to NCBI Gene Expression Omnibus archive the transcriptome datasets analysed on this study (accession number GSE194385).

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by Kralt et al. provides insight into the enigmatic process of nuclear pore complex (NPC) biogenesis. By taking advantage of their recent development of an isotope labeling/affinity purification/quantitative mass spectrometry pipeline called KARMA, the authors convincingly demonstrate that Brl1, a double pass transmembrane protein, associates with early NPC assembly intermediates but not mature NPCs. A combination of auxin-inducible degradation of Brl1 coupled with an extensive analysis of nup localization (including the use of RITE technology) and cryo-electron tomography provides compelling evidence of the importance of Brl1 in NPC biogenesis at a step that correlates with inner and outer nuclear membrane fusion. Overall, the strengths of the work are that the experiments are innovative, the data are of the highest quality, and the conclusions are on a solid footing.

      A potential weakness is that there is already convincing published data that implicates Brl1 as an NPC biogenesis factor. Although there is no doubt that the current work extends these findings to implicate a lumenal amphipathic helix as a key element of Brl1 function, there remains considerable uncertainty over the ultimate mechanism by which the amphipathic helix contributes to inner and outer nuclear membrane fusion.

      We thank the reviewer for the overall positive evaluation.

      We agree that several recent papers have pointed towards a role for Brl1 in NPC assembly. However, the literature was not conclusive on whether Brl1 directly acts in this process and did not provide any mechanistic insight into the function of Brl1. For example it was suggested that Brl1 might play a role in nuclear transport [1,2] or affect NPC assembly by regulating lipid homeostasis [3,4]. Especially the effect of Brl1 on lipid homeostasis and whether Brl1 indirectly affects NPC assembly by regulating the lipid composition remained controversial [3-5].

      In this respect, we would like to highlight that figures 1-3 provide, to our knowledge, the first direct evidence that Brl1 acts as an NPC assembly factor, and our results clearly go beyond previously published data. First, we identify Brl1 as an NPC-interacting factor in an unbiased analysis of a previously published MS dataset [6]. Second, we show that Brl1 has a preference for binding to young, premature NPCs, which means it binds to NPC assembly intermediates but is not part of the mature complex. This is evident from our metabolic labeling assays and also seen in vivo using the RITE approach. Finally, by in-depth characterization of the associated NPC assembly defects and localization of Brl1 we show that (i) the activity of Brl1 is required for membrane fusion during new NPC assembly and (ii) the lipid binding of its luminal AH is essential for this activity. We agree that we do not have any direct evidence that Brl1 displays fusogenic activity, but such a direct proof would likely require biochemical reconstitutions, which are outside the scope of this study. However, our data provide a solid ground for further work in this direction.

      Reviewer #2 (Public Review):

      In this study, Kralt et al. investigate the mechanisms of nuclear pore complex (NPC) biogenesis in budding yeast, which only relies on interphase NPC assembly. By combining metabolic labeling and microscopy, they show that Brl1, a nuclear envelope (NE) transmembrane protein previously reported to partake in NPC biogenesis, associates with early NPC assembly intermediates. They further report that Brl1 depletion triggers NPC biogenesis defects, as revealed by (i) the characterization of NPC species lacking a subset of nucleoporins in fluorescence microscopy and metabolic labeling assays, and (ii) the detection of NE abnormalities (i.e. herniations) by cryo-electron tomography. In search of the underlying mechanisms, they identify an essential Brl1 motif predicted to fold as an amphipathic helix (AH), which exhibits liposome-binding activity in vitro and supports NE targeting in vivo. Finally, they demonstrate that overexpression of an AH-deficient Brl1 version blocks NPC assembly at a stage likely preceding the fusion of the inner and outer nuclear membranes. Based on these observations, they suggest that Brl1 AH is required for the membrane fusion step in de novo pore biogenesis.

      Overall, the conclusions of the authors are supported by the large panel of high-resolution, quantitative data provided. This study provides an unprecedented characterization of Brl1 recruitment and function during the early steps of NPC maturation, although it was already reported that Brl1 contributes to pore assembly (e.g. Zhang et al., 2018). In this view, the involvement of an AH-containing factor in the fusion step represents the main conceptual advance here. Yet, although the featured results support a role for Brl1 AH in membrane fusion, they do not actually prove that Brl1 acts as a fusogen during nuclear pore formation. Additional characterization of Brl1 AH properties, in particular through in vitro experiments, will be required to understand the underlying mechanisms and their relationships with the other NE proteins proposed to contribute to this process (i.e. Brr6/Apq12).

      Of note, this work also validates the utilization of the KARMA workflow (metabolic labeling coupled to affinity purification and mass spectrometry), previously published by the same authors, for the characterization of NPC assembly factors. While this methodological framework could thereby prove useful to assess the biogenesis of multiprotein complexes, beyond NPCs, some potential limitations also emerge, as highlighted here by the necessity to control for post-lysis intermixing.

      We appreciate the very positive evaluation of our manuscript. We would like to highlight that the main conceptual advance of this study is not only the identification and characterization of the ahBrl1 but also the direct evidence and characterization of Brl1 as an NPC assembly factor (see also the general response to reviewer #1). Investigating in detail the mechanism of membrane fusion and the role of Brl1, Brr6 and Apq12 in this process will unquestionably be very interesting, but is beyond the scope of this study.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Stephan et al. use experimental data to test whether positive interactions between different crop species strengthen over time (generations) when these species are cultivated in association. Even if this has already been investigated in grassland species, we currently lack experimental data on such questions in crops, which makes the study original and with potentially important agronomic applications. To address the question, the authors designed two types of communities: monocultures and mixtures made from seeds collected on plants that evolved in the same community type in the previous two generations (i.e. mixtures from mixture seeds and monocultures from monoculture seeds), and monocultures and mixtures made from seeds collected on plants that evolved in a different community type in the previous two generations (i.e. mixtures from monoculture seeds and monocultures from mixture seeds). They then used multiple sets of indexes to characterize the magnitude and direction of plant-plant interactions in order to compare communities with different evolutionary histories. Interestingly, the experiment is also replicated across two fertilization treatments. At the individual plant level, the results suggest that facilitation increases and competition decreases for plants grown in the same community type as their progenitors compared to plants grown in a different community type as their progenitors. Community-level analysis, however, shows a different picture: both the total yield of the communities and the relative yield of the mixtures are not affected by the coexistence history of their parents, and the results do not provide evidence for increased complementarity between species that have evolved in mixtures in the previous generations. Finally, the authors show that several aboveground traits have lower variation in communities that evolved in the same community type in the previous generations compared to communities that evolved in a similar community type, which shows phenotypic convergence rather than the expected phenotypic divergence in mixtures. They also report differences in trait means, with for example lower mean leaf dry matter content in communities composed of offspring of plants that were grown in the same community type compared to communities composed of offspring of plants that were grown in a different community type.

      The study is based on original and high-quality experimental data. The number of species, communities, and replicates is relevant regarding the research question. It is also very nice to have contrasted environments (fertilized vs unfertilized). All these combinations of factors have been replicated over three consecutive years, which also needs to be acknowledged as an impressive experimental effort.

      The statistical analysis is rigorous and successfully accounts for design features when testing the effects of interest. My main criticisms concern, by order of importance, the disconnection between the results and the claims of the paper, some weaknesses in the experimental design, and the clarity of the Materials and Methods.

      The main hypothesis of the study, which is that higher facilitation/complementarity should occur in mixtures made from plants that evolved in mixtures compared to mixtures made from plants that evolved in monoculture, is not supported by the results. However, the results are presented in such a way that it seems that this hypothesis is verified. For example, the first index which is used by the author (Relative Intensity Index, RII) shows a significant effect on the treatment of interest (coexistence history). However, this index is not the most relevant to capture facilitation or complementarity effects in multi-species communities, notably because it is computed at the single plant level, and only with three plants per species. The Loreau & Hector partitioning (Net Biodiversity >Effect, NBE, partitioned into a Selection Effect, SE, and a Complementarity Effect, CE), which is used a second time, is the gold standard in the field. This is acknowledged by the authors given that they check the validity of RII by measuring its correlation with CE in Fig. S13). Unfortunately, RII and NBE give very different results: the coexistence history of the species has no effect on NBE, and most notably no effect on the CE component. Yet, the authors claim that the coexistence history has an effect on NBE in the fertilized treatment, but we do not have any information supporting the statistical significance of this result (the p-value used to support this claim l. 103 is > 0.05). More generally, several non-significant results are discussed (e.g. l. 100 to 105, l. 136 to 144). The Figures in the main text are also misleading. They show the means ({plus minus} standard error) in the different treatment but do not report statistical significance. Very often, the related boxplots in the Supplementary Information show much fewer differences between the treatments (e.g. Figure 2 vs Fig. S1, Figure 3 vs Figure S3), and the ANOVA tables confirm that these differences are not statistically significant. Overall, the fact that coexistence history has no effect on the total yield, on NBE, and most notably on the CE component, together with the fact that species' traits converge, notably towards taller plants, do not support reduce competition nor higher facilitation in mixtures with a mixture history compared to mixtures with a monoculture history.

      The p-values are indicated in the main text along the figures in the Result section. Significance stars were now added and we hope it will be clearer. Furthermore, the discrepancy between RII results and NE/CE was also discussed more extensively in the discussion and we refer to our responses of the general recommendations (1) and (3) by the review editor above. In summary, with our interpretations we tried to stick to the results provided by the data and now provide explanations to reconcile the different results. This does hopefully help to demonstrate that the different results we obtained are not contradictive.

      The amount of phenotypic and genetic variation within each species at the beginning of the experiment has not been controlled and reported in the study. It seems that inbred lines were chosen for some species (e.g., wheat, oat, or lentil) which means that there was no genetic variation for these species, whereas landraces or open-pollinated varieties were chosen for others (e.g., coriander or camelina). It thus means that the evolutionary potential of the different species was not the same. It would have been more rigorous to choose either only fixed genotypes for all species (inbred lines or hybrids), which would then have evolved under the sole effect of epigenetic changes, or only mixtures of genotypes for all species (either varietal mixtures or open-pollinated varieties), which would then have evolved under natural selection and changes in gene frequency.

      We recognize that the underlying mechanisms remain unclear, as indeed we did not measure the initial amount of standing variation and we could not find seed or populations with the exact genetic variation. This was not done as investigating the potential genetic mechanism was beyond the scope of the study. Therefore, we can only speculate regarding the potential mechanisms, and we have now added an extensive paragraph discussing the possibilities in the discussion (L305-323).

      Several aspects of the Materials and Methods could be clarified. It is not clear how the different plots and community types were re-allocated each year. This is important to interpret the results, as soil legacy effects could also affect the outcomes of plant-plant interaction. For example, were mixture plots with a "pure" mixture history grown in the same plot from one year to the other, or were plots reshuffled each year?

      Plots were reshuffled each year precisely to avoid soil legacy effects. This was clarified in the methods.

      Also, the sowing pattern of the 4-species mixtures is not explained. Was it also alternate rows, as the 2 species mixtures? Was the pattern the same across the different replicates and treatments for a given 4-species mixture?

      Since plots were sown with four lines a 50 cm length, in the case of 4-species mixtures, it was for each crop species 1 line. We did not mix species within sowing lines. The pattern across the different replicates and treatments was randomized and is not necessarily the same for a given 4-species mixture. This was now added to the methods.

      We do not have information on the sowing densities in mixtures plots (was it simply their monoculture densities divided by the number of species?).

      Yes, indeed. We kept the monoculture sowing densities in the mixture plots (i.e. if we planted 10 lentils per line in the monoculture, we also planted 10 lentils per line in the mixture). This was added in the method description.

      An important aspect of index computation is also not explained. For example, monoculture yield is used as a reference to compute Relative Yield (RY), and single plant yield is used as a reference to compute RII. There are several ways to compute these reference values, given that there are multiple replicates of monocultures and single plants for a given species. It can be either the value of the closest replicate in the experiment in the same treatment, the average value of all replicates in the same treatment, or a model-derived prediction which accounts for design effects (BLUE or BLUP). In this experiment, monocultures and single plants are also replicated across different evolutionary histories. So, we need to know which type of monoculture plots or single plant plots were used to compute RII and NBE.

      This was specified in the Methods and discussed in the response to the general comments. In short, we always used the average value over the replicates with the same treatment combination, e.g. a single plant with a monoculture history in unfertilised plots as a reference for monoculture history treatments in unfertilised plots.

      Reviewer #2 (Public Review):

      The paper offers a very novel experimental framework for assessing how coexistence history could influence intercropping success in agricultural systems. The authors do a very nice job combining the science from multiple fields into a coherent and useful framework. Based on this framework we should conclude that growing crops in polyculture fields for multiple generations will increase the benefits of intercropping for growing food.

      However, on the ecological side, there are some weaknesses that need to be addressed: 1. The introduction and discussion need more context for how co-occurrence can lead to more facilitation. I see how co-occurrence could lead to trait displacement and less niche overlap, so less competition. But what is the facilitation part of this? The introduction doesn’t introduce any potential mechanisms for this despite many indications that facilitation could also change as a result of coexistence history.

      We included a paragraph covering the evolution of facilitation in the introduction (L49-57).

      1. The authors should think carefully about their use of net effects, RII_facilitation, and RII_competition. It appears to me as though all three are measuring net effects but in some cases facilitation > competition and in other cases competition > facilitation. Even though that’s true, it doesn’t mean that the indices aren’t still measuring net effects. Given that, the authors should temper that language and consider reinterpreting some of their data.

      Indeed, as our RII measures net effects, for more clarity, we decided to skip this distinction between RII facilitation and RII competition – thereby also acknowledging concerns raised by reviewers 1 and 3. Furthermore, we rephrased our statements to make clear that what we measure is the outcome of plant-plant interactions and that we could not always distinguish between increased facilitation and/or reduced competition.

      1. The authors should also give careful consideration to the relative balance of inter vs. intraspecific competition. Many (if not all) of these trends could be indicative of stronger intraspecific competition than interspecific competition. This will need to be considered very carefully.

      We agree in that increased complementarity with increasing diversity is per se due to stronger intraspecific competition than interspecific competition, and consequently higher benefits of alleviation of the most important source of competition. This is the underlying hypothesis of all BEF studies and actually the reason why we tackled BEF in this study by means of plant-plant interaction metrics. We clarified this in the introduction (L96-99).

      1. Have the authors considered separating their data into plots with and without legumes? The strong selection effects with co-occurrence history would also support this. Nitrogen enrichment is one of the most heavily studied facilitation mechanisms and thus this separation might help give insight into the mechanisms operating here.

      We tried to separate effects with and without legumes, but since all our 4-species mixtures necessarily included a legume, the plots without legumes were only monocultures and 2-species mixtures, and therefore it did not give a representative picture of the experimental design.

      1. Overall, a lack of clarity of underlying mechanisms is the greatest weakness of the paper.

      We recognize that the underlying mechanisms remain unclear, as indeed we did not measure the initial amount of standing variation and we could not find seed or populations with the exact same genetic variation. This was not done as investigating the potential genetic mechanism was not the goal of the study. Therefore, we can only speculate regarding the potential mechanisms, and we have now added an extensive paragraph discussing the possibilities in the discussion (L305-323).

      Reviewer #3 (Public Review):

      This work investigates the effects of growing annual crops for two generations in the same or a different social environment (coexistence history being single plant, monoculture, mixture of species) on measures of competition and yield. This is a very interesting and timely topic; diversification in agriculture is a promising means to help reduce the global decline of biodiversity. The experimental setup appears to be sound and the experiment is carefully executed (though this is not my area of expertise). The authors conclude that growing plants in the same community as their parents did reduces competition.

      However, I am not convinced by the interpretation of the results. Particularly the results for competition versus overall yield are in conflict. This discrepancy is not properly discussed and is largely ignored in the conclusions. Hence, I doubt whether the results support the conclusions.

      We would like to thank the reviewer for the constructive feedback and the appreciation of our work. Regarding the potentially conflicting findings mentioned by the reviewer here, we would like to refer to our previous statement in response to the review editor, in particular response (1) and (3), where we show that these results are not necessarily conflicting.

      My most important comment relates to the discrepancy between results for total yield (Figure 3b) versus those for competition (Figure 2a) and for net biodiversity effect (Figure 3a). Results for all those measures are based on yield records. Figure 2a and 3b (panel fertilizer) show clearly that plants that have the same coexistence history as the tested plants outperform those having a different co-existence history. Figure 3b, however, shows no effect of coexistence history on yield; total yield for Same and Different do not differ. How to reconcile these results? Remarkably, this discrepancy is not discussed at all; the discussion largely ignores the absence of an effect on total yield.

      This discrepancy is now discussed in the discussion (L230-245), which was largely rephrased to take into account the comments. The discrepancy between the response of plant-plant interactions and the response of net biodiversity effects to coexistence history can stem from various reasons. First, net biodiversity effects are driven both by complementarity and selection effects (6); therefore, a reduction in competition does not necessarily lead to an increase in net biodiversity effects, as this can be compensated by concurrent changes in selection effects. Changes in RII should however correlate with complementarity effects, which they do in our study (Fig. S11, p-value = 0.033), indicating that reduced competition and/or increased facilitation correlates with higher complementarity effects. As mentioned in the response to the general comments, the reference for RII (single plant) is not the same as for community level measures such as NE (monoculture). This can also explain why coexistence history affects RII but not NE (see how we calculate this extra RII_monoculture in the main response). Finally, our RII calculations and net biodiversity effects also take into account the baseline effect of coexistence history on the reference plant or community (i.e. single plant for RII, monoculture for net effects). This allows to explicitly distinguish the effects of coexistence history on the interactions, independently of the baseline effect on plant performance overall (35), and can explain why the effect of coexistence history on relative metrics (such as RII) does not appear in absolute metrics (such as total yield). We also suggest that the limited timeframe of this study – two generations – might be the reason for the lack of more significant changes in total yield.

      Related to the previous comment, the title includes the phrase “reduces competition”. In the manuscript, competition is derived from effects on yield. Still, there is no benefit of the same coexistence history for total yield. This is somewhat misleading.

      See response to main comments.

      A second important comment relates to the absence of results from the second year. The Methods section explicitly states that the comparisons made in year 3 (as shown in Figure 2) were also made in the second year (2018; L350-358). However, no results are presented. Why are those results excluded?

      We collected only partial data after 1 year, as this was considered as an intermediate stage, where adaptation was less likely to give significant results. Notably, we put fewer efforts into collecting data at the individual-level and reduced the number of traits measured, which prevented us from having a full picture of the response of plant-plant interactions as well as of the trait space. Therefore, we decided to not present these partial pieces of data in the study.

      A third comment relates to the distinction between competition and facilitation (Equations 3 and 4, and corresponding results), which is artificial and not very meaningful in my opinion. Since RII will never be precisely equal to zero (i.e., the RII=0 category is empty), an increase (decrease) in facilitation must go together with a decrease (increase) in competition, and vice versa. This must be the case since the total of both categories must add up to the number of comparisons made. In other words, if we have a total of N objects, being either apples or pears, then, if we have fewer apples, we must have more pears. (hence, L93-94 is a tautology). I suggest dropping this distinction from the manuscript.

      The decomposition between competition and facilitation was dropped (as already mentioned above).

      The Discussion seems to ignore some of the results that don’t seem to match the “desired” outcome. For example, L178 speaks about niche differentiation as if this was found, but it was not. Same for L200. Similarly, L181 speaks about “the yield benefit”, which was not there.

      We tried to reformulate the discussion and notably to emphasize that we find some clues for niche differentiation (light use, RII) but this did not consistently match other measures (traits, CE).

      While the manuscript is well written with respect to the language, it is not always easy to follow and absorb. This is partly because the number of traits is large. A table with the traits could help. Also, the writing could be improved to help the reader get the message. For example, when showing results in Figure 2, it could be mentioned from the start that these are relative to single plants, whereas those in Figure 3a are relative to monoculture. This can be found in the methods but should be clear from the Results as well.

      This was clarified and specified in both the main text of the results and in the figure legends.

    1. Author Response

      Reviewer 3

      This is work by an internationally recognized group with strong expertise in sophisticated single-molecule microscopy assays in cells. They present here a single-molecule fluorescence-based assay for proximity in the nanometer range.

      It has long been reported that cyanine dyes such as Cy3, Cy5 and derivatives such as AF555, AF647 can undergo a photoswitching mechanism by which the shorter wavelength dye when being excited can switch the longer wavelength dye which is in a dark state back into the bright state. And it has furthermore been reported that this switching mechanism is not based on FRET, as the distance requirement is more stringent (up to ~ 2 nm). However, this mechanism has not been fully explored for the investigation of molecular interactions yet.

      The authors in the present work present a similar mechanism for a different class of rhodamine-based fluorophores, specifically JF549 and JFX650. They describe the discovery of this mechanism in dual-color labeling of a pentameric protein and initial characterization to distinguish it from UV-light-mediated recovery from a pumped dark state as reported for (d)STORM-like measurements. They extend their observation to TMR, JF529 as lower wavelength "senders" and JF646 and JFX646 as longer wavelength "receivers" that can become reactivated into the ground state upon illumination of a nearby "sender". The authors then test activation pulse length and distance dependence and find that longer pulses lead to more recovery and that PAPA of JF549/JFX650 has unlike previously observed for the Cy3/Cy5 pair a smaller distance dependence than FRET of the same fluorophore pair. The authors then move on to use both the UV-light mediated direct reactivation "DR" and proximityassisted photoactivation "PAPA" to activate different molecules that are either double-labeled for PAPA or singly labeled with JFX650 for DR. They succeeded in four different cases to identify clear population shifts to distinguish molecules of different mobility.

      Overall, I think the authors made an interesting discovery and characterizing this previously poorly characterised interaction for cellular single-molecule experiments is certainly an important effort. The authors make an honest and quite complete effort to work out the practical details of this interaction and designed experiments that convincingly highlight the basic capabilities this technique offers to the detection of verified interactions and the mobility of interacting molecules in cells.

      The weakness is that these capabilities do not seem to be as clear-cut as the reviewer hoped for when starting to read this manuscript. It remains unclear to this reviewer, to what extant PAPA molecules can be separated from DR molecules. In all but the last diffusion experiment(s) in Figure 4, PAPA molecules seem to be significantly perturbed by DR molecules, casting doubt on the usefulness in real experiments. Similarly, in Figure 5, a difference is seen but does not allow for quantification. This certainly is not the case for other methods of sensing as well, but maybe the authors could more specifically compare their efforts and the dynamic range to other sensors for example in Figure 5? This would make it easier for the reader to make up their mind if the assay is worthwhile adopting for their system.

      We agree that a problem with PAPA at present is that although PAPA trajectories are significantly enriched for double-labeled complexes, they are still “contaminated” with singlelabeled molecules. As we described in the Discussion (and as pointed out by Reviewer 1), we think that one major contribution to this background arises from chance proximity of sender and receiver molecules independent of direct physical interaction. Additionally, some background is expected from continual spontaneous (a.k.a. “thermal”) reactivation of molecules from the dark state.

      In response to the reviewers’ comments, we have tried to quantify more precisely how much PAPA enriches for one population over another by fitting the diffusion spectra of 2-component mixtures to linear combinations of the corresponding individual components (Figure 4–figure supplement 4). We estimate that the fold enrichment of double-labeled molecules ranged from 3.7 to 37-fold between different 2-component mixtures.

      We fully agree that it is critical that researchers who use PAPA be aware of its limitations, so that they do not fallaciously assume that all green-reactivated localizations are protein complexes. To avoid committing a bait-and-switch against our readers, we now state explicitly in the Introduction that PAPA in its current form enriches for complexes but does not provide perfect selectivity. In Appendix 2, we now discuss the problem of background reactivation in more detail and outline what we think will be required to correct quantitatively for this background. Though we believe that such corrections will ultimately be possible, at least in some cases, figuring out how to do this rigorously will require substantial additional development of experimental and computational methods, which we hope the editor and reviewers agree is beyond the scope of the current paper.

      At the end of Appendix 2, we briefly mention another technical problem that we have noticed with SNAP ligand background staining. While this background was negligible for the experiments described in this paper, which involved highly expressed SNAPf transgenes, it may pose a more significant problem for SNAPf-tagged proteins with lower expression levels. We think it is worth mentioning this problem to make readers aware of it and hopefully to motivate the development of better orthogonal pairs of self-labeling tags.

      While there are obviously limitations to PAPA, we think this should not overshadow the fact we have identified a novel photophysical property of commonly used fluorophores and harnessed it to detect molecular interactions in live cells. Our initial proof-of-concept study provides a foot in the door of this new biophysical approach, which we and others will continue to refine. Immediate applications of PAPA could include disambiguation of peak assignments in complex diffusion spectra, confirmation of proposed interactions between proteins (and subsequent investigations into the molecular mechanisms supporting such interactions), or integration into SPT-based high-throughput screening (https://www.eikontx.com/technology) to provide a useful additional readout for each experimental condition.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors show that the unmitigated generation interval of the original variant of SARS-CoV-2 is longer than originally thought. They argue that in the absence of interventions that limit transmission late in the course of infection, the fraction of transmission events that occur before symptom onset would be considerably lower, and the fraction of transmission events occurring 10 days or more after infection of the index case would be substantially higher.

      These findings improve our ability to accurately estimate the basic reproductive number (R0), to evaluate quarantine and isolation policies, and to model counterfactual intervention-free scenarios. Many applied analyses rely on accurate generation interval estimates. To head off confusion, it would be helpful if the authors could provide more comprehensive guidance about which applied analyses should be informed by the unmitigated generation interval, or the observed generation interval. (E.g. the unmitigated interval is useful for quarantine and isolation policies, but would we ever want to use the unmitigated interval to estimate R?).

      The unmitigated generation-interval should be used for estimation of the R0 of the initial epidemic phase, but not for the R(t) of the current epidemics. Estimation of R(t) must account for changes in generation-interval distributions caused by the invasion of new variants and changes in behavior. When analyzing policies of quarantine, isolation or contact tracing, the unmitigated interval should also be adopted to account for late transmissions.

      We added few sentences at the end of our introduction to clarify this point:

      “The estimated unmitigated generation-interval distribution could be adopted for answering questions about quarantine and isolation policy, as well as for estimating the original R0 at the initial spread in China. However, estimation of instantaneous R(t) should account for changes in generation-interval distributions, reflecting mitigation effects and the current variant.”

      The analysis estimates a longer generation interval after accounting for three main sources of bias or error that are common in other analyses: 1. Recently infected individuals are intrinsically overrepresented in data on a growing epidemic. Thus, shorter incubation periods and forward serial intervals are more likely to be observed, even in the absence of interventions. This analysis adjusts for these dynamical biases. 2. Interventions or behavioral changes can prevent transmission late in the course of infection. This can shorten the generation and serial intervals over the course of an epidemic. This analysis focuses specifically on transmission pairs observed very early, before the adoption of interventions. 3. The incubation period and generation interval should be correlated - infectors that progress relatively quickly to symptoms should also become infectious sooner (symptom onset occurs near the peak of viral titers). Most existing analyses assume these intervals are uncorrelated, but this analysis accounts for their correlation.

      Overall, the conclusions seem reasonable and well-supported. The observation that the generation interval decreases over the course of an epidemic is also consistent with existing studies that show the serial interval has similarly decreased over time. But given the implications of the findings, I hope the authors can address a few questions about potential additional sources of bias:

      1. It is somewhat reassuring that the generation interval decreases relatively smoothly as the cutoff date increases (Fig. S6), but it would be helpful if the authors address the potential impact of ascertainment biases. One of the main reasons that the authors estimate a shorter generation interval is that they define January 17th, early in the outbreak before interventions and behavioral changes had taken place, as the cutoff point for the infector's date of symptom onset. This cutoff eliminates biases from interventions, but it also severely limits the size of the transmission-pair dataset (Fig. S3), and focusing on this very early subset of cases may increase the influence of ascertainment bias. Prior to January 17th, should we expect observed transmission pairs to involve more severe cases on average? And is the unmitigated generation interval correlated with case severity?

      We thank the reviewer for identifying a source of possible bias that we overlooked. Following the comment we performed a new sensitivity analysis for the inclusion of the severe cases, summarized in Appendix 1—figure 11.

      Severity of the cases was reported only in Ali et al.’s data, for some of the individuals. In these cases, individuals are divided into one of three conditions: mild, severe (non-fatal) and death. As non-mild cases represent a small fraction of the dataset, we combine them into one category, which we denote as severe.

      Severe cases (including deaths) were overrepresented in the period prior to January 17, with 8 out of 77 cases, compared to 18 out of 745 in the period of January 18-31. The effect of inclusion of severe cases was analyzed by comparing the means of the estimated generation-interval distribution, separately for the two periods in question, using the inference framework with 30 bootstrapping runs. For the earlier period, the estimated means were compared between the dataset with or without the severe cases. For the later period, we also consider the “enriched” dataset, in which severe cases are oversampled for each bootstrap such that the proportion of severe cases matches that during the earlier period (10%). In both cases we see that the effect on the estimated mean generation interval is small.

      1. The analysis assumes the incubation period follows a fixed distribution, whose parameterization comes from a meta-analysis of previously estimated incubation periods. But p.5 discusses the idea that observed incubation periods are affected by the same dynamical biases as forward serial intervals, "For example, when the incidence of infection is increasing exponentially, individuals are more likely to have been infected recently. Therefore, a cohort of infectors that developed symptoms at the same time will have shorter incubation periods than their infectees on average, which will, in turn, affect the shape of the forward serial-interval distribution." Has the incubation period been adjusted for these dynamical biases, or should it be?

      In our analysis we use the incubation period distribution from Xin et al. 2021 which already considers the backward bias caused by the expanding epidemic with the corrected growth rate of 0.1/d. Xin et al. showed in their meta-analysis that the mean incubation period reported by the various sources changed according to the dates used by the source. Incubation periods prior to the peak of the epidemic in China were lower than ones from after the peak, in a manner that coincided with the backward correction they performed (using a similar derivation to that suggested by Park et al. 2021). Accordingly, the distribution of incubation period they report is the intrinsic incubation period, after correction for the growth rate of the initial spread in China. We added two sentences in our methods section to clarify this point:

      “In their meta-analysis, Xin et al. found an increase of the incubation period following the introduction of interventions in China, matching the theoretical framework shown above. Their inferred incubation period distribution includes a correction for the growth rate of the early spread, accordingly.”

      Furthermore, we perform a sensitivity analysis for the shape of the incubation period distribution, and show that it has a minor effect on our conclusions (Appendix 1—figure 10).

      1. It appears that correlation parameter estimates co-vary with estimates of the mean generation interval (Fig. S6; S13b). Are the authors confident that the correlation parameter is identifiable? How much would the median generation interval estimate in the main analysis change if the correlation parameter had been fixed to 0 (which isn't realistic) or to 0.5 (which might be plausible)?

      As the reviewer pointed out, the correlation parameter estimates co-vary with estimates of the mean generation interval. We further analyzed this relation following the comment. The analysis is summarized in supplemental figures S19-20.

      We first examine the relation between the mean generation interval and the correlation parameter based on the uncertainty estimates, consisting of 1000 bootstrap runs. Appendix 1—figure 12 shows a joint bivariate scatter plot of the parameters, together with contours of equal probability. As can be seen there is a connection between the parameters. The estimates centered around the maximum likelihood estimate with correlation parameter of 0.75 and mean generation interval of 9.7 days. The confidence interval for the correlation parameter of 0.45-0.95 corresponds to mean generation intervals in the range of 8-11 days, supporting the conclusion of this study.

      Next, we reanalyzed the dataset while fixing the correlation parameter, as suggested by the reviewer. Appendix 1—figure 13 shows the estimated mean generation interval for fixed correlation parameters with values of 0, 0.25, 0.5, 0.75, 0.9. For each fixed correlation parameter 100 bootstrapping runs. As can be seen, the results reflect the same connection that can be seen in Appendix 1—figure 12, with probable values in the range of 8-11 days, for correlation parameters in the range of 0.5-0.9. Assuming no correlation would cause underestimation of the mean generation interval match previous literature (Hart, Maini, and Thompson 2021; Park et al. 2022).

      Reviewer #2 (Public Review):

      There have been several estimates of the generation time and serial interval published for SARS-CoV-2, but as the authors note, estimates can be subject to biases including shifted event timing depending on the phase of the epidemic, correlation in characteristics between infector and infectee, and impact of control measures on truncating potential infectiousness. This study, therefore, has several strengths. It first collates data on transmission events from the earliest phase of the COVID-19 pandemic, then makes adjustments for these potential biases to estimate the generation time in absence of control measures, and finally discusses implications for transmission.

      Given many subsequent aspects of the COVID-19 pandemic have been defined relative to earlier phases (e.g. relative transmissibility of variants, relative duration of infectiousness), understanding the baseline characteristics of the infection is crucial. I thought this paper makes a useful contribution to this understanding, generating adjusted estimates for infectiousness (which is longer than previous estimates) and corresponding values for the reproduction number (which is remarkably similar to earlier estimates, presumably because of the different sources of bias in the growth rate and generation time distribution somehow end up canceling each other out).

      However, there are some weaknesses at present. The study correctly flags several potential sources of bias in estimates, but in making adjustments uses estimates from the literature that themselves could suffer from these biases, e.g. the distribution of incubation period from a 2021 meta-analysis. Although the authors conduct some sensitivity analysis it would be worth including some more explicit consideration of whether they would expect any underlying bias to propagate through their calculations. The authors also conduct some sensitivity analysis around the underlying data (e.g. ordering of transmission pairs), but again it would be useful to know whether there could be systematic biases in these early data. Specifically, the paper references Tsang et al (2020), which highlighted variability in early case definitions - is it possible that early generation times are estimated to be longer because intermediate cases in the transmission chain were more likely to go undetected than later in the epidemic?

      We recognize the potential biases in the transmission pairs data. We therefore developed an extensive framework of sensitivity analyses for identifying biases that could substantially affect the results. In the results section and figure 5, we show that the main study result, that the unmitigated generation-interval distribution is longer than previously estimated, is robust to reasonable amounts of ascertainment bias. We discuss this point at length and have added several supplemental figures to support this claim.

      As reviewer #3 mentioned, we conducted a sensitivity analysis for the inclusion of the longest serial intervals, to investigate possible effects of missing links in the longest transmission pairs. We also discuss why we think it’s not necessary to explicitly model the short intervals that may be unobserved due to missing links.

      “Second, we considered the possibility that long serial intervals may be caused by omission of intermediate infections in multiple chains of transmission, which in turn would lead to overestimation of the mean serial and generation intervals. Thus, we refit our model after removing long serial intervals from the data (by varying the maximum serial interval between 14 and 24 days). We also considered “splitting” these intervals into smaller intervals, but decided this was unnecessarily complex, since several choices would need to be made, and the effects would likely be small compared to the effect of the choice of maximum, since the distribution of the resulting split intervals would not differ sharply from that of the remaining observed intervals in most cases.”

      We added to the discussion text regarding the effect of possible bias in the dataset, explicitly specifying the ascertainment bias.

      “Our analysis relies on datasets of transmission pairs gathered from previously published studies and thus has several limitations that are difficult to correct for. Transmission pairs data can be prone to incorrect identification of transmission pairs, including the direction of transmission. In particular, presymptomatic transmission can cause infectors to report symptoms after their infectees, making it difficult to identify who infected whom. Data from the early outbreak might also be sensitive to ascertainment and reporting biases which could lead to missing links in transmission pairs, causing serial intervals to appear longer (For example, people who transmit asymptomatically might not be identified). Moreover, when multiple potential infectors are present, an individual who developed symptoms close to when the infectee became infected is more likely to be identified as the infector. These biases might increase the estimated correlation of the incubation period and the period of infectiousness. We have tried to account for these biases by using a bootstrapping approach, in which some data points are omitted in each bootstrap sample. The relatively narrow ranges of uncertainty suggest that the results are not very sensitive to specific transmission pairs data points being included in the analysis. We also performed a sensitivity analysis to address several potential biases such as the duration of the unmitigated transmission period, the inclusion of long serial intervals in the dataset, and the incorrect ordering of transmission pairs (see Methods). The sensitivity analysis shows that although these biases could decrease the inferred mean generation interval, our main conclusions about the long unmitigated generation intervals (high median length and substantial residual transmission after 14 days) remained robust (Figure 5).”

      It would also be helpful to have some clarifications about methodology, particularly in how the main results about generation time are subsequently analyzed. For example, estimates such as the conversion of generation time to R0 and VOC scalings are described very briefly, so it is currently unclear exactly how these calculations are being performed.

      Following the reviewer comments we made edits to the Methods section in order to make it more readable and clearer. We added subheadings for the various sections. Moreover, we added a section explaining the derivation of the basic reproduction number and clarified the section regarding the VOCs extrapolations.

      We made some edits to the methods section in order to make it more accessible and clear, for example, we added subheadings for the various sections, added a section explaining the derivation of the basic reproduction number, and clarified the section regarding the VOCs extrapolations.

      Reviewer #3 (Public Review):

      Sender & Bar-On et al. perform robust analyses of early SARS-CoV-2 line list data from China to estimate the intrinsic generation interval in the absence of interventions. This is an important topic, as most SARS-CoV-2 data are from periods when transmission-reducing interventions are in place, which will lead to underestimation of the potential infectious period.

      The authors highlight two shortcomings in previous approaches. First, the distribution of 'observed' serial intervals (the time between symptom onset in the infector and symptom onset in the infectee) depends not only on the timeline of each infector's infection, but also the epidemic growth rate, which weights the proportion of observed short vs. long serial intervals. The authors argue that by accounting for this weighting, more accurate estimates of the intrinsic generation interval - the metric on which isolation policies are based - can be obtained. Second, the authors find that the original SARS-CoV-2 generation interval distribution has both a higher mean and longer tail than previous estimates when using only data prior to the introduction of interventions. Finally, the authors use publicly available data on viral load trajectories to extrapolate their estimates to other SARS-CoV-2 variants, finding that alpha, delta, and omicron may have shorter generation intervals than original SARS-CoV-2. These findings are important, as case isolation policies are based on assumptions for how long individuals remain infectious. More broadly, these methods will be important for future work to correctly estimate generation intervals in other outbreaks.

      The conclusions are well supported by the data, and a suite of sensitivity analyses give confidence that the findings are robust to deviations from many of the key assumptions. The code is well documented and publicly available, and thus the findings are easily reproducible. Key strengths of the paper include the clarity and rigor of the modeling methods, and the exhaustive consideration of potential biases and corresponding sensitivity analyses - it is very difficult to think of potential biases that the authors have not already considered! I think this is a well-written and well-executed study. The work is likely to be impactful for reconsidering SARS-CoV-2 isolation policies and revisiting generation interval estimates from other data sources. I also expect this to be a key reference and method for future studies estimating the generation interval.

      I have some minor comments on potential weaknesses and interpretation:

      1. Uncertainty in early generation interval estimates. One of the conclusions is that the estimated mean generation interval is longer than the observed mean serial interval. However, this conclusion does not seem justified given that the observed mean serial interval (9.1 days) is well within the 95% CI of 8.3-11.2 days for the mean generation interval. The confidence intervals for the serial interval in figure 2 are also wide for pre-Jan 17th (though presumably these would be reduced if all pre-Jan 17th serial intervals were combined). Further, only 77 of the ~1000 transmission pairs are actually from pre-January 17th. The actual sample size used for these estimates is much smaller than suggested by Figure S1 and thus this should be made clear. Therefore, although the intuition for why observed serial intervals may differ from the generation interval is correct, I do not think that the data alone demonstrate this. A related issue is on ascertainment bias - could the early serial interval data be biased longer because ascertainment is initially poor and thus more intermediate infectors are missed? The authors consider removing particularly long serial intervals to try and account for this, but that does not deal with e.g. chains of multiple short serial intervals being incorrectly recorded as a single long serial interval (but still within 16 days).

      We agree with the reviewer that due the large uncertainty we cannot deduce that the mean generation interval is longer than the mean serial interval. We changed the phrasing to emphasize this statement is supported by mathematical theory.

      “We note that our estimated mean generation-interval is longer than the observed mean serial-interval (9.1 days) of the period in question. This is supported by the theory (Park et al. 2021) of the dynamical effects of the epidemic -- in contrast to the common assumption that the mean generation and serial intervals are identical. During the exponential growth phase, the mean incubation period of the infectors is expected to be shorter than the mean incubation period of the infectee - this effect causes the mean forward serial interval to become longer than the mean forward generation interval of the cohorts that developed symptoms during the study period. However, these cohorts of infectors with short incubation periods will also have short forward generation (and therefore serial) intervals due to their correlations. When the latter effect is stronger, the mean forward serial interval becomes shorter than the mean intrinsic generation interval, as these findings suggest.“

      Following the comment, we added to Figure S1 the filtering according to date, to reflect the true sample size we use for the main analysis (We renamed it: Appendix 1—figure 1).

      We recognize the potential biases in the transmission pairs data. We therefore developed an extensive framework of sensitivity analyses for identifying biases that could substantially affect the results. In the results section and figure 5, we show that the main study result, that the unmitigated generation-interval distribution is longer than previously estimated, is robust to reasonable amounts of ascertainment bias. We discuss this point at length and have added several supplemental figures to support this claim.

      As reviewer #3 mentioned, we conducted a sensitivity analysis for the inclusion of the longest serial intervals, to investigate possible effects of missing links in the longest transmission pairs. We also discuss why we think it’s not necessary to explicitly model the short intervals that may be unobserved due to missing links.

      “Second, we considered the possibility that long serial intervals may be caused by omission of intermediate infections in multiple chains of transmission, which in turn would lead to overestimation of the mean serial and generation intervals. Thus, we refit our model after removing long serial intervals from the data (by varying the maximum serial interval between 14 and 24 days). We also considered “splitting” these intervals into smaller intervals, but decided this was unnecessarily complex, since several choices would need to be made, and the effects would likely be small compared to the effect of the choice of maximum, since the distribution of the resulting split intervals would not differ sharply from that of the remaining observed intervals in most cases.”

      We added to the discussion text regarding the effect of possible bias in the dataset, explicitly specifying the ascertainment bias.

      “Our analysis relies on datasets of transmission pairs gathered from previously published studies and thus has several limitations that are difficult to correct for. Transmission pairs data can be prone to incorrect identification of transmission pairs, including the direction of transmission. In particular, presymptomatic transmission can cause infectors to report symptoms after their infectees, making it difficult to identify who infected whom. Data from the early outbreak might also be sensitive to ascertainment and reporting biases which could lead to missing links in transmission pairs, causing serial intervals to appear longer (For example, people who transmit asymptomatically might not be identified). Moreover, when multiple potential infectors are present, an individual who developed symptoms close to when the infectee became infected is more likely to be identified as the infector. These biases might increase the estimated correlation of the incubation period and the period of infectiousness. We have tried to account for these biases by using a bootstrapping approach, in which some data points are omitted in each bootstrap sample. The relatively narrow ranges of uncertainty suggest that the results are not very sensitive to specific transmission pairs data points being included in the analysis. We also performed a sensitivity analysis to address several potential biases such as the duration of the unmitigated transmission period, the inclusion of long serial intervals in the dataset, and the incorrect ordering of transmission pairs (see Methods). The sensitivity analysis shows that although these biases could decrease the inferred mean generation interval, our main conclusions about the long unmitigated generation intervals (high median length and substantial residual transmission after 14 days) remained robust (Figure 5).”

      1. Frailty of using viral loads to extrapolate generation intervals. The authors take the observation that variants of concern demonstrate faster viral clearance on average to estimate shorter generation intervals for alpha, delta, and omicron. The authors rightly point out in the discussion that using viral load as a proxy for infectiousness has many limitations. I would emphasize even further that it is very difficult to extrapolate from viral load data in this way, as infectiousness appears to vary far more between variants than can be explained by duration positive or peak viral load. Other factors are potentially at play, such as compartmentalization in the respiratory tract, aerosolization, receptor binding, immunity, etc. Further, there is considerable individual-level variation in viral trajectories and thus the use of a population-mean model overlooks a key component of SARS-CoV-2 infection dynamics. An important reference, which came out recently and thus makes sense to have been missed from the initial submission, is Puhach et al. Nature Medicine 2022 https://doi.org/10.1038/s41591-022-01816-0.

      We agree with the reviewer about the frailty of using viral loads to extrapolate generation intervals. We therefore expanded our discussion of the limitation of using viral load data for inferring infectiousness including many of the points mentioned by the reviewer. We use viral load data in the most minimal way to try to enable some discussion of new VOC, and try to emphasize the needed caution.

      Viral load trajectories data have potential for informing estimates of the infectiousness profile. However the relationship between viral load, culture positivity, symptom onset, and infectivity is complex and not well characterized. Due to this limitation we tried to use viral loads in a more limited way, extrapolating our results to variants of concerns (which lack unmitigated transmission data). Following the comment, we added a detailed discussion of the limitations of using viral loads as a proxy for infectiousness, including the variation of viral loads across individuals. We also added supplementary figures (Figure 6—figure supplements 1-2) to show the possible effect of an individual's viral loads in relation to the infectiousness and for comparison with new viral load and culture results (Chu et al. 2022; Killingley et al. 2022). As the viral load trajectories data for the different VOC is given only as a function of time from the onset of symptoms, it is not possible to directly link it to the fraction of transmission post 14 days from infection. We made changes to Figure 6 to clarify the possible connection of viral load with the TOST (time from symptoms onset to transmission) distribution and the resulting extrapolation to the unmitigated generation-interval distributions.

      “SARS-CoV-2 viral load trajectories serve an important role in understanding the dynamics of the disease and modeling its infectiousness (Quilty et al. 2021; Cleary et al. 2021). Indeed, the general shapes of the mean viral load trajectories and culture positivity, based on longitudinal studies, are comparable with our estimated unmitigated infectiousness profile (Figure 6—figure supplements 1-2, comparison with (Chu et al. 2022; Killingley et al. 2022; Kissler et al. 2021)). However, the nature of the relationship between viral load, culture positivity, symptom onset, and real-world infectivity is complex and not well characterized. Therefore, the ability to infer infectiousness from viral load data is very limited, especially near the tail of infectiousness, several days following symptom onset and peak viral loads. Viral load models are usually made to fit the measurements during an initial exponential clearance phase and in many cases miss a later slow decay (Kissler et al. 2021). Furthermore, there is considerable individual-level variation in viral trajectories that isn’t accounted for in population-mean models (Kissler et al. 2021; Singanayagam et al. 2021). Other factors limiting the ability to compare generation-interval estimates with viral loads models are the variability of the incubation periods and its relation to the timing of the peak of the viral loads, and the great uncertainty and apparent non-linearity of the relation between viral loads and culture positivity (Jaafar et al. 2021; Jones et al. 2021). Due to these caveats and in order to avoid over interpretation of viral load data, we restrict our extrapolation of new VOCs’ infectiousness to a single parameter characterizing the viral duration of clearance.”

      We also edited another paragraph in the discussion:

      “Our extrapolations are necessarily crude given the complex relationship between viral load, symptomaticity, and infectiousness discussed above. Moreover, compartmentalization in the respiratory tract, aerosolization, receptor binding affinity, and immune history can also play important roles in determining the infectiousness profiles of SARS-CoV-2 variants (Puhach et al. ). ”

      1. Lack of validation with other datasets This study hinges on data from a single setting in a short window of time. Although the data are from multiple publications, the fact that so many reported the same transmission pair data demonstrates that these are overlapping datasets. As the authors note, there are potential biases e.g., ascertainment rates and behavioral changes which will impact the generation interval estimates. Thus, generalizability to other settings is limited.

      We agree with the reviewer that the dataset used in our study is limited, and consists of overlapping transmission pairs. We perform some analysis of the possible bias caused by inclusion of each dataset, as can be seen in Appendix 1—figure 4.

      The best validation would have been a comparison with another independent dataset from the early spread of the epidemic, but no such dataset exists. We added a sentence to the discussion to emphasize this point.

      “Due to the nature of early spread of a new unknown disease it is nearly impossible to find two completely unrelated datasets from the period prior to mitigation, limiting the ability of further validation of the current results.”

      1. The impact of epidemic dynamics on infector vs. infectee serial intervals. It took me a long time to get my head around the assertion that the forward serial interval distribution will be longer during epidemic growth due to the overrepresentation of short incubation periods among infectors relative to infectees. A supplementary figure, similar to the way Figure 1 is laid out, to illustrate this concept may go a long way to aid the reader's understanding.

      We added an explanation to the paragraph in order to make it clearer:

      “A cohort of individuals that develop symptoms on a given day is a sample of all individuals who have been previously infected. When the incidence of infection is increasing, recently infected individuals represent a bigger fraction of this population and thus are over-represented in this cohort. Therefore, we are more likely to encounter infected individuals with a short incubation period in this cohort compared to an unbiased sample. The forward serial-interval is calculated for a cohort of infectors who developed symptoms at the same time and therefore is sensitive to this bias. These dynamical biases are demonstrated using epidemic simulations by Park et al."

      1. Simulations to illustrate concepts and power Given the assertion that observed serial intervals will depend on epidemic growth rates, reporting, and timing of interventions, I think a simple simulation to illustrate some of these ideas would be very helpful. For example, a simple agent-based model with simulated infectivity profiles and incubation periods using the estimated bivariate distribution would be extremely helpful in illustrating how serial intervals and estimates of the generation interval can differ from the true intrinsic generation interval (I coded such a simulation to help me understand this paper in a couple of hours with <100 lines of R code, so I do not think this would be much work). This would also be very helpful for illustrating statistical power re. comment 1.

      The current paper is based on a strong theoretical foundation provided by previous works, specifically Park et al. 2021, which used simulations similar to the reviewer’s suggestions to demonstrate the dynamical biases. We now mention these simulations somewhere in the introduction section:

      “These dynamical biases are demonstrated using epidemic simulations by Park et al."

    1. Author Response

      Reviewer #1 (Public Review):

      Bohère, Eldridge-Thomas and Kolahgar have studied the effect of mechanical signalling in tissue homeostasis in vivo, genetically manipulating the well known mechano-transductor vinculin in the adult Drosophila intestine. They find that loss of vinculin leads to accelerated, impaired differentiation of the enteroblast, the committed precursor of mature enterocytes, and stimulates the proliferation of the intestinal stem cell. This leads to an enlarged intestinal epithelium. They discriminate that this effect is mediated through its interaction with alpha-catenin and the reinforcement of the adherens junctions, rather than with talin and integrin-mediated interaction with the basal membrane. This results aligns well, as the authors note, with previous observations from Choi, Lucchetta and Ohlstein (2011) doi:10.1073/pnas.1109348108. Bohère et al then explore the impact that disrupting mechano-transduction has on the overall fitness of the adult fly, and find that vinculin mutant adult flies recover faster after starvation than wild types.

      The main conclusions of the paper are convincing and informative. Some important results would benefit from a more detailed description of the phenotypes, and others could have alternative explanations that would warrant some additional clarification.

      1) - Interpretation of phenotypes in vinc[102.1] mutants

      The paper presents several adult phenotypes of the homozygous viable, zygotic null mutant vinculin[102.1], where the fly gut is enlarged (at least in the R4/5 region). In many cases, they correlate this phenotype with that of RNAi knockdown of vinculin in the gut induced in adult stages. This is a perfectly valid approach, but it presents the difficulty of interpretation that the zygotic mutant has lacked vinculin throughout development and in every fly tissue, including the visceral mesoderm that wraps the intestinal epithelium and that also seems enlarged in the vinc[102.1] mutant. So this phenotype, and others reported, could arise from tissue interactions. To me, the quickest way to eliminate this problem would be to express vinculin in ISCs and/or EBs the vinc[102.1] background, either throughout development or after pupariation or emergence, and observe a rescue.

      We agree with the reviewer that we cannot exclude additional vinculin role(s) in other tissues during or after development that might have an impact on the intestinal epithelium. Our attempts to express a full-length Vinculin construct (Maartens et al, 2016) in the vinc102.1 flies, either in adulthood or throughout development, were not very conclusive: although we observed some degree of rescue, it was not fully penetrant. This was in contrast to the complete rescue observed with the genomic rescue of vinculin. Thus, it is possible that some form of tissue interaction contributes to the phenotype observed, for example if vinculin loss affects muscle structure. Alternatively, just like it was shown that too much active vinculin is detrimental to the fly (Maartens et al, 2016), our experiment suggests that too much vinculin may be deleterious to the intestine.

      In any case, because of cell-specific knockdowns in the adult gut, we are confident that EB reduction of vinculin levels or activity is sufficient to accelerate tissue turnover, at least in a specific portion of the posterior midgut. We have amended the text to acknowledge a role for tissue interactions (see page 6 (end of first paragraph), page 7 (start of last paragraph), page 12 (starvation experiments).

      An experiment where this is particularly difficult is with the starvation/refeeding experiment. The authors explored whether the disruption of tissue homeostasis, as a result of vinculin loss, matters to the fly. So they tested whether flies would be sensitive to starvation/re-feeding, where cellular density changes and vinculin mechano-sensing properties may be necessary. They correctly conclude that mutant flies are more resistant to starvation, and suggest that this may be due to the fact that intestines are larger and therefore more resilient. However, in these animals vinculin is absent in all tissues. It is equally likely that the resistance to starvation was due to the effect of Vinculin in the fat body, ovary, brain, or other adult tissues singly or in combination. The fact that the intestine recovers transiently to a size slightly larger than that of the fed flies seems anecdotal, considering the noise within the timeline of fed controls. I am not sure this experiment is needed in the paper at all, but to me, the healthy conclusion from this effort is that more work is needed to determine the impact of vinculin-mediated intestinal homeostasis in stress resistance, and that this is out of the scope of this paper.

      Please the new data presented in Figure 8A-B (text page 12).

      2) - Cell autonomy of the requirement of Vinculin and alpha-Catenin

      Authors interpret that Vinculin is needed in the EB to maintain mechanical contact with the ISC, restrict ISC proliferation through contact inhibition, and maintain the EB quiescent. This interpretation explains seemingly well the lack of an obvious phenotype when knocking down vinculin in ISCs only, while knockdown in ISCs and EBs, or EBs only, does lead to differentiation problems. It also sits well with the additional observation that vinculin knockdown in mature ECs does not have an obvious phenotype. However, a close examination makes the results difficult to explain with this interpretation only. If the authors were correct, one would expect that in mutant clones, eventually, vinculin-deficient EBs will be produced, which should mis-differentiate and induce additional ISC proliferation. However, the clones only show a reduction in ISC proportions; the most straight forward interpretation of this is that vinculin is cell-autonomously necessary for ISC maintenance (which is at odds with the phenotype of vinculin knockdown using the ISC and ISC/EB drivers).

      We apologise that we were unclear in the text. With hindsight, the confusion may have been caused by our describing the phenotype of MARCM clones before reporting the accumulation of EBs in the vinc102.1 guts. Therefore, we swapped these two sections and improved the description of these experiments in the manuscript (see section: “The pool of enterocyte progenitors expands upon vinculin depletion” pages 6-8).

      In brief, we do not think that our results are at odds with the phenotype of vinculin knockdown using the ISC and ISC/EB drivers - we realise the text was misleading and hope to have clarified our observations in the revised manuscript (pages 7 and 8). From cell conditional RNAi experiments, like the reviewer, we would predict that vinculin knockdown or loss of function in mitotic clones (MARCM experiments, Figure 4E-G) will induce accelerated differentiation of vinculin deficient enteroblasts, which in turn will increase proliferation. We observed that vinc102.1 or vinc RNAi mitotic clones contained similar number of cells compared to control clones, but reduced proportion of stem cells (Figure 4G). We interpret this as indicating that to maintain an equivalent clone size, stem cells must have divided more frequently, with some divisions producing two differentiated daughter cells. This type of symmetric division would increase the EB pool (as seen in Figure 4-figure supplement 2B), at the expense of the ISC population, in turn decreasing long term clonal growth potential. Altogether, the results obtained with MARCM clones highlight changes in tissue dynamics compatible with those observed with cell-specific vinculin knockdowns.

      Also, from the authors interpretation, it would follow induce that the phenotype of vinculin knockdown in ISCs+EBs and in EBs only should be the same. However, in ISCs+EBs vinculin knockdown, differentiation accelerates, which is likely accompanied by increased proliferation (judging by the increase in GFP area, PH3 staining would be more definitive).

      Indeed, the accelerated differentiation observed with esgGal4>UAS VincRNAi is accompanied by increased proliferation with the two independent RNAi lines used. We have added this result in Figure 1-figure supplement 1G (and in text, page 5).

      This contrasts with the knockdown only in EBs, which leads to accumulation of EBs due to misdifferentiation, and increased proliferation, mostly of ISCs, as measured directly with PH3 staining, but not additional late EBs or mature ECs. The authors call this "incomplete maturation due to accelerated differentiation". I think that one should expect to find incomplete differentiation/maturation when the rate of the process is very slow, not the other way around. To me, these are different phenotypes, which could perhaps be explained if vinculin was also needed in the ISC to transmit tension to the EB and prevent its differentiation, and removing it only in the EB may be revealing an additional, cell-autonomous requirement in maturation.

      When vinculin is knocked down in EBs, cells appear bigger than controls (as judged by the RFP+ nuclei in Figure 5E). This, compared to yw and vinc102.1 guts shown in Figure 4D suggests that these cells are more advanced in their differentiation. We have removed the sentence, to not confuse the reader, and clarified the text (see page 8). The discrepancy in the differentiation index between the esgGal4 and KluGal4 experiments might result from differences in the drivers, or an additional role of vinculin in EC differentiation, which we now mention in the text (page 8).

      So far, we have no evidence to support the idea that vinculin is also needed in the ISC to transmit tension to the EB and prevent its differentiation; for example, the lack of any phenotype when we knocked down vinculin specifically in ISCs (Figure 3) – notably, no increase in ISC ratio and no increase in cell density (unlike the reduction seen in Figure 1F with ISC+EB Knockdown).

      Another unexpected result, considering the authors interpretation, is that the over expression of activated Vinculin (vinc[CO]) does not seem to have much of an effect. It does not change the phenotype of the wild type (where there is very little basal turnover to begin with) and it only partially rescues the phenotype of the vinc[102.1] mutants, when the rescue transgene vinc:RFP does. This again suggests that there may be tissue interactions, in development or adulthood, that may explain the vinc[102.1] phenotypes. It could also be that this incomplete rescue is due to the deleterious effect of Vinc[CO]; this is another reason for doing the vinc[102.1]; esg-Gal4; UAS-vincFL experiments suggested above). An alternative experiment to perform this rescue would be to knock down vinculin gene while overexpressing the Vinc[CO] transgene - this may be possible with the RNAi HSM02356, which targets the vinculin 3'UTR and is unlikely to affect UAS-vinc[CO].

      Please refer to essential point 2c; as VincCO is not a simple overactive protein, like a constitutively active kinase, additional effects in the tissue can be expected.

      The claims of the authors would be more solid if the reporting of the phenotypes was more homogeneous, so one could establish comparisons. Sometimes conditions are analysed by differentiation index, others by extension of the GFP domains, others with phospho-histone-3 (PH3), others through nuclear size or density, and combinations. I do not think the authors should evaluate all these phenotypes in all conditions, but evaluating mitotic index and abundance of EBs and "activated EBs/early ECs" to monitor proliferation and differentiation rates should be done across the board (ISC, ISC+EB, EB drivers).

      To improve consistency, in all conditions we have compared cell types ratios and cellular density upon vinculin knockdown: see Figure 1E-F for ISC+EB, Figure 3B-C for ISC, and Figure 5 – figure supplement 1C-E for EB (with panel E newly added). As we did not observe any effect on ratio or density, we did not monitor cell proliferation for ISC knockdown.

      Nonetheless, we added the mitotic index for the ISC+EB driver (new Figure 1- figure supplement 1G) to be consistent with the results from the EB driver (Figure 5- figure supplement 1C).

      If the primary role of Vinculin is to induce contact inhibition in the ISC from the EB and prevent the EB differentiation and proliferation, one would expect that over expression of Vinc[CO] (or perhaps VincFL or sqhDD) in EBs should prevent or delay the differentiation and proliferation induced by a presumably orthogonal factor, like infection with Pseudomonas entomophila or Erwinia carotovora.

      This is indeed an exciting prediction, but outside the scope of this manuscript.

      3) - Relationship between Vinculin and alpha-Catenin

      The authors establish a very clear difference in the phenotypes between focal adhesion components and Vinculin, whereas the similarity of alpha-catenin and vinculin knockdowns is very compelling. Therefore I am sure the authors are in the right path with their interpretation of this part of the paper. However, some of the alpha-Catenin experiments are not very clear. The result from the rescue experiment of alpha-Cat knockdown with alpha-Cat-deltaM1b does not seem to show what the authors claim, and differentiation does not seem affected, only the amount of extant older ECs (which may be due to other reasons as this is a non-autonomous effect).

      Like the reviewer, we were surprised about the milder rescue with M1b compared to M1a and are unsure of the reasons for this. Nevertheless, quantifications of the differentiation and retention indices show significant differences for M1a and M1b compared to the FL control (Figure 6F-G), with phenotypes resembling the vinc knockdown. In Figure 6E, we have added a row of zoomed views to better highlight the similarity of phenotype between M1a and M1b and have acknowledged the mild differences in the text (bottom of page 9). For the sake of rigour, we think it is important to include results from both M1 deletions, even if there is not yet a logical reason to explain why they have different effects.

      Ulrich Tepass produced a UAS-alpha-catenin construct with the full deletion of the M1 region, perhaps that would show a clearer phenotype.

      This is a good suggestion, however for technical reasons this is not possible. The strategy devised by Ken Irvine and his group relies on rescuing the RNAi with an RNAi resistant construct, which is not the case for the constructs generated in the Tepass lab. Furthermore, we cannot adopt a MARCM strategy as -cat is too close to the centromere (80F).

      Also, the autonomy of the phenotype is difficult to address with these experiments alone. It would be expected that the phenotype of alpha-catenin knockdown should be similar to that of vinculin knockdown in the ISCs only or EBs only.

      This is not what our understanding of cadherin-mediated adhesion would predict. Forming cadherin adhesions requires cadherins and catenins in both cells, so we would expect similar phenotypes in ISCs only and EBs only. What is exciting about our findings is that the mechanosensitive machinery is not equally important in the two adherent cells, i.e. the EB is using vinculin to measure force on the contact and regulate differentiation, whereas the ISC needs to resist that force, but does not use vinculin to sense that force and regulate its behaviour.

      We have added new data showing the role of the vinculin/α-catenin interaction in ISCs or EBs by co-expressing α-Cat RNAi and α-Cat ΔM1a. We observed that absence of VBS in α-catenin has no effect in ISCs but promotes EB differentiation and increase in numbers (new Figure 6 – figure supplement 2), similar to our observations with vincRNAi (see text page 10).

      Reviewer #2 (Public Review):

      Vinculin functions as an important structural bridge that connects cadherin and integrin-mediated adhesions to the F-actin cytoskeleton. This manuscript carefully examined the mutant phenotype of vinc in the Drosophila intestine and found that vinc mutant in EBs causes significant increases of EB to EC differentiation, stem proliferation, and tissue growth. By analyzing the mutant phenotype of the cadherin adaptor alpha-catenin, the authors suggest that the vinc functions through the cell-cell junctions instead of cell-CEM adhesions in EBs. Finally, manipulation of myosin activity in EBs phenocopies the vinc mutant, suggesting that vinculin is regulated by the mechanical tension transduced through the cytoskeleton.

      The authors claim that the vinculin mutant phenotype is opposite compared to the loss of the major integrin components, suggesting a function independent of the cell-ECM adhesions. However, the phenotype of vinc and integrin may not be completely opposite. Besides loss of ISCs, both mys and talin knockdown in ISCs clearly causes ISCs differentiation into EC cells (Fig.3A), suggesting a possible involvement of integrin in EB to EC differentiation. Therefore, it will be important to test the phenotype of integrin KD in EBs using EB-specific Gal4.

      The reviewer raised an important point. To test this we had to overcome the ISC defect of mys or talin RNAi, and specifically tested their function in enteroblasts using the KluGal4 driver. This revealed a similar phenotype of accelerated differentiation, assayed with the ReDDM system (see new Figure 6 -figure supplement 4). Thus, as the reviewer suggested both integrins and cadherins function in this process, we have amended the text to indicate this (see page 10, and sentence in the discussion page 12). It appears however that, unlike vinculin, they also have a key role in ISCs.

      The authors proposed a model that the cell-cell adhesion between ISC and EBs is required for vinculin mediated differentiation suppression. However, this model is not directly supported by the data as the EB-ISC adhesion and EB-EC adhesion have not been tested separately.

      This is an important point and we have amended the text to address this.

      We have focussed our model on EB-ISC adhesion as the adherens junctions are stronger between progenitor cells than EBs-ECs, and because of previous data from the Ohlstein lab (Choi et al, 2011) demonstrating the relationship between adherens junction stability and EB differentiation/ISC proliferation. Nonetheless we agree it is possible that EB-EC adhesion might contribute to this mechanism and have modified the last sentence of the result section (page 12) and the legend associated to the model (Figure 8) to take this into account.

      In addition, previous short-term manipulation of E-cadherin in ISCs and EBs shows no change in cell proliferation (Liang J. et al. 2017), which seems to contradict the authors' model. To support the authors' conclusion, long-term manipulation of E-cadherin in ISCs and EBs must be tested.

      A main feature of the vinculin phenotype is the regional accelerated differentiation observed in R4/5, potentially reflecting areas more subject to mechanical forces. Strikingly, this accelerated differentiation is rarely observed more anteriorly (such as region R4a/b studied in Liang et al, 2017). In fact, these regional differences were previously reported with E-cadherin knockdown by the Adachi-Yamada group (see Figure S1, Maeda et al, 2008). This highlights the importance of considering regional control of cell fate for the field.

      To test our hypothesis further, we have knocked down E-cadherin and α-catenin in EBs only (with Klu-Gal4). As shown in new Figure 6-figure supplement 3, we observed an accumulation of EBs as early as 3 days after induction, reminiscent of vinculin loss of function phenotype. Longer E-cadherin EB knock-down with KluGal4 appears particularly detrimental for survival as all flies died after 4 days of continuous RNAi expression preventing any further observations (see new text page 10). These observations support our model that junctional stability slows down EB differentiation. Our results are also in agreement with the work described in Choi et al (2011), whereby after 6 days of E-Cadherin RNAi expression in progenitors or EBs (using a different driver from us, Su(H)Gal4), the mitotic index increases, showing a feedback regulation on ISC proliferation. Therefore, our work and the Liang et al 2017 study are not in fact contradictory: the differences in the contribution of junctions to tissue dynamics might reflect the variety of molecular mechanisms involved along the small intestine.

      The result of MARCM analysis seems inconsistent with the rest of the data. In MARCM, no significant change of clone sizes is observed between WT and vinc mutant (Fig. 3E). However, vinc mutant in EBs clearly promotes ISC proliferation in other experiments such as esg>vinc-RNAi and the EB>vinc-RNAi (Fig. 1A, Fig. 4).

      Please refer to point 2a, essential revisions. We do not think that our results are at odds with the phenotype of vinculin knockdown using the ISC and ISC/EB drivers - we realise the text was misleading and hope to have clarified our observations in the revised manuscript (pages 7-8).

      In Fig. 4H, the authors suggest that vinculin mutant prevent terminal EC formation. However, this may be simply caused by longer retention of Klu expression in the newborn ECs. To test if EB differentiation is indeed affected, the EC marker pdm1 staining will provide more convincing evidence. Another experiment to strengthen the conclusion will be the tracking of clone sizes generated from a single EB cell using the UAS-Flp system (such as G-trace).

      These are good suggestions to strengthen our findings. Unfortunately, we have not managed to obtain a working Pdm1 antibody (or other commercially available EC marker), which is why we assayed nuclear size and the tracking of KluReDDM cells. Therefore, we have not been able to test if Klu is retained in newborn ECs.

      As we agree this section of the text was misleading, we have rephrased and highlighted that the phenotype seen with KluGal4ReDDM resembles the accumulation of activated EBs and newborn ECs observed in vinc102.1 guts. (page 8).

      In Fig. 6D, the survival rate of WT and vinc mutant flies were compared. However, as there is no additional assay about the feeding behavior or metabolic rate, the systematic mutant of vinc does not provide a direct link between animal survival and intestinal EBs. Therefore, an experiment with vinc level specifically manipulated in fly intestine using esg>vinc-RNAi or BE>vinc-RNAi will be more relevant.

      This experiment has now been added in Figure 8B and the text modified to acknowledge the limitations of the survival experiments with whole mutant flies (see point 3, essential revisions above).

      Reviewer #3 (Public Review):

      Prior work had identified essential roles for Integrin signaling in regulating intestinal stem cell (ISC) proliferation, and the authors studies were motivated by trying to understand whether Vinculin (Vinc) might participate in this. However, Vinc is involved in mechanotransduction at both focal adhesions (FA) and adherens junctions (AJ), and their results revealed that Vinc phenotypes do not match those of FA proteins like Integrin. Conversely, they do match a-catenin (a-cat) RNAi phenotypes, and together with the localization of Vinc and the phenotypes associated with a-cat mutants that can't bind Vinc, this led to the conclusion that Vinc is acting at AJ rather than FA in this tissue. The results here are convincing, with clear presentation, nice images, and appropriate quantitation. It's also worth emphasizing that initial characterization of Vinc mutant flies failed to reveal any essential roles for this protein in Drosophila, so finding a mutant phenotype of any sort is significant.

      While the manuscript is strong as a descriptive report on the requirement for Vinc in the Drosophila intestine, it doesn't provide us with much understanding of the mechanism by which Vinc exerts its effects, nor how its requirement is linked to intestinal physiology.

      There is always more to learn, and the importance of our work so far is that it demonstrates a very specific role for vinculin as a mechanoeffector in regulating cell fate decisions in specific regions of the midgut, and provide the foundation for future work addressing the detailed mechanism of this function and physiological role.

      Prior work has shown that mechanical stretching of intestines stimulates ISC proliferation (presumably through Integrin signaling), which is opposite to what Vinc does here.

      We would like to stress that very little mechanistic knowledge is available regarding how mechanical stretching stimulates ISC proliferation, in Drosophila or mammalian systems. To our knowledge, the only work linking gut mechanical stretching to cell fate decisions in Drosophila identified Msn/Hippo pathway (Li et al., 2018) and the ion channel Piezo requirement (He et al., 2018). We agree with the reviewer that integrin signaling would most likely contribute, especially given the composition of gels for organoid cultures (Gjorevski et al, 2016), yet the actual molecular mechanisms remain to be elucidated.

      There is a suggestion that Vinc is involved in maintaining homeostasis, but how its regulated remains a bit murky. The authors report that reductions in myosin activity result in phenotypes reminscent of Vinc phenotypes, which they interpret as supporting a model where Vinc's role is to help maintain tension at AJ. Of course it could also be reversed - maybe they are similar because tension is needed to maintain Vinc recruitment to AJ? They lack of epistasis tests and lack of analysis of whether Vinc localization to AJ in EBs is affected by tension or the M2 deletion of a-cat leaves us uncertain as to the actual basis for the relationship between Vinc and myosin phenotypes.

      Thank you for all these suggestions. New experiments have been done to test the relationship between cellular tension and vinculin at junctions (see essential point 1).

    1. Author Response:

      Reviewer #3 (Public Review):

      Murphy et al. further develop the linked selection model of Elyashiv et al. (2016) and apply it to human genetic variation data. This model is itself an extension of the McVicker et al. (2009) paper, which developed a statistical inference method around classic background selection (BGS) theory (Hudson and Kaplan, 1995, Nordborg et al., 1996). These methods fit a composite likelihood model to diversity data along the chromosome, where the level of diversity is reduced by a local factor from some initial "neutral" level π0 down to observed levels. The level of reduction is determined by a combination of both BGS and the expected reduction around substitutions due to a sweep (though the authors state that these models are robust to partial and soft sweeps). The expected reduction factor is a function of local recombination rates and genomic annotation (such as exonic and phylogenetically conserved sequences), as well as the selection parameters (i.e. mutation rates and selection coefficients for different annotation classes). Overall, this work is a nice addition to an important line of work using models of linked selection to differentiate selection processes. The authors find that positive selection around substitutions explains little of the variation in diversity levels across the genome, whereas a background selection model can explain up to 80% of the variance in diversity. Additionally, their model seems to have solved a mystery of the McVicker et al. (2009) paper: why the estimated deleterious mutation rate was unreasonably high. Throughout the paper, the authors are careful not only in their methodology but also in their interpretation of the results. For example, when interpreting the good fit of the BGS model, the authors correctly point out that stabilizing selection on a polygenic trait can also lead to BGS-like reductions.

      Furthermore, the authors have carefully chosen their model's exogenous parameters to avoid circularity. The concern here is that if the input data into the model - in particular the recombination maps and segments liked to be conserved - are estimated or identified using signals in genetic variation, the model's good fit to diversity may be spurious. For example, often recombination maps are estimated from linkage disequilibrium (LD) data which is itself obtained from variation along the chromosome. Murphy et al. use a recombination map based on ancestry switches in African Americans which should prevent "information leakage" between the recombination map and the BGS model from leading to spuriously good fits. Likewise, the authors use phylogenetic conservation maps rather than those estimated from diversity reductions (such as McVicker et al.'s B maps) to avoid circularity between the conserved annotation track and diversity levels being modeled. Additionally, the authors have carefully assessed and modified the original McVicker et al. algorithm, reducing relative error (Figure A2).

      One could raise the concern that non-equilibrium demography confounds their results, but the authors have a very nice analysis in Section 7 of the supplementary material showing that their estimates are remarkably stable when the model is fit separately in different human populations (Figure A35). Supporting previous work that emphasizes the dependence between BGS and demography, the authors find evidence of such an interaction with a clever decomposition of variance approach (Figure A37). The consistency of BGS estimates across populations (e.g. Figures A35 and A36) is an additional strong bit of evidence that BGS is indeed shaping patterns of diversity; readers would benefit if some of these results were discussed in the main text.

      We appreciate the reviewer’s kind remarks. With regards to the results included in the main text vs the supplement, we attempted to strike a balance between having the main text remain communicative to a larger readership and providing experts with details they may find useful. We have, however, done our best for the supplementary analyses to be written clearly.

      I have three major concerns about this work. First, it's unclear how accurate the selection coefficient estimates are given the non-equilibrium demography of humans (pre-Out of Africa split, and thus not addressed by the separate population analyses). The authors do not make a big point about the selection coefficient estimates in the main section of the paper, so I don't find this to be a big problem. Still, some mention of this issue might be helpful to readers trying to interpret the results presented in the supplementary text.

      As the reviewer notes, we chose not to emphasize the inferred distributions of selection coefficients. Our main reason for this choice is the technical issue addressed in Appendix Section 1.5 (L561-564): “Second, thresholding potentially biases our estimates of the distribution of selection effects. While this bias is probably smaller than the bias without thresholding, its form and magnitude are not obvious. This is why we decided not to report the inferred distributions of selection effects in the Main Text.” We agree that if we were to focus on our estimates of the distribution of selection effects, the effects of demographic history would also need to be considered. This is, however, not the focus here.

      Second, I'm curious whether the composite likelihood BGS model could overfit any variance along the chromosome - even neutral variance. At some level, the composite likelihood approach may behave like a sort of smoothing algorithm, albeit with a functional form and parameters of a BGS model. The fact that there is information sharing across different regions with the same annotation class should in principle prevent overfitting to local noise. Still, there are two ways I think to address this overfitting concern. First, a negative neutral control could help - how much variation in diversity along the chromosome can this model explain in a purely neutral simulation? I imagine very little, likely less than 5%, but I think this paper would be much stronger with the addition of a negative control like this. Second, I think the main text should include the R2 values from out-sample predictions, rather than just the R2 estimates from the model fit on the entire data. For example, one could fit the model on 20 chromosomes, use the estimated θΒ parameters to predict variation on the remaining two. The authors do a sort of leave-one-out validation at the window level (Figure A31); however, this may not be robust to linkage disequilibrium between adjacent windows in the way leaving out an entire chromosome would be.

      The two requested analyses were done and their results are described above, in response to essential revisions (p. 2-3 here). In brief, there is no overfitting of neutral patterns or otherwise. We elaborate on why this finding is expected below.

      Finally, I feel like this paper would be stronger with realistic forward simulations. The deterministic simulations described in the supplementary materials show the implementation of the model is correct, but it's an exact simulation under the model - and thus not testing the accuracy of the model itself against realistic forward simulations. However, this is a sizable task and efforts to add selection to projects like Standard PopSim are ongoing.

      We agree that forward simulations would be a nice addition, but believe that it is a project in itself. Indeed, a major complication is that when, for computational tractability, purifying selection is simulated in small populations with realistic population-scaled parameters, the reduction in diversity due to selection at unlinked sites has a major effect on neutral diversity levels (see, e.g., Robertson 1961). We hope to address this issue in future work. Meanwhile, we note that the theory that we rely on has been tested against simulations in the past (e.g., Charlesworth et al., 1993; Hudson and Kaplan, 1995; Nordborg et al., 1996).

    1. Author Response

      Reviewer #1 (Public Review):

      The largest point of improvement that I expect will unfold over this project's development lifecycle will be its documentation.”

      This is indeed one of the biggest source code issues. In the next version, we plan to improve the source code documentation, add more examples and also some small HOWTOs for the RasPi setup.

      Reviewer #2 (Public Review):

      The Design section then introduces the actor model, the C++ library SObjectivizer used to implement it, and the binary message protocol used for transmission of data across nodes. As currently written, however, this section seems overly technical and hard to grasp for readers who might be interested in experimental neuroscience, but who lack the expertise to understand all mentioned functional constructs and required expertise in the C++ language. Several concepts are mentioned only in passing and without introductory references for the non-expert reader. The level of detail also seems to distract from conveying a more meaningful understanding of the remaining trade-offs involved between network communication, latency, synchronization, and bandwidth.

      We wanted to briefly describe why the actor model was used and how it addresses the problem of multithreading programming. We think most of the concepts should be understandable even without prior C++ knowledge. This is also why these concepts are only described briefly. For a more in-depth look e.g. the SObjectizer has a detailed documentation.

      The essence of the actor-model could probably be captured more succinctly, and more time spent discussing some of these critical decisions underlying LabNet's design principles. For example, although each Raspberry Pi device runs a LabNet server, the current implementation allows only one client connection per node. This might be surprising for some readers as it excludes a large number of possible network topologies, and the reason presented for the design decision as currently detailed is hard to understand without further clarification.

      We have removed some unnecessary details about the actor model. In the beginning of the Design section we now describe more in depth why LabNet was designed as a distributed system and why this results in only one connection per node. The hardware low cost made also us prefer simplicity over more complex hardware topologies.

      The main method for evaluating the performance of LabNet is a series of performance tests in the Raspberry Pi comparing clients written in C++, C# and Python, followed by a series of benchmarks comparing LabNet against other established hardware control platforms. While these are undoubtfully useful, especially the latter, the use of benchmarking methods as described in the paper should be carefully revisited, as there are a number of possible confounding factors.

      For example, in the performance tests comparing clients written in C++, C# and Python, the Python implementation is running synchronously and directly on top of the low-level interface with system sockets, while the C++ and C# versions use complex, concurrent frameworks designed for resilience and scalability. This difference alone could easily skew the Python results in the simplistic benchmarks presented in the paper, which can leave the reader skeptical about all the comparisons with Python in Figure 3. Similarly, the complex nature of available frameworks also raises questions about the comparison between C# and C++. I don't think it is fair to say that Figure 3 is really comparing languages, as much as specific frameworks. In general, comparing the performance of languages themselves for any task, especially compiled languages, is a very difficult topic that I would generally avoid, especially when targeting a more general, non-technical audience.

      This is true; comparisons between different languages are always difficult. This is now explicitly addressed in the text. However, since the implementations in C++, C# and Python were so close in all tests, this is more a demonstration then a comparison: the language and framework at the client side is not really important, at least for the simple cases considered here

      The second set of benchmarks comparing LabNet to other established hardware control platforms is much more interesting, but it doesn't currently seem to allow a fair assessment of the different systems. Specifically, from the authors' description of the benchmarking procedure, it doesn't seem like the same task was used to generate the different latency numbers presented, and the values seem to have been mostly extracted from each of the platform's published results. This unfortunately reduces the value of the benchmarks in the sense that it is unclear what conditions are really being compared. For example, while the numbers for pyControl and Bpod seem to be reporting the activation of simple digital input and output lines, the latency presented for Autopilot uses as reference the start of a sound waveform on a stereo headphone jack. Audio DSP requires specialized hardware in the Pi which is likely to intrinsically introduce higher latency versus simply toggling a digital line, so it is not clear whether these scenarios are really comparable. Similarly, the numbers for Whisker and Bpod being presented without any variance make it hard to interpret the results.

      We also saw this as a problem. Therefore, all tests were resigned and repeated. Now all platforms were subjected to the same test (with the exception of Whisker, where we did not have suitable hardware available). In this way we now have comparable measurements for all platforms.

      One of the stated aims of LabNet was to provide a system where implementing new functionality extensions would be as simple as possible. This is another aspect of experimental neuroscience that is under active discussion and where more contributions are very much needed. Surprisingly, this topic receives very little attention in the paper itself. It is not clear whether the actor model is by itself supposed to make the implementation of new functionality easier, but if this is the case, this is not obvious from the way the design and evaluation sections are currently written, especially given the choice of language being C++.

      One of the reasons behind the choice of Python for other hardware platforms such as pyControl and Autopilot is the growing familiarity and prevalence of Python within the neuroscience research community, which might assist researchers in implementing new functionality. Other open-hardware projects in neuroscience allowing for community extensions in C++ such as Open-Ephys have informally expressed the difficulty of the C++ language as a point of friction. I feel that the aim of "ease of extensibility" should merit much more discussion in any future revision of the paper.

      Indeed, they only mention in passing that user extensibility is in the conclusion where it is stated that it is not currently possible to modify LabNet without directly modifying and recompiling the entire code base. A software plug-in system is suggested, and indeed this would be extremely beneficial in achieving the second stated aim.

      With the simplicity of implementing new functionality, we rather meant the simplicity to adapt the LabNet source code to new requirements. For which the modularization and the use of the actor model is responsible. This is now explained more explicitly in the text. And yes, a plug-in system is on our roadmap but not a part of LabNet yet.

      Finally, a set of example experimental applications would have been extremely useful to ground the design of LabNet in practical terms, in addition to the example listings. Even in diagrammatic form, describing how specific experiments have been powered by LabNet would give readers a better sense of the kind of designs that might be currently more appropriate for this platform. For example, video is being increasingly used in behavioral experiments, and Raspberry Pi drivers are available for several camera models, but this important aspect is not mentioned at all in the discussion, so readers interested in video would not know from reading this paper whether LabNet would be appropriate for their goals.

      The section "Example" actually shows how a simple experiment can be realized with LabNet. Listings 1-3 are also responsible for this.

      LabNet does not support video acquisition in the current version. Even though this video transmission would be quite easy to implement. Nevertheless, we have not needed this in our experiments so far.

      As the manuscript currently stands, I don't feel the authors have achieved their second stated aim, and I am unfortunately not fully convinced that the experimental results are adequate to demonstrate the achievement of the first aim. I fully agree, however, that a robust, high-performance and flexible hardware layer for combining neuroscience instruments is desperately needed, and so I do expect that a more thorough treatment of the methods developed in LabNet will in the future have a very positive impact on the field.

      Latency measurements are indeed very important, also because in this way a comparison with other tools can be achieved. With the test redesign and an own implementation for each tool the comparability is now a given. Of course, LabNet cannot beat Bpod. After all, Bpod is running on a microcontroller and LabNet has to send all commands via Ethernet. But the results are still very good. The stress test also demonstrates the scalability of LabNet. Above all, LabNet offers the possibility to control many systems at the same time, which cannot be done with other tools, or only in a complicated way.

    1. Author reponse

      Reviewer #1 (Public Review):

      In their paper, Kroell and Rolfs use a set of sophisticated psychophysical experiments in visually-intact observers, to show that visual processing at the fovea within the 250ms or so before saccading to a peripheral target containing orientation information, is influenced by orientation signals at the target. Their approach straddles the boundary between enforcing fixation throughout stimulus presentation (a standard in the field) and leaving it totally unconstrained. As such, they move the field of saccade pre-processing towards active vision in order to answer key questions about whether the fovea predicts features at the gaze target, over what time frame, with what precision, and over what spatial extent around the foveal center. The results support the notion that there is feature-selective enhancement centered on the center of gaze, rather than on the predictively remapped location of the target. The results further show that this enhancement extends about 3 deg radially from the foveal center and that it starts ~ 200ms or so before saccade onset. They also show that this enhancement is reinforced if the target remains present throughout the saccade. The hypothesized implications of these findings are that they could enable continuity of perception trans-saccadically and potentially, improve post-saccadic gaze correction.

      Strengths:

      The findings appear solid and backed up by converging evidence from several experimental manipulations. These included several approaches to overcome current methodological constraints to the critical examination of foveal processing while being careful not to interfere with saccade planning and performance. The authors examined the spatial frequency characteristics of the foveal enhancement relative, hit rates and false alarm rates for detecting a foveal probe that was congruent or incongruent in terms of orientation to the peripheral saccade target embedded in flickering, dynamic noise (i/f )images. While hit rates are relatively easy to interpret, the authors also reconstructed key features of the background noise to interpret false alarms as reflecting foveal enhancement that could be correlated with target orientation signals. The study also - in an extensive Supplementary Materials section - uses appropriate statistical analyses and controls for multiple factors impacting experimental/stimulus design and analysis. The approach, as well as the level of care towards experimental details provided in this manuscript, should prove welcome and useful for any other investigators interested in the questions posed.

      Weaknesses:

      I find no major weaknesses in the experiments, analyses or interpretations. The conclusions of the paper appear well supported by the data. My main suggestion would be to see a clearer discussion of the implications of the present findings for truly naturalistic, visually-guided performance and action. Please consider the implication of the phenomena and behaviors reported here when what is located at the gaze center (while peripheral targets are present), is not a noisy, relatively feature-poor, low-saliency background, but another high-saliency target, likely crowded by other nearby targets. As such, a key question that emerges and should be addressed in the Discussion at least is whether the fovea's role described in the present experiments is restricted to visual scenarios used here, or whether they generalize to the rather different visual environments of everyday life.

      This is a very interesting question. While we cannot provide a definite answer, we have added a paragraph discussing the role of foveal prediction in more naturalistic visual contexts to the Discussion section (‘Does foveal prediction transfer to other visual features and complex natural environments?’). We pasted this paragraph in response to another comment in the ‘Recommendations for the authors’ section below. We suggest that “the pre-saccadic decrease in foveal sensitivity demonstrated previously[9] as well as in our own data (Figure 2B) may boost the relative strength of fed-back signals by reducing the conspicuity of foveal feedforward input”, presumably allowing the foveal prediction mechanism to generalize to more naturalistic environments with salient foveal stimulation.

      Reviewer #2 (Public Review):

      Human and primates move their eyes with rapid saccades to reposition the high-resolution region of the retina, the fovea, over objects of interest. Thus, each saccade involves moving the fovea from a pre-saccadic location to a saccade target. Although it has been long known that saccades profoundly alter visual processing at the time of saccade, scientists simply do not know how the brain combines information across saccades to support our normal perceptual experience. This paper addresses a piece of that puzzle by examining how eye movements affect processing at the fovea before it moves. Using a dynamic noise background and a dual psychophysical task, the authors probe both the performance and selectivity of visual processing for orientation at the fovea in the few hundred milliseconds preceding a saccade. They find that hit rates and false alarm rates are dynamically and automatically modulated by the saccade planning. By taking advantage of the specific sequence of noise shown on each trial, they demonstrate that the tuning of foveal processing is affected by the orientation of the saccade target suggesting foveal specific feedback.

      A major strength of the paper is the experimental design. The use of dynamic filtered noise to probe perceptual processing is a clever way of measuring the dynamics of selectivity at the fovea during saccade preparation. The use of a dual-task allows the authors to evaluate the tuning of foveal processing as well and how it depends on the peripheral target orientation. They show compellingly that the orientation of the saccade target (the future location of the fovea) affects processing at the fovea before it moves.

      There are two weaknesses with the paper in its current form. The first is that the key claim of foveal "enhancement" relies on the tuning of the false alarms. A more standard measure of enhancement would be to look at the sensitivity, or d-prime, of the performance on the task. In this study, hits and false alarms increase together, which is traditionally interpreted as a criterion shift and not an enhancement. However, because of the external noise, false alarms are driven by real signals. The authors are aware of this and argue that the fact that the false alarms are tuned indicates enhancement. But it is unclear to me that a criterion shift wouldn't also explain this tuning and the change in the noise images. For example, in a task with 4 alternative choices (Present/Congruent, Present/Incongruent, Absent/Congruent, Absent/Incongruent), shifting the criterion towards the congruent target would increase hits and false alarms for that target and still result in a tuned template (because that template is presumably what drove the decision variable that the adjusted criterion operates on). I believe this weakness could be addressed with a computational model that shows that a criterion shift on the output of a tuned template cannot produce the pattern of hits and false alarms.

      We thank the reviewer for this comment. We will present three arguments, each of which suggests that our effects are perceptual in nature and cannot be explained by a shift in decision criterion: (1) the temporal specificity of the difference in Hit Rates (HRs), (2) the spatial specificity of the difference in HRs and (3) the phenomenological quality of the foveally predicted signal. In general, a criterion shift would indeed affect hits and false alarms alike. Nonetheless, the difference in HRs only manifested under specific and meaningful conditions:

      First, the increase in congruent as compared to incongruent HRs, i.e., enhancement, was temporally specific: congruent and incongruent HRs were virtually identical when the probe appeared in a baseline time bin or one (Figure 2B) or even two (Figure 4A) early pre-saccadic time bins. Based on another reviewer’s comment, we collected additional data to measure the time course and extent of foveal enhancement during fixation. While pre-saccadic enhancement developed rapidly, enhancement started to emerge 200 ms after target onset during fixation. Crucially, these time courses mirror the typical temporal development of visual sensitivity during pre-saccadic attention shifts and covert attentional allocation, respectively[8,33]. We are unaware of data demonstrating similar temporal specificity for a shift in decision criterion. One could argue that a template of the target orientation needs to build up before it can influence criterion. Nonetheless, this template would be expected to remain effective after this initial temporal threshold has been crossed. In contrast, we observe pronounced enhancement in medium but not late stages of saccade preparation in the PRE-only condition (Figure 4A).

      Second, it has been argued that a defining difference between innately perceptual effects and post-perceptual criterion shifts is their spatial specificity[53]: in opposition to perceptual effects, criterion shifts should manifest in a spatially global fashion. Due to a parafoveal control condition detailed in our reply to the next comment, we maintain the claim that enhancement is spatially specific: congruent HRs exceeded incongruent ones within a confined spatial region around the center of gaze. We did not observe enhancement for probes presented at 3 dva eccentricity even when we raised parafoveal performance to a foveal level by adaptively increasing probe contrast. The accuracy of saccade landing or, more specifically, the mean remapped target location (Figure 3B) influenced the spatial extent of the enhanced region in a fashion that is reconcilable with previous findings[30]. A criterion shift that is both spatially and temporally selective, follows the time course of pre-saccadic or covert attention depending on observers’ oculomotor behavior, does not remain effective throughout the entire trial after its onset, is sensitive to the mean remapped target location across trials, and does not apply to parafoveal probes even after their contrast has been increased to match foveal performance, would be unprecedented in the literature and, even if existent, appear just as functionally meaningful as sensitivity changes occurring under the same conditions.

      Lastly and on a more informal note, we would like to describe a phenomenological percept that was spontaneously reported by 6 out of 7 observers in Experiment 1 and experienced by the author L.M.K. many times. On a small subset of trials, participants in our paradigms have the strong phenomenological impression of perceiving the target in the pre-saccadic center of gaze. This percept is rare but so pronounced that some observers interrupt the experiment to ask which probe orientation they should report if they had perceived two on the same trial (“The orientation of the normal probe or of the one that looked exactly like the target”). Interestingly, the actual saccade target and its foveal equivalent are perceived simultaneously in two spatiotopically separate locations, suggesting that this percept cannot be ascribed to a temporal misjudgment of saccade execution (after which the target would have actually been foveated). We have no data to prove this observation but nonetheless wanted to share it. Experiencing it ourselves has left us with no doubt that the fed-back signal is truly – and almost eerily – perceptual in nature.

      The analysis suggested by the reviewer is very interesting. Yet for several reasons stated in the ‘Suggestions to the authors’ section, our dataset is not cut out for an analysis of noise properties at this level of complexity. We had always planned to resolve these concerns experimentally, i.e., by demonstrating specificity in HRs. We believe that our arguments above provide a strong case for a perceptual phenomenon and have incorporated them into the Discussion of our revised manuscript.

      The second weakness is that the author's claim that feedback is spatially selective to the fovea is confounded by the fact that acuity and contrast sensitivity are higher in the fovea. Therefore, the subject's performance would already be spatially tuned. Even the very central degree, the foveola, is inhomogeneous. Thus, finding spatially-tuned sensitivity to the probes may simply indicate global feature gain on top of already spatially tuned processing in the fovea. Another possible explanation that is consistent with the "no enhancement" interpretation is that the fovea has increased. This is consistent with the observation that the congruency effects were aligned to the center of gaze and not the saccade endpoint. It looks from the Gaussian fits that a single gain parameter would explain the difference in the shape of the congruent and incongruent hit rates, but I could not figure out if this was explicitly tested from the existing methods. Additional experiments without prepared saccades would be an easy way to address this issue. Is the hit rate tuned when there is no saccade preparation? If so, it seems likely that the spatial selectivity is not tuned feedback, but inhomogeneous feedforward processing.

      We fully agree. We do not consider a fixation condition diagnostic to resolve this question since, as of now, correlates of foveal feedback have exclusively been observed during fixation. In those studies, it was suggested that the effect, i.e., a foveal representation of peripheral stimuli, reflects the automatic preparation of an eye movement that was simply not executed[11,12,14]. To address another reviewer’s comment, we collected additional data in a fixation experiment. The probe stimulus could exclusively appear in the screen center (as in Experiment 1) and observers maintained fixation throughout the trial. While pre-saccadic congruency effects were significantly more pronounced and developed faster, congruency effects did emerge during fixation when the probe appeared 200 ms after the target. If pre-saccadic processes indeed spill over to fixation tasks to some extent and trigger relevant neural mechanisms even when no saccade is executed, we could expect a similar feedback-induced spatial profile during fixation. Since this matches the reviewer’s prediction if the pre-saccadic profiles resulted from inhomogeneous feedforward processing, we do not consider a fixation condition suitable to distinguish between both hypotheses.

      To test whether the tuning of enhancement is effectively a consequence of declining visual performance in the parafovea/periphery, we instead raised parafoveal performance to a foveal level by adaptively increasing the opacity of the probe: while leaving all remaining experimental parameters unchanged, we presented the probe in one of two parafoveal locations, i.e., 3 dva to the left or right of the screen center. Observers were explicitly informed about the placement of the probe. We administered a staircase procedure to determine the probe opacity at which performance for parafoveal target-incongruent probes would be just as high as foveal performance had been in the preceding sessions. While the foveal probe was presented at a median opacity of 28.3±7.6%, a parafoveal opacity of 39.0±11.1% was required to achieve the same performance level. As a result, the gray dot at 0 dva in the figure below represents the incongruent HR in the center of gaze and ranges at 80% on the y-axis. The gray dots at ±3 dva represent incongruent parafoveal HRs and also range at ~80% on the y-axis. Using the reviewer’s terminology, we effectively removed the influence of acuity- (or contrast-sensitivity-) dependent spatial tuning. If the spatial profiles had indeed been the result of “global feature gain on top of already spatially tuned processing“, this manipulation should render parafoveal feature gain just as detectable as foveal feature gain. Instead, congruent and incongruent parafoveal HRs were statistically indistinguishable (away from the saccade target: p = .127, BF10 = 0.531; towards the saccade target: p = .336, BF10 = 0.352), inconsistent with the idea of a spatially global feature gain.

      We had included these data in our initial submission. They were collected in the same observers that contributed the spatial profiles (Experiment 2). The data points at 0 dva in the reduced figure above correspond to the foveal probe location in Figure 2D. The data points at ±3 dva had been plotted and discussed in our initial submission, yet only very briefly. Based on this and another reviewer’s comment, we realize that we should have explained this condition more extensively in the main text rather than in the Methods and have added a dedicated paragraph to the Results section.

      This paper is important because it compellingly demonstrates that visual processing in the fovea anticipates what is coming once the eyes move. The exact form of the modulation remains unclear and the authors could do more to support their interpretations. However, understanding this type of active and predictive processing is a part of the puzzle of how sensory systems work in concert with motor behavior to serve the goals of the organism.

      Reviewer #3 (Public Review):

      This manuscript examines one important and at the same time little investigated question in vision science: what happens to the processing of the foveal input right before the onset of a saccade. This is clearly something of relevance as humans perform saccades about 3 times every second. Whereas what happens to visual perception in the visual periphery at the saccade goal is well characterized, little is known about what happens at the very center of gaze, which represents the future retinal location where the saccade target will be viewed at high resolution upon landing. To address this problem the authors implemented an elegant experiment in which they probed foveal vision at different times before the onset of the saccade by using a target, with the same or different orientation with respect to the stimulus at the saccade goal, embedded in dynamic noise. The authors show that foveal processing of the saccade target is initiated before saccade execution resulting in the visual system being more sensitive to foveal stimuli which features match with those of the stimuli at the saccades goal. According to the authors, this process enables a smooth transition of visual perception before and after the saccade. The experiment is well designed and the results are solid, overall I think this work represents a valuable contribution to the field and its results have important implications. My comments below:

      1. The change in the overall performance between the baseline condition and when the probe is presented after the saccade target is large, but I wonder if there are other unrelated factors that contribute to this difference, for example, simply presenting the probe after vs before the onset of a peripheral stimulus, or the fact that in the baseline the probe is presented right after a fixation marker, but in the other condition there was a longer time interval between the presentation of the marker and the probe transient. The authors should discuss how these confounding factors have been accounted for.

      We thank the reviewer for this helpful comment. We would like to clarify that the probe was never presented right after the fixation dot. In the baseline condition, fixation dot and target were separated by 50 ms, i.e., the duration of one noise image. Since the fixation dot was an order of magnitude smaller than the probe (0.3 vs 3 dva in diameter) and since two large-field visual transients caused by the onset of a new background noise image occurred between fixation dot disappearance and probe appearance, we consider it unlikely that the performance difference was caused by any kind of stimulus interaction such as masking. Nonetheless, we had been puzzled by this difference already when inspecting preliminary results and wondered if it may reflect observers’ temporal expectations about the trial sequence. We therefore explicitly instructed and repeatedly reminded observers that the probe could appear before the peripheral target. Since the difference persisted, we ascribed it to a predictive remapping of attention to the fovea during saccade preparation, as we had stated in the Discussion.

      Another contributing factor may be that observers approached the oculomotor and perceptual detection tasks sequentially. In early trial phases, they may have prioritized localizing the target and programming the eye movement. After motor planning had been initiated, resources may have been freed up for the foveal detection task. Since on the majority of probe-present trials, the probe appeared after the saccade target, this strategy would have been mostly adaptive. Crucially, however, observers yielded similar incongruent Hit Rates in the baseline and last pre-saccadic time bin (70% vs 74%). While we observed pronounced enhancement in the last pre-saccadic bin, congruent and incongruent Hit Rates in the baseline bin were virtually identical. We therefore conclude that lower overall performance in the baseline bin did not prevent congruency effects from occurring. Instead, congruency effects started developing only after target appearance. We have added this potential explanation to the Results.

      1. Somewhat related to point 3, the authors conclude that the effects reported here are the result of saccade preparation/execution, however, a control condition in which the saccade is not performed is missing. This leaves me wondering whether the effect is only present during saccade preparation or if it may also be present to some extent or to its full extent when covert attention is engaged, i.e when subjects perform the same task without making a saccade.

      Foveal feedback has, as of now, exclusively been demonstrated during fixation (see references in Introduction and Discussion). In most of these studies, it was suggested that these effects (i.e., the foveal representation of a peripheral stimulus) may reflect the automatic preparation of an eye movement that was simply not executed[11,12,14]. Since foveal feedback has been demonstrated during fixation, and since eye movement preparation may influence foveal processing even when the eyes remain stationary, we considered it likely that congruency effects would emerge during fixation. Nonetheless, we agree with the reviewer that an explicit comparison between saccade preparation and fixation would enrich our data set and allow for stronger conclusions. We therefore collected additional data from seven observers. While all remaining experimental parameters were identical to Experiment 1, observers maintained fixation throughout each trial. We found that pre-saccadic foveal enhancement was more pronounced and emerged earlier than foveal enhancement during fixation. We present these data in the Results section (Figure 5) and have updated the Methods section to incorporate this additional experiment. We have furthermore added a paragraph to the Discussion which addresses potential mechanisms of foveal enhancement during fixation and saccade preparation.

      Furthermore, the reviewer’s comment helped us realize that we never stated a crucial part of our motivation explicitly. We now do so in the Introduction:

      “Despite the theoretical usefulness of such a mechanism, there are reasons to assume that foveal feedback may break down while an eye movement is prepared to a different visual field location. First and foremost, saccade preparation is accompanied with an obligatory shift of attention to the saccade target[6-8] which in turn has been shown to decrease foveal sensitivity[9]. Moreover, the execution of a rapid eye movement induces brief motion signals on the retina[20] which may mask or in other ways interfere with the pre-saccadic prediction signal. On a more conceptual level, the recruitment of foveal processing as an ‘active blackboard’[21] may become obsolete in the face of an imminent foveation of relevant peripheral stimuli – unless, of course, foveal processing serves the establishment of trans-saccadic visual continuity.”

      We believe that the additional data and the revisions to the Introduction and Discussion have strengthened our manuscript and thank the reviewer for this comment.

      1. Differently from other tasks addressing pre-saccadic perception in the literature here subjects do not have to discriminate the peripheral stimulus at the saccade goal, and most processing resources are presumably focused at the foveal location. Could this have influenced the results reported here?

      This is true. We intentionally made the features of the peripheral target as task-irrelevant as possible, contrary to previous investigations. We wanted to ensure that the enhancement we find would be automatic and not induced by a peripheral discrimination task, as we state in the Discussion and the Methods. We agree that the foveal detection task likely focused processing resources on the center of gaze in Experiment 1. In Experiment 2, however, we measured the spatial profile of enhancement which involved two different conditions:

      1. In each observer’s first six sessions, the probe could be presented anywhere on a horizontal axis of 9 dva length. On a given trial, an observer could not predict where it would appear, and therefore could not strategically allocate their attention. Nonetheless, enhancement of target-congruent orientation information was tuned to the fovea.
      2. In the final, seventh session, the probe appeared exclusively in one of two possible peripheral locations: 3 dva to the left or 3 dva to the right of the screen center. Observers were explicitly informed that the probe would never appear foveally, and processing resources should therefore have been allocated to the peripheral probe locations. The general performance level in this condition was comparable to performance in the fovea (see reply to the next comment). Nonetheless, we did not find peripheral enhancement of target-congruent information.

      Importantly, the magnitude of the foveal congruency effect in the PRE-only condition of Experiment 1 (i.e., when the target disappeared before the eyes landed on it) was comparable to the foveal congruency effect in Experiment 2 (PRE-only throughout), suggesting that the format of the task – i.e., purely foveal detection or foveal and peripheral detection – did not alter our findings.

      1. The spatial profile of the enhancement is very interesting and it clearly shows that the enhancement is limited to a central region. To which extent this profile is influenced by the fact that the probe was presented at larger eccentricities and therefore was less visible at 4.5 deg than it was at 0 deg? According to the caption, when the probe was presented more eccentrically the performance was raised to a foveal level by adaptively increasing probe transparency. This is not clear, was this done separately based on performance at baseline? Does this mean that the contrast of the stimulus was different for the points at +- 3 dva but the performance was comparable at baseline? Please explain.

      Based on the previous comment and comments of Reviewer #2, we realize that we should have explained this condition more extensively in the main text rather than in the Methods and have adapted the manuscript accordingly. As stated in our reply to the previous comment, Experiment 2 involved one session in which we addressed whether the lack of parafoveal/peripheral enhancement could be due to a simple decrease in acuity as mentioned by the reviewer. Observers were explicitly informed that the to-be detected stimulus (the probe) would appear either 3 dva to the left or right but never in the screen center and were shown slowed-down example trials for illustration. Observers then performed a staircase procedure which was targeted at determining the probe contrast at which performance for parafoveal target-incongruent probes would be just as high as foveal performance for target-incongruent probes had been in the previous six sessions. While the foveal probe was presented at a median opacity of 28.3±7.6%, an opacity of 39.0±11.1% was required to achieve the same performance level at a 3 dva eccentricity. Therefore, the gray curve in Figure 2D that represents incongruent Hits reaches its peak just under 80% on the y-axis. The gray dots at ±3 dva also range at ~80% on the y-axis. The performance level for target-incongruent probes (‘baseline’ here) in the parafovea is thus equal to foveal performance for target-incongruent probes. Target-congruent parafoveal feature information had the same “chance” to be enhanced as foveal information in the preceding sessions. Despite an equation of performance, we found no parafoveal enhancement. This suggests that enhancement is a true consequence of visual field location and not simply mediated by visual acuity at that location.

      1. The enhancement is significant within a region of 6.4 dva around the center of gaze. This is a rather large region, especially considering that it extends also in the direction opposite to the saccade. I was expecting the enhancement to be more confined to the central foveal region. Was the effect shown in Figure 2D influenced by the fact that saccades in this task were characterized by a large undershoot (Fig 1 D)? Did the effect change if only saccades landing closer to the target were included in the analysis? There may not be enough data for resolving the time course, but maybe there are differences in the size of the main effect.

      Width of the profile: In general, the width of the enhancement profile is likely to be influenced by two experimental/analysis choices: the size of the probe stimulus presented during the experiment and the width of the moving window combining adjacent probe locations for analysis.

      Probe size: Since the probe itself had a comparably large diameter of 3 dva, even the leftmost significant point at -2.6 dva could be explained by an enhancement of the foveal portion of the probe. We had mentioned this briefly in the Discussion but realize that this point is crucial and should be made more explicit. Moving window width: We designed the experiment with the intention to densely sample a range of spatial locations during data collection and combine a certain number of adjacent locations using a moving window during analysis (see preregistration: https://osf.io/6s24m). To ensure the reliability of every data point, the width of this window was chosen based on how many trials were lost during preprocessing. We chose a window width of 7 locations as this ensured that each data point contained at least 30 trials on an individual-observer level. Nonetheless, the width of the resulting enhancement profile depends on the width of the moving window:

      We added these caveats to the Results section and incorporated the figure above into the Supplements. We now state explicitly that…

      “the main conclusions that can be drawn are that enhancement i) peaks in the center of gaze, ii) is not uniform throughout the tested spatial range as, for instance, global feature-based attention would predict, and iii) is asymmetrical, extending further towards the saccade target than away from it.”

      For the above reasons, the absolute width of the profile should be interpreted with caution.

      Saccadic landing accuracy: To address the reviewer’s question, we inspected the spatial enhancement profile separately for trials in which the saccade landed on the target (i.e., within a radius of 1.5 dva from its center) or off-target but still within the accepted landing area. This trial separation criterion, besides appearing meaningful, ensured that all observers contributed trials to every data point. We had never resolved the time course in this experiment and could therefore not collapse across time points as suggested by the reviewer. To increase the number of trials per data point, we instead increased the width of the moving window sliding across locations from 6 to 9 neighboring locations (but see caveat above).

      Considering only saccades that landed on the target (‘accurate’; A) yielded significant enhancement from -2.6 to 2.1 dva and from 3.2 dva throughout the measured range towards the saccade target. Saccades that landed off-target (‘inaccurate’; B) showed a more pronounced asymmetry. When only considering inaccurate saccades, enhancement reached significance between -1.1 and 4.4 dva.

      The increased asymmetry for inaccurate saccades may be related to predictive remapping: since inaccurate saccades were hypometric on average, the predictively remapped location of the target was shifted towards the target by the magnitude of the undershoot. Asymmetric enhancement would therefore have boosted congruency at the remapped target location across all trials. In consequence, we inspected if aligning probe locations to the remapped target location on an individual-trial level would lead to a narrower profile for inaccurate saccades. This was not the case. Instead, we observed two parafoveal maxima (C). Their position on the x-axis equals the mean remapping-dependent leftwards (2.0 dva) and rightwards (1.9 dva) displacement across trials. In other words, they correspond to the pre-saccadic center of gaze. Note that these profiles could not be fitted with a mixture of Gaussians and were fitted using polynomials instead.  

      In sum, while we do not observe a clear narrowing of the enhancement profile for accurate saccades, the profile’s asymmetry is more pronounced for inaccurate eye movements. An increase in asymmetry could bear functional advantages since it would boost congruency at the remapped target location across all trials. Importantly though, this adjustment seems to rely on an estimate of average rather than single-trial saccade characteristics: aligning probe locations to the remapped attentional locus on an individual trial level provides further evidence that, irrespective of individual saccade endpoints, enhancement was aligned to the fovea. We have added these analyses to the Results section (Figure 3). We have also added the remapped profiles for all saccades and accurate saccades only to the Supplements.

      1. Is the size of the enhanced region around the center of gaze related to the precision of saccades? Presumably, if saccades are less precise a larger enhanced area may be more beneficial.

      This is a very interesting point. To address this question, we estimated each observer’s saccadic precision by computing bivariate kernel densities from their saccade landing coordinates. As we measured the horizontal extent of enhancement in our experiment, we defined the horizontal bandwidth as an estimate of saccadic imprecision. To estimate the size of the enhanced region for each observer, we created 10,000 bootstrapping samples for each observer’s congruent and incongruent HRs (4 locations combined at each step) We then determined the difference between the bootstrapped congruent and incongruent HRs and defined significantly enhanced locations as all locations for which <= 5% of these differences fell below zero. We then defined the width of the enhancement profile as the maximum number of consecutive significant locations.

      Instead of a positive correlation, we observed a negative correlation between the bandwidth of landing coordinates (i.e., saccadic imprecision) and the size of the enhanced window (r = -.56, p = .117). In other words, there was a non-significant tendency that the less precise an observer’s saccades, the narrower their estimated region of enhancement. We furthermore inspected the magnitude of enhancement per position within in the enhanced region. To do so, we computed the mean difference between congruent and incongruent HR across all positions in the enhanced region. The sizes of the orange circles in the figure above represent the resulting values (ranging from 2.9% to 13.3%). As saccadic precision decreases, the magnitude of enhancement per data point in the enhanced region tends to decrease as well. We therefore suggest that high saccadic precision is a sign of efficient oculomotor programming, which in turn allows peri-saccadic perceptual processes to operate more effectively. We added this analysis to the Supplements and refer to it in the Results section of the revised manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      Han et al. present important insights into the effect of interventions on the regional importation and within-country spread of SARS-CoV-2 variants. The authors combine phylogenetic and epidemiological approaches to study the introductions and spread of SARS-CoV-2 variants in The Netherlands. The manuscript is clear, concise, and well-written.

      We thank the reviewer for considering our manuscript.

      1. The main focus of the study is on the effect of international travel restrictions, but these restrictions are not well defined. Moreover, the effect of travel restrictions cannot be distinguished from other restrictions and interventions that were enforced. It seems more appropriate to focus the paper on the effect of collective interventions on SARS-CoV-2 introductions and spread, rather than focusing on international travel restrictions.

      To be clear, we are investigating specifically how travel restrictions targeted at countries where VOCs first emerged did not deter the introduction of VOCs into the Netherlands. The restrictions are now more clearly defined in the text:

      Line 227: “To deter the introductions of novel VOCs into the Netherlands, travel restrictions were imposed on countries where the VOCs first emerged, including the United Kingdom between December 2020 and March 2021 due to the emergence of Alpha; South Africa and Brazil between January and June 2021 due to Beta and Gamma respectively; and India from April to June 2021 due to Delta. These travel restrictions include a ban on all incoming passenger flights except for those carrying cargo and medical personnel, on top of an entry ban for all non-European Union residents (Government of the Netherlands, 2021). On the other hand, travel within the Schengen Area of the European Union, which includes the Netherlands, remained possible during this period.”

      We had tried to infer effects of other public health interventions on VOC introduction and spread in the Netherlands based on phylogenetic analyses. However, we did not include these analyses in this work as it was difficult to derive precise conclusions because: (1) we lacked travel history data of sampled Dutch individuals, including if they were imported cases or not. As such, we were unable to directly infer the impacts of interventions and reliably estimate frequency of overseas introductions; (2) there have been changes to individual protective behaviours over time (Sharma et al., Nat Comms, 2021) which made it difficult to disentangle these effects from those due to non-pharmaceutical interventions; and (3) like many other countries, a variety of interventions and relaxation of rules were imposed in the Netherlands that overlapped during the study period, further complicating efforts to disentangle the effects of different events.

      1. Most introductions originated from other European countries and it would be valuable to perform a more in-depth analysis at the country level to understand patterns of introductions within Europe.

      It is difficult to perform a more in-depth analyses to elucidate country level contributions of VOC introductions into the Netherlands due to non-uniform levels of genomic surveillance efforts between different countries, sampling biases and a lack of travel history data or if a case was likely an imported one among the sampled Dutch individuals. Our inability to perform more in-depth reconstruction of importation events and estimate country-level importation risks into the Netherlands is now discussed as a limitation to our analyses – see response to comment 11 below.

      1. The authors conclude that robust surveillance in regions of early spread is important for variant detection and outbreak control. Given the retrospective nature of this study (the studied variants have mostly disappeared after the emergence of omicron), it would be good to further discuss how future outbreak control can be achieved in a timely manner.

      We have now briefly discussed this in the last Discussion paragraph:

      Line 400: “As such, a robust level of surveillance efforts should still be maintained in these dominant source locations to provide timely actionable information on novel variant detection as well as infection control. These surveillance efforts should encompass a minimal level of clinical diagnostic testing capacity be maintained to ensure clinical genomic surveillance remains sensitive enough for early detection of novel variants (Han et al., 2022). Wastewater surveillance could also be included to facilitate early variant detection and identify cryptic transmissions amid falling testing rates (Karthikeyan et al., 2022).”

    1. Author Response

      Reviewer #1 (Public Review):

      A discussion of how the current results fit into the larger framework of GRNs (both for known and novel genes) would provide a more complete context for the work. A schematic figure that maps these genes onto a GRN could be quite informative and clarifying.

      We thank the reviewer for this insightful comment. To address this, we have added discussion of how some of the genes identified here fit into the broader neural crest gene regulatory network using previously published data (Williams et al., 2019) (lines 384-386 and 392-394). While the idea of putting some of the novel genes described here into a GRN is very attractive, it is premature in the absence of functional data since it is not possible to construct a GRN on expression data alone; this would require epistatic/regulatory data which is outside the scope of the current work.

    1. Author Response

      Reviewer #1 (Public Review):

      We thank the reviewer for carefully reading of the manuscript and for the insightful criticisms and comments. In the following we address them point by point.

      The community assembly process is modelled in a very specific way, and the manuscript would benefit from an expanded ecological motivation of the processes that are being mimicked, and thereby explain more clearly what taxonomic level of organization is being considered.

      We follow the more recent trait-based approach that shifts the focus from species (and the many traits by which they differ from one another) to groups of species that share the same values of selected functional traits. Since the general context is ecosystem response to drier climates, we choose the functional traits to include a response trait associated with stress tolerance and an effect trait associated with biomass production. We further assume a tradeoff between the two traits which is well supported by earlier studies (see e.g. Angert et al. 2009, https://doi.org/10.1073/pnas.0904512106). So, indeed, the choice we make in characterizing the community is quite specific, but it is highly relevant to the ecological context considered of dryland plant communities where plants compete primarily for water and light. The taxonomic level we consider is species except that we group them in a manner that is more transparent to questions of ecosystem function, ignoring differences between species that are not significant to these questions.

      We expanded considerably the text in the section “Modeling spatial assembly of dryland plant communities” to clarify the ecological motivation of the processes we model.

      In addition, it would be useful if the authors could provide further clarification as to what extent the community diversity dynamics can be separated from total biomass dynamics of patterned water-limited ecosystems given the current approach. These points are explained in further detail below.

      The model describes the dynamics of all functional groups, which provides the biomass distribution 𝐵 = 𝐵(𝜒) in trait space (in the case of patterned states we first integrate over space). That distribution contains information about various community-level properties, including functional diversity (richness, evenness) as figure 3 in the revised manuscript illustrates, and total biomass, which is the area below the distribution curve. The two types of dynamics are tightly connected and cannot be separated, but in principle the approach can be used to study the relationships between diversity and total biomass by calculating biomass distributions along the rainfall gradient and extracting the two properties from the distributions.

      We added in the section “Modeling spatial assembly of dryland plant communities” the information that the biomass distribution also contains information about the total biomass.

      First, it was not entirely clear to this reviewer how the reaction parts of the model equations determine the optimal trait value χ, and how this value varies as a function of precipitation.

      The ‘optimal’ trait value 𝜒𝑚𝑎𝑥 is determined by the interspecific interactions that the model captures, which divide into ‘direct’ and ‘indirect’ interactions. The direct interactions are captured by the dependence of the growth rate Λ𝑖 of the ith functional group (see Eq. (1a)) on the aboveground biomass values of all functional groups, Λ𝑖 = Λ𝑖(𝐵1,𝐵2,… , 𝐵𝑁) (see Eq. (2)). This dependence represents competition for light (taller plants are better competitors) and includes the effect of self-shading. The indirect interactions are through the water uptake term in the soil-water equation (1b) (2nd term from right) and the water dependence of the biomass growth term in Eq. (1a). These terms represent competition for water. For a given precipitation value 𝑃 the net effect of these interspecific interactions result in a particular functional group 𝜒𝑚𝑎𝑥 which is most abundant. For spatially uniform vegetation, as 𝑃 is increased 𝜒𝑚𝑎𝑥 moves to lower values. The precipitation increases surface water (Eq. (1c)) and consequently the amount of water 𝐼𝐻 infiltrating into the soil. The increased soil water gives competitive advantage to species investing in growth, mainly because they better compete for light as they grow taller, and therefore 𝜒𝑚𝑎𝑥 decreases.

      … it is then not immediately clear why the most successful trait class is not outcompeting the other classes.

      With the current model and parameters set the most successful trait does eventually outcompete all other traits, when trait diffusion is set to zero, 𝐷𝜒 = 0. This is, however, a very long process because the most successful trait suffers from self-shading at late growth stages, which slows down its growth and allows nearby traits to survive for a long time. Choosing a finite but very small 𝐷𝜒 values that represent mutations occurring on evolutionarily long times counteracts the exclusion process and results in a stationary asymptotic community, as Fig. 3 in the revised manuscript shows (this behavior is reminiscent of optical solitons, where self-focusing instability is balanced by dispersion). We note that modeling stronger growth-inhibiting factors, such as pathogens, by including a factor of the form (1 − 𝐵𝑖/𝐾) to the growth rate, results in an asymptotic stationary community also for 𝐷𝜒 = 0 (see also earlier studies Nathan et al. 2016, Yizhaq et al. 2020).

      We revised original Fig. 4 (now Fig. 3) by adding a new part (Fig. 3a) that shows the exclusion process for 𝐷𝜒 = 0, and the effect of the counter-acting process of trait diffusion, which results in an asymptotic distribution of finite width (Fig. 3b) from which community level properties such as functional diversity can be derived. We also extended the text in section “Modeling spatial assembly of dryland plant communities” (last paragraph) to clarify the two counter-acting processes of exclusion because of interspecific competition for water and light, and trait diffusion driven by mutations, which together culminate in an asymptotic biomass distribution along the 𝜒 axis of finite width.

      The authors model trait adaptation through a diffusion approximation between trait classes. That is, every timestep, a small amount of biomass flows from the class with higher biomass to the neighboring trait class with lower biomass. From an ecological point of view, it seems that this process is describing adaptation of vegetation that is already present, so this process seems to be limited to intraspecific phenotypic plasticity. From the text, however, it seems that the trait classes correspond to higher taxonomic levels of organization, when describing shifts from fast growing to stress-tolerant species, for example. It is not entirely clear, however, how biomass flows as assumed in the model could occur at these higher levels of organization.

      We do not study in this work adaptation through diffusion in trait space. That kind of adaptive dynamics can indeed be studied with the current model, but with different initial conditions, namely, initial conditions corresponding to a single resident trait where the biomass of all other traits is zero. The resulting dynamics of mutations and succession are then very slow, occurring on evolutionarily long time scales set by the small value of 𝐷𝜒 (e.g. 10−6). In this study the initial conditions represent the presence of all traits, even if at very low biomass values that may represent a pool of seeds that germinate once environmental conditions allow. For a given precipitation value 𝑃, the functional traits we consider determine which functional groups (of species) overcome environmental filtering and grow, and which of the growing traits survive the competition for water and light. These are relatively fast processes, occurring on ecological time scales, which determine the emerging community. At longer times this community is further shaped by slow processes of interspecific competition among species of similar traits and by trait diffusion (mutations). A final remark about phenotypic changes: although in general 𝜒 can be interpreted as representing different phenotypes, the choice of very small values for 𝐷𝜒 cannot represent relatively fast phenotypic changes and restricts the context to mutations at the taxonomic level of species.

      We added an explanation in the 3rd paragraph of the section “Modeling spatial assembly of dryland plant communities” of the need to consider mutations and the role they play in our study.

      Combining the observations from the previous two points, there is a concern that for a given level of precipitation, there is a single trait class with optimal biomass/lowest soil water level that is dominant, with the neighboring trait classes being sustained by the diffusion of biomass from the optimal class to neighboring inferior classes. This would seem a bit problematic, as it would mean that most classes are not a true fit for the environment, and only persist due to the continuous inflow of biomass. Taking a clue from the previous papers of the authors, it seems this may not be the case, though. Specifically, in the paper by Nathan et al. (2016) it seems that all trait classes are started at low initial biomass density, and the resulting steady state (in the absence of biomass flows between classes) seems to show similar biomass profiles as shown in Figs. 4,5 and 7 of the current paper. While the current model formulation seems slightly different, similar results may apply here. Indeed, keeping all trait classes at non-zero (but low) density, and when the (abiotic and biotic) environment permits, let each class increase in biomass seems like the most straightforward approach to model community assembly dynamics. Given the above discussion about these trait classes competing for a single resource (soil water), and one trait class being able to drive this resource availability to the lowest level, it would then be useful to readers to explain why multiple trait classes can coexist here, and how(for spatial uniform solutions) the equilibrium soil water level with multiple trait classes present compares to the equilibrium soil water level when only the optimal trait class is present. Furthermore, if results as presented in Nathan et al. (2016) indeed hold in the current case, perhaps it means that the biomass profile responses as shown in e.g. Fig. 5 would also occur if there was no biomass flow between trait classes included, but that the time needed to adjust the profile would take much longer as compared to when the drift term/second trait derivative is included. In summary, further clarification of what the biomass flows between classes represent, and the role it plays in driving the presented results would be useful for readers.

      As explained in the reply to previous comments the asymptotic community is tuned by a balance between two slow counter-acting processes, interspecific competition among similar traits and mutations over evolutionarily long time scales. However, the community structure is largely determined by much faster processes of environmental filtering and interspecific competition among widely distinct traits, as all traits are initially present. Indeed, comparing the biomass distributions in new Fig. 3, with and without trait diffusion indicates that the community composition, as measured by 𝜒𝑚𝑎𝑥, is the same. Trait diffusion, however, does affect functional diversity, along with environmental factors. In that sense the emerging community is a true fit for the environment.

      We thank the reviewer for these thoughtful comments, which helped us realize that our presentation of these issues was too concise and unclear. We believe that the new extended section on modeling spatial assembly of dryland plant communities, and the new figure 3a clarify these issues.

      In addition, it would be useful for readers to understand to what extent the shifts in average trait values and functional diversity can be decoupled from the biomass and soil water responses to changes in precipitation that would occur in a model with only a single biomass variable. For example, early studies on self-organization in semi-arid ecosystems already showed that the shift toward a patterned state involved the formation of patches with higher biomass, and higher soil water availability, as compared to the preceding spatially uniform state, and that the biomass in these patches remains relatively stable under decreasing rainfall, while their geometry changes (e.g. Rietkerket al. 2002). It has also been observed that for a given environmental condition, biomass in vegetation patches tends to increase with pattern wavelength (e.g. Bastiaansen and Doelman 2018; Bastiaansen et al. 2018). Given the model formulation, one wonders whether higher biomass in the single variable model is not automatically corresponding to higher abundance of faster growing species and a higher functional diversity (as the diffusion of biomass can cover a broader range when starting from higher mass in the optimal trait class). There are some indications in the current work that the linkage is more complicated, for example, the biomass peak in Fig. 7c is lower, but also broader as compared to the distribution of Fig. 7b, but it is currently not entirely clear how this result can be explained (for example, it might be the case that in the spatially patterned states, the biomass profiles also vary in space).

      We are not sure we understand what the reviewer means by “decoupled”, but much insight indeed can be gained from a study of a model for a single functional group (trait) and observing the behaviors described by the reviewer. In fact, these behaviors, which some of us are familiar with from numerical studies, motivated parts of the current study. Higher biomass in vegetation patches (compared to uniform vegetation) in the single trait model does not automatically imply a shift to faster growing species; in principle the stress-tolerant species that already reside in the system when uniform vegetation destabilizes to a periodic pattern can simply grow denser. To answer this and additional questions we need to take into account interspecific interactions by studying the full community model. As to Fig. 7b,c, the behavior appears to be opposite to that described by the reviewer: the biomass pick in Fig. 7c is higher and narrower than that in Fig. 7b, not lower and broader. This is because of the much larger domain of the patterned state as compared with that of the uniform state, which increases the abundance of low-𝜒 species, i.e. species investing in growth.

      The increase of biomass in vegetation patches with pattern wavelength for given environmental conditions, as observed by Bastiaansen et al. 2018, is actually another mechanism for increasing functional diversity. This is because the water stress at the patch center is higher than that in the outer patch areas and thus forms favorable conditions for stress tolerant species while the outer areas form favorable conditions for fast growing species.

      We added a new paragraph in the Discussion and Conclusion section (last paragraph in the subsection Insight III) where we discuss the effect of coexisting periodic patterns of different wavelengths on functional diversity and ecosystem management. We also added citations to the references the reviewer mentioned.

      The possibility of hybrid states, where part of the landscape is in a spatially uniform state, while the other part of the landscape is in a patterned state, is quite interesting. To better understand how such states could be leveraged in management strategies, it would be useful if a bit more information could be provided on how these hybrid states emerge, and whether one can anticipate whether a perturbation will grow until a fully patterned state, or whether the expansion will halt at some point, yielding the hybrid state. It seems that being able to distinguish this case would be necessary in the design of planning and management strategies

      The hybrid states appear in the bistability range of the uniform and patterned vegetation states, and typically occupy most of this range. Their appearance is related to the behavior of ‘front pinning’ in bistability ranges of uniform and patterned states in general. Front pinning refers to fronts that separate a uniform domain and a periodic-pattern domain, which remain stationary in a range of a control parameter (precipitation in our case). This is unlike fronts that separate two uniform states, which always propagate in one direction or another and can be stationary only at a single parameter value – the Maxwell point. Thus, an indication that a given landscape may have the whole multitude of hybrid states is the presence of a front (ecotones) that separates uniform and patterned vegetation. If that front appears stationary over long period of times (on average), this is a strong indication.

      We added a new paragraph in the subsection Insight III of the Discussion and conclusion section to clarify this point.

      Also, in Fig. 3a, the region of parameter space in which hybrid states occur is not very large; it is not entirely clear whether the full range of hybrid states is left out here for visual considerations, or whether these states only occur within this narrow range in the vicinity of the Turing instability point.

      As pointed out in the reply to the previous comment the hybrid states are limited to the bistability range of uniform and patterned vegetation, which is not wide. However, this should not necessarily restrictma nagement of ecosystem services by nonuniform biomass removal, as such management will have similar effects on community structure also outside the bistability range where front propagate slowly.

      The new paragraph we added also addresses this point.

      Reviewer #2 (Public Review):

      We thank the reviewer for carefully reading the manuscript and for the constructive criticisms and comments. In the following we address them point by point.

      1) Model presentation.

      It would be better to explain the model in ecological terms first, clarifying parameter biological meaning and justifying their choice. In doing so, creating a specific 'Methods' section, which now is lacking, would be of help too. Authors should clarify whether and how the model follows the conservation of mass principle involving precipitation and evapotranspiration. Are root growth and seed dispersal included for this purpose? Why they are not referred to any further in the analysis and discussion? Why a specific term for plant transpiration is not included, or is to somehow phenomenologically incorporated into the growth-tolerance tradeoff? In doing so, authors should also pay attention to water balance as above (H) and below (W) ground water are not independent from each other.

      We added a Methods section, which in eLife is placed at the end of the manuscript. The section includes the model equations and more detailed explanations in ecological terms of various parts of the model. We also added Table 1 with a list of all model parameters, their descriptions, units and numerical values used in the simulations. Presenting the model at the end of the manuscript suits more technical information about the model, but not essential information that is needed for understanding the results. We therefore kept the subsection “A model for spatial assembly of dryland plant communities” in the Results section, where we present that information.

      There is no conservation of mass in the model (and all other models of this kind) simply because the system that we consider is open. In particular, it does not include the atmosphere, which constitute part of the system’s environment. Including the atmosphere as additional state variables in the model, capturing the feedback of evapotranspiration on the atmosphere, would make the model too complicated for the kind of analysis we perform. So, although the model contains parts that represent mass conservation such as the terms describing below- and above-ground water transport, water mass is not conserved. The biomass variables represent aboveground biomass of living plants or plant parts and are not conserved either as biomass production involve biochemical reactions that convert inorganic substances coming from the system’s environment (atmosphere and the soil) into organic ones, while plant mortality involves organic matter that leaves the system.

      Roots in the model platform we consider are modeled indirectly through their relation to aboveground biomass. That relation constitutes one of the scale-dependent feedbacks that produce a Turing instability to vegetation patterns, the so-called root-augmentation feedback (see Meron 2019, Physics Today), but in this particular study we eliminate this feedback for simplicity. The scale-dependent feedback that we do consider is the so-called infiltration feedback, associated with biomass-dependent infiltration rate that produces overland water flow towards vegetation patches, as explained in the subsection “A model for spatial assembly of dryland plant communities”. It will be interesting indeed to extend the study in the future to include also the root-augmentation feedback.

      We assume short-range seed dispersal and take it into account through biomass “diffusion” terms (obtained as approximations of dispersal kernels assuming narrow kernels). These terms play important roles in the scale-dependent feedback that induces the Turing instability, as is explained in earlier papers which we cite. Plant transpiration is modeled through the water uptake term in the equation for the soilwater 𝑊. Indeed above-ground water 𝐻 and below-ground water 𝑊 are not independent; the infiltration term IH in the equations for both state variables account for this dependence in a unidirectional manner (loss of 𝐻 and gain of 𝑊). As we do not include the atmosphere in the model the other direction, namely, evapotranspiration that increases air humidity and affects rainfall, is not accounted for. The neglect of this effect can be justified for sparse dryland vegetation.

      These good points have already been discussed in many earlier papers as well as in the book Nonlinear Physics of Ecosystems (Meron 2015), and we cannot address them all in this paper. We did however add several clarifications in the section Modeling spatial assembly of dryland plant communities and in the new Methods section, including the consideration of the atmosphere as the system’s environment quantified by the precipitation parameter 𝑃.

      Another unclear point is that growth rates for the same plant functional groups are assumed to be constant among different species within the same group and are confounded by biomass production. Why is that the case? Furthermore, how many different species are characterizing each functional group? How are interspecific interactions accounted for (more specifically, see comment below)?

      In the trait-based approach we focus on just two functional traits, related to growth rate and tolerance to water stress, ignoring differences in other traits that distinguish species. That is, a given functional group consists of species that share the same values of the two selected functional traits (to a given precision determined by 𝑁), taking all other traits represented in the model to be equal. In this approach we do not care about how many species belong to each functional group, only their total biomass. We wish to add that simplifying assumptions of this kind are necessary if we want the model to be mathematically tractable and capable of providing deep insights by mathematical analysis.

      We expanded the discussion of the trait-based approach in the section Modeling spatial assembly of dryland plant communities and added relevant references (second paragraph).

      Finally, stress tolerance is purely phenomenological. There is no actual mechanism/parameter describing it. Rather, it "simply" appears as low/high mortality, which in turn is said to be due to high/low tolerance. This leads to a sort of circularity between mortality and tolerance. Yet, mortality can occur due to other biophysical factors (e.g. disturbance, fire, herbivory, pathogens). A drawback of this assumption is that a mechanism of drought tolerance is often to invest in belowground organs, including roots. However, according to the proposed model, it turns out that fast growing species with low investment in tolerance also have high investment in roots; vice versa, tolerant species have low investment in roots. This is a bit counterintuitive and not well biologically supported.

      First, we agree with the reviewer that our approach is purely phenomenological, as we model tolerance to water stress by a single parameter that lumps together the effects of various physiological mechanisms. That parameter can be distinguished from other factors affecting mortality by regarding the constant 𝑀𝑚𝑎𝑥 in Eq. (3) as representing several contributions. Since we do not study the effects of these other factors we can absorb them in 𝑀𝑚𝑎𝑥 for mathematical simplicity. Tolerance to water stress is not necessarily associated with roots. Plants can better tolerate water stress by reducing transpiration through stomatal closure, regulating leaf water potential, or develop hydraulically independent multiple stems that lead to a redundancy of independent conduits and higher resistance to drought (see Schenk et al. 2008 - https://doi.org/10.1073/pnas.0804294105).

      We added a discussion in the Methods section (5th paragraph, “Tolerance to water stess …”) of the simple form by which we model tolerance to water stress through the mortality parameter.

      2) Parameter choice.

      N = 128 is an extremely high number for plant functional groups. It is even quite unrealistic to have 128 species per square meter, so this value is not very reasonable. Please run the model and report results with more realistic N (e.g from 4-64) as well as with different sets of N values keeping all other parameters constant.

      We wish to clarify two points: 1) N=128 does not imply 128 functional groups per square meter; the emerging community has much lower functional richness (FR) as the average FR is around 0.25, meaning only 128 × 0.25 = 32 functional groups. 2) The model results, as reflected by the key metrics 𝜒𝑚𝑎𝑥, 𝐹𝑅, and 𝐹𝐸, are independent of the particular value of N (for N values sufficiently large), as Figures IA and IB below show. The biomass 𝐵𝑖 of each functional group, however, does change (Figure IA) because by changing N we change the range of traits Δ𝜒 = 1/𝑁 that belong to a given functional group. But if we look at the biomass density in trait space 𝑏𝑖, related to 𝐵𝑖 through the relation 𝐵𝑖 = 𝑏𝑖Δ𝜒, then also the biomass density is independent of 𝑁 as Figure IB shows. So, even if in practice there are less functional groups and thus species as considered in the model studies, the results are not affected by that. On the other hand, choosing higher 𝑁 values provides smoother curves and nicer presentation of our results.

      Figure IA

      Figure IB

      We added a discussion of this issue in the Methods section after Eq. (2).

      Gamma (rate of water uptake by plants' roots): why is it in that unit of m^2/kg * y? Why are you now considering the area (and not the volume) per biomass unit?

      The vegetation pattern formation model we study, like most other models of this kind, does not explicitly capture the soil depth dimension. Accordingly, W is interpreted as the soil-water content in the soil volume below a unit ground area within the reach of the plant roots. In practice W has units kg/m2, like B, and since Γ𝑊𝐵 should have the same units as 𝜕𝑊/𝜕𝑡 (see Eq. 1b), Γ must have the units of (𝐵𝑡)−1.

      A is not defined in the text.

      We now define it in Table 1 (see Methods section).

      M min: why 0.5 mortality? Having M max set to 0.9, please consider a lower mortality value set to 0.1, and please report evidence(hopefully) demonstrating the robustness of results to such change.

      The results are robust to the particular values of 𝑀𝑚𝑖𝑛 and 𝑀𝑚𝑎𝑥, except that there are combinations of these two parameters for which the biomass distributions are pushed towards the edge of the 𝜒 domain, which make the presentation of the results less clear. Figure II shows results of recalculations of the distribution 𝐵 = 𝐵(𝜒) for 𝑀𝑚𝑖𝑛 = 0.1, as requested (using 𝑀𝑚𝑎𝑥 = 0.15) for 3 different precipitation values. As the reviewer can see there’s no qualitative change in the results: lower precipitation push a uniform community to stress tolerant species (higher 𝜒), while the formation of patterns at yet lower precipitation push the community back to fast growing species (low 𝜒).

      Figure II

      K_min and K_max are in two different units, and should both be kg/m^2.

      Thanks, we fixed this typo in Table 1.

      Values of precipitation (P, mean annual precipitation) are not reported.

      The precipitation parameter is variable, as is now stated in Table 1, and therefore was not include it in the list of parameters’ values used. Whenever a particular precipitation value has been used our intention was to state it in the caption of the corresponding figure. This was done in Figs. 5,6,7, but indeed not in Fig. 4 (Fig. 3 in revised ms.). The insets on the right side of Fig. 3 (Fig. 4 in revised ms.) where also calculated for particular precipitation values, but that information is not essential as the intention is to show typical forms of the various solution branches, which do not qualitatively change along the branches (i.e. at different P values).

      We added the precipitation value (P=180mm/y) at which all the biomass distributions shown in new Fig. 3 (Fig. 4 in original ms) were calculated.

      3) Results presentation and interpretation.

      Parameter range of precipitation in figure 3 is odd. Why in one case precipitation ranges from 0 to 160 while in another it is only 60-120? Furthermore, in paragraph 198-213 and associated results in fig. 5. the Choice of precipitation values is somehow discordant from the previous model. Please provide motivation for this choice, clarify and uniformize it.

      In Fig. 3b (Fig. 4b in revised ms) we restricted the precipitation range to 60-120 as the curves, which are limited to 0 < 𝜒 < 1 (by the definition of 𝜒), do not extend to 𝑃 < 60 and to 𝑃 > 120. Extending the range to 0 < 𝑃 < 160 would make the figure less compact and nice as it will contain blank parts with no information.

      We are not sure we understand what the reviewer means by “is somehow discordant from the previous model”. The motivation of the choices we made for the precipitation values P=150, 100 and 80 was to show the shift of a spatially uniform community to a higher 𝜒 value as the precipitation is decreased to a lower value (from 150 to 100), and the shift back to a lower 𝜒 value at yet lower precipitation (80) past the Turing instability.

      Finally, authors seem to create confusion around community composition, which is defined as the (taxonomic) identity of all different species inhabiting a community. Notably, it is remarkably different from the x_max parameter used in the model, which as a matter of fact is just the value of the most productive (notably, not necessarily the most abundant) functional group.

      We thank the reviewer for this comment. Since all the emerging communities in the model studies are pretty localized around the value of 𝜒𝑚𝑎𝑥, that value does contain information about the identity of other functional groups in the community when complemented by FR (functional richness) and FE (functional evenness). More significantly to our study, shifts in 𝜒𝑚𝑎𝑥 represent the shifts in community composition we focus on in this study, i.e. shifts towards fast growing species or towards stress-tolerant species.

      We modified the description of the community-level properties that can be derived from the biomass distribution in trait space (see modified text towards the end of the section “Modeling spatial assembly …” and also the caption of Fig. 3b), explaining that both functional diversity and community composition can be described by several metrics, and clarifying the significance of 𝜒𝑚𝑎𝑥 in describing community-composition shifts.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors explore mechanisms involved in the predation of other bacteria by Myxococcus xanthus. The major findings are (1) M. xanthus cells depend on gliding motility to efficiently invade an E. coli prey colony. (2) E. coli prey cells are lysed in a contact-dependent manner. (3) When M. xanthus cells make prey contact, they sometimes pause and then kill the prey cell. (4) Using a genetic screen, two gene clusters (referred to as the kil gene clusters) are identified that encode proteins, some of which have homology to those of Tad pili. Some of the Kil proteins are important for pausing of cells and killing of prey. (5) One of the suggested Kil proteins assemble to form clusters upon prey contact; however, assembly of these clusters is independent of other Kil proteins On the basis of these findings the authors suggest that the Kil proteins assemble to form a Tad pilus system and are important for pausing and prey killing. Overall, this is an interesting manuscript; however, it remains unclear what the actual function of the identified Kil proteins are.

      The reviewer raised an important point because it is correct that the previous data did not formally establish that a Tad-like machinery is recruited at the prey contact site. Addressing this point was challenging because it required to either demonstrate direct interactions between KilD and Tad structural components or show that predicted Tad core components also localize dynamically upon contact with the prey. This later possibility nevertheless required to obtain functional fluorescent protein fusions, which are typically difficult to obtain for membrane proteins. Below we describe which strategies we chose to address the reviewer’s comments.

      Weaknesses include

      (1) The lack of genetic complementation experiments. Thus, it isunclear precisely which of the Kil proteins are important for predation.

      This question is especially relevant for the cluster 2 genes, given that its functional association with the cluster 1 genes were only provided genetically in the previous version. In this cluster, we have chosen to only delete the genes annotated as Tad-like proteins, namely the two IM platform proteins CpaA and CpaG, the outer membrane accessory protein CpaB and three predicted pilin homologs. We did not attempt complementing the pili deletions given that they all show at best intermediate phenotypes when individually deleted and that a triple deletion is needed to obtain a kil-null phenotype. This strongly argues against polar effects in the pilin deletion mutants. However, we agree that it was important to show that the mutation in the Cpa homologs were not polar, demonstrating the critical function of these genes. In this version for coherence, we chose to complement core Tad components encoded in cluster 1 and 2, the secretin (cluster 1), the ATPase (cluster 1) and the two IM platform proteins (cluster 2). These complementations are now provided as proof that these genes are all essential for predation in a new Figure 5a and S4b.

      (2) The Kil proteins are encoded in two gene clusters. The evidencethat these proteins make up a Tad pilus system is based on homology and that mutations in both clusters result in reduced predation. No evidence is presented that proteins encoded by these two clusters interact to form a Tad pilus machine.

      (3) The authors localize the Kil system using an NG-KilD fusion;however, there is no evidence that KilD, which is a FHA domaincontaining protein, associates with the Tad pilus machinery. In fact, KilD makes clusters independently of all other Kil proteins tested suggesting that these clusters may not report on Kil assembly and activity. An equally plausible scenario is that Myxococcus/E. coli contacts result in activation of KilD leading to the formation of foci. These foci then signal assembly of the Kil system somewhere in a cell (or maybe not). Therefore, it is not clear where and if this machinery localizes during prey contact.

      These two points are related so we answer them jointly.

      Showing direct interactions between Tad proteins is challenging and in fact, there is currently very little interaction data for these machineries, contrarily to Type-IV pili and Type-2 secretion systems.

      For this reason, we chose a localization approach reasoning that localization of Tad core components in contact with E. coli would show that the system is assembled at the prey contact site. We now present data showing that both KilF (The ATPase encoded by cluster 1) and KilG (the CpaG homolog) both form clusters at the prey contact site similar to KilD. Since these proteins are predicted to form complementary parts of the Tad machinery and are encoded by cluster 1 and 2, we believe that these results demonstrate dynamic assembly of the machinery at the prey contact site.

      (4) I did not find a description of how the mutagenesis was done.Please include a description of how the mutagenesis was done, how many mutants were screened, and in which loci the mutations (transposon insertions?) occurred. Was the screen saturated?

      We did not perform an extended genetic screen to find the kil genes. We tested a number of selected mutants in predicted membrane complexes, including all A-motility genes, possible orthologs of the Caulobacter Cdcz system, a possible CDI system, T6SS genes and decarboxylase genes and the kil cluster 1 and 2 genes. Mutations in the kil cluster 1 and 2 genes were the only ones to show a killing defect so we followed up. We do not mention all the tested genes, given that they were not investigated in depth and rapidly discarded as negative candidates.

      We nevertheless clarified the text to avoid any confusion.

      (5) Throughout the manuscript, the authors need to tone down theirconclusions and stick to what they actually show. It is also important that the authors present their results in the context of what is already known about contact-dependent killing in M. xanthus.

      We believe that this comment was mostly an objection to our inference that our data previously showed assembly of the Tad pilus at the contact site. The new data strongly reinforces this view. We nevertheless carefully rewrote the manuscript making sure that the conclusions are indeed in line with the data.

    1. Author Response

      Reviewer #1 (Public Review):

      The general idea of comparing response patterns to stress in the offspring generation is new and very interesting.

      We thank Reviewer 1 for their time and thoughtful comments. We agree that these comparisons are new and very interesting and have added multiple revised analyses to the manuscript based on the reviewer comments that we think will further enhance the impact of and conclusions made in this study.

      However, the data that are presented are in several ways preliminary. The phenotype comparisons are mostly convincing, although statistical treatments are partly unclear, given that each "replicate" includes itself many individuals.

      The statistical treatments for groups of individuals are the same as in Burton et al., 2017, Burton et al., 2020, and Willis et al., 2021 which include the original reports of the intergenerational responses studied here. Replicates that include many individuals are relatively common when working with C. elegans and are usually compared using ANOVA or student’s t-tests (depending on the number of comparisons) to analyze the variation in batch effects as well as differences between populations of animals.

      We believe this ability to assay hundreds or even thousands of animals, in total, for each comparison in this study makes our data substantially stronger and more reliable. However we are happy to perform any additional statistical tests the reviewer might want to see.

      The transcriptomic data are minimal (only three replicates)

      To address this comment we compared our original three replicates of RNA-seq from F1 animals from C. elegans parents exposed to P. vranovensis BIGb0446 to a second independent three replicates of F1 animals from C. elegans parents exposed to a second P. vranovensis isolate (BIGb0427 – the data for this second P. vranovensis isolate was already part of Fig. 4 of this manuscript).

      By comparing these three new replicates to our previous findings from three original replicates we found that 515 of the 562 genes that exhibited a >2-fold change and were significant at padj <0.01 in the original three replicates were also changed at >2-fold and padj <0.01 in the new three replicates. We believe our findings that 91.6% of genes change >2-fold and remain significant at padj<0.01 even when the number of replicates is doubled (and a different isolate of P. vranovensis is used!) suggests that adding additional replicates would not substantially change the conclusions of this manuscript.

      We would also like to highlight, as above, that because this analysis was done on populations of thousands of similarly staged animals, as opposed to individuals, that this further reduces the variability between replicates. In addition, much of our transcriptomic data from each species was then compared across species and genes were only analyzed for those that changed in multiple different species which themselves each represent a separate three additional replicates [ie genes that change in all 4 species analyzed have to exhibit significant (>2-fold, padj <0.01) changes across 12 total replicates].

      Our new findings comparing six replicates did not substantially change the number of genes identified when compared to using three replicates, and the fact that for all of the main conclusions of this manuscript each set of triplicates from one species was then compared across 9 additional replicates from three other species from pools of thousands of animals makes us very confident that our results are robust and highly reproducible.

      and lack comparison to the stress responses in the parental animals.

      We agree with Reviewer 1 that comparisons to parental animals are interesting and important. Comparisons of F1 progeny gene expression patterns to parental animals were not included here because such comparisons were previously published in some of our original reports of these intergenerational effects (For example, see Burton et al., 2020). In summary, we found that most, but not all, of the effects on gene expression in F1 animals were also detected in parental animals. However, the transcriptional responses only turn on in F1 animals post gastrulation and do not appear to be due to the simple deposition of parental mRNAs into embryos (Burton et al., 2020).

      We have updated the text to highlight these findings.

      The analysis of the transcriptome data is limited to counting overlaps between significantly changed genes, without deeper discussion of the genes and pathways that are affected.

      In the revised manuscript we have completely redone all of the transcriptomic analysis to use a stricter set of cutoffs for significance – both padj <0.01 and requiring a >2-fold change in expression based on the helpful comments of Reviewer 1 – which we agree with – see below.

      As part of this new analysis we have now also included a deeper discussion of the genes that exhibited similar changes across species, including using g:Profiler to examine the genes that exhibited changes across all four species.

      In addition, we have now paired our phenotypic and transcriptomic data across species to identify 19 new genes that we predict are highly likely to be involved in intergenerational responses to stress based on their expression patterns across species. These 19 genes come out of highly filtered analyses across species that identified a total of 23 genes that change only in species that adapt to P. vranovensis or osmotic stress and not in species that do not adapt.

      Interestingly, this analysis identified nearly all of the previously known genes involved in intergenerational adaptations to these stresses including rhy-1, cysl-1, cysl-2 and gpdh-1. Thus, we predict the remaining 19 genes that came out of this analysis are highly likely to be involved in the responses to these stresses. Furthermore, in the revised text we highlight that our new list of 19 genes includes multiple conserved factors that are required for animal viability including genes involved in nuclear transport (imb-1 and xpo-2), the CDC25 phosphatase ortholog cdc-25.1, and the PTEN tumor suppressor ortholog daf-18. This new analysis will likely form the basis for future investigations into the mechanisms underlying these exciting intergenerational effects.

      We believe this additional analysis greatly improves this manuscript. We are also happy to include any specific additional analysis the reviewer would like to see.

      The top response genes that are directly tested have been discovered before. Hence, while interesting patterns are evident from the data, this work largely confirms prior work, including that described in Burton et al. 2020.

      We have revised the text to highlight that the aims of this particular study were to determine if multigenerational responses to stress were evolutionarily conserved at any level, as well as to determine the potential costs of such effects and the specificity of the responses. Questions that were not addressed in any previous study of multigenerational effects, including Burton et al., 2020. Because of the aims of this study we believe it was critical to focus on genes that had an established role in these intergenerational responses in C. elegans and to compare and contrast the behavior and requirement of these genes in intergenerational responses in other species. (Although we note that this newly revised manuscript we have now also reported 19 new top response genes – see above).

      In addition to our original goals, in this study we were able to determine the extent to which intergenerational transcriptional responses are conserved and the extent to which intergenerational transcriptional changes persist transgenerationally (which we find to be effectively not at all using our revised stricter analysis). We believe these findings are not only novel, but perhaps will be surprising to much of the intergenerational and transgenerational field and have a major impact on both how multigenerational studies are interpreted and how they are conducted in the future. This is especially the case for studies in C. elegans which is one of the leading model organisms to study the mechanisms underlying both intergenerational and transgenerational responses to stress.

      For example, we note that several landmark studies of transgenerational effects (persisting into F3 or later generations) in C. elegans performed RNA-seq on F1 progeny (For example, Moore et al., Cell 2019 or Ma et al., Nature Cell Biology 2019). Our new findings reported here suggest that it is possible that none of the transcriptional effects detected in F1 animals will persist in F3 progeny. Furthermore, our studies demonstrate the importance of comparing C. elegans transcriptional effects to related Caenorhabditis species as we found that only a subset of the effects detected in C. elegans are conserved in any other Caenorhabditis species. (Such comparisons are important for determining if and to what extent observations of intergenerational and/or transgenerational effects observed in C. elegans represent conserved phenomena).

      For all of these reasons we believe our data is highly exciting, will be of broad interest to the field, and represent novel and potentially unexpected findings that were not previously reported in any prior work including Burton et al., 2020.

      Reviewer #2 (Public Review):

      Transgenerational effects (TE) (usually defined as multigenerational effects lasting for at least three generations) generated a lot of interest in recent years but the adaptive value of such effects is unclear. In order to understand the scope for adaptive TE we need to understand i) whether such effects are common; ii) whether they are stress-specific; and iii) if there are trade-offs with respect to performance in different environments. The last point is particularly important because F1, F2 and F3 descendants may encounter very different environments. On the other hand, intergenerational effects (lasting for one or two generations) are relatively common and can play an important role in evolutionary processes. However, we do not know whether intergenerational and transgenerational effects have same underlying mechanisms.

      This study makes a big step towards resolving these questions and strongly advances our understanding of both phenomena. Much of the previous work on mechanisms of multigenerational effects has been conducted in C. elegans and this works uses the same approach. They focus on bacterial infection, Microsporidia infection, larval starvation and osmotic stress. I did not quite understand why the authors chose to focus on P. vranovensis rather than P. aeruginosa P14 that has been used in previous studies of transgenerational effects in C. elegans. However, this is a minor point because I guess they were interested in broad transgenerational responses to bacterial infection rather than in strain-specific ones. The authors used different Caenorhabditis species, which is another strength of this study in addition to using multiple stresses.

      We thank the reviewer for these comments. We’d like to briefly highlight that P. vranovensis was also shown to elicit the same transgenerational effects as P. aeruginosa in the bioRxiv version of the same papers that reported transgenerational effects for P. aeruginosa (Kaletsky et al., 2020 – GRb0427 is an isolate of P. vranovensis).

      It is not clear to us why this result was not included in the final published version of this manuscript, but we in fact used P. vranovensis for these studies in part because of this bioRxiv paper and because we failed to detect any robust intergenerational effects using P. aeruginosa PA14 in any of our assays – including at the RNA-seq level (unpublished).

      Nonetheless, we have since confirmed with Coleen Murphy’s lab that they do find P. vranovensis elicits the same transgenerational effect on behaviour as P. aeruginosa. We expect that future investigations into the conditions under which P. vranovensis elicits effects that are lost/erased after 1 generation and the conditions under which effects might be maintained for more than 3 generations will be highly interesting.

      They found 279 genes that exhibited intergenerational changes in all C species tested, but most interestingly, they show that a reversal in gene expression corresponds to a reversal in response to bacterial infection (beneficial in two species and deleterious on one). This is very intriguing! This was further supported by similar observations of osmotic stress response.

      We thank Reviewer 2 for their excitement and we agree that these findings were highly exciting.

      They also report that intergenerational effects are stress-specific and there have deleterious effects in mismatched environments, and, importantly, when worms were subject to multiple stresses. It is quite likely that offspring will experience a range of environments and that several environmental stresses will be present simultaneously in nature. I really liked this aspect of this work as I think that tests in different environments, especially environments with multiple stresses, are often lacking, which limits the generality of the conclusions.

      Another interesting piece of the puzzle is that beneficial and deleterious effects could be mediated by the same mechanisms. It would be interesting to explore this further. However, this is not a real criticism of this work. I think that the authors collected an impressive dataset already and every good study generates new research questions.

      Given these findings, I was particularly keen to see what comes of transgenerational effects. The general answer was that there aren't many, and the authors conclude that all intergenerational effects that they studied are largely reversible and that intergenerational and transgenerational effects represent distinct phenomena. While I think that this is a very important finding, I am not sure whether we can conclude that intergenerational and transgenerational effects are not related.

      In my view, an alternative interpretation is that intergenerational effects are common while transgenerational effects are rare. Because intergenerational effects are stress-specific, transgenerational effects could be stress-specific as well.

      We agree with reviewer 2 that our findings suggest that intergenerational effects are common and transgenerational effects are either rare in comparison or only occur under specific conditions. We have updated the text to include this interpretation.

      Perhaps different mechanisms regulate intergenerational responses to, say, different forms of starvation (e.g. compare opposing transgenerational responses to prolonged larval starvation (Rechavi et al. doi:10.1016/j.cell.2014.06.020) and rather short adulthood starvation (Ivimey-Cook et al. 2021 https://doi.org/10.1098/rspb.2021.0701). Perhaps some (most?) forms of starvation generate only intergenerational responses and do not generate transgenerational responses. But some do. Those forms of starvation that generate both intergenerational and transgenerational effects could do so via same mechanisms and represent the same phenomenon. I am by no means saying this is the case, but I am not sure that the absence of evidence of transgenerational effects in this study necessarily suggests that inter- and trans-generational effects are different phenomena.

      We agree and, similar to above, have updated the text accordingly to state that it is also very possible that transgenerational effects only occur under certain conditions.

      The only concern real concern was the lack of phenotypic data on F3 beyond gene expression. Ideally, I would like to see tests of pathogen avoidance and starvation resistance in F3. However, given the amount of work that went into this study, the lack of strong signature of potential transgenerational effects in gene expression, and the fact that most of these effects were shown previously to last only one generation, I do not think this is crucial.

      We thank reviewer 2 for these comments and agree that phenotypic investigations of F3 effects are also very interesting.

      We have previously investigated the phenotypic effects of all of the stresses used in this paper on F3 animals using the assays described here and consistent with our new gene expression findings we previously found that most of these stresses do not exert phenotypic effects in F3 animals (Burton et al. 2020, Willis et al 2021, Hibshman et al., 2016).

      Separately, we have also attempted to investigate the effects of pathogen exposure on pathogen avoidance, as these effects have previously been reported to occur transgenerationally, but to date have been unable to consistently replicate these findings. We expect that this is likely due to what might be subtle differences in conditions between labs (differences in water used for the media prep, air humidity, potential differences in N2 wild-type strains etc….) because assays such as behavioral avoidance are known to be very sensitive to many different environmental inputs.

      We currently believe that our experiences as they relate to intergenerational and transgenerational effects support the general conclusion of this manuscript that while intergenerational effects are common and easy to initiate across multiple labs (the intergenerational effects studied here have now been successfully reproduced in labs in the US, UK, and Canada), transgenerational effects might be more specific and/or only occur/be initiated under more stringent conditions – perhaps with the aim of avoiding the costs of such multigenerational effects.

      Future studies of exactly when/under what conditions C. elegans initiates intergenerational vs transgenerational effects is likely to be very interesting.

      It would be very interesting to compare gene expression and other phenotypic responses in F1 and F3 between P. vranovensis and PA14. Also, it would be interesting to test the adaptive value of intergenerational and transgenerational effects after exposure to both strains in different environments. This is would be very informative and help with understanding the evolutionary significance of transgenerational epigenetic inheritance of pathogen avoidance as reported previously. Why response to P. vranovensis is erased while response to PA14 is maintained for four generations? Are nematodes more likely to encounter one species than the other? Again, however, this is not something necessary for this study.

      We completely agree with Reviewer 2 and have indeed attempted these experiments both in Burton et al., 2020 and in unpublished results.

      With regards to the transgenerational F3 effects, as mentioned above, P. vranovensis has been reported to elicit the same transgenerational effect as P. aeruginosa PA14 – at least as reported in the Kaletsky et al., 2020 bioRxiv version of the manuscript from the same studies. (GRb0427 is an isolate of P. vranovensis).

      To date, however, in our laboratory we have been unable to detect any transgenerational effects for either P. vranovensis or P. aeruginosa infection on gene expression data from RNA-seq experiments (data from this manuscript and unpublished data).

      It is not yet clear why this is the case, but we note that the RNA-seq analysis from the transgenerational PA14 studies (published in Moore et al., Cell 2019) was performed on F1 animals and thus was looking at intergenerational effects – to our knowledge no RNA-seq on F3 progeny from animals exposed to PA14 has ever been published. Thus, as it stands there is no existing F3 gene expression studies done using PA14 for us to compare our results to, but it remains possible that PA14 does not elicit specific effects on F3 gene expression when analyzed by RNA-seq.

      For F1 effects we have published a gene expression comparison for P. vranovensis and P. aeruginosa F1 effects in a previous manuscript (Burton et al 2020) and will add a mention of this to the text. Briefly, we detected very few F1 effects on gene expression when exposing adults to P. aeruginosa for 24 hours and parental infection by P. aeruginosa did not result in protection for offspring from P. vranovensis infection (Burton et al., 2020). We concluded that the intergenerational adaptation to P. vranovensis was not initiated by P. aeruginosa and was at least somewhat specific to P. vranovensis as well as the new species of Pseudomonas described in this manuscript which does cross protect.

      The main strengths of this paper are i) use of multiple stresses; ii) use of multiple species; iii) tests in different environments; and iv) simultaneous evaluation of intergenerational and transgenerational responses. This study is first of a kind, and it provides several important answers, while highlighting clear paths for future work.

      Excellent work and I think it will generate a lot of interest in the community, definitely want to see it published in eLife.

      We agree with Reviewer 2 and thank them for their kind comments.

      Reviewer #3 (Public Review):

      In this manuscript, the authors address whether the mechanisms mediating intergenerational effects are conserved in evolution. This question is important not only to frame this phenomenon in an evolutionary context, but to address several interlinked questions: is there a mechanism in common between adaptive versus deleterious effects? What makes some effects last one instead of several generations? What is the ecological relevance for those mechanisms? Using Caenorhabditis elegans as a model of reference, they compare four types of intergenerational effects on additional three Caenorhabditis species.

      The authors used previously characterized models of intergenerational inheritance, focusing on those that are likely to have adaptive significance. This is relevant, because the adaptive relevance of other published examples of inter- and transgenerational inheritance is not clear. They used functional studies to probe for conservation of mechanisms for bacterial infection and resistance to osmolarity stress, which is a major strength of this study. The data supports the claim of conservation in some types of intergenerational inheritance and divergence in others. One major question addressed in this manuscript is whether there is a potential overarching mechanism that confers stress-resistance across generations. Their experiments convincingly show that this is not the case, but that instead, there are stress-specific mechanisms responsible for intergenerational inheritance.

      We agree and thank Reviewer 3 for their kind comments.

    1. Author Response

      Reviewer #1 (Public Review):

      The relationship between genetic disease and adaptation is important for biomedical research as well as understanding human evolution. This topic has received considerable attention over the past several decades in human genetics research. The present manuscript provides a much more comprehensive and rigorous analysis of this topic. Specifically, the authors select a set of ~4000 human Mendelian disease genes and examine patterns of recent positive selection in these genes using the iHS and nSL tests (both haplotype test) for selection. They then compare the signals of sweeps to control genes. Importantly, they match the control set to the disease genes based upon many different genomic variables, such as recombination rate, amount of background selection, expression level, etc. The authors find that there is a deficit of selective sweeps in disease genes. They test several hypotheses for this deficit. They find that the deficit of sweeps is stronger in disease genes at low recombination rate and those that have more disease mutations. From this, the authors conclude that strongly deleterious mutations could be impeding selective sweeps.

      Strengths

      The manuscript includes a number of important strengths:

      1) It tackles an important question in the field. The question of selection in disease genes has been very well-studied in the past, with conflicting viewpoints. The present study examines this topic in a rigorous way and finds a deficit of sweeps in disease genes.

      2) The statistical analyses are rigorously done. The genome is a confusing place and there can often be many reasons why a certain set of genes could differ from another set of genes, unrelated to the variable of interest. Di et al. carefully match on these genomic confounders. Thus, they rigorously demonstrate that sweeps are depleted in disease genes relative to control genes. Further, the pipeline for ranking the genes and testing for significance is solid.

      3) The Introduction of the manuscript nicely relates different evolutionary models and explanations to patterns that could be seen in the data. As such, the present manuscript isn't just merely an exploratory analysis of patterns of sweeps in disease genes. Rather, it tests specific evolutionary scenarios.

      Weaknesses

      1) The authors did not discuss or test a basic explanation for the deficit of sweeps in disease genes. Namely, certain types of genes, when mutated, give rise to strong Mendelian phenotypes. However, mutations in these genes do not result in variation that gives rise to a phenotype on which positive selection could occur. In other words, there are just different types of genes underlying disease and positive selection. I could think that such a pattern would be possible if humans are close to the fitness optimum and strong effect mutations (like those in Mendelian disease genes) result in moving further away from the fitness optimum. On the other hand, more weak effect mutations could be either weakly deleterious or beneficial and subject to positive selection. I'm not sure whether these patterns would necessarily be captured by the overall measures of constraint which the disease and non-disease genes were matched on.

      We thank the reviewer for suggesting that alternative explanation. It is indeed important that we compare it with our own explanation. To rephrase the reviewer’s suggestion, it is possible that disease genes may just have a different distribution of fitness effects of new mutations. Specifically, mutations in disease genes might have such large effects that they will consistently overshoot the fitness optimum, and thus not get closer to this optimum. This would prevent them from being positively selected. Two predictions can be derived from this potential scenario. First, we can predict a sweep deficit at disease genes, which is what we report. Second, we can also predict that disease genes should exhibit a deficit of older adaptation, not just recent adaptation detected by sweep signals. Indeed, the decrease in adaptation due to (too) large effect mutations would be a generic, intrinsic feature of disease genes regardless of evolutionary time. This means that under this explanation, we expect a test of long-term adaptation such as the McDonald-Kreitman test to also show a deficit at disease genes.

      This latter prediction differs from the prediction made by our favored explanation of interference between deleterious and advantageous variants. In this scenario, the sweep deficit at disease genes is caused by the presence of deleterious, and most importantly currently segregating disease variants. Because the presence of the segregating variants is transient during evolution, our explanation does not predict a deficit of long-term adaptation. We can therefore distinguish which explanation (the reviewer’s or ours) is the most likely based on the presence or absence of a long-term adaptation deficit at disease genes.

      To test this, we now compare protein adaptation in disease and control genes with two versions of the MK test called ABC-MK and GRAPES (refs). ABC-MK estimates the overall rate of adaptation, and also the rates of weak and strong adaptation,and is based on Approximate Bayesian Computation. GRAPES is based on maximum likelihood. Both ABC-MK and GRPES have shown to provide robust estimates of the rate of protein adaptation thanks to evaluations with forward population simulations (refs). We find no difference in long-term adaptation between disease and control non-disease genes, as shown in new figure 4. This shows that the explanation put forward by the reviewer of an intrinsically different distribution of mutation effects at disease genes is less likely than an interference between currently segregating deleterious variants with recent, but not with older long-term adaptation. We even show in the new figure 4 that disease genes and their controls have more, not less strong long-term adaptation compared to the whole human genome baseline (new figure 4C). Also, disease genes in low recombination regions and with many disease variants have experienced more, not less strong long-term adaptation than their controls. Therefore, far from overshooting the fitness optimum due to stronger fitness effects of mutations, it looks like that these stronger fitness effects might in fact be more frequently positively selected in these disease genes.

      We now provide these new results P15L418:<br /> “Disease genes do not experience constitutively less long-term adaptive mutations<br /> A deficit of strong recent adaptation (strong enough to affect iHS or 𝑛𝑆!) raises the question of what creates the sweep deficit at disease genes. As already discussed, purifying selection and other confounding factors are matched between disease genes and their controls, which excludes that these factors alone could possibly explain the sweep deficit. Purifying selection alone in particular cannot explain this result, since we find evidence that it is well matched between disease and control genes (Figures 2 and Figure 4-figure supplement 1). Furthermore, we find that the 1,000 genes in the genome with the highest density of conserved elements do not exhibit any sweep deficit (bootstrap test + block-randomized genomes FPR=0.18; Methods). Association with mendelian diseases, rather than a generally elevated level of selective constraint, is therefore what matters to observe a sweep deficit. What then might explain the sweep deficit at disease genes?

      As mentioned in the introduction, it could be that mendelian disease genes experience constitutively less adaptive mutations. This could be the case for example because mendelian disease genes tend to be more pleiotropic (Otto, 2004), and/or because new mutations in mendelian are large effect mutations (Quintana-Murci, 2016) that tend to often overshoot the fitness optimum, and cannot be positively selected as a result. Regardless of the underlying processes, a constitutive tendency to experience less adaptive mutations predicts not only a deficit of recent adaptation, but also a deficit of more long-term adaptation during evolution. The iHS and nSL signals of recent adaptation we use to detect sweeps correspond to a time window of at most 50,000 years, since these statistics have very little statistical power to detect older adaptation (Sabeti et al., 2006). In contrast, approaches such as the McDonald-Kreitman test (MK test) (McDonald and Kreitman, 1991) capture the cumulative signals of adaptative events since humans and chimpanzee had a common ancestor, likely more than six million years ago. To test whether mendelian disease genes have also experienced less long-term adaptation, in addition to less recent adaptation, we use the MK tests ABC-MK (Uricchio et al., 2019) and GRAPES (Galtier, 2016) to compare the rate of protein adaptation (advantageous amino acid changes) in mendelian disease gene coding sequences, compared to confounding factors-matched non-disease controls (Methods). We find that overall, disease and control non-disease genes have experienced similar rates of protein adaptation during millions of years of human evolution, as shown by very similar estimated proportions of amino acid changes that were adaptive (Figure 5A,B,C,D,E). This result suggests that disease genes do not have constitutively less adaptive mutations. This implies that processes that are stable over evolutionary time such as pleiotropy, or a tendency to overshoot the fitness optimum, are unlikely to explain the sweep deficit at disease genes. If disease genes have not experienced less adaptive mutations during long-term evolution, then the process at work during more recent human evolution has to be transient, and has to has to have limited only recent adaptation. It is also noteworthy that both disease genes and their controls have experienced more coding adaptation than genes in the human genome overall (Figure 5A), especially more strong adaptation according to ABC-MK (Figure 5C). The fact that the baseline long-term coding adaptation is lower genome-wide, but similarly higher in disease and their control genes, also shows that the matched controls do play their intended role of accounting for confounding factors likely to affect adaptation. The fact that long-term protein adaptation is not lower at disease genes also excludes that purifying selection alone can explain the sweep deficit at disease genes, because purifying selection would then also have decreased long-term adaptation. A more transient evolutionary process is thus more likely to explain our results.”

      Then P22L613: “More importantly, the fact that constitutively less adaptation at disease genes combined to more power to detect sweeps in low recombination regions does not explain our results, is made even clearer by the fact that disease genes in low recombination regions and with many disease variants have in fact experienced more, not less long-term adaptation according to an MK analysis using both ABC-MK and GRAPES (Figure 5F,G,H,I,J). ABC-MK in particular finds that there is a significant excess of long-term strong adaptation (Figure 4H, P<0.01) in disease genes with low recombination and with many disease variants, compared to controls, but similar amounts of weak adaptation (Figure 5G, P=0.16). It might be that disease genes with many disease variants are genes with more mutations with stronger effects that can generate stronger positive selection. The potentially higher supply of strongly advantageous variants at these disease genes makes it all the more notable that they have a very strong sweep deficit in recent evolutionary times. This further strengthens the evidence in favor of interference during recent human adaptation: the limiting factor does not seem to be the supply of strongly advantageous variants, but instead the ability of these variants to have generated sweeps recently by rising fast enough in frequency.”

      2) While I think the authors did a superb job of controlling for genome differences between disease and non-disease genes, the analysis of separating regions by recombination rate and number of disease mutations does not seem as rigorous. Specifically, the authors tested for enrichment of sweeps in disease genes vs control and then stratified that comparison by recombination rate and/or number of disease mutations. While this nicely matches the disease genes to the control genes, it is not clear whether the high recombination rate genes differ in other important attributes from the low recombination rate genes. Thus, I worry whether there could be a confounder that makes it easier/harder to detect an enrichment/deficit of sweeps in regions of low/high recombination.

      We thank the reviewer for emphasizing the need for more controls when comparing our results in low or high recombination regions. We have now compared the confounding factors between low recombination disease genes and high recombination disease genes, as classified in the manuscript. As shown in new supp table Figure 6 figure supplement 1, confounding factors do not differ substantially between low and high recombination disease genes, and are all within a range of +/- 25% of each other. It would take a larger difference for any confounding factor to explain the sharp sweep deficit difference observed between the low and high recombination disease genes. The only factor with a 35% difference between low and high recombination mendelian disease genes is McVicker’s B, but this is completely expected; B is expected to be lower in low recombination regions.

      We now write P20L569: “Further note that only moderate differences in confounding factors between low and high recombination mendelian disease genes are unlikely to explain the sweep deficit difference (Figure 6-figure supplement 1).”

      Regarding the potential confounding effect of statistical power to detect sweeps differing in low and high recombination regions, please see our earlier response to main point 2.

      Reviewer #2 (Public Review):

      This paper seeks to test the extent to which adaptation via selective sweeps has occurred at disease-associated genes vs genes that have not (yet) been associated with disease. While there is a debate regarding the rate at which selective sweeps have occurred in recent human history, it is clear that some genes have experienced very strong recent selective sweeps. Recent papers from this group have very nicely shown how important virus interacting proteins have been in recent human evolution, and other papers have demonstrated the few instances in which strong selection has occurred in recent human history to adapt to novel environments (e.g. migration to high altitude, skin pigmentation, and a few other hypothesized traits).

      One challenge in reading the paper was that I did not realize the analysis was exclusively focused on Mendelian disease genes until much later (the first reference is not until the end of the introduction on pages 7-8 and then not at all again until the discussion, despite referring to "disease" many times in the abstract and throughout the paper). It would be preferred if the authors indicated that this study focused on Mendelian diseases (rather than a broader analysis that included complex or infectious diseases). This is important because there are many different types of diseases and disease genes. Infectious disease genes and complex disease genes may have quite different patterns (as the authors indicate at the end of the introduction).

      We want to apologize profusely for this avoidable mistake. We have now made it clearer from the very start of the manuscript that we focus on mendelian non-infectious disease genes. We have modified the title and the abstract accordingly, specifying mendelian and non-infectious as required.

      The abstract states "Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we don't even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution." This seems to diminish a large body of work that has been done in this area. The authors acknowledge some of this literature in the introduction, but it would be worth toning down the abstract, which suggests there has been no work in this area. A review of this topic by Lluis Quintana-Murci1 was cited, but diminished many of the developments that have been made in the intersection of population genetics and human disease biology. Quintana-Murci says "Mendelian disorders are typically severe, compromising survival and reproduction, and are caused by highly penetrant, rare deleterious mutations. Mendelian disease genes should therefore fit the mutation-selection balance model, with an equilibrium between the rate of mutation and the rate of risk allele removal by purifying selection", and argues that positive selection signals should be rare among Mendelian disease genes. Several other examples come to mind. For example, comparing Mendelian disease genes, complex disease genes, and mouse essential genes was the major focus of a 2008 paper2, which pointed out that Mendelian disease genes exhibited much higher rates of purifying selection while complex disease genes exhibited a mixture of purifying and positive selection. This paper was cited, but only in regard to their findings of complex diseases. A similar analysis of McDonald-Kreitman tables3 was performed around Mendelian disease genes vs non-disease genes, and found "that disease genes have a higher mean probability of negative selection within candidate cis-regulatory regions as compared to non-disease genes, however this trend is only suggestive in EAs, the population where the majority of diseases have likely been characterized". Both of these studies focused on polymorphism and divergence data, which target older instances of selection than iHS and nSL statistics used in the present study (but should have substantial overlap since iHS is not sensitive to very recent selection like the SDS statistic). Regardless, the findings are largely consistent, and I believe warrant a more modest tone.

      We thank the reviewer for their recommendation. We should have written more about what is currently well known or unknown about recent adaptation in disease genes, and in more nuanced terms. Instead of writing “Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we don't even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution”, we now write in the new abstract:

      “Despite our expanding knowledge of gene-disease associations, and despite the medical importance of disease genes, their recent evolution has not been thoroughly studied across diverse human populations. In particular, recent genomic adaptation at disease genes has not been characterized as well as long-term purifying selection and long-term adaptation. Understanding the relationship between disease and adaptation at the gene level in the human genome is hampered by the fact that we don’t know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during the last ~50,000 years of recent human evolution.”

      We also toned down the start of the introduction. We now write P3L74:

      “Despite our expanding knowledge of mendelian disease gene associations, and despite the fact that multiple evolutionary processes might connect disease and genomic adaptation at the gene level, these connections are yet to be studied more thoroughly, especially in the case of recent genomic adaptation.”

      Although we agree that others have made extensive efforts to characterize older adaptation or purifying selection at disease genes compared to non-disease genes, we still believe that our results are novel and more conclusive about recent positive selection. Our initial statement was however poorly phrased. To our knowledge, our study is the first to look at the issue using specifically sweep statistics that have been shown to be robust to background selection, while also controlling for confounding factors. These sweep statistics have sensitivity for selection events that occurred in the past 30,000 or at most 50,000 years of human evolution (Sabeti et al. 2006). This is a very different time scale compared to the millions of years of adaptation (since divergence between humans and chimpanzees) captured by MK approaches.

      We also want to note that we did cite the Blekhman et al. paper for their result of stronger purifying selection in our initial manuscript. It is true however that we did not specify mendelian disease genes, which was confusing. We want to apologize again for it:

      From the earlier manuscript: “Multiple recent studies comparing evolutionary patterns between human disease and non-disease genes have found that disease genes are more constrained and evolve more slowly (lower ratio of nonsynonymous to synonymous substitution rate, dN/dS, in disease genes) (Blekhman et al., 2008; Park et al., 2012; Spataro et al., 2017)”

      “Among other confounding factors, it is particularly important to take into account evolutionary constraint, i.e the level of purifying selection experienced by different genes. A common intuition is that disease genes may exhibit less adaptation because they are more constrained (Blekhman et al., 2008)”

      It is important to remember that, as we mention in the introduction, previous comparisons did not take potential confounding factors at all into account. It is therefore unclear whether their conclusions were specific to disease genes, or due to confounding factors. We have now made this point clearer in the introduction, as we believe that we have made a substantial effort to control for confounding factors, and that it is a substantial departure from previous efforts:

      P7L201: “In contrast with previous studies, we systematically control for a large number of confounding factors when comparing recent adaptation in human mendelian disease and nondisease genes, including evolutionary constraint, mutation rate, recombination rate, the proportion of immune or virus-interacting genes, etc. (please refer to Methods for a full list of the confounding factors included).”.

      P9L253: “These differences between disease and non-disease genes highlight the need to compare disease genes with control non-disease genes with similar levels of selective constraint. To do this and compare sweeps in mendelian disease genes and non-disease genes that are similar in ways other than being associated with mendelian disease (as described in the Results below, Less sweeps at mendelian disease genes), we use sets of control non-disease genes that are built by a bootstrap test to match the disease genes in terms of confounding factors (Methods)”.

      Furthermore, we have now added a comparison of older adaptation in disease and non-disease genes using a recent version of the MK test called ABC-MK, that can take background selection and other biases such as segregating weakly advantageous variants into account. Also controlling for confounding factors, we find no difference in older adaptation between disease and non-disease genes (please see our response to main point 2).

      Therefore, contrary to the reviewer’s claim that the sweep statistics and MK approaches should have substantial overlap, we now show that it is clearly not the case. We further show that the lack of overlap is expected under our explanation of our results based on interference between recessive deleterious and advantageous variants (see our responses to main point 1 and to reviewer 1 weakness 1).

      Previous analyses were using much smaller mendelian disease gene datasets, less recent polymorphism datasets and, critically, did not control for confounding factors. We also note that reference 3 (Torgerson et al. Plos Genetics 2009) does not make any claim about recent positive selection in mendelian disease genes compared to other genes. Their dataset at the time also only included 666 mendelian disease genes, versus the ~4,000 currently known.

      In short, we do think that we have a claim for novelty, but the reviewer is entirely right that we did a poor job of giving due credit to previous important work. These previous studies deserved much better credit than no credit at all. We want to thank the reviewer from avoiding us the embarrassment of not citing important work.

      We now cite the papers referenced by the reviewer as appropriate in the introduction, based on the scope of their results:

      P3L93: “Multiple recent studies comparing evolutionary patterns between human mendelian disease and non-disease genes have found that mendelian disease genes are more constrained and evolve more slowly (Blekhman et al., 2008; Quintana-Murci, 2016; Spataro et al., 2017; Torgerson et al., 2009). An older comparison by Smith and Eyre-Walker (Smith and Eyre-Walker, 2003) found that disease genes evolve faster than non-disease genes, but we note that the sample of disease genes used at the time was very limited.”

      P5L134 “Among possible confounding factors, it is particularly important to take into account evolutionary constraint, i.e the level of purifying selection experienced by different genes. A common intuition is that mendelian disease genes may exhibit less adaptation because they are more constrained (Blekhman et al., 2008; Spataro et al., 2017; Torgerson et al., 2009),”

      There are some aspects of the current study that I think are highly valuable. For example, the authors study most of the 1000 Genomes Project populations (though the text should be edited since the admixed and South Asian populations are not analyzed, so all 26 populations are not included, only the populations from Africa, East Asia, and Europe are analyzed; a total of 15 populations are included Figures 2-3). Comparing populations allows the authors to understand how signatures of selection might be shared vs population-specific. Unfortunately, the signals that the authors find regarding the depletion of positive selection at Mendelian disease genes is almost entirely restricted to African populations. The signal is not significant in East Asia or Europe (Figure 2 clearly shows this). It seems that the mean curve of the fold-enrichment as a function of rank threshold (Figure 3) trends downward in East Asian and European populations, but the sampling variance is so large that the bootstrap confidence intervals overlap 1). The paper should therefore revise the sentence "we find a strong depletion in sweep signals at disease genes, especially in Africa" to "only in Africa". This opens the question of why the authors find the particular pattern they find. The authors do point out that a majority of Mendelian disease genes are likely discovered in European populations, so is it that the genes' functions predate the Out-of-Africa split? They most certainly do. It is possible that the larger long-term effective population size of African populations resulted in stronger purifying selection at Mendelian disease genes compared to European and East Asian populations, where smaller effective population sizes due to the Out-of-Africa Bottleneck diminished the signal of most selective sweeps and hence there is little differentiation between categories of genes, "drift noise"). It is also surprising to note that the authors find selection signatures at all using iHS in African populations while a previous study using the same statistic could not differentiate signals of selection from neutral demographic simulations4.

      We want to thank the reviewer profusely for putting us on the right track thanks to their insightful suggestion. As described in our response to reviewer 1 weakness 1, we have now shown with simulations that the interference of deleterious variants on advantageous variants is strongly decreased during a bottleneck of a magnitude similar to the Out of Africa bottlenecks experienced by East Asian and European populations. This decrease of interference is likely strong enough to not require any other explanation, even if other processes may also be at work, such as a decrease of the sweeps signals as suggested by the reviewer.

      About the Granka et al. paper, the last author of the current manuscript has already shown in a previous paper (ref) that the type of approaches used to quantify recent adaptation is likely to be severely underpowered due to a number of confounding factors, notably including comparing genic and non-genic windows that are not sufficiently far from each other to not overlap the same sweep signals. Our result are also based on much more recent and less biased sets of SNPs used to measure the sweeps statistics.

      The authors find that there is a remarkably (in my view) similar depletion across all but one MeSH disease classes. This suggests that "disease" is likely not the driving factor, but that Mendelian disease genes are a way of identifying where there are strongly selected deleterious variants recurrently arising and preventing positively selected variants. This is a fascinating hypothesis, and is corroborated by the finding that the depletion gets stronger in genes with more Mendelian disease variants. In this sense, the authors are using Mendelian disease genes as a proxy for identifying targets of strong purifying selection, and are therefore not actually studying Mendelian disease genes. The signal could be clearer if the test set is based on the factor that is actually driving the signal.

      Based on the reviewer’s comment, we have now better explained why our results are unlikely to be a generic property of purifying selection alone. As we explain in our response to main point 3, our results cannot be explained by purifying selection alone, because we match purifying selection between disease genes and the controls. Indeed, we now show with additional MK analyses and GERP-based analyses that our controls for confounding factors already account for purifying selection. This is shown by the fact that disease genes and their controls have similar distributions of deleterious fitness effects.

      In addition, we added a comparison that shows that purifying selection alone does not explain our results. Instead of comparing sweeps at disease and non-disease genes, we compared sweeps (in Africa) between the 1,000 genes with the highest density of conserved, constrained elements and other genes in the genome. If purifying selection is the factor that drives the sweep deficit at disease genes, then we should see a sweep deficit among the genes with the most conserved, constrained elements compared to other genes in the genome. However, we see no such sweep deficit at genes with a high density of conserved, selectively constrained elements (boostrap test + block randomization of genomes, FPR=0.18). See P15L424. Note that for this comparison we had to remove the matching of confounding factors corresponding to functional and purifying selection densities (new Methods P40L1131).

      Again, our results are better explained not just by purifying selection alone, but more specifically by the presence of interfering, segregating deleterious variants. It is perfectly possible to have highly constrained parts of the genome without having many deleterious segregating variants at a given time in evolution.

      The similarity across MeSH classes can be readily explained if what matters is interference with deleterious segregating variants. Because all types of diseases have deleterious segregating variants, then it is not surprising that different MeSH disease categories have a similar sweep deficit. We make that point clearer in the revised manuscript:

      P26L707: “The sweep deficit is comparable across MeSH disease classes (Figure 8), suggesting that the evolutionary process at the origin of the sweep deficit is not diseasespecific. This is compatible with a non-disease specific explanation such as recessive deleterious variants interfering with adaptive variants, irrespective of the specific disease type.”.

      One of the most important steps that the authors undertake is to control for possible confounding factors. The authors identify 22 possible confounding factors, and find that several confounding factors have different effects in Mendelian disease genes vs non-disease genes. The authors do a great job of implementing a block-bootstrap approach to control for each of these factors. The authors talk specifically about some of these (e.g. PPI), but not others that are just as strong (e.g. gene length). I am left wondering how interactions among other confounding factors could impact the findings of this paper. I was surprised to see a focus on disease variant number, but not a control for CDS length. As I understand it, gene length is defined as the entire genomic distance between the TSS and TES. Presumably genes with larger coding sequence have more potential for disease variants (though number of disease variants discovered is highly biased toward genes with high interest). CDS length would be helpful to correct for things that pS does not correct for, since pS is a rate (controlling for CDS length) and does not account for the coding footprint (hence pS is similar across gene categories).

      Based on our response to the previous point, it is clear that a high density of coding sequences, or conserved constrained sequence in general are not enough to explain our results. Furthermore, we want to remind the reviewer that we already control for coding sequence length through controlling for coding density, since we use windows of constant sizes.

      The authors point out that it is crucial to get the control set right. This group has spent a lot of time thinking about how to define a control set of genes in several previous papers. But it is not clear if complex disease genes and infectious disease genes are specifically excluded or not. Number of virus interactions was included as a confounding factor, so VIPs were presumably not excluded. It is clear that the control set includes genes not yet associated with Mendelian disease, but the focus is primarily on the distance away from known Mendelian disease genes.

      We are sorry that we were not more explicit from the start of the manuscript. We now make it clearer what the set disease genes includes or not throughout the entire manuscript, by repeating that we focus specifically on mendelian, non-infectious disease genes. By noninfectious, we mean that we excluded genes with known infectious disease-associated variants. This does not exclude most virus-interacting genes since most of them are not associated at the genetic variant level with infectious diseases. It is also important to note that the effect of virus interactions is accounted for by matching the number of interacting viruses between mendelian disease genes and controls.

      We write P29L818: “By non-infectious, we mean that we excluded genes with known infectious disease-associated variants. This does not exclude most VIPs since most of them are not associated at the genetic variant level with infectious diseases. It is important to note that the effect of virus interactions is accounted for by matching the number of interacting viruses between mendelian disease genes and controls.”

      Minor comments:

      On page 13, the authors say "This artifact is also very unlikely due to the fact that recombination rates are similar between disease and non-disease genes (Figure 1)." However, Figure 1 shows that "deCode recombination 50kb" is clearly higher in disease genes and comparable at 500kb. The increased recombination rate locally around disease genes seems to contradict the argument formulated in this paragraph.

      We apologize for the lack of precision in this sentence. What we meant is that the recombination rates are not different enough that the mentioned hypothetical artifact would be able to explain our results. We also forgot to remind at this point in the manuscript that we match recombination between disease genes and controls. We now use more precise language:

      P28L772 “The recombination rate at disease genes is also only slightly different from the recombination rate at non-disease genes (Figure 1), and we match the recombination rate between disease genes and controls.”.

      Reviewer #3 (Public Review):

      In this paper, the authors ask whether selective sweeps (as measured by the iHS and nSL statistics) are more or less likely to occur in or near genes associated with Mendelian diseases ("disease genes") than those that are not ("non-disease genes"). The main result put forward by the authors is that genes associated with Mendelian diseases are depleted for sweep signatures, as measured by the iHS and nSL statistics, relative to those which are not.

      The evidence for this comes from an empirical randomization scheme to assess whether genes with signatures of a selective sweep are more likely to be Mendelian disease genes that not. The analysis relies on a somewhat complicated sliding threshold scheme that effectively acts to incorporate evidence from both genes with very large iHS/nSL values, as well as those with weaker signals, while upweighting the signal from those genes with the strongest iHS/nSL values. Although I think the anlaysis could be presented more clearly, it does seem like a better analysis than a simple outlier test, if for no other reason than that the sliding threshold scheme can be seen as a way of averaging over uncertainty in where one should set the threshold in an outlier test (along with some further averaging across the two different sweeps statistics, and the size of the window around disease associated genes that the sweep statistics are averaged over). That said, the particular approach to doing so is somewhat arbitrary, but it's not clear that there's a good way to avoid that.

      In addition to reporting that extreme values of iHS/nSL are generally less likely at Mendelian disease genes, the authors also report that this depletion is strongest in genes from low recombination regions, or which have >5 specific variants associated with disease.

      Drawing on this result, the authors read this evidence to imply that sweeps are generally impeded or slowed in the vicinity of genes associated with Mendelian diseases due to linkage to recessive deleterious variants, which hitchhike to high enough frequencies that the selection against homozygotes becomes an important form of interference. This phenomenon was theoretically characterized by Assaf et al 2015, who the authors point to for support. That such a phenomenon may be acting systematically to shape the process of adaptation is an interesting suggestions. It's a bit unclear to me why the authors specifically invoke recessive deleterious mutations as an explanation though. Presumably any form of interference could create the patterns they observe? This part of the paper is, as the authors acknowledge, speculative at this point.

      We thank the reviewer for their comments. We are sorry that we did not provide a clear explanation of why only recessive deleterious mutations are expected to interfere more than other types of deleterious variants. This was shown by Assaf et al. (2015), and we should have stated it explicitly. The reason why recessive deleterious variants interfere more than additive or dominant ones is that they can hitchhike together with an adaptive variant to substantial frequencies before negative selection actually happens, when a significant number of homozygous individuals for the deleterious mutation start happening in the population. On the contrary dominant mutations do not make it to the same high frequencies linked to an adaptive variant, because they start being selected negatively as soon as they appear in the population.

      We now write P18L496: “In diploid species including humans, recessive deleterious mutations specifically have been shown to have the ability to slow down, or even stop the frequency increase of advantageous mutations that they are linked with (Assaf et al., 2015). Dominant variants do not have the same interfering ability, because they do not increase in frequency in linkage with advantageous variants as much as recessive deleterious do, before the latter can be “seen” by purifying selection when enough homozygous individuals emerge in a population (Assaf et al., 2015).”

      We have also confirmed with SLiM forward simulations that recessive deleterious variants interfere with adaptive variants much more than dominant ones (Table 1).

      I'm also a bit concerned by the fact that the signal is only present in the African samples studied. The authors suggest that this is simply due to stronger drift in the history of European and Asian samples. This could be, but as a reader it's a bit frustrating to have to take this on faith.

      We thank the reviewer for pointing out this issue with our manuscript. We have now shown, as detailed above in our response to main point 1, reviewer 1 weakness 1, that a weaker sweep deficit at disease genes in Europe and East Asia is an expected feature under the interference explanation, due to the weakened interference of recessive deleterious variants during bottlenecks of the magnitude observed in Europe and East Asia. We therefore believe that these new results strengthen our previous claim regarding the role interference between deleterious and advantageous variants. We want to thank the reviewer for forcing us to examine the difference between results in Africa and out of Africa, as the manuscript is now more consistent and our results substantially better explained.

      There are other analyses that I don't find terribly convincing. For example, one of the anlayses shows that iHS signals are no less depleted at genes associated with >5 diseases than with 1 does little to convince me of anything. It's not particularly clear that # of associated disease for a given gene should predict the degree of pleiotropy experienced by a variant emerging in that gene with some kind of adaptive function. Failure to find any association here might just mean that this is not a particularly good measure of the relevant pleiotropy.

      We agree with the reviewer that the number of associated disease may not be a good measure of pleiotropy. Unfortunately to our knowledge there is currently no good measure of gene pleiotropy in human genomes. Given that the evidence in favor of interference of deleterious variants is now strengthened, we have chosen to remove this analysis from the manuscript. As we now explain throughout the manuscript, pleiotropy is an unlikely explanation in the first place because of the fact that disease genes have not experienced less long-term adaptation (see the details on our new MK test results in the response to main point 2).

      P16L447: “We find that overall, disease and control non-disease genes have experienced similar rates of protein adaptation during millions of years of human evolution, as shown by very similar estimated proportions of amino acid changes that were adaptive (Figure 5A,B,C,D,E). This result suggests that disease genes do not have constitutively less adaptive mutations. This implies that processes stable over evolutionary time such as pleiotropy, or a tendency to overshoot the fitness optimum, are unlikely to explain the sweep deficit at disease genes.”.

      A last parting thought is that it's not clear to me that the authors have excluded the hypothesis that adaptive variants simply arise less often near genes associated with disease. The fact that the signal is strongest in regions of low recombination is meant to be evidence in favor of selective interference as the explanation, but it is also the regime in which sweeps should be easiest to detect, so it may be just that the analysis is best powered to detect a difference in sweep initiation, independent of possible interference dynamics, in that regime.

      We thank the reviewer for stating these important alternative explanations that needed more attention in our manuscript. In our response to main point 2 above, we explain that higher statistical power in low recombination regions is unlikely to explain our results alone, because we also show that the sweep deficit is substantially present not only in low recombination regions, but also requires the presence of a higher number of disease variants. We also describe in our response to main point 2 how our new MK-test results on long-term adaptation make it very unlikely that mendelian disease genes experience constitutively less adaptation. We want to thank the reviewer again for pointing out this issue with our manuscript, since it was indeed an important missing piece.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses of this paper include:

      1) The authors conclude that reducing NHE6 clears plaques by activating resident microglia, shifting them from a dormant state to a damage-associated activated state that phagocytoses Abeta plaques. However, there is no data presented to demonstrate this. In a supplemental figure, the authors show there are more Iba1-expressing microglia and GFAP-expressing astrocytes in APP mice and in APP/ApoE4KI mice in which NHE6 has been ablated, but this does not prove that this is the mechanism by which plaques are cleared.

      We apologize for the overstatement. We agree, we have not evaluated whether NHE6 depletion causes a signature of damage-associated microglia. Thus, we have removed this comment from the manuscript (abstract).

      2) The mechanisms underlying the increase in Iba1 and GFAP are not clear. The authors cite a previous paper from another group that demonstrated in their own NHE6 KO mice, there was an increase in GFAP and in activated microglia expressing CD68, which may relate to the cell loss in hippocampus and other brain regions documented in those mice. However, in the current study, the authors indicate that in their NHE6 KO lines, there is no overt cell loss. It is therefore unclear how reductions in NHE6 expression led to microglial/astrocyte activation. This is an important point to work out, since the authors conclude that it is microglial activation that is responsible for the reduction in Abeta plaques.

      We agree that identifying the mechanism how NHE6 depletion causes glial activation is crucial. We and others show that germline NHE6 depletion causes glial activation. Moreover, our current data suggest that genetic deletion of NHE6 in both germline and from adulthood on causes glial activation. Neuronal cell loss is a potential explanation for glial activation. We found cerebellar Purkinje cell loss in both of our NHE6 mutant lines. As stated in the old version of our manuscript we find “Normal Gross Anatomical Brain Structure in Both NHE6-KO and NHE6cKO Mice” (Supplementary Figure S3). To address whether neuronal loss occurs in the hippocampus or cortex, as described for NHE6-KO mice in Xu et al., 2017, we measured neuronal loss in both NHE6 mutant lines. Comparable to Xu et al., in the NHE6-KO line we detect a reduction in total brain area, HC area, cortical thickness and CA1 thickness. By contrast, in our NHE6-cKO;APP-KI;ApoE4-KI mice we do not observe any neuronal loss when compared to NHE6-floxed,APPNL-F,ApoE4-KI littermate controls; however, we detect similar glial activation in the NHE6-cKO;APPNL-F;ApoE4-KI mice as compared to the germline NHE6KO,APPNL-F mice. This suggests that the neuronal loss in the germline NHE6-KO model does not mediate glial activation. Lastly, we have removed the statement that the microglial activation is the reason why we detect Aβ reduction and included a discussion of our new findings.

      3) What might be some of the underlying explanations be for the differences between the published NHE6-KO mice, which has fairly widespread cell loss, and the current KO mice generated in this paper, which did not exhibit noticeable cell loss in brain regions other than the cerebellum?

      Our previous manuscript stated that there are no gross anatomical abnormalities in the NHE6KO mice. However, we appreciate the reviewer’s concerns as it prompted us to analyze neuronal loss in NHE6-KO versus NHE6-cKO mice. Besides Purkinje cell loss in both lines, and as stated above, we do detect cell loss in the hippocampus and cortex in our germline NHE6KO mouse model, but not in the tamoxifen induced NHE6-cKO mice.

      4) There are a number of mechanistic links that have not been worked out, as indicated above. Until these links are identified and characterized, a number of the conclusions drawn by the authors are not yet supported.

      We thank the reviewer for the constructive feedback. We have removed these conclusions.

      Reviewer #3 (Public Review):

      1) The leading hypothesis of this work is that APOE4 impairs synapse function through prolonged association with endosomes, thereby making brain cells vulnerable to AD-related pathological changes. However, the positive effects of NHE6 in a mouse model of Aβ accumulation occurs regardless of APOE4. This suggests that NHE6 may contribute to pathology by mechanisms other than APOE4-mediated retention of endosomal trafficking.

      We agree with the reviewer that NHE6 depletion plays a protective role in AD both by protecting against synaptic impairments in ApoE4-KI mice and Aβ toxicity in an Aβ overproducing mouse model. This may reflect a beneficial effect of endosomal compartment acidification through NHE6 depletion. Our current work and studies by others (Fagan, A.M., et al., Neurobio. of Dis. 2002) show that human Aβ-overproducing ApoE4-KI mice generate plaques at a much later age than mice with wildtype, mouse ApoE, but the mechanism is unknown. Since both of our mouse models, NHE6-KO and NHE6-cKO;ApoE4-KI show a comparable reduction in plaque load, this might be the result of a maximally accelerated early endosomal maturation and cargo transport in the absence of NHE6. We elaborated on this in topic in the discussion of our manuscript.

      2) With the current data, it is not possible to exclude possible nonspecific effects resulting from NHE6 genetic deletion. Additional experiments to measure the endosomal pH would add support to the hypothesis.

      We agree with the reviewer’s concern and addressed this in the discussion accordingly.

      3) The authors attribute reduced amyloid plaque load in NHE6-deficient APP KI mice to increased glial responses, which would promote plaque clearance. This is a very interesting hypothesis, but it is not supported by the experimental data reported in Supplemental Figure 6. Additional experimentation is needed to more thoroughly characterize astrocytic and microglial phenotypes caused by NHE6 genetic depletion in APP KI mice. Functional assays, including cytokine release, nitric oxide production (Griess reaction), and Aβ uptake experiments would be desired to strengthen these conclusions.

      We thank the reviewer for this valuable feedback. In our revised manuscript, we evaluated whether there is a change of microglial Ab content in the NHE6 depletion mouse model. We also quantified the immunoreactivity of Iba1 and GFAP in plaque areas. We found no change between NHE6-KO or control littermate APPNL-F controls when we co-stained with Aβ and Iba1 (microglia) or GFAP (astrocyte). However, when considering a massively reduced Aβ signal in NHE6-KO brains overall, yet the proportion of microglia containing Aβ is comparable to control, this indirectly indicates that NHE6 deficient microglia are more efficient in Aβ uptake and degradation. We agree with the reviewers that future studies will be required to evaluate Aβ uptake in primary microglia derived from NHE6-KO mice to properly conclude that the reduction of Aβ is mediated by enhance glial activation. Thus, we have adjusted our conclusions in the manuscript accordingly.

      4) The authors demonstrate that global or conditional NHE6 deletion causes severe Purkinje cell loss in the mouse cerebellum (Figure 2). Although the authors included representative images of H&E staining indicating no gross histological abnormalities (Supplemental Figure 3), a more detailed investigation is required to assess neuronal survival in the hippocampus and cortex upon NHE6 suppression, given the relevance of these regions to AD pathology. Indeed, previous evidence (Xu et al., eNeuro, 2017) showed that NHE6 depletion leads to significant cortical and hippocampal atrophy, in addition to the cerebellum. Could the reductions in plaque load in NHE6 depleted mice (Figure 5, 6; Supplemental Figure 5) be somehow a reflection of neuronal loss? It is important that the authors discuss this issue in the manuscript.

      We thank the reviewer for this suggestion. We have now measured brain area, hippocampal area, cortical and CA1 thickness. Comparable to Xu et al., we detect a reduction in total brain area, HC area, cortical thickness and CA1 thickness. Contrary, in our NHE6-cKO;APPNFL;ApoE4-KI mice we do not see any neuronal loss; however, we detect similar plaque reduction and glial activation in the NHE6-KO;APPNL-F mice. These findings suggest that the neuronal loss does not mediate the reduction in plaque load or glial activation. We discussed our findings in our manuscript accordingly.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, Bai et al. investigate in experiments and simulations how cohesion is maintained in chemotactic travelling waves of bacteria. These waves emerge from the bacterial population consuming an attractant, thus carving a gradient which they follow chemotactically. This paper builds up on previous work of some of the authors (Fu et al, Nat Commun 2018), which found that in these waves bacteria with varying degree of chemotactic sensitivity organize spatially in the band, which allows for its cohesiveness despite varying phenotypes. The authors investigate here an additional element for the cohesiveness of the wave: because the sharpness of the gradient increases from the front to the back of the wave, 'late' cells catch up via a stronger chemotactic response, and front cells slow down via a weaker one. This had been already postulated in earlier work on the phenomenon (Saragosti et al. PNAS 2011), but here the authors investigate how this applies to cells with varying chemotactic sensitivity. They also performed agent-based simulations of the cells behavior in the gradient and developed a model of the motion in the gradient. The latter maps the spatial dependence of the gradient steepness onto an effective travelling potential which keeps the cells together in a group as the gradient and the wave propagate. Importantly, the effective potential is predicted to be tighter for cells with higher chemotactic sensitivity, in agreement with the cell behavior they observe in experiments where the chemotactic sensitivity is artificially modulated. This suggests that weakly chemotactic cells are more weakly bound to the group and have a higher chance of being left behind. This last part is interesting in the context of range extension in semi-solid agar, where bacteria are known to be spatially organized and selected according to their chemotactic motility (Ni et al, Cell reports 2017, Liu et al Nature 2019)

      This paper builds its strengths on the extensive experimental characterization of the system and a variety of modeling approaches and makes a fairly convincing case for the way of understanding the mechanism of cohesion maintenance they propose.

      In fact, we have addressed both the mechanism to maintain a coherent group and also the mechanism to form ordered pattern of diverse phenotypes. Thanks to the reviewer, we noticed that the second point was not clearly showed out in our previous version. So that we have largely rewritten the texts and reorganized the results to prominent both mechanism.

      From a methodological perspective, only a few points need to be addressed:

      Control experiments need to quantify the cell-to-cell variability of the induction level of Tar by tetracycline.

      The distributions of the titrate cells are presented by a ptet-Tar-GFP strain, where the GFP is used as a reporter of the expressed Tar protein. The results are shown below:

      Chemical attraction to cues released by other cells is a well-documented way to create cohesive large scale structures in E. coli (Budrene & Berg Nature 1995, Park et al PNAS 2003, Jani et al Microbiology 2017, Laganenka et al Nat commun 2016). The cohesion of the wave have never been analyzed in this optic, despite being a possible alternative explanation to the gradient shape. Since the authors main claim is about the wave cohesion, they should provide evidence that such an explanation can be ruled out or considered secondary.

      We thank the reviewer to point out the self-attractant secretion as a possible mechanism to maintain coherent group. We argue that this mechanism is not necessary for the chemotactic group to maintain coherency, because the migration group keeps without considering these effect in our agent based simulations.

      Moreover, as suggested by the reviewer, we Used a Tar only strain, which do not sense any chemo-attractant other than aspartate, to show that the migration group maintained coherent (see Fig S9). This experiment showed that the secretion of self-attractant is not essential for the coherent group migration.

      Possible effects of physical interactions between cells on the chemotactic response are not accounted for. The consequences should be better discussed, because they are known to influence chemotactic motility at the densities encountered in the present experiments (Colin et al Nat commun 2019).

      As being reported by Colin et al., the effective drift velocity and the chemotactic ability deceases when cells are condensed (volume fraction >0.01). However, the cell density is smaller than this critical value (volume fraction<0.01).

      Additionally, the paper could better emphasize the new results and separate them from the confirmations of previous results.

      In the revised version, we addressed 2 new findings:

      1) The individual drift velocity decreases from back to front of the bacterial migration group, which makes the chemotactic migration wave a pushed wave.

      2) Cells of diversed phenotypes follows the same reversion behavior, ie. drift faster in the back and slower in the front, but with ordered mean positions, to achieve the ordered pattern in the migration group.

      Reviewer #2 (Public Review):

      The manuscript by Bai et al. explores the single-cell motility dynamics within a chemotactic soliton wave in E. coli. They tracked individual cells and measured their trajectory speed and orientation distributions behind and ahead of the wave. They showed cells behind the wave were moving in a more directed fashion towards the center of the wave compared to cells ahead of the wave. This behavior explains the stability of group migration, as confirmed by numerical simulations.

      I do not recommend this manuscript for publication in eLife since it basically reproduces and deepens previous published works. In particular, Saragosti et al (2011) already provided exactly what the authors claim to do here : "How individuals with phenotypic and behavioral variations manage to maintain the consistent group performance and determine their relative positions in the group is still a mystery." (Line 75-77) (See the last sentences from Saragosti et al : "This modulation of the reorientations significantly improves the efficiency of the collective migration. Moreover, these two quantities are spatially modulated along the concentration profile. We recover quantitatively these microscopic and macroscopic observations with a dedicated kinetic model.")

      Saragosti et al.talks about the modulation of reorientation angle of bacteria along directions. It is not equal to the spatial modulation of drift velocities along space. They claim that cells moving along the gradient direction reorient less during a tumble than cells moving against the gradient. This phenomenon increases the migration efficiency of the group. Here, in our paper, we claim that the drift velocity of bacteria is spatially modulated, where cells on the back drifts faster while the cells in the front drift slower. This phenomenon is important because it makes the chemotactic migration front a pushed wave, that helps the group to keep diversed phenotypes.

      Although Saragosti et al. Have also suggested spatial modulation of bias in run length to explain the coherency of the migration group. But they did not quantify such bias nor did they explain the causes and consequences of the spatial modulation. More over, Their model, consisting their proposed mechanism of directional persistence, can not explain their observed phenomenon of the decreasing bias of run length (see their figure 4A and C).In this circumstance, we can’t agree that they already proofed how cells with diversed phenotype to maintain coherent group.

      Moreover, they did not talk about diversities in the group.

      What is novel here is the titration of the behavior with chemo-receptor abundance, but I believe the scope is not wide enough for publication in eLife. I suggest the authors to submit in a more specialized journal.

      The titration of the chemo-receptor abundance of bacteria serves as a tool to explain how diverse individuals manage to form the ordered patterns in a group. This question worth several discussion because diversity is known as an important feature to keep a group to survive. The ordered pattern was found the key for a migrating group to keep the diversity while performing consistent migration speed. In this paper we successfully explained how individuals performing biased random walk are able to form ordered structure.

      Reviewer #3 (Public Review):

      The authors present a study on the collective behaviour of E.coli during migration in a self-generated gradient. Taking into account phenotypic variation within a biological population, they performed experiments and complemented the study with a predictive model used for simulation to understand how bacteria can move as a group and how the individual bacterium defines its own position within the group.

      They observed experimentally that phenotype variation within the bacterial population causes a spatial distribution within the chemotactic band that is not continuous but formed by subpopulations with specific properties such as run length, run duration, angular distribution of trajectories, drift velocity. They attribute this behaviour to the chemotaxis ability, which varies between phenotypes and defines a potential well that anchors each bacterium in its own group. This was proven by the subdiffusive dynamics of the bacteria in each subgroup. Many cases were studied in the experiments and the authors present many controls to clearly demonstrate their hypothesis.

      These are interesting results that prove how a discretised distribution can produce continuous collective behaviour. It presents also an interesting example in the field of active matter about collective behaviour on a large scale that is generated by a different behaviour of individuals on a much smaller scale. However, it is not clear how the subpopulations can be held together in the group.

      The decreasing chemo-attractant gradient makes the migration wavefront a pushed wavefront. So that the balanced position of the subpopulation with larger chemotactic ability is located in the front where the gradient is small. So that diverse phenotypes form ordered pattern to achieve identical migration speed on their balanced positions. This discussion was added in the revised text (see line 268-277).

      Moreover, a link between bacterial dynamics and the biological necessary mechanism is not clear.

      The bacterial individual dynamics is controlled by the bacterial chemotaxis pathway, which is clear according to previous studies. Basically, the biased random motion was controlled by alternating expected run length through a temporal comparison mechanism between received chemo-attractant concentrations.(Jiang et al. 2010 Plos Comp. Biol.)

      They formulate a theoretical description based on the classical Keller-Segel model. Langevin dynamics was used to describe bacterial activity in terms of drift velocity for simulation, which agrees very well with experimental observations.

      One can appreciate the interesting results of the study describing Ecoli chemotaxis as a mean-reversion process with an associated potential, but it is not clear to what extent the results can be generalised to all bacteria or rather relate to the strain the authors investigated.

      The mean reversion process is a result of decreasing drift velocity (or a pushed wave). Although our study focuses on bacterail chemotaxis migration, but the ordering mechanism of diversed phenotypes follows a OU type model, which is not limited to bacterial chemotaxis. In this case, we argue that the ordering mechanism that we proposed is universal to all active particles that generate signals as a global cue of collective motion.

    1. Author Response

      Reviewer #2 (Public Review):

      (1) Much of the cited literature that is used to make the case for their hypothesis is very old and actually refers to active HIV infection and patient studies prior to ART. Also, the literature they cite regarding the role of H2S as an antimicrobial agent seem to be limited to tuberculosis infection.

      We have revised the list of literature and included more relevant references post- ART era. Recently, the antimicrobial role of H2S is comprehensively examined in the context of tuberculosis. Given the close association of TB with HIV, we thought our study is very timely and essential. However, we would like to point out that the references showing the effect of H2S on infection caused by respiratory viruses are included in the manuscript (7-9). Further, recent findings showing the influence of H2S in the context of SARS-CoV2 infection are also included in the revised manuscript

      (2) The choice of the latently infected model cell lines is rather unfortunate. There are much better defined models out there these days than J1.1 or U1 cells, such as the J-LAT cells from the Verdin lab or the various reporter cell lines generated by Levy and co-workers. In particularly, U1 cells should not be considered as latently infected, as the virus has a defect in the Tat/TAR axis and is mostly just transcriptionally attenuated. It is unclear why the authors only use J-LAT cells for one of the last experiments

      As suggested by the reviewer, we have generated new data using J-LAT cells in the revised manuscript. First, we confirmed that PMA-mediated HIV-1 reactivation in J-LAT cells is associated with the down-regulation of cbs, cth, and mpst transcripts (Figure 1-figure supplement 1C-D in the revised manuscript). Additionally, we have performed several other mechanistic experiments in J-LAT cells to validate the data generated in U1 (see below response to # 3).

      (3) It is further unclear why the authors perform most of the experiments using U1 cells, which are considered promonocytic, but in the end seek to demonstrate the influence of H2S on latent HIV-1 infection in CD4 T cells. Performing all experiments in J1.1 or better J-LAT cells would have seemed more intuitive.

      The choice of U1 was based on our earlier studies showing that U1 cells uniformly recapitulate the association of redox-based mechanisms and mitochondrial bioenergetics with HIV-latency and reactivation (10-12). We have validated key findings of U1 cells in J1.1 and J-Lat cell lines. We genetically and chemically silenced the expression of CTH in J-Lat cells and examined the effect on HIV-1 reactivation. Consistent with U1 and J1.1, genetic silencing of CTH using CTH-specific shRNA (shCTH) reactivated HIV-1 in J-Lat (Figure 2-figure supplement 1F-G in the revised manuscript). Supporting this, pre-treatment of J-Lat with non-toxic concentrations of a well-established CTH inhibitor, propargylglycine (PAG) further stimulated PMA-induced HIV-1 reactivation (Figure 2-figure supplement 1H-I in the revised manuscript). Altogether, using various cell line models of HIV-1 latency, we confirmed that endogenous H2S biogenesis counteracts HIV-1 reactivation.

      (4) The authors suggest that H2S production would control latent HIV-1 infection and reactivation. Regarding the idea that CBS, CTH or possibly MPST would control latent infection as a function of their ability to produce H2S from different sources, there are several questions. First, if H2S is the primary factor, why would the presence of e.g. MPST not compensate for the reduction of CTH? Second, why would J1.1 and U1 cells both host latent HIV-1 infection events, however, their CBS/CTH/MPST composition is completely different? Third, natural variations in CTH expression caused by culture over time are larger than variations caused by PMA activation.

      These questions are important and complex. CBS, CTH, and MPST produce H2S in the sulfur network. CBS and CTH reside in the cytoplasm, whereas MPST is mainly involved in cysteine catabolism and is mitochondrial localized. The lack of compensation of CTH by MPST could be due to the compartmentalization of their activities. Furthermore, CTH and CBS activities are regulated by diverse metabolites, including heme, S-adenosyl methionine (SAM), and nitric oxide/carbon monoxide (NO/CO). In contrast, MPST activity responds to cysteine availability. How substrates/cofactors availability and enzyme choices are regulated in the cellular milieu of J1.1 and U1 is an interesting question for future experimentation.

      Moreover, the tissue-specific expression/activity of CBS and CTH dictates their relative contributions in H2S biogenesis and cellular physiology (13). Some of these factors are likely responsible for differential expression of CBS, CTH, and MPST in J1.1 and U1 cells. Regardless of these concerns, viral reactivation uniformly reduces the expression of CTH in U1, J1.1, and J-Lat. While we cannot completely rule out natural variations in CTH expression over prolonged culturing, in our experimental setup CTH remained stably expressed and consistently showed down-regulation upon PMA treatment as compared to untreated conditions.

      (5) Also, the statement that H2S production as exerted per loss of CTH would control reactivation is not supported by the kinetic data. In latently HIV-1 infected T cell lines or monocytic cell lines, PMA-mediated HIV-1 reactivation at the protein level is usually almost complete after 24 hours, but at this time point the difference between e.g. CTH levels only begins to appear in U1 cells. The data for J1.1. are even less convincing.

      We have performed the kinetics of p24 production and CTH in U1 cells. We showed that the levels of p24 gradually increased from 6 h and kept on increasing till the last time point, i.e., 36 h post-PMA-treatment (Fig. 2D in the revised manuscript). The p24 ELISA detected a similar kinetics of p24 increase in the cell supernatant (Fig. 2E in the revised manuscript). The CTH levels show reduction at 24 h and 36 h. Based on these data, we report that HIV-1 reactivation is associated with diminished biogenesis of endogenous H2S. We have not made any claims that depletion of CTH precedes HIV reactivation. However, our CTH knockdown data clearly showed that diminished expression of CTH reactivates HIV-1 in the absence of PMA, which is consistent with our hypothesis that H2S production is likely to be a critical host component for maintaining viral latency.

      (6) Figure 2F. PMA is known to induce an oxidative stress response, however, in the experiments the data suggest that PMA results in a downregulated oxidative stress response. Maybe the authors could explain this discrepancy with the literature. In fact, both shRNA transductions, scr and CTH-specific seem to result in a lower PMA response.

      In our experiment, PMA treatment for 24 h results in down-regulation of oxidative stress genes. However, the effect of PMA on the oxidative stress responsive genes is time-dependent. In our earlier publication, we showed that 12 h PMA treatment induces oxidative stress responsive genes in U1 cells (12), whereas at 24 h, the expression of genes is down-regulated (10). Genetic silencing of CTH resulted in elevated mitochondrial ROS and GSH imbalance, which is in line with a further decrease in the expression of oxidative stress responsive genes as compared to PMA alone. As a consequence, PMA-treatment of U1-shCTH induced HIV-1 reactivation, which supersedes that stimulated by PMA or shCTH alone.

      (7) Given that the others in subsequent experiments use GYY4137, which is supposed to mimic the increased release of H2S, the authors should have definitely included experiments in which they would overexpress CTH, e.g. by retroviral transduction. Specifically in U1 cells, which seemingly do not express CBS, overexpression of CBS should also result in a suppressed phenotype

      We have explored the role of elevated H2S levels using GY44137. Treatment with GYY4137 suppressed HIV reactivation in multiple cell lines and primary CD4+ T cells. As suggested by the reviewer, overexpression of CTH could be another strategy to validate these findings. However, since the transsulfuration pathway and active methyl cycle are interconnected and share metabolic intermediates (e.g., homocysteine), overexpression of CTH could disturb this balance and may lead to metabolic paralysis. Owing to these potential limitations, we used a slow releasing H2S donor (GYY4137) to chemically complement CTH deficiency during HIV reactivation. We thank the reviewer for this comment.

      (8) Figure 4F: The authors need to explain how they can measure a 4-fold gag RNA expression change in untreated cells. Also, according to Figure 4A, 300 µM GYY produces much less H2S than 5mM, yet the suppressive effect of 300 µM GYY is much higher?

      The four-fold-expression in untreated cells is likely due to leaky control of viral transcription in J1.1 cells (14-16). However, to avoid confusion, we have replotted the results by normalizing the data generated upon PMA mediated HIV reactivation with the PMA untreated cells in the revised manuscript (Figure 4F in the revised manuscript). The suppressive effect of GYY4137 at the lower concentration is intriguing but consistent with the findings that high and low concentrations of H2S have profound and distinct effects on cellular physiology (3,17). One possibility is that the high concentration of H2S induces mitochondrial sulfide oxidation pathway to avert toxicity. This might modulate mitochondrial activity and ROS, resulting in the suppression of GYY4137 effect. Consistent with this, higher concentrations of H2S have been shown to cause pro-oxidant effects, DNA damage and genotoxicity (3,18). We have discussed these possibilities in the revised manuscript

      (9) Initially, the authors argue "that the depletion of CTH could contribute to redox imbalance and mitochondrial dysfunction to promote HIV-1 reactivation"(p. 9). Less CTH would suggest less produced H2S. However, later on in the manuscript they demonstrate that addition of a H2S source (GYY4137) results in the suppression of HIV-1 replication and supposedly HIV-1 reactivation. This is somewhat confusing.

      We show that depletion of endogenous H2S by diminished expression of CTH (U1-shCTH) resulted in higher mitochondrial ROS and GSH/GSSG imbalance. Both of these alterations are known to reactivate HIV-1 and promote replication (10,11,19). The addition of GYY4137 chemically compensated for the diminished expression of CTH, and prevented HIV-1 reactivation in U1-shCTH. These events are expected to suppress HIV-1 replication and reactivation. We have made this distinction clear in the revised manuscript.

      (10) CTH, or for that matter CBS or MPST do not only produce H2S, however, they also are part of other metabolic pathways. It would have been interesting and important to study how these metabolic pathways were affected by the genetic manipulations and also how the increased presence of H2S (GYY4137) would affect the metabolic activity of these enzymes or their expression.

      We fully agree with the reviewer. In fact, our NanoString data show that upon CTH knockdown (U1-shCTH), MPST levels were down-regulated and CBS remained undetectable (Fig. 2F in the revised manuscript). Additionally, GYY4137 treatment induced the expression of CTH but not MPST upon PMA addition (Fig. 5A in the revised manuscript). We have incorporated these findings in the revised manuscript. Given that CBS and CTH catalyzed at least eight H2S generating steps and two cysteine-producing reactions, the modulation of CTH by HIV is likely to have a widespread influence on transsulfuration pathway and active methyl cycle intermediates. Our future strategies are to generate a comprehensive understanding of sulfur metabolism underlying HIV latency and reactivation. These experiments require multiple biochemical and genetic technologies with appropriate controls. We hope that the reviewer would agree with our views that these experiments should be a part of future investigation. We thank the reviewer for this comment.

      (11) H2S has been reported to cause NFkB inhibition by sulfhydration of p65; as such, the findings here are not particularly novel or surprising. Also, H2S induced sulfhydration is rather not targeted to a specific protein, let alone a HIV protein, making this approach a very unlikely alternative to current ART forms.

      We believe that NF-kB inhibition is not the only mechanism by which H2S exerts its influence on HIV latency. Recent studies point towards the importance of the Nrf2-Keap1 axis in sustaining HIV-latency (20). Our data suggest an important role for Nrf2-Keap1 signaling in mediating the influence of H2S on HIV latency. Additionally, recruitment of an epigenetic silencer YY1 is also affected by H2S. Interestingly, YY1 activity is modulated by redox signaling (21), suggesting H2S could be an important regulator of YY1 activity in HIV-infected cells. We have so far, no evidence for viral proteins targeted by H2S. However, experiments to examine global S-persulfidation of host and HIV protein are ongoing in the laboratory to fill this knowledge gap. Lastly, our findings raise the possibility of exploring H2S donors with the current ART (not as an alternate to ART) for reducing virus reactivation. We have tone down the clinical relevance of our findings.

      (12) The description of the primary T cell model used to generate the data in Figure 6 is slightly misleading. Also, the idea of this model was originally to demonstrate that "block and lock" by didehydro-cortistatin is possible. In this application, the authors did not investigate whether GYY4137 would actually induce a HIV "block and lock" over an extended period of time.

      As suggested by the reviewer, we have cited the didehydro-cortistatin studies as the basis of our strategy. Our idea was to adapt the primary T cell model to begin understanding the role of H2S in blocking HIV rebound. Our results indicate the future possibility of investigating GYY4137 to lock HIV in deep latency for an extended period of time. However, comprehensive investigation would require long-term experiments and samples from multiple HIV subjects. In the current pandemic times with overburdened Indian clinical settings, we cannot plan these experiments. However, we hope our data form a solid foundation for HIV researchers to perform extended “block and lock” studies using H2S donors.

      (13) However, the authors never provide evidence that endogenous H2S is altered in latently HIV-1 infected cells (which may actually be an impossible task). By the end of the manuscript, the authors have not provided clear evidence that the effects of e.g. CTH deletion would be mediated by the production of H2S, and not by another function of the enzyme. Similarly, the inability of stimuli to trigger efficient HIV-1 reactivation following the provision of unnaturally high levels of H2S is not surprising given reports on the effect of GYY4137 as anti-inflammatory agent and suppressor NF-kB activation. Unless the authors were to demonstrate a true "block and lock" effect by GYY4137 the data will likely have limited impact on the HIV cure field.

      It's difficult to measure H2S levels in the latently infected primary cells due to the assay's sensitivity and the insufficient number of cells latently infected with HIV-1. However, in the revised manuscript we have clearly shown that cysteine levels are not affected by CTH depletion and cysteine deprivation does not reactivate HIV-1. These results indicate that the effects of CTH depletion are likely mediated by H2S. This is consistent with our data showing that GYY4137 specifically complement CTH deficiency and blocks HIV-1 reactivation in U1-shCTH. Further, we carried in-depth investigation to show that the effect of GYY4137 is not due to impaired activation of CD4+ T cells.

      Lastly, since CTH catalyzed multiple reactions during H2S production, we cannot rule out the effect of other metabolites in this process. However, we think that this is outside the scope of the present study. Our study focuses on understanding of how H2S modulates redox, mitochondrial bioenergetics, and gene expression in the context of HIV latency. These understandings are likely to positively impact future studies exploring the role of H2S on HIV cure.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to establish a standardized quantitative approach to categorize the activity patterns in a central pattern generator (specifically, the well-studied pyloric circuit in C. borealis). While it is easy to describe these patterns under "normal" conditions, this circuit displays a wide range of irregular behaviors under experimental perturbations. Characterizing and cataloguing these irregular behaviors is of interest to understand how the network avoids these dysfunctional patterns under "normal" circumstances.

      The authors draw upon established machine learning tools to approach this problem. To do so, they must define a set of features that describe circuit activity at a moment in time. They use the distribution of inter-spike-intervals ISIs and spike phases of the LP and PD neuron as these features. As the authors mention in their Discussion section, these features are highly specialized and adapted to this particular circuit. This limits the applicability of their approach to other circuits with neurons that are unidentifiable or very large in number (the number of spike phase statistics grows quadratically with the number of neurons).

      We agree with the reviewer that the size of the feature vectors as described grows quadratically with the number of neurons. The feature sets we describe are most suited for “identified” neurons – neurons whose identity and connectivity are known and can be reliably recorded from multiple animals. The method described here is best suited for systems with small numbers of identified neurons. For other systems, other feature vectors may be chosen, as we have suggested in the Discussion: Applicability to other systems.

      The main results of the paper provide evidence that ISIs and spike phase statistics provide a reasonable descriptive starting point for understanding the diversity of pyloric circuit patterns. The authors rely heavily on t-distributed stochastic neighbor embedding (tSNE), a well-known nonlinear dimensionality reduction method, to visualize activity patterns in a low-dimensional, 2D space. While effective, the outputs of tSNE have to be interpreted with great care (Wattenberg, et al., "How to Use t-SNE Effectively", Distill, 2016. http://doi.org/10.23915/distill.00002). I think the conclusions of this paper would be strengthened if additional machine learning models were applied to the ISI and spike phase features, and if those additional models validated the qualitative results shown by tSNE. For example, tSNE itself is not a clustering method, so applying clustering methods directly to the high-dimensional data features would be a useful validation of the apparent low-dimensional clusters shown in the figures.

      We thank the reviewer for these suggestions, and agree with the reviewer that t-SNE is not a clustering method, and directly clustering on t-SNE embeddings is rife with complexities. Instead we have used t-SNE to generate a visualization that allows domain experts to quickly label and cluster large quantities of data. This makes a previously intractable task feasible, and offers some basic guarantees on quality (e.g., no one data point can have two labels, because labels derive from position of data points in two dimensional space). In addition:

      • We used uMAP, another dimensionality reduction algorithm, to perform the embedding step, and colored points by the original t-SNE embedding. (Figure 3—figure supplement 3). Large sections of the map are still strikingly colored in single colors, suggesting that the manual clustering did not depend on the details of the t-SNE algorithm, but is rather informed by the statistics of the data.

      • We validated our method using synthetic data. We generated synthetic spike trains from different “classes” and embedded the resultant feature vectors using t-SNE. Data from different classes are not intermingled, and form tight “clusters” (Figure 2 -- figure supplement 4).

      • Finally, we attempted to use hierarchical clustering to cluster the raw feature vectors, and were not able to find a reasonable portioning of the linkage tree that separated qualitatively different spike patterns (Figure at the top of this document). We speculate that this is because feature vectors may contain outliers that bias clustering algorithms that attempt to preserve global distance to lump the majority of the data into a single cluster, in order to differentiate outliers from the bulk of the data.

      The authors do show that the algorithmically defined clusters agree with expert-defined clusters. (Or, at least, they show that one can come up with reasonable post-hoc explanations and interpretations of each cluster). The very large cluster of "regular" patterns -- shown typically in a shade of blue -- actually looks like an archipelago of smaller clusters that the authors have reasoned should be lumped together. Thus, while the approach is still a useful data-driven tool, a non-trivial amount of expert knowledge is baked into the results. A central challenge in this line of research is to understand how sensitive the outcomes are to these modeling choices, and there is unlikely to be a definitive answer.

      We agree with the reviewer entirely.

      Nonetheless, the authors show results which suggest that this analysis framework may be useful for the community of researchers studying central pattern generators. They use their method to qualitatively characterize a variety of network perturbations -- temperature changes, pH changes, decentralization, etc.

      In some cases it is difficult to understand the level of certainty in these qualitative observations. A first look at Figure 5a suggests that three different kinds of perturbations push the circuit activity into different dysfunctional cluster regions. However, the apparent spatial differences between these three groups of perturbations might be due to animal-level differences (i.e. each preparation produces multiple points in the low-D plot, so the number of effective statistical replicates is smaller than it appears at first glance). Similarly, in Figure 9, it is somewhat hard to understand how much the state occupancy plots would change if more animals were collected -- with the exception of proctolin, there are ~25 animals and 12 circuit activity clusters which may not be a favorable ratio. It would be useful if a principled method for computing "error bars" on these occupancy diagrams could be developed. Similar "error bars" on the state transition diagrams (e.g. Fig 6a) would also be useful.

      We agree with the reviewer. Despite this paper containing data from hundreds of animals, the dataset may not be sufficiently large to perform some necessary statistical checks. We agree with the reviewer that a more rigorous error analysis would be useful, but is not trivially done.

      Finally, one nagging concern that I have is that the ISIs and spike phase statistics aren't the ideal features one would use to classify pyloric circuit behaviors. Sub-threshold dynamics are incredibly important for this circuit (e.g. due to electrical coupling of many neurons). A deeper discussion about what is potentially lost by only having access to the spikes would be useful.

      We agree with the reviewer that spike times aren’t the ideal feature to use to describe circuit dynamics. This is especially true in the STG, where synapses are graded, and coupling between cells can persist without spiking. However, the data required simply do not exist, as it requires intracellular recordings, which are substantially harder to perform (and maintain over challenging perturbations) than extracellular recordings.

      Finally, the signal to the muscles – arguably the physiologically and functionally relevant signal – is the spike signal, suggesting that spike patterns from the pyloric circuit are a useful feature to measure. Nevertheless, this is an important point, and we thank the reviewer for raising it, and we have included it in the section titled Discussion: Technical considerations.

      Overall, I think this work provides a useful starting point for large-scale quantitative analysis of CPG circuit behaviors, but there are many additional hurdles to be overcome.

      Reviewer #2 (Public Review):

      This manuscript uses the t-SNE dimensionality reduction technique to capture the rich dynamics of the pyloric circuit of the crab.

      Strengths:

      • The integration of a rich data-set of spiking data from the pyloric circuit

      • Use of nonlinear dimension reduction (t-SNE) to visualise that data

      • Use of clusters from that t-SNE visualisation to create subsets of data that are amenable to consistent analyses (such as using the "regular" cluster as a basis for surveying the types of dynamics possible in baseline conditions)

      • Innovative use of the cluster types to describe transitions between dynamics within the baseline state and within perturbed states (whether by changes to exogenous variables, cutting nerves, or applying neuromodulators)

      • Some interesting main results: o Baseline variability in the spiking patterns of the pyloric circuit is greater within than between animals

      o Transitions to silent states often (always?) pass through the same intermediate state of the LP neuron skipping spikes

      Weaknesses:

      • t-SNE is not, in isolation, a clustering algorithm, yet here it is treated as such. How the clusters were identified is unclear: the manuscript mentions manual curation of randomly sampled points, implying that the clusters were extrapolations from these. This would seem to rather defeat the point of using unsupervised techniques to obtain an unbiased survey of the spiking dynamics, and raises the issue of how robust the clusters are

      We have used t-SNE to visualize the circuit dynamics in a two-dimensional map. We have exploited t-SNE’s ability to preserve local structure to generate an embedding where a domain expert can efficiently manually identify and label stereotyped clusters of activity. As the author points out, this is a manual step, and we have emphasized this in the manuscript. The strength of our approach is to combine the power of a nonlinear dimensionality reduction technique such as t-SNE with human curation to make a task that was previously impossible (identifying and labelling very large datasets of neural activity) feasible.

      To address the question of how robust the manually identified clusters are, we have:

      1) used another dimensionality reduction technique, uMAP, to generate an embedding and colored points by the original t-SNE map (Figure 3 – figure supplement 3). To rough approximation, the coloring reveals that a similar clustering exists in this uMAP embedding.

      2) We generated synthetic spike trains from pre-determined spike pattern classes and used the feature vector extraction and t-SNE embedding procedure as described in the paper. We found that this generated a map (Figure 2—figure supplement 4) where classes of spike patterns were well separated in the t-SNE space.

      • the main purpose and contribution of the paper is unclear, as the results are descriptive, and mostly state that dynamics in some vary between different states of the circuit; while the collated dataset is a wonderful resource, and the map is no doubt useful for the lab to place in context what they are looking at, it is not clear what we learn about the pyloric circuit, or more widely about the dynamical repertoire of neural circuits

      • in some places the contribution is noted as being the pipeline of analysis: unfortunately as the pipeline used here seems to rely in manual curation, it is of limited general use; moreover, there are already a number of previous works that use unsupervised machine-learning pipelines to characterise the complexity of spiking activity across a large data-set of neurons, using the same general approach here (quantify properties of spiking as a vector; map/cluster using dimension reduction), including Baden et al (2016, Nature), Bruno et al (2015, Neuron), Frady et al (2016, Neural Computation).

      • Some key limitations are not considered:

      o the omission of the PY neuron activity means that the map as given is incomplete: potentially there are many more states, and hence transitions, within or beyond those already found that correspond to changes in PY neuron activity

      We agree with the reviewer that the omission of the PY neurons’ activity means that the map is incomplete. There are likely many more states, and hence many more transitions, than the ones we have identified. In addition, we note that there are other pyloric neurons whose activity is also missing (AB, IC, LPG, VD). However, measuring just LP and PD allows us to monitor the activity of the most important functional antagonists in the system (because they are effectively in a half-center oscillator because PD is electrically coupled to AB). In general, the more neurons one measures, the richer the description of the circuit dynamics will be. Collecting datasets at this scale (~500 animals) from all pyloric neurons is challenging, and we have revised the manuscript to make this important point (see Discussion: Technical considerations).

      o The use of long, non-overlapping time segments (20s) - this means, for example, that the transitions are slow and discrete, whereas in reality they may be abrupt, or continuous.

      We agree with the reviewer. There are tradeoffs in choosing a bin size in analyzing time series – choosing longer bins can increase the number of “states” and choosing shorter bins can increase the number of transitions. We chose 20s bins because it is long enough to include several cycles of the pyloric rhythm, even when decentralized, yet was short enough to resolve slow changes in spiking. We have included a statement clarifying this (see Discussion: Technical considerations).

      o tSNE cannot capture hierarchical structure, nor has a null model to demonstrate that the underlying data contains some clustering structure. So, for example, distances measured on the map may not be strictly meaningful if the data is hierarchical.

      We agree with the reviewer. t-SNE can manifest clusters when none exist (Section 4 of https://distill.pub/2016/misread-tsne/) and can obscure or merge true clusters. We have restricted analyses that rely on distances measured in the map to cases where there are qualitative differences in behavior (e.g., with decentralization, Fig 7) or have compared distances within subsets of data where a single parameter is changed (e.g., pH or temperature, Fig 5). The only conclusion we draw from these distance measures is that data are more (or less) spread out in the map, which we use as a proxy for variability. We have included a statement discussion limitations of using t-SNE (Discussion: Comparison with other methods).

      • the Discussion does not include enough insight and contextualisation of the results.

      We have completely rewritten the discussion to address this.

      Reviewer #3 (Public Review):

      Gorur-Shandilya et al. apply an unsupervised dimensionality reduction (t-SNE) to characterize neural spiking dynamics in the pyloric circuit in the stomatogastric ganglion of the crab. The application of unsupervised methods to characterize qualitatively distinct regimes of spiking neural circuits is very interesting and novel, and the manuscript provides a comprehensive demonstration of its utility by analyzing dynamical variability in function and dysfunction in an important rhythm-generating circuit. The system is highly tractable with small numbers of neurons, and the study here provides an important new characterization of the system that can be used to further understand the mapping between gene expression, circuit activity, and functional regimes. The explicit note about the importance of visualization and manual labeling was also nice, since this is often brushed under the rug in other studies.

      Major concern:

      While the specific analysis pipeline clearly identifies qualitatively distinct regimes of spike patterns in the LP/PD neurons, it is not clear how much of this is due to t-SNE itself vs the initial pre-processing and feature definition (ISI and spike phase percentiles). Analyses that would help clarify this would be to check whether the same clusters emerge after (1) applying ordinary PCA to the feature vectors and plotting the projections of the data along the first two PCs, or (2) defining input features as the concatenated binned spike rates over time of the LP & PD neurons (which would also yield a fixed-length vector per 20 s trial), and then passing these inputs to PCA or tSNE. As the significance of this work is largely motivated by using unsupervised vs ad hoc descriptors of circuit dynamics, it will be important to clarify how much of the results derive from the use of ISI and phase representation percentiles, etc. as input features, vs how much emerge from the dimensionality reduction.

      We agree with the reviewer that is important to clarify how much of our results come from the data itself, and how we parameterize them using ISIs and phases, and how much comes from the choice of t-SNE as a dimensionality reduction algorithm. We have addressed this concern in the following ways:

      1. We used principal components analysis on the feature vectors and measured triadic differences in features such as the period and duty cycle of the PD neuron. We found that triadic differences were lower in the t-SNE embedding than in the first two PCA features, or in shuffled t-SNE embeddings (Figure 2– Figure supplement 2), suggesting that the embedding is creating a useful representation that captures key features of the data.

      2. We have used uMAP to reduce the dimensionality of the feature matrix to two dimensions and found that it too preserved the coarse features of the embedding that we observe with t-SNE. Coloring the uMAP embedding by the t-SNE labels revealed that the overall classification scheme was intact (Fig 3 – figure supplement 3).

      3. We generated a synthetic dataset and applied the unsupervised part of our algorithm to it (conversion to ISIs, phases, etc., then t-SNE). We colored the points in the t-SNE embedding by the category in the synthetic dataset. We found that categories were well separated in the t-SNE plot, and each cluster tended to have a single color. This validates the overall power of our approach and shows that it can recover clustering information in large spike sets (Figure 2—figure supplement 4).

      4. We have run k-means and hierarchical clustering on the feature vectors directly and shown that our method is superior to these naïve clustering algorithms running on the feature vectors. We speculate that this is because these clustering methods attempt to partition the full space using global distances, at the expense of distance along the manifold on which the data is located. Algorithms like t-SNE are biased towards local distances, and discount global distances between points outside a neighborhood, and are this better suited here.

    1. Author Response

      Reviewer #2 (Public Review):

      In this paper Wiles et al. show that mutations in the iswi and acf genes, which encode components of a nucleosome remodeling complex, lead to expression of a subset of H3K27me-repressed genes. The strengths of the paper include the detailed genomic analysis supporting the statements that Iswi and Acf regulate a subset of H3K27me3-repressed genes. Data showing that the +1 nucleosome shifts 50bp in H3K27me-genes upregulated in the iswi mutant is also very strong. There is strong data documenting the proteins that Iswi interacts with in N. crassa. The data showing the nucleosome shift in the acf mutant is not as strong. The summary figure is highly speculative because there is no data for discrete localization of Acf. Another piece of data that is lacking is what happens to H3K36me in iswi and acf mutants. Knowing this is important because a similar set of genes seem to be derepressed in an ash1 mutant as in the acf and iswi mutants, although the level of depression in ash1 is not as great as in iswi mutants. The summary diagram shows loss of H3K36me as a separate mechanism than loss of the ACF complex. We don't know that since there was no analysis of H3K36me in iswi or acf mutants. Still, the major findings of the paper are important.

      We feel that the data showing the nucleosome shift in ∆acf1 are quite strong (Figure 5I). We agree that the model is somewhat speculative but we feel it is useful and have addressed concerns e.g. by performing H3K36me ChIP-seq (more below). Although we were unable to ChIP ACF, we find the combination of the ACF1-DamID and nucleosome shifts specifically at the K27me upregulated genes in ∆acf1 provides good evidence that ACF is acting at and localizing to these genomic locations. We have added H3K36me ChIP data in ∆acf1 and ∆iswi.

    1. Author Response

      Reviewer #1 (Public Review):

      Adefuin and colleagues examined the interaction between components of binary odor mixtures in odor responses in mice. The authors used two-photon calcium imaging from the soma and apical dendrites of mitral/tufted cells in the olfactory bulb. Odor responses were measured in various conditions: under anesthesia (ketamine/xylazine), while well-trained mice were engaged in an odor discrimination task, or disengaged. The authors first show that mixture components interacted sublinearly in a large fraction of mitral/tufted cells (46%; Fig. 6D) consistent with previous studies. However, when odor responses were measured in awake animals, very few mitral/tufted cells showed sublinear responses at soma (8-9%; Fig. 6D). Interestingly, sublinear interaction was evident in apical dendrites of mitral/tufted cells (45%). Whether mixture components are represented linearly or not in the olfactory system is an important question, related to the animal's ability to identify or segment mixture components. Somewhat contrary to previous studies, this study demonstrate largely linear interactions. Furthermore, this study compares various behavioral conditions. These results are important and of interest to those who study sensory systems. I have a few concerns regarding data analysis.

      Thank you for your helpful review, and for recognising the relevance our work. We hope that the reviewer finds the our point-by-point responses satisfactory.

      1) Non-linear interactions are detected by the activity showing a deviation from linearity greater than 2 standard deviations. Using this criterion, non-linear interactions might decrease if the trial-by-trial activity becomes more variable. This is concerning because the activity might be less variable in the anesthetized condition, and the reduction in sublinear interactions in awake conditions may be due to a general increase in response variability during awake. Can the authors exclude the possibility that the decrease in sublinear interactions is merely due to an increase in response variability in the awake conditions. This issue also applies to the comparison between apical dendrites versus soma; are the signals in apical dendrite less variable (maybe due to some averaging across dendrites from multiple cells; see the following point 5)?

      Thank you for raising this valid point and for suggesting alternative analyses. We agree that the index we used previously is susceptible to noise, and not appropriate for comparing two datasets with different trial-by-trial variability. To quantify the deviation from linear sum more robustly, we now use the “Median fractional deviation”, which expresses a deviation from the linear sum as a fraction of predicted, linear sum - not normalised by the standard deviation – and take the median of the distribution from each field of view. As we describe in the revised Figure 4, this measure is more robust to noise. Notably, our finding that mixture summation is generally less sublinear in awake mice still stands for the early phase.

      In the revised manuscript, we use the median fractional deviation whenever we compare linearity of summation across different conditions, which includes the comparison of anaesthetised vs. awake, behaving conditions (revised Fig. 4), comparison of dendrites vs. somata (revised Fig. 4-figure supplement 1), and comparisons of awake states (revised Fig. 6). This has given us, too, more confidence about our interpretation, so we are grateful for the reviewer’s suggestions.

      2) Related to the above issue, it would be useful to analyze the difference between conditions using different metrics to fully understand what really are different between conditions. The scatter plots shown in various figures do not show drastic differences between awake and anesthetized conditions, as might be indicated by the percent of sublinear responses. It would be useful to characterize the magnitude of sublinear/supralinear effects. For example, one can calculate a fractional change in the mean response. Does this measure show consistent difference between awake and anesthetized conditions?

      Thank you for suggesting this analysis. As described above, we now use the fractional deviation to quantify how mixture summations differ from linear sums, which turned out to be a very useful way to express the property of summation (N.B.: noise is amplified for small responses when fractional deviation is used, which is another reason we use the median now). We thank the reviewer for suggesting this analysis.

      Reviewer #2 (Public Review):

      This study addresses how complex stimuli are represented in neural responses. This is particularly relevant to olfaction because the vast majority of stimuli are complex mixtures that perceptually, are not easy to decompose into parts. Nonetheless, the ability to discern a relevant odor from background odors is essential. This process is easier when neural responses to mixtures reflect the linear sum of the responses to the individual components. The main conclusion of this study is that the linearity of olfactory bulb responses to two-component mixtures increases awake versus anesthetized states. The authors provide some evidence to support this claim. However, this could be better quantified and there is a temporal aspect of linearization that is not addressed. Perhaps the most interesting aspect of the study is the difference in linearity between the dendrites and the somata of the mitral/tufted cells. But a statistical analysis of this finding was not evident. Overall a mechanistic or functional approach to understanding these findings is lacking. The differences linearity between the anesthetized and awake are simply explained by response saturation anesthetized animals. There are hints at mechanism by which linearity is supported in the OB with comparisons between soma and dendrite but these are not well developed. There is a model that addresses the functional significance of linearity but this is only supplemental and not well described.

      Thank you for appreciating the significance of our work, and for your constructive comments.

      Reviewer #3 (Public Review):

      Adefuin et al use multiphoton imaging of M/T cell responses to investigate whether neuronal representations of binary mixtures can be explained as a sum of the components. The current view in the field (built largely from studies in anesthetized animals), is that mixture summation is non-linear and increases with the degree in glomerular response overlap elicited by the components. The authors reproduce these results and ask whether the same phenomenon is observed in the awake state, in particular when the animals are engaged in an odor discrimination task. Unlike in the anesthetized state, the authors find that mixture representations are linear in the awake brain. They use a series of systematic behavioral paradigms to show that the observed linearity in the awake state (compared to anesthetized) is not dependent on task engagement (reward is given randomly, post-odor) or stimulus relevance (reward is given before odor). While the experiments are well done and the data is presented clearly, I have several major concerns about the interpretation of their results.

      1) Given the data the authors present, it is unclear if one can conclude that the olfactory system is more or less linear in the awake state compared to the anaesthetised one. What seems to change most across the awake vs. anesthetized state is the response amplitude. Responses appear to be ~3x smaller in the awake mice. In the anesthetized state, non-linearity seems most apparent for large response amplitudes (>5 dF/F) with mixture responses being sub-linear, most likely due to saturation effects. The authors themselves do an analysis in Figure 6 - supplement 1 to show that most of the observed non-linearity in the anesthetized animals can be explained away after accounting for amplitude normalisation. The authors use this analysis to comment that the level of linearity is the same across all the three awake states, but the same figure shows that it is in fact the same even for the anaesthetized state.

      To put it differently, it is indeed true from the authors data that the OB response gain is significantly lower in the awake state, but it is unclear if the summation is more linear if measured at similar response amplitude regimes in both awake and anaesthetised mice.

      Thank you for the valuable comments. We agree that many differences between the anaesthetised vs. awake states should have been taken into account when comparing the linearity of summation. We address the reviewer’s concern now by expressing the deviation as a fraction of the predicted, linear sum of component responses. Further, we also considered another factor that could influence the anaesthetised vs. awake comparison, namely, the trial-by-trial variability. This is reproduced below.

      Figure R1: comparison of mixture summation for the early phase of responses, expressed as the fractional deviation.

      2) The authors argue that keeping response amplitudes small in the awake brain prevents sub-linear summation and therefore may lead to better mixture decomposition. They do a decoding analysis in anaesthetised mice to show that linear mixture representations (instead of using observed sub-linear representations) make odor classification easier. However, I find this analysis uninformative and misleading. It is no surprise that the decoders trained on single odor representations should perform better (or equivalent) when using linear sums as input instead of observed sub-linear representations. The authors use this observation to suggest that this mechanism aids discrimination ability in the awake state. However, given that even the single odor responses are much weaker and noisier in the awake state, it is likely that even the single odor discrimination ability is poorer in the awake state. By the same logic, mixture decomposition might be also much poorer in the awake brain than the anesthetized brain, even though summation is more linear, just because responses are weaker and noisier. In my opinion, the authors should compare decoding accuracy across awake vs. anesthetized responses if they want to assert that linearisation of responses in the awake brain leads to easier decomposition. Because otherwise, while linearisation in principle can aid decomposition, at least in the form that the authors observe here, it may come at a high cost on signal-to-noise ratio which would undo the gain that linearity provides, in principle, for discrimination.

      Thank you very much for the insight and for the excellent suggestion to consider the discriminability of stimuli. In particular, we now include an analysis where a decoder trained on single responses is tested on observed mixture responses. Surprisingly, despite the substantial differences in the amplitudes of response and trial-by-trial variability, decoders using data from awake mice performed well, even better than anaesthetised data for the late phase of responses. This is now described in the revised figures (revised Fig. 5). We thank the reviewer for the excellent suggestion.

      Interestingly, though, the time course of the decoder performance does not correlate well with the linearity of summation. This observation is now described in the abstract (lines 19-21): “…decoding analyses indicated that the data from behaving mice was able to encode mixture responses well, though the time course of decoding accuracy did not correlate with the linearity of summation“.

      3) At a more philosophical level, to this Reviewer, it is unclear if anesthesia vs. awake state difference in response should constitute the main focus of the manuscript. The authors explore summation properties under four different brain states, one of which is anaesthesia (also least behaviorally relevant). In three out of four states, they observe that summation is linear. In the fourth (anaesthesia), they observe that summation is sub-linear, but this happens at much larger response amplitude regimes compared to the three awake states sampled, presumably due to saturation. To me, it seems that the Authors here show that mixture summation in the OB, is largely independent of brain state since it is unaffected by whether the animal is task engaged or motivated etc.

      Thank you for this thoughtful comment. This has made us reflect on the essence of our study. We believe we make three main observations. First, the anaesthesia vs. awake difference in the property of summation differ, and should be reported, because of the large volume of prior works reporting sublinear summations. However, as the reviewer recommends and as mentioned next, this is no longer the sole focus of our study. Our second observation is that the linearity of summation does not necessarily correlate with the ability to analyse mixtures, based on the decoder performance. We believe it is important to share this observation, since a number of previous studies speculated that nonlinear summation contributes to perceptual difficulty (Bell et al., 1987; Laing, 1994). Third, the decoder performance - especially one that is trained on single odour responses and tested on mixtures - shows differences depending on the awake states, where data from disengaged mice performed particularly poorly. This result is shown in the revised Figure 6. Further, we have edited the abstract and results to ensure that these are clearly communicated. We hope that this is more balanced and reflects the data better.

      4) It is unclear how to interpret the dendritic imaging comparison. First, the dendritic signal is pooled across many cells. If any of the cells that are being pooled shows sub-linearity, the pooled population response will look sub-linear, albeit less so than at the single cell level. Second, again like for the anesthetized vs. awake comparison, there is a discrepancy in response amplitudes - dendritic responses are ~2x stronger than the somatic responses and sub-linear summation would be more apparent as one approaches the saturation regime. Third, dendritic responses pool both mitral and tufted, while the somatic data the authors present is predominantly from tufted cells.

      Thank you for commenting on ways to further understand the dendritic signal. Indeed, the early prevalence of sublinearity in the apical dendrites does seem to relate to the time course of responses. This is treated more directly in the revised Fig.4 – supplement 1.

      To address the averaging effect, we tested how pulled signals may look like in terms of linearity of summation. To roughly approximate pooled responses, we reasoned that neighbouring TC/MC somata have higher chances of belonging to the same glomerulus. Thus, we averaged signals from somatic ROIs (TCs and MCs) from each field of view and calculated the fractional deviation from the linear sum (Fig. R2). While a simplistic averaging of neighbouring somata may not be perfectly accurate, but this analysis indicates that the difference between the apical dendrites vs. somata may not be simply explained by the averaging effect.

      Figure R2: Analysis of pooled somatic signals

      To approximate how dendritic signals might look like if they were simple averages of somatic responses, we pooled together signals from all TC/MC somata from each field of view, and treated it as “an approximate glomerular signal”. The plot above shows the fractional deviation from the linear sum. MC somata data comes from an additional set of experiments conducted for this rebuttal).

      In terms of the unmatched amplitude distributions and trial-by-trial variability across conditions, as the reviewer points out, the issue is similar to the comparison of anaesthetised vs. awake data. To address this, all comparisons are now presented in terms of the median fractional deviations. Further, to explain if mitral cells contributed to the discrepancy in the linearity between the dendritic signal vs. somatic signal, we now provide additional data from 137 MCs (5 fields of view, 3 trained mice performing the mixture task). These changes are described in the revised manuscript (Figure 4- supplement 1).

    1. Author Response

      Reviewer #1 (Public Review):

      In their manuscript “Plant Trans-Golgi Network/Early Endosome pH regulation requires Cation Chloride Cotransporter (CCC1)” the authors sought out to understand the importance of the cation chloride co-transporter CCC1 on plant function and intracellular ion homeostasis. The authors provide new data showing that CCC1 functions at the TGN/EE where it regulates ion homeostasis. Plants lacking CCC1 show a disruption to normal endomembrane trafficking, leading to defects in root hair cell elongation and patterning. Interestingly the authors show that the cell elongation defects can be rescued by supplementing the plants with an external osmolyte such as mannitol. Through the characterisation of CCC1 in A. thaliana, this paper shows that cation/anion transporters are essential in maintaining fine control over endosomal pH, in addition to previously characterised endosomal proton/cation transporters such as NHX5, NHX6, and CLCd.

      The paper is well written, and the experimental design is generally well thought out. The data mostly supports the authors conclusions, however there are some areas where changes are necessary to improve the clarity and completeness of the experimental work.

      1) The co-localisation experiment of CCC1 with VHA-a1 (TGN/EE marker) shows that they highly overlap, however, there are clear regions where the CCC1 and VHA-a1 marker do not co-localise, suggesting CCC1 has a broader localisation pattern which is also alluded to in the text.

      It is important to clearly determine which endomembrane compartments CCC1 localises to as this has large implications in interpretation of data regarding where the endomembrane trafficking defects originate from (eg: TGN/EE dysfunction, or other organelles, such as the Golgi and MVB/LE), and for comparisons with other intracellular transporters such as NHX5 and NHX6 (which have broader localisation at the Golgi, TGN/EE, and MVB/LE). A more detailed localisation approach by also assessing the co-localisation of CCC1 with Golgi and MVB/LE markers is necessary.

      Our data shows that the co-localisation of VHA-a1-RFP and GFP-CCC1 is extraordinarily high at 0.86 for CCC1/VHAa1, compared, for instance, with SYP43/VHAa1 from Shimizu et al. 2021 Nature Plants, which has a correlation coefficient of ~0.7.

      Proteomic studies (e.g. Groen et al 2014 J Proteomic Research), have shown that CCC1 is a high-confidence TGN/EE resident protein, co-localised with VHA-a1 and SYP61. We have included this information in the introduction and results (L86-90 and L182-186). We agree that further co-localisation studies will be useful in the future; at this time point, we focused on the role of CCC1 in the TGN/EE.

      2) The authors identify defects in cell elongation in ccc1-1 root epidermal cells, as well as defects in the formation of collet hairs. It is not clear whether the defects in collet hair formation is due to defects in cell elongation, or in root hair cell identity as root hair cell identity is disrupted in ccc mutants. Since under control conditions some ccc mutants do not form collet hair cells at all this would suggest that the hair cell identity is also disrupted, rather than just elongation. However, the root hair length quantification experiment does show very clear cell elongation defects in ccc1 mutants. The two phenotypes should be differentiated more clearly in the text.

      Thank you, we have amended the manuscript and now better differentiate between the collet hair phenotype and the root hair phenotype.

      We have now included evidence that collet hair elongation, and not cell identity, is disrupted in ccc1 collet hairs (Figure 4 – supplementary figure 1D). In ccc1 plants, collet hairs are initiated to some degree but do not elongate, while under increased external osmolarity, collet hairs elongate similar to what was observed in the wildtype.

      3) Figure 6 describes experiments designed to assess whether ccc1 mutants have defects in endo- and/or exocytosis. The authors assess endocytosis using an FM4-64 uptake experiment where they conclude that ccc1 mutants have defects in endocytosis. However, the data from the 10-minute time point (which is usually used to measure endocytosis) shows no difference between wild-type and mutant lines. There are clear differences in FM4-64 uptake to the BFA bodies after 60 minutes (Golgi+TGN) which instead suggests ccc1 mutants primarily have defects in post-Golgi trafficking, rather than endocytosis.

      We agree with the reviewer’s comment here and we think there was a misunderstanding due to the wording we used. Yes, a 10 minutes time point would measure immediate endocytosis and what we quantify at the 60 min time point is endocytic trafficking. We had used the terminology “endocytosis” throughout the manuscript, however, in the wake of these comments we realise that this terminology was not sufficiently precise. As the reviewer correctly points out, what we measured and what we are interested in is “endocytic trafficking”, a process previously shown to be disrupted in mutants with altered TGN/EE pH. We have improved the wording of the manuscript to better reflect this and more strictly adhered to the exact use of endocytic trafficking and endocytosis

      The authors should also assess whether secretion/recycling of PIP2;1 and PIN2-GFP is altered by quantifying the signal at the plasma membrane, and potentially by performing FRAP assays of PIP2;1 or PIN2-GFP at the plasma membrane.

      We appreciate the reviewers’ interest in this subject; however, the trafficking results are included to support the assertion that CCC1 has a role in TGN/EE pH and ion regulation. Detailed trafficking assays are therefore not a key or central theme in the manuscript and as such, we think that further focusing on the trafficking aspect would distract readers from the primary take home message of the work. Nevertheless, quantification of PIN2-GFP signal at the PM is now included in Figure 6 – supplementary figure 1A, as requested.

      The authors could also assess whether ccc1 mutants have general defects in secretion by visualisation of sec-RFP in ccc1 mutants. These experiments (in addition to the co-localisation experiments suggested above) would provide much stronger evidence to determine the exact source of trafficking defects.

      We agree that sec-RFP would be another means to assess general secretion defects on-top of what we have already provided. However, we believe that further characterisation of trafficking defects in ccc1 with sec-RFP will not aid in further determining the exact source of trafficking defects beyond what is already provided in the manuscript. We suggest that the trafficking defects are caused by changes to TGN/EE pH regulation and the mechanism by which pH impacts trafficking is not yet fully understood. To that end, we used FM4-64 and PIN2 to assay trafficking as these are markers used previously to assay trafficking of det3 and nhx5/nhx6 mutants. The assessment of CCC1’s role in TGN/EE pH and ion regulation is the central goal of this work.

      4) The calibration curves from Figure 7 are missing

      Calibration curves have now been included (Figure 7 – supplementary figure 2)

      5) The control image of PRP3::H2B in wild type seedlings is missing

      The control wildtype image has been added (Figure 2 – figure supplement 1C)

      Reviewer #2 (Public Review):

      In the submitted paper, the authors first show that activity of the CCC1 promoter is ubiquitous. They further analyze the phenotype of the mutant in the root and show a root cell elongation defect in epidermal cells as well as in root hairs. The ccc1 mutants also lack the collet root hairs and show trichoblast-atrichoblast cell fate identity defects in the primary root. The authors perform a set of elegant experiments where they show that, surprisingly, the ccc1 plants are resistant to hyperosmotic environment. The ccc1 cells show delayed plasmolysis, ccc1 seeds show better germination, and ccc1 root hair elongation is recovered in hyper-osmotic media. Interestingly, the absence of collet root hairs was also recovered in hyper-osmotic environment, even though it is not clear whether this was caused by 'reparation' of collet hair elongation or collet hair cell fate specification. The phenotypic analysis is carefully performed and the results are unexpected and intriguing.

      The authors further show that in root trichoblasts, GFP-CCC1 localizes to the TGN/EE compartment, and that in this tissue, the fusion protein recovers the root hair elongation of the mutant. Further, the authors focus on the subcellular phenotypes of the endomembrane system performance in the ccc1 mutant background. It is shown that PIP2 aquaporin internalized less in the ccc1 than in control, which hints to that endocytosis is reduced in the ccc1 cells. An alternative explanation however could be that the mutant is more osmotically tolerant also on the subcellular level. To test the endomembrane trafficking rate, PIN2 aggregation and recovery from BFA bodies is performed, as well as quantifications of FM4-64 uptake, and the authors conclude that the mutant has generic endomembrane trafficking defects.

      The authors hypothesize that the endomembrane defects might stem from a disturbance in TGN/EE luminal pH caused by an ion imbalance in the ccc1 cells. Therefore, they measured the luminal pH of TGN/EE using a genetically encoded phluorin and demonstrate a more alkaline pH values in the ccc1 mutants. Finally, the authors show that during hyperosmotic stress, the TGN/EE pH rises in the control plants, suggesting that this pH rise is functionally connected to the stress response. The second part of the manuscript that focuses on subcellular phenotypes uses advanced live-cell imaging tool and successfully measures pH in minute volumes of TGN/EE compartments. In addition, the specificity of the phenotype is demonstrated by careful analysis of vacuolar and cytoplasmic pH. These well performed experiments indeed point to the function of CCC1 in ion control in TGN/EE.

      Many thanks for your positive comments on our work. We are glad it is appreciated.

      Weaknesses of the manuscript:

      The functionality of the GFP-CCC1 fusion is questionable as it was impossible to obtain transgenic lines that would express GFP-CCC1 under the control of the native promoter, not allowing full complementation of the ccc1 phenotype. This hints to a possible dominant-negative effect of this particular protein fusion. The authors therefore express GFP-CCC1 using a trichoblast-specific promoter and show that the root hair elongation phenotype is complemented, demonstrating some functionality of this construct. Moreover, the root hair length data in the ccc1-1 mutants shown in figure 2D and 3C differ, which to some extent weakens the important conclusion that the GFP-CCC1 is functional at least in this cell type. Functionality of this construct is a crucial aspect for the manuscript. The possible dominant-negative effect of the construct weakens the conclusion about the subcellular protein localization, which in turn weakens the main conclusion of the paper - that CCC1 by regulating ion fluxes in the TGN/EE allows proper endomembrane functionality.

      The reviewer notes the observed difference in wildtype root hair length between experimental data shown in Figures 2D and 3C. Yes, this is correct. The experiments were done several months apart (different to the biological replicate experiments pooled for one graph, which were always conducted around the same time). Root hairs are highly reactive to environmental conditions and as such, plants grown at different times of the year commonly have differences in root hair length. As such, comparisons can only be made to a control, grown at the same time, when looking for differences in treatments or genotypes. Therefore, the data from 2D and 3C cannot be compared to each other quantitatively but rather, qualitatively.

      In regards to the localisation and constructs used, neither N- nor C- terminally tagging produces transformants. Our experimental approaches suggest that all terminal tagging of CCC1 is dominant negative if the construct is expressed from the embryo stage (native promoter or ubiquitous promoters such as 35S). Expression of untagged CCC1 by either the native or 35S promoter rescues the phenotype of ccc1 KO plants. We have now provided a table (Supplementary File 1a) that summarises all attempts to localise CCC1 and the outcome, including the generation of an antibody.

      We have additionally added details on previous proteomics studies, which identified CCC1 as a high-confidence TGN/EE resident protein (L186 and discussion)

      The subcellular localization of CCC1 should be demonstrated without any doubts, as it was previously localized to the PM and endomembranes in pollen tubes (Domingos 2019). If CCC1 localized at the PM, alternative explanations of the phenotypes of the nature of the mutant phenotype that would include regulation ion fluxes across the PM would appear more probable than the TGN/EE hypothesis.

      The reviewer highlights that CCC1 has been proposed to localise to the PM in pollen tubes in Domingos et al. (2019). Localisation of CCC1 shown in Domingos et al. (2019), lacks colocalisation with a marker and importantly, no complementation of the knockout phenotype is shown with the tagged protein. We have added a paragraph in the discussion on this topic (L488-507).

      Quantification of endomembrane trafficking represents another important argument in the proposed hypothesis. The section that demonstrates the reduction in exo- and endocytosis is however not utterly convincing. It has been shown that a major contribution to the BFA body PIN2 pool originates from de-novo synthesis of the protein (Jasik et al, PMID:27506239). In the figure 6A, it is apparent that the BFA washout leads to disappearance of BFA bodies in the ccc1 mutant, but the level of PM fluorescence was decreased, leading to an apparent 'minimal recovery' of the cytoplasm:PM ratio. In case of endocytosis, the experiment combining FM4-64 uptake with BFA is hard to interpret as endocytosis visualization, because TGN/EE aggregation might be disturbed in the ccc1, as the authors suggest. A more detailed endomembrane trafficking then simple cytoplasm/PM ratios of signal could be performed to address what is happening with trafficking in this interesting mutant.

      In the FM4-64 experiment, there is a poor formation of BFA bodies and a lower ratio in ccc1, however, despite having the same poor formation of BFA bodies in the PIN2-GFP experiment, the ratio is higher, indicating that the visibility of BFA bodies is not crucial to the accumulation or measurement of fluorescence (discussed in L381). The reviewer does rightly state that de-novo protein synthesis is a major contributor to intracellular protein accumulation in these assays and that is why the assay focuses on the rate of recovery. That is, we measured PIN2-GFP recovery to the PM regardless of the origin of the PIN2-GFP protein in BFA bodies (de novo or from endocytosis).

      Further characterisation of the impacts TGN/EE luminal pH changes have on endomembrane trafficking is undoubtedly an interesting topic of study, as highlighted by the reviewer’s comments. The work detailed in this manuscript is focused on assessing the role of CCC1 in TGN/EE pH and ion regulation. As such, the results presented will enhance the potential to investigate the impact of TGN/EE pH on endomembrane trafficking by providing details of another tool, ccc1, which can be used to investigate this link and by further detailing the impact of environmental conditions on TGN/EE pH. It is not the aim of this study here to investigate the link between TGN/EE pH and endomembrane trafficking and as such, we believe that further results detailing endomembrane trafficking will detract from the central results of this work. However, this aspect highlights the general interest and importance of our work for other fields of plant science, and to other fields as CCC proteins are present in all organisms.

    1. Author Response

      Reviewer #1 (Public Review):

      I'm not sure why the authors are not seeing Evans Blue dye entry into the osteocytes of loaded bone from D130-136 transgenic mice. The Augusta, GA very nicely (and it has since been repeated) that osteocyte membrane disruptions occur with much milder loading (e.g., treadmill running) and allow in EB dye. These membrane tears have nothing to do with channel or hemichannel activity. So it is very hard to understand why the D130-136 mice would be spared from membrane tears that should allow copious amounts of EB into the cells. Do certain mutations in connexin prevent membrane tears? If the R76W mutation enhances hemichannel function, and the conclusions of the paper are correct that the hemichannels are controlling the response to loading, then why were the R76W mutants not more responsive than WT to mechanical loading?

      There are uncertainties regarding how common the phenotypes of membrane tears occurred in the cells in the referred study since no specific inhibitor and underlying mechanism are currently available. Hemichannels have been investigated for years and they are selective channels that allow molecular weight less than 1 kDa to pass through. In addition to specific hemichannel-blocking antibodies, and other hemichannel blockers, such as chemicals, such as carboxlone and connexin extracellular mimetic peptides showed the inhibition of smaller dye (i.e. EB, EtBr, Lucifer yellow) uptake, but not bigger dyes (i.e. rhodamine dextran (~10 kDa) both in vitro and in vivo.

      It is true that R76W mutation has enhanced hemichannel function and some anabolic bone responses as compared to WT are indeed enhanced in R76W mice including bone volume fraction, trabecular thickness and BMD, although not as dramatic as expected. It is possible that certain threshold of hemichannel activity is required for the anabolic function in response to mechanical loading and excess hemichannel activity can be attenuated by a feedback inhibition mechanism. Our earlier study showed that prolonged activity of osteocytic Cx43 hemichannels increases extracellular PGE2 level and excess extracellular PGE2 acting in an autocrine manner activates EP2/4 receptors, leading to MAPK activation. MAPK directly phosphorylates Cx43 and closes hemichannels (Riquelme et al., 2015). This mechanism could similarly regulate the activity of R76W, resulting in comparable anabolic responses to mechanical loading as WT. We have included the above in the Discussion.

      Fig 3: How is it justified to say that the D130-D136 mice had increased bone formation response to loading on the periosteum when the relative change between loaded and nonloaded look to be about the same in all three genotypes? Are the authors not adjusting for the higher or lower control leg bone formation measurements?

      In WT and R76W mice, the bone formation in both periosteal and endosteal surface were increased by tibial loading and this increase is correlated with the increase of bone area fractions and cortical thickness. In D130-136 mice, only bone formation on the periosteal surface increased, but not on the endosteal surface. In addition, we observed the increased osteoclast number in endosteal surface. The net effect in D130-136 is a decreased bone area fraction and cortical thickness. The data presented in this study include the higher or lower control leg formation measurements. We have revised the text in the Discussion to make it clear.

      Reviewer #2 (Public Review):

      This study examines the effects of mechanical loading on the bones of two transgenic mouse models of connexin 43 overexpression, one mutant which impairs both gap junction intercellular communication (GJIC) and hemichannel activity ( 130-136) and another that supports only enhanced hemichannel activity but not GJIC (R76W). The authors conclude that hemichannels but not GJIC facilitate the effects of mechanical loading on bone via the secretion of PGE2 through the hemichannels.

      While provocative, the data fall short of being convincing of the interpretation.

      A major concern is the statistical approaches used to evaluate data. The conclusions obligate that each group of animals (WT, R76W and 130-136 mice with or without loading) be compared to each other to determine differences in their ability to mount a response of bone to a mechanical load. The correct statistical test is a two way ANOVA when there are multiple variables (genotype and load). However, multiple t-tests are used to support major conclusions. Since primary data was supplied by the authors in the supplement, we checked this using statistical software. Many of the statistical analyses do not hold up when run through the appropriate statistical test. Thus, the primary findings reported are not supported.

      By working closely with a biostatistician expert, in the revision, we have thoroughly reanalyzed the data with statistical analyses. To determine the mechanical responses, the major analysis should be the paired comparison within each genotype group, WT, R76W and D130-136. Therefore, paired student T-test is an appropriate statistical approach. We agree that one-way ANOVA is improper to compare multiple variables and comparison with multiple variable (genotype and load) would provide irrelevant information regarding the treatment responses. In this study, we focus on the comparison between loaded and contralateral, unloaded tibias within each genotype using paired student T-test.

      Two additional significant weaknesses affect the potential quality and impact of this study.

      1) No convincing evidence is presented that the phenotype was rescued by PGE2. In Figure 8 and the corresponding supplement, vehicle treated and PGE2 treated unloaded controls are not shown and are critical to the appropriate interpretation of the experiment. Meaningful bone parameters including bone area and cortical thickness are not affected by the PGE2. Trabecular bone was completely unaffected by PGE2 or even the M1 antibody. Also, a oneway ANOVA is the incorrect measure with which to assess these changes. There are many variables in these mice: treatment with or without M1 antibody, loading or unloading (although not included) and treatment with or without PGE2. These are not accounted for with the statistical models used to assess the data.

      The significant reduction of bone area fraction, a key parameter by M1 was ablated with PGE2 treatment as well bone marrow area and cortical thickness. As the reviewer pointed out, the rescue by PGE2 in cortical bone was not shown in trabecular bone. We are not certain for the difference between cortical and trabecular bones. A recent paper has also shown more beneficial osteogenic responses of combined treatment of PTH(1-34) and mechanical loading to cortical bone than trabecular bone (Roberts et al., 2020). As discussed in this paper, one of the possibilities could be related to the higher strain levels experienced by cortical bone compared to trabecular bone. We have included the above in the Discussion. We have reanalyzed the data and comparison. The major comparison should be paired student-T test by comparing vehicle and M1 treated within each group, Control and PGE2.

      2) No convincing evidence that PGE2 secretion through connexin 43 hemichannels is shown. Instead, Figure 4C shows that a protein (COX2) responsible for producing PGE2 is reduced in the cells that produce PGE2 in the D130-136 mice. Several papers have shown that connexin 43 regulates ptgs2 and could affect PGE2 abundance independent of the ability to pass through connexin 43 hemichannels and others show that PGE2 also regulates connexin 43 abundance and gap junctional communication.

      Our earlier study has showed that Cx43 hemichannels in osteocytes serve as a direct portal for the release of PGE2 (Cherian et al., 2005). In this study, we showed that increased PGE2 in tibia bone by mechanical loading was totally attenuated in D130-136 mice (Fig. 4A) and M1 treated mice (Fig. 7A). Moreover, we have previous reported that the inhibition of Cx43 hemichannels does not affect intracellular PGE2 level (Siller-Jackson et al., 2008), suggesting the reduced PGE2 biosynthesis by COX2 since COX2 is the enzyme subtype responding to mechanical loading. Indeed, in this study, we showed the attenuated upregulation of COX2 expression and reduction of PGE2 level in D130-136 mice as well as in M1 treated mice. We did not find any previous papers raised by the reviewer regarding “connexin 43 regulates ptgs2 and could affect PGE2 abundance independent of the ability to pass through connexin 43 hemichannels”. We and others have shown that PGE2 can increase Cx43 expression and gap junction communication in cultured osteoblasts (Civitelli et al., 1998) and osteocytes (Cheng et al., 2001), but has no effect in oral-derived human osteoblasts (Adamo et al., 2001). Additionally, increasing Cx43 expression enhances PGE2-dependent β-catenin signaling activation in osteoblast cells (Gupta et al., 2019). Cx43 overexpression in rabbit and human synovial fibroblast cell lines increased PTGS2 gene expression (Gupta et al., 2014). Moreover, increased extracellular PGE2 could serve as a feedback inhibitor that activates MAPK, phosphorylates Cx43 and closes Cx43 hemichannels (Riquelme et al., 2015). The outcomes of this study will help establish hemichannels as a potential de novo drug target for treating bone loss and osteoporosis. We have included the above in the Discussion.

    1. Author Response

      Reviewer #4 (Public Review):

      Not every single cell is the same in terms of its metabolism. To study the causes of such cell-tocell differences, we need microscopic tools to assess metabolic properties, such as metabolite levels and metabolic fluxes, on the single cell level or even beyond. While sensors exits to visualize certain metabolite levels, we still largely lack methods to assess metabolic fluxes in single cells. The work of Yang and Needleman presents a method that can assess -under certain assumptions- the flux through electron transport chain (ETC) in mitochondria of single mouse oocytes at quasi steady-states with subcellular resolution.

      For their method, the authors use FLIM (fluorescence lifetime imaging microscopy) to determine the concentration of free and unbound NADH in mitochondria, and these measurements are then used in a simple coarse-grained model to infer the flux through the ETC. This coarse-grained steady-state model describes the oxidation of NADH with one oxidase (resembling the ETC) and one NADH reductase (resembling all the 3 TCA cycle NADH dehydrogenases plus pyruvate dehydrogenase, but neglecting the FADH2-dependent succinate dehydrogenase) with only two free model parameters.

      Strikingly, when fed with the FLIM data, this coarse-grained model could describe the outcomes of a number of perturbations, where the oxygen uptake rate (i.e. a proxy for the flux through the ETC) was independently measured with a different method. Applying the method, the authors also suggest that the ETC flux is higher in mitochondria that are rather located at the outside of the oocyte.

      While FLIM measurements of bound and unbound NADH have been done before, the main strength of the paper is that it presents a method to infer metabolic activity in an oocyte, where the novelty resides on the development of the simple coarse-grained model and on showing that the model-based analysis of the FLIM data can allow to obtain quasi-steady-state ETC fluxes. The main weakness of the paper is the following: Unfortunately, the work falls short on the application side. One would have wished that for a novel method like this, if it is indeed relevant, it should have been easy for the authors to add exciting application cases that would indeed generate novel biological insight.

      While the main strength is the paper is the method (i.e. inference of ETC flux of model-based analysis of FLIM data), I feel that the description of the method, its assumptions etc falls short, which made assessment of the method and its potential limitations challenging. I feel that this is due to the fact that the writing of the manuscript is suboptimal. While the biochemistry is described/introduced on a very detailed textbook level, the methods, the measurements, the analyses of the measurement data in the result section and in the method section are described in a very short, condensed, and sometime convoluted, manner. As this is primarily meant to be a method paper, the authors need to do a better job in describing what they have done (i.e. model development, model assumptions, inference procedure, etc) in a clearer manner.

      I felt that a strong point was that the two different versions of how the experimental data is used in the model, i.e. lifetime (tau) and bound ratio (beta), leads to similarly inferred r_ox. However, due to the above criticized too short explanations, I could not tell whether this would be trivial or not. Also, the whole method boils down to this equation J_ox = alpha * (beta - beta_eq) * [NADH_f], describing the full complexity of mitochondrial metabolism (TCA cycle, the electron transport chain, metabolite exchange between mitochondria and cytoplasm) with a single equation with only two free parameters (alpha, beta_eq). For this reviewer, also this part still remains somewhat elusive.

      We thank the reviewer for the detailed and in-depth review of our manuscript. We appreciate the reviewer’s suggestion to add application cases to demonstrate the usefulness of our method. We now added two application of our flux inference procedure to the revised manuscript. The first case is the discovery of homeostasis of ETC flux in mouse oocytes: perturbations of nutrient supply and energy demand do not change ETC flux despite significantly impacting NADH metabolic state (Figure 8). The second case is the discovery of the intracellular spatial gradient of ETC flux in mouse oocytes. As suggested by the reviewer, we have used metabolic inhibitors to help reveal the cause of this gradient and found that this gradient is primarily a result of a spatially heterogeneous mitochondrial proton leak (Figure 9). We concluded from these observations that ETC flux in mouse oocytes is not controlled by energy demand or supply, but by the intrinsic rates of mitochondrial respiration.

      We thank the reviewer for their suggestions to improve the presentation of this work. We have significantly rewritten the paper to clearly describe the model development, model assumptions, data analysis procedures and results. Regarding the comparison of the two inference methods, we presented details of the assumptions and derivations in the results section and demonstrated that the agreement between these two methods is not trivial, and is a robust self-consistency check of the method. We also now explained the coarse-graining procedure in detail in the main text and in Appendix 2 and 3 to demonstrate how all the model complexities are coarse-grained into only two free parameters 𝛼 and 𝛽’(. In a nutshell, 𝛼 and 𝛽’( would be functions of the kinetic rates of the model, with the kinetic rates depending on the details of mitochondrial metabolism. However, using the model to infer ETC flux does not require knowing the functional forms of 𝛼 and 𝛽’(, because 𝛼 and 𝛽’( can be experimentally measured with FLIM. The only assumptions required are that 𝛼 remains a constant under perturbations and 𝛽’( can be determined from ETC inhibitions. These assumptions are validated experimentally in mouse oocytes and human tissue culture cells from the agreement between predicted ETC flux and direct measurements of OCR.

      We also validated our model in an additional cell type of human tissue culture cells, demonstrating the generality of our method (Figure 7).

    1. Author Response

      Reviewer #1 (Public Review):

      Wang and Dudko derive analytical equations for one special case of a model of Ca-dependent vesicle fusion, in the attempt to find a "general theory" of synaptic transmission. They use a model with 2 kinetically distinct fast and slow pools (equation 1).

      Critique

      1) Overall, the analytical approach applied here remains limited to the quite arbitrarily chosen 2-pool model. Thus, while the authors are able to re-capitulate the kinetics of transmitter release under a series of defined intracellular Ca-concentration steps, [Ca]i (see Fig. 2B; data from Woelfel et al. 2007 J. Neuroscience), this is nevertheless not surprising because the data by Woelfel et al. was originally also fit with a 2-pool model. More importantly, the 2-pool model is valid for describing release kinetics at high [Ca]i, but it cannot account for other important phenomena of synaptic transmission like e.g. spontaneous and asynchronous release which happen at lower [Ca]i, with different Ca cooperativity (Lou et al., 2005). Along the same lines, the derivations of the equations by Wang and Dudko are not valid in the range of low [Ca]i below about 1 micromolar (see "private recommendations" for details). This, however, limits the applicability of the model to AP-driven transmitter release, and it shows that based on one specific arbitrarily chosen model (here: the 2-pool model), one cannot claim to build a realistic and full "theory" for synaptic transmission.

      Our two-pool description is far from being “arbitrarily chosen”. It is based on experimental facts that have been established by multiple independent laboratories: namely, the observed two distinct vesicle fusion kinetics due to the presence of the readily releasable and reserve pools in vivo and due to the presence of two dominant vesicle morphologies in vitro. The two-pool picture has been confirmed and successfully used in numerous experimental papers previously. That being said, our two-pool description refers to a more general notion of separation of timescales and is thus more flexible than a literal interpretation might suggest.

      The data from Woelfel et al. 2007 J. Neuroscience, while of excellent quality, are not the only measured kinetics of the action-potential triggered vesicle fusion that our theory has been able to recapitulate (see other experimental data in Fig.2 and Fig.3 of the manuscript). The theory also recapitulates the kinetic measurements from fifteen other independent experimental studies, on ten different types of synapses. The dynamic range (peak release rate) of these synapses vary by 10 orders of magnitude, and the range of Ca2+ concentrations spans more than 3 orders of magnitude. Our work recapitulates these 16 datasets not through 16 different ad-hoc models but through a single, fully analytically solved, theoretical framework. Importantly, beyond recapitulating the existing data, our analytically tractable theory enables one to extract the unique sets of microscopic parameters for particular synapses, such as the activation energies and kinetic rates of their synaptic machinery, the sizes of the vesicle pools and the critical number of SNAREs. We verify that these predictions from our theory have reasonable values for each of the data sets; this is an additional, non-trivial check of our theory. The fact that our theory reproduces observations on such strikingly diverse systems, and has such a degree of predictive power, cannot be dismissed as an artifact or coincidence. We are not aware of any other theory, nor fitting model, of comparable generality and the ability to generate concrete predictions.

      Reviewer #1 is mistaken in stating that the derivations of our equations are not valid below 1 micromolar Ca2+ concentrations. It is evident already from Figure R1 below (Fig.2 in the revised manuscript) that the theory performs flawlessly at concentrations as low as 0.1µM. There are indeed non-linear effects at ultra-low Ca2+ concentrations that are not displayed by the experimental data in Fig. R1. Our theory is also applicable in that regime: one simply needs to include a second coordinate (in addition to the number of Ca2+ ions bound, 𝑄‡ ) to account for the multidimensionality of the free energy landscape, analogous to the calculations of the rate constants for multidimensional activated rate processes in chemical physics. This illustrates just one of the many ways in which our theory will enable detailed studies of mechanistic aspects of synaptic transmission.

      With further regards to generality, as stated in our Abstract, this paper is concerned with providing a physical theory to describe “rapid and precise neuronal communication” enabled by “a highly synchronous release” of neurotransmitters. Typically, more than 90% of the neurotransmitters are released through synchronous release during the action potential. By applying our theory to each of the multiple Ca!" sensors one will be able to cover the remaining <10% of the neurotransmitters and thus simultaneously describe spontaneous, asynchronous and synchronous release. While detailed studies of these effects are clearly beyond the scope of this work, our theory opens a door for such studies by providing a foundation in the form of a conceptual, analytically tractable framework.

      2) In their derivations, Wang and Dudko collapse the intracellular Ca-concentration [Ca]i, a parameter directly quantified in the several original experiments that went into Fig. 2A, into a dimensionless relative [Ca]i "c" (see equation 7). Similarly, the release rates are collapsed into a dimensionless quantity. With these normalizations, Ca-dependent transmitter release measured in several preparations seems to fall onto a single theoretical prediction (Fig. 2A). The deeper meaning behind the equalization of the data was unclear, except a demonstration that the data from these different experiments can in general be described with a two-pool model, which is at the core of the dimensionless equations. One issue might be that many of the original data sets used here derive from the same preparation (the calyx of Held), and therefore the previous data might not scatter strongly between studies. This could be clarified by the authors by also plotting the data from all studies on the non-normalized [Ca]i axis for comparison. Furthermore, it would be useful to include data from other preparations, like the inner hair cells (Beutner et al. 2001 Neuron; their Fig. 3) which likely have a lower Ca-sensitivity, i.e. are right-shifted as compared to the calyx (see discussion in Woelfel & Schneggenburger 2003 J. Neuroscience). Thus, it is unclear why normalization of [Ca]i to "c" should be an advantage, because differences in the intracellular Ca sensitivity of vesicle fusion exist between synapses (see above), and likely represent important physiological differences between secretory systems.

      We thank the Reviewer for challenging our work with the hypothesis that the demonstrated universal scaling of the experimental data could in fact be an artefact caused by pre-selecting the data with the same preparation – addressing this hypothesis is indeed a compelling test to probe the true limits of generality of our theory. Below we carry out this test. We implemented the two suggestions of the Reviewer: (i) we added datasets on markedly different synaptic preparations, including the inner hair cells as suggested by the Reviewer, as well as retina bipolar cell, hippocampal mossy fiber, cerebella basket cell, chromaffin cell, insulin-secreting cell, and additional data on Calyx of Held from multiple laboratories, and (ii) we plotted the data on the non-normalized axis of [Ca2+] to reveal the full extent of scatter among the data sets. The resulting plot (Fig. R1 below) speaks for itself: in vivo data for the release rate span 4 orders of magnitude at low [Ca2+] and 6 orders of magnitude at high [Ca2+], and there is a 10 orders of magnitude difference between the release rates from in vivo and in vitro data. The scatter across 4-10 orders of magnitude allows one to appreciate the vastly different sensitivities to [Ca2+] between synaptic preparations (Fig.R1, left). Yet, all these data collapse beautifully on the master curve established by our theory (Fig.R1, right).

      Fig. R1. Despite 10 orders of magnitude variation in the release rate of different synaptic preparations and more than 3 orders of magnitude range of calcium concentration (left), the data collapse onto a universal curve predicted by the theory (right). The universal collapse indicates that the established scaling (Eq. 7) is universal across different synapses. The distinct sets of parameters for individual synapses (Appendix 3 Table 2) is a demonstration of the predictive power of the theory as a tool for extracting the unique properties of each synapse from experimental data.

      What the Reviewer refers to as “the equalization of the data” is known in statistical physics as universality. The deeper meaning of a universal scaling is its indication that the observed phenomena realized in seemingly unrelated systems are in fact governed by common physical principles. The collapse of the data onto the universal curve in Fig. R1 is a demonstration that the present theory has uncovered, quantitatively, unifying physical principles underneath the striking diversity and bewildering complexity of chemical synapses. The Referee is of course correct that the differences in [Ca2+] sensitivities among synapses likely represent important physiological differences between distinct synapses and distinct secretory systems. The present theory does not negate these differences, but it in fact allows one to quantify these differences through the unique sets of extracted parameters for individual synapses (see Appendix 3 Table 2). We are not aware of any other theory that has demonstrated universality in synaptic transmission through a simple, single scaling relation across 10 orders of magnitude in dynamic range and at the same time allowed the extraction of the microscopic parameters that are unique for the individual synapses and thus reflect the diversity of their synaptic machinery. We included Fig. R1 shown here in the revised manuscript (Figure 2).

      3) Finally, the authors use their model to derive the number of SNARE proteins necessary for vesicle fusion, and they arrive at the quite strong conclusion that N = 2 SNAREs are required. Nevertheless, this estimate doesn't fit with the number of n = 4-5 Ca2+ ions which the original studies of Fig. 2A consistently found. The Ca-sensitivity at the calyx of Held, and the steepness of the release rate versus [Ca]i relation is determined by Ca-binding to Synatotagmin-2 (the specific Ca sensor isoform found at the calyx synapse), as has been determined in molecular studies at the calyx synapse (see Sun et al. 2007 Nature; Kochubey & Schneggenburger 2011 Neuron). Furthermore, in other secretory cells, the number of SNARE proteins has been estimated to be {greater than or equal to} 3 (Mohrmann et al., Science 2010).

      The Reviewer is incorrect in their claim that there is any discrepancy here. The number of SNAREs N and the number of Ca2+ ions 𝑄‡ , extracted from the fit to our theory, are actually in a good agreement with the findings from the studies mentioned by the Reviewer. To clarify, the parameter 𝑄‡ is the number of 𝐶𝑎!" ions bound to a SNARE at the transition state (not final state) of the free energy landscape of a SNARE complex. Appendix 3 Table 2 shows that, for all synaptic preparations, the extracted values at the transition state are 𝑄‡ < 4 − 5, which is indeed consistent with n = 4 − 5 at the final state. We note that, in addition, our theory enables one to extract the key energetic parameter that governs synaptic vesicle fusion: the activation free energy barrier ∆𝐺‡ of SNARE conformational transition (in the range 8-34 kBT for different synaptic preparations, see Appendix 3 Table 2), which, to our knowledge, has not been possible to extract from these experiments before.

      The specific value N=2 was extracted from a particular data set for Calyx of Held (Woelfel et al 2007), for which the temporal curves of cumulative release at different Ca2+ concentrations were available. It is quite possible that the value of N will be different for some other synapses. As we emphasize in the manuscript (see Discussion), the present theory does not declare the same value of N for all types of synapses; the power of the theory lies in providing a fitting tool for extracting this value for a system of interest.

      Taken together, the derivation of the analytical equations for the kinetic scheme of a 2-pool model is mathematically interesting, and the scholarly derived equations are trustworthy. Nevertheless, the derived analytical model in fact captures only a specific stage of synaptic transmission focusing on Ca-dependent fusion of vesicles from two pools at [Ca]i >1 microM. Other important processes and mechanistic components (e.g. spontaneous, asynchronous release, Ca-dependent pool replenishment, postsynaptic factors) are either over-simplified or remained out of the scope of the theory. Therefore, the paper is far from providing a general "theory for synaptic transmission", as the title promises.

      We appreciate that the Reviewer sees our analytical derivations as being mathematically interesting, scholarly derived, and trustworthy. We believe that we have convincingly refuted the Reviewer’s criticisms regarding perceived limitations. We have shown that our universal scaling and collapse is not limited to high calcium concentrations, and have presented checks using data from vastly different synaptic preparations. As noted above, the generality of a theory is determined not by the amount of details packed in it but by the ability of the theory to reproduce observations and generate predictions regarding the phenomenon of interest (here: rapid and precise neuronal communication) while containing as few details as possible. Our theory accomplishes just that; it delivers precisely what our title promises.

      Reviewer #2 (Public Review):

      The present MS describes an effort to create a general mathematical model of synaptic neurotransmission. The authors invested great efforts to create a complex model of the presynaptic mechanisms, but their approach of the postsynaptic mechanisms is way oversimplified. The authors claim that their model is consistent with lots of in vivo and in vitro experimental data, but this night be true for a small subselection of experimental papers (they cite 7 experimental papers regularly in the MS!). The authors also indicate that their modeling has a realistic foundation, namely they can relate some parameters in their equations to molecules/molecular mechanisms. One example is the parameter N, which they claim indicate the number of SNARE complexes requires for fusion. The reviewer finds it rather misleading because it alludes that there is a parameter for complexin, Rim1, Rim-BP, Munc13-1 etc... The equations clearly cannot formulate and reflect diversity due to different isoforms of even the above mentioned key presynaptic molecules.

      We appreciate that the Reviewer found 7 different experimental papers – covering different synapses and different experimental setups – to be “a small subselection”. We believe that Fig. R1 above (response to Reviewer #1 point 2), which uses 16 different experimental papers, leaves no further doubts that the claims about the consistency between the theory and data are fully justified. Despite up to 10 orders of magnitude variation in the release rate of different synaptic preparations and more than 3 orders of magnitude range of calcium concentrations (Fig. R1, left), all the data collapse onto a universal curve predicted by our theory (Fig. R1, right). These data represent different systems – from the central nervous system to the secretory system – and come from in vivo and in vitro experiments. The data we have used cover the measurements on all synaptic systems that we could find in the literature on the action potential-driven neurotransmitter release. If the Reviewer is aware of any existing data on other synaptic systems that we might have missed, we will gratefully appreciate the opportunity to apply the theory to those data as well.

      The diversity of the molecular components in different synapses is captured in our theory through different values of the microscopic parameters Δ𝐺‡, 𝑄‡ and 𝑘( . These parameters describe, respectively, the activation energy barrier, the number of bound Ca2+ ions, and the intrinsic rate of the conformational transition of the SNARE complexes that drive synaptic vesicle fusion in a given synapse. Different isoforms of the individual components of SNARE complexes and scaffold proteins, including the proteins mentioned by the Reviewer, will be reflected in different values of Δ𝐺‡, 𝑄‡ and 𝑘( for specific synaptic preparations, as can be seen in Appendix 3 Table 2 in the manuscript. These parameters capture the energetic and kinetic properties of the synaptic fusion machinery as a complex rather than as a collection of isolated molecules. Because the molecular components within a SNARE complex act collectively (hence the name “complex”) to drive vesicle fusion, it is natural (and indeed fortunate) that the predictive power of the theory can be preserved with only a few key parameters of the molecular machinery as opposed to requiring a long list of parameters for every specific isoform of each of the many individual molecular components.

    1. Author Response

      Reviewer 1

      Panda and co-workers analyzed RS fMRI recordings from healthy patients and from two types of coma: UWS and MCS. They characterized the time-resolved functional connectivity in terms of metastability (time-variance of the Kuramoto order parameter), spatiotemporal patterns via non-negative tensor factorization, and its relationship to the eigenmodes of structural connectivity. Finding greater metastability and non-stationarity of the DMN network in healthy MCS patients, than in UWS patients, they found that the best discriminators to classify the different DoCs are the number of excursions (nonstability) from the DMN, salience and FPN networks extracted by the NNTF analysis. Interestingly, the data-driven NNTF yielded a novel sub-network comprising the FPN and some subcortical structures. The excursions and dwell times from this FPN subnetwork showed to be significantly lower in the UWS patients than in MCS. Surrogate data testing assures that the different methods and fits are effectively expressing the functional connectivity matrices measured.

      Overall, I think that the results are correct and they advance in the characterization and understanding of the brain under DoC. However, some improvements can be made in the way the results, and the rationale behind them, are presented.

      We thank Prof. Patricio Orio for his assessment.

      While reading the Results section, it is easy to have the impression of a disconnected set of analyses that just happened to be together. In particular, the section about the structural eigenmodes and their relationship with the time-resolved FC seems to have little connection with the rest of the work, except for confirming (yet again) that DoC patients have a less dynamic FC. More elaboration about the relevance of these results, and what they say about DoC (that other dynamical FC analyses don't), is needed both in the introduction and discussion. Although a clear explanation is given in the introduction, the bottom line seems to be yet another measure of metastability. Perhaps, a better explanation of what underlies the 'modulation strength of eigenmodes expression' will be helpful for distinguishing this analysis from others. How novel is the connection that is being done with the structural connectivity and why is this important? Moreover, the eigenmodes analysis has little-to-none importance in the discrimination of patients done at the end; thus, its place within the big picture is hard to evaluate.

      We understand the reviewer’s position. Part one of our work covers time-resolved FC and spatiotemporal networks in DoC. Part two covers the relationship between timeresolved FC and eigenmodes of the structural network. The rationale for including part two is the following: there is a lot of literature that shows that eigenmodes of the structural network can be considered as ‘building blocks’ or basis functions/vectors for spatiotemporal networks at the functional level (Aqil et al., 2021; Atasoy et al., 2016, 2018; Deslauriers-Gauthier et al., 2020; Gabay et al., 2018; Gabay and Robinson, 2017; Robinson et al., 2016; Robinson, 2021; Tewarie et al., 2019, 2020; Wang et al., 2017). Ideally to link part one and two, you would take this notion further by analysing if the magnitude eigenmode coefficients differed between UWS, MCS and healthy controls and how this would relate to dwell times or expression of spatiotemporal networks. However, this would lead to an immense multiple testing issue, which would be impossible to overcome with our sample size. An important link between part one and two of our work is the relationship between change in eigenmode expression and metastability. Our measure for metastability is only a proxy for metastability. Lack of change in eigenmode expressions seems to confirm this result of metastability.

      To allow for better integration of part one and two of our work, we have added to the introduction:

      “These eigenmodes can be considered as patterns of ‘hidden connectivity’ that come to expression at the level of functional networks. It has been postulated that eigenmodes form elementary building blocks for spatiotemporal dynamics (Aqil et al., 2021). There is evidence that the well-known resting state networks can be explained by activation of a small set of eigenmodes (Atasoy et al., 2018).”

      We have also clarified in the result section:

      “As resting-state network activity can be explained by activation of structural eigenmodes, we next analyse the role of fluctuations in eigenmode expression over time.”

      Something that I find counter-intuitive and that may confuse some readers, is the (apparent) contradiction between the diminished metastability in the DoC conditions and the reduced dwell times (Figure S1; also "the inability to sequentially dwell for prolonged times in a different set of eigenmodes", as stated in the Discussion). Fewer excursions and shorter dwell times can only mean that some networks are just less visited and maybe this would be enough to distinguish between conditions. Further explaining this will help to understand better the implications of the work.

      We understand the reviewer’s point, however we disagree that diminished metastability is in contradiction with the findings on dwell times. We show that dwell times are reduced in the posterior DMN, FPN and sub-FPTN networks, however, there is very long dwelling in the residual network in DoC. Hence, the brain resides in fewer network states in DoC, which is in agreement with reduced metastability. Our proxy for metastability is the standard deviation of the Kuramoto order parameter. Whenever there are more visits to network states, or switching between network states as is the case for healthy controls in our data, this would lead to phase uncoupling followed by phase synchronization, which would hence boost the standard deviation of the Kuramoto order parameter (a proxy for metastability).

      We agree with the reviewer that the sentence starting “the inability to sequentially dwell for prolonged….” Is confusing. We have now removed this statement.

      We have now added to the result section:

      “These findings of very short dwell times in the posterior DMN, FPN and sub-FPTN and long dwell time in the residual network can be considered as a contraction of the functional network repertoire in DoC, which is in agreement with a loss in metastability in these patients.”

      Finally, some comments about the connection(s) of these analyses with the commonly used FCD analysis (based on sliding windows of pair-wise correlations) will be useful, to put better this work into the big picture of time evolution of the functional connectivity.

      We have now discussed sliding window-based analysis in the context of our work in the methodology section.

      “Lastly, we have used a high temporal resolution method to estimate time-resolved connectivity at every time point instead of a sliding window-based method. Previous studies using sliding window approaches have provided novel insights into brain dynamics of loss of consciousness, such as the brain co-occurrence of functional connectivity patterns, which is known as brain states and its temporal (i.e., rate of pattern occurrence (probability) and between pattern transition probabilities) alteration in loss of consciousness in DoC patients (Demertzi et al., 2019) and anaesthesia induced loss of consciousness (Barttfeld et al., 2014a; Uhrig et al., 2018). However, sliding window approaches have limited sensitivity to non-stationarity in the fMRI BOLD signals (Hindriks et al., 2016) and lack to provide spatial alteration of classical brain functional network. The exploration of the spatiotemporal aspects of well-known resting state networks is an important step forwards for better understanding the relation between brain function and consciousness, in a way that is impossible to achieve at the whole brain level. In addition, recent work on time-resolved connectivity shows that brief periods of co-modulation in BOLD signals are an important driving factor for functional connectivity (Esfahlani et al., 2020; Hindriks et al., 2016).”

      Reviewer 2

      The study is of high significance, rigor, and novelty. Despite the many studies of repertoire, dynamic connectivity, etc., in the study of consciousness, there is (surprisingly, as I confirmed with a literature search) a dearth of application of these approaches to disorders of consciousness. The manuscript is well-written and transparent about its limitations. The author should consider the following recommendations:

      We thank the reviewer for his/her assessment of our work.

      1) There is frequent reference to "subcortical" and related networks, but I see no description in the text of which subcortical structures are involved. Panel N of figure 2 is helpful but I think that more explicit detail is important, especially given the specific predictions of mesocircuit theory.

      We have provided details for the subcortical networks presented in the Panel N of Figure 2. In the manuscript we provide a textual description of the brain areas that are part of the network. To improve the clarity of the description of the network, we also now refer to it as “subcortical fronto-temporoparietal (Sub-FTPN)”.

      In the result section, it read as: “This modulated subcortical fronto-temporoparietal network consist of the following brain regions: bilateral thalamus, caudate, right putamen, bilateral anterior and middle cingulate, inferior and middle frontal areas, supplementary motor cortex, middle and inferior temporal gyrus, right superior temporal, bilateral inferior parietal and supramarginal gyrus.”

      2) Similarly, although the global neuronal workspace does posit a critical role for recurrent frontal-parietal networks, can the authors be more specific about the nodes of the proposed workspace and what they found empirically?

      As above mentioned, we have provided more details about the regions part of the “subcortical fronto-temporoparietal”. As the reviewers rightfully noted, this network also shows some overlap with the Global Neuronal Workspace. We refer to that in more detail in the discussion, highlighting how our functional networks overlap and differ with the two networks (i.e., one feedforward only, one with recurrent activity), and with the predictions of the mesocircuit model. For more detail, please refer to the reply to point 1 of “Recommendations for the authors”.

      3) The classification sensitivity/specificity did not, in my opinion, add much to the manuscript, especially since the number of patients is not remotely close to what would be required for a population-based diagnostic approach. If the authors chose to include this with any reference to diagnosis (highlighted in the introduction and elsewhere), I would encourage a comparison with similar data from other clinical or neuroimagingbased diagnostic approaches. However, I think the value of the study resides more with mechanistic understanding than diagnosis.

      We agree with your suggestions that the primary aim of our work is to provide a mechanistic understanding of loss of consciousness. Therefore, we have removed the classification part from the paper and explain our findings focusing on mechanism of pathological unconsciousness rather than its potential as a clinical diagnostic tool. This change has required several textual edits throughout the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript investigates a role for YAP in replication. Previous work from this group has shown that Yap knock-down leads to accelerated S-phase and an abnormal progression of DNA replication in the frog eye. Here they extend this to show that YAP depletion accelerates S-phase and DNA replication in the frog embryo, and that YAP binds a DNA replication regulator called Rif1. Combing assays suggest that YAP acts on origin firing. This is an interesting new aspect of YAP function. I am not an expert on DNA replication, however, I feel that the manuscript would have been improved if more mechanistic insight was gained into how Rif1 and YAP interact, and how that interaction influences replication timing.

      In the revised version of the manuscript, we have strengthened our conclusion that Yap regulates the dynamics of DNA replication. We now provide additional experiments in addition to DNA combing and nascent strand analysis by agarose gel electrophoresis: Rhodamine-dUTP incorporation/nucleus, 32P-dCTP incorporation, western blotting for replication fork proteins. All show that DNA synthesis and origin activation is increased after Yap depletion.

      Moreover, in the revised manuscript we also directly compared the effects of YAP depletion to those of Rif1 depletion alone (page 7, New Figure 4). As for Yap depletion, we first quantified rhodaminedUTP incorporation after Rif1 depletion by direct fluorescence microscopy that demonstrated a clear increase of DNA synthesis, consistent with Alver et al. 2017. Second, we performed DNA combing experiments after Rif1 depletion in egg extracts that show a marked increase in DNA replication and fork density like those seen after Yap depletion, spanning from very early to mid S-phase. We therefore found that Rif1 depletion and Yap depletion qualitatively show the same main effects: an increase of DNA synthesis and fork density, that are more pronounced in early S-phase. We also noticed quantitative differences in the direct fluorescence after rhodamine incorporation of whole nuclei and fork density, with stronger effects after Rif1 depletion compared to Yap depletion. This suggests that there might be an additional mechanism for Rif1 in regulating origin activation.

      The title of the manuscript is "A non-transcriptional function of YAP orchestrates the DNA replication program". It is not clear that YAP "orchestrates" DNA replication - for this to be true, it would have to be signal responsive. Since the authors did not reveal any links to YAP activity (such as YAP phosphorylation or nuclear/cytoplasmic distribution) it is not "orchestrating" DNA replication.

      We have replaced “orchestrates” by “regulates”.

      Figure 1 shows that YAP is recruited onto chromatin after MCM2 and MCM7 and at the same time as PCNA and the start of DNA synthesis. Addition of geminin, an inhibitor of Cdt and MCM loading inhibits YAP loading onto chromatin. YAP immuno depletion leads to premature DNA synthesis or replication. Fig 1 B is quite confusing- the labeling in Figure 1B is likely incorrect.

      We apologize for this confusion. This has been corrected and the Figure 1B is now properly labelled.

      Figure 2 investigates if YAP depletion affects origin firing or fork speed, using DNA combing. Fig 2A shows that there is increased activated replication origins and decreased distance between origins. The authors say that the increase of fork density is more pronounced than the decreased distance, suggesting YAP is regulating the activation of origins. The number of replicates is low. This is especially true for the conclusion that eye length is unaltered -it appears that there is a subset of eye length that is increased in 2F, which might reach significance if triplicates were performed.

      As the referee points out, both the observed increase of fork density and decrease of origin distances argues that origin activation is increased after Yap depletion. The fact that the increase of the fork density seems more pronounced than the local decrease of neighbouring origins allows a more detailed interpretation, explicitly that whole clusters of origins are activated on top of origins inside already active clusters. This can be observed in the two independent experiments probing many fibers for eye distances and eyes numbers.

      Concerning Figure 2F, the scatter plot makes it look like that the impression that there are more eyes with larger sizes after Yap depletion, but please note that there are also more EL measured as stated in the legend (Mock n=182 versus Yap n=311). To highlight this parameter, we added these numbers below the scatter plot in the revised Figure 2F, as we have done consistently for all of the experiments presented in the revised Figures. The means of the two EL distributions are numerically different but since both distributions are not Gaussian (tested by d'Agostino and Pearson test), only non-parametric tests can apply (Mann-Whitney or Kolmogorov Smirnow test). The results of the two non-parametric tests show that the distributions are not significantly different, as mentioned in the legend. However, we cannot rule out that after Yap depletion some larger eyes may arise from fusions of forks or from a higher fork speed, but again, the tests, applied to a high number of measurements, show no significant statistical differences.

      The authors conducted AP-MS on egg extracts to identify proteins that co-IP with YAP. One of many proteins identified was RIF1 Figure 3 shows a co-IP with RIF1 and YAP. It is a very weak co-IP.

      We agree that the Rif/Yap co-IP is weak, but it is reproducible in several independent experiments with different extracts. There could be many reasons for this. Co-IPs with a high molecular weight partner like Rif1 (250 kDa) are generally tedious (poor gel migrations and WB transfer). Further, Rif1 has been described as having a subnuclear localisation and to associate with the nuclear lamina and heterochromatin. These characteristics are known to make the proteins highly insoluble. These technical limitations have been reported for the mouse Rif1 for instance (Sukackaite R; et al. Sci Rep 2017 May 18;7(1):2119). In fact, similar “weak co-IPs” were also obtained between Rif1 and Nanog (Wang J. et al. Nature 2006 (444), 364–368 ) as well as with PPI (Hiraga S. et al. EMBO Rep. 2017 Mar;18(3):403-419). Finally, it could also be that this interaction is not permanent but dynamic, making it difficult to capture in a Co-IP. Taken together, these parameters mean that the identification of the interaction is in itself challenging. What we did manage to provide is a reciprocal co-IP using the endogenous proteins, which we believe best reflects native conditions.

      Figure 4 shows that YAP levels increase during development and that depletion of YAP or RIF1 leads to increased cell division. The authors use Trim-away to deplete YAP and RIF1 and find that depletion of either leads to an increased number of small cells. The YAP depletion shown in Fig 4B is clear, as is the increased number of small cells in YAP depletion or RIF1 depletion.

      Figure 4 supplement 1 is arguing that trim away and morpholino combined are more effective. Quantitation of the western blots in panel A is needed for this to be convincing.

      The quantification is now presented in new Figure 5-figure supplement 1A. At the 2-cell stage, we observe some fluctuations in the amounts of Yap between samples, the origin of which we do not fully understand. At the 4-cell stage, a reduction in Yap is observed regardless of the depletion strategy used. It is from the 8-cell stage onwards that differential effects between the depletion methods can be appreciated. From this stage onwards, the quantifications confirm that the TRIM-Away and morpholino combined are more effective than taken separately.

      Figure 5 shows that RIF1 is expressed in the eye in RSC and that loss of RIF1 leads to a small eye. Panel B shows that by western blot analysis RIF1 antibody is specific. However, antibodies can have very different abilities in western vs staining. The RIF1 and YAP antibodies should be validated in staining. Also, the staining in Fig5C is at low resolution for both YAP and RIF1 and the identification of foci is unclear.

      This is indeed an important issue. To address this point, we performed immunostaining on retinal sections from embryos depleted with the target protein and compared the fluorescent signal obtained in control versus depleted samples. We show that upon depletion of Yap or Rif, the signal from the immunostaining is severely reduced for Yap or Rif1, respectively, which attests the specificity of the antibodies used in this study. We have added an additional supplementary Figure to show this control (Figure 6-figure supplement 1).

      We agree with the reviewers that the quality of the images could be improved. We now provide confocal images with a better resolution (Figure 6C).

      For Rif1, we observe a clear nuclear staining, rather non-homogenous which is consistent with data reported in the literature. Indeed, Rif1 localisation has been shown to be highly dynamic during the cell cycle and also during S-phase (Cornacchia D. et al. EMBO J. 2012). Some brighter foci could be observed at specific phases (such as G1-phase) but overall, the general pattern appears rather “granular” and restricted to the nucleus. This is what we are also observing. Interestingly, Rif1 does not appear to colocalize with the replication fork or with the replicative helicase MCM3 (Cornacchia D. et al. EMBO J. 2012). The replication foci observed in this study are therefore to be understood independently of the Rif1 localisation pattern.

      For Yap, we do not detect any granular expression but observe rather homogeneous nuclear and cytoplasmic staining, which is also consistent with reported data showing YAP nucleo-cytoplasmic shuffling (see for instance Manning S.A. et al. Curr Biol. 2018). STED microscopy might be necessary for higher resolution.

      It is difficult to see the points the authors wish to communicate in Figure 6. There is almost no Edu in the YAP-MO, which questions the ability to recognize the different patterns in this region of the eye.

      Our observations show that there are fewer EdU positive cells in the Yap-MO but not “no EdU”. The fluorescence intensity in the green-labelled nuclei in Figure 7C after Yap MO does not appear different from that in the control-MO. Under these conditions, there is no reason to think that one pattern is more difficult to recognise than the other one.

      Reviewer #2 (Public Review):

      This paper is of potential interest within the field of DNA replication, as it identifies a novel role for YAP protein in DNA replication dynamics. However, the conclusions are not supported by properly controlled data. Several aspects of data analysis and representation need to be revised.

      In this manuscript, the authors characterized YAP function in the control of DNA replication dynamics, taking advantage of the Xenopus laevis system.

      They found that YAP is recruited to replicating-chromatin and showed that its chromatin enrichment depends on the assembly of pre-RC proteins. In addition, they show that the immuno-depletion of YAP leads to increased DNA synthesis and origin activation, revealing YAP's possible role in the regulation of replication dynamics.

      The authors were also interested in finding YAP potential partners that could mediate its function. They identified Rif1, a major regulator of replication timing, as a novel YAP interactor during DNA replication.

      As RIF1 expression in vivo is restricted to the stem cell compartment of the Xenopus retina, similar to YAP, the authors assessed whether Rif1 could regulate the spatial-temporal program of DNA replication in stem cells. They showed that depletion of Rif1 at early stages of Xenopus embryos development leads to alterations in replication foci of retinal stem cells, resembling the effect observed following YAP down-regulation.

      Finally, they studied the impact of YAP and RIF1 down-regulation at early stages of development, showing that their absence results in the acceleration of cell division rate of Xenopus embryos, where RNA transcription is absent. Based on these results they concluded that YAP has a role in S-phase independent from transcription.

      The higher rate of DNA synthesis observed in the absence of Yap in Figure 1D is not very evident from the gels in Figure 1, supplement 3B. The timing of the experiments is continuously changing throughout the figures. It is therefore difficult to compare them. Also, comparisons across different gels are difficult to interpret. Most importantly, relative quantification on gel images cannot support the claim of increased DNA synthesis in the absence of YAP. To accurately quantify the replication of DNA added to the extract, the total amount of DNA synthesized must be quantified.

      Although we do not agree that relative quantification on gel images cannot support the claim of increased DNA synthesis in the absence of Yap, we thank the reviewer for his suggestion since we now provide additional data clearly strengthening our conclusion.

      Many studies, published in high standards journals and coming from different Xenopus replication laboratories have quantified DNA synthesised after 32P-dCTP incorporation and separation by agarose gel electrophoresis (Shechter et al, 2004; Trenz et al, 2008; Guo et al, 2015; Walter & Newport, 1997; Suski et al, 2022, Nature). Nevertheless, as the referee suggested, we quantified the total amount of DNA synthesized in three new independent experiments. These new results, presented page 5, lines 34-39 and shown in Figure 1G, support our conclusion, as they also show that Yap depletion increases total DNA synthesis. Please note that the DNA combing results presented in Figure 2 also show that replication is increased after Yap depletion. Finally, we also added another set of experiments to Figure 1 to further confirm these findings. We used the incorporation of Rhodamine-dUTP followed by the quantification of the fluorescence intensity within nuclei. This nuclei-fluorescence based method is frequently used in proliferation assays to assess nucleotide incorporation resulting from the DNA replication process in other organisms. Our new results demonstrate that DNA synthesis is increased 1.5-fold in six biological replicates and represent a third independent method, in addition to DNA combing and 32P-dCTP incorporation, showing that DNA synthesis is increased upon YAP depletion. These new results are now presented page 5, lines 27-24 and shown in Figure 1D-F.

      As explained in the MM section page 14 in the original manuscript, the replication extent (percent of replication) differs for a specific time point from one extract to another, because each egg extract prepared from one batch of eggs replicates nuclei with its own replication kinetics. To overcome this problem and to compare different independent experiments performed using different egg extracts, the data points of each sample were normalized to maximum incorporation value.

      It is also necessary to analyze the dynamics and the abundance of chromatin-bound replication proteins associated with the active replication fork after Yap depletion using chromatin binding assays. This would further confirm the increase in the fork density observed by DNA combing experiments.

      We thank the referee for this suggestion and we added a western blot of chromatin bound proteins after Yap depletion. This shows that two replication proteins associated with the active replication fork, namely Cdc45 and PCNA, are enriched after Yap depletion compared to the control at the beginning of S-phase. This observation further supports the DNA combing results showing that more forks are active after YAP depletion. This new data is now presented page 6 lines 25-32 and displayed in Figure 2H.

      We would like to stress here that with these additional methods added to the revised version, five different methods in total (Rhodamine-dUTP incorporation/nucleus, 32P-dCTP incorporation - total synthesis, 32P-dCTP incorporation - nascent strand analysis, DNA combing, western blotting for replication fork proteins) show that DNA synthesis and origin activation is increased after Yap depletion.

      The quantification of the amount of YAP in Figure 1B is confusing. The legend of the chart states "Control in light grey and presence of geminin in black", but the bar colors are of different shades of grey. It is not clear how to evaluate them.

      We apologize for this confusion. This has been corrected and the Figure 1B is now properly labelled.

      The efficiency of depletion for both Rif1 and YAP is different in Figure 4B and Figure 4A, supplement 1.

      We agree with the referee that the efficiency of depletion is different in both figures. This is explained by the fact that the extent of the depletion varies from experiment to experiment. We work with different batches of in vitro fertilized embryos and extracts, so these differences simply reflect the technical/biological variability.

      Moreover, the combined use of the TRIM-away approach with injections of MO led to a stronger and prolonged YAP depletion but also triggered toxicity in the tadpoles, which display severe abnormalities.

      It is important to point out that abnormal development is not always attributable to a toxic effect. Many losses of gene function result in malformations without being ascribed to toxicity or unspecific effects. However, we agree with the reviewers on the need to present a rescue experiment, which is now shown in new Figure 5C and new Figure 5-figure supplement 1B. In addition, we also provide gain-of-function (GOF) data for YAP in early embryos. In brief, we find that the Yap GOF leads to opposite outcomes than those of its depletion with embryos at the same stage of development, having fewer and larger cells than the control. Furthermore, we show that the effects of Yap depletion, i.e. embryos with more and smaller cells than the control at the same developmental stage, are rescued by the injection of MO-resistant Yap mRNA to restore the protein level. This is true for both embryonic divisions (new Figure 5C) and development, as we obtained normal-looking neurula after Yap rescue (new Figure 5-figure supplement 1B). Overall, these data now clearly show that Yap is both sufficient and necessary to maintain the rate of embryonic divisions and that this phenotype is specific since it can be rescued by expressing Yap alone. These new data are presented page 8, lines 2-10.

      Reviewer #3 (Public Review):

      The article by Garcia et al clearly describes a set of experiments establishing Yap as a novel regulator of DNA replication dynamics. Its characterization as both a RIF1 interaction partner as well as playing its own role in replication initiation will likely have a significant impact on the field, as currently little is known about how DNA replication during early embryonic cell divisions is regulated.

      The authors aim to identify a non-transcriptional function of YAP through the use of the Xenopus in vitro replication system and Yap depletion. Strengths of the paper include the particularly appropriate use of the Xenopus in vitro replication system, as well as the combined use of Trim-Away and morpholino oligonucleotides to deplete Yap and Rif1. Moreover, these experiments were elegantly complemented by single-molecule molecular combing and in vivo studies. Identifying Yap as a novel regulator of DNA replication dynamics, the authors achieved their aim. Through characterization of Yap as both playing a role in replication initiation and as a Rif1 interaction partner will likely have a significant impact on the field, as currently little is known about how DNA replication during early embryonic cell divisions is regulated. A weakness of the paper is that some of the representative data does not appear to be very representative of the entire data set.

      We replaced representative data in Figure 2 A, which we think better reflects the main conclusions of the entire data set.

    1. Author Response

      Reviewer #3 (Public Review):

      Q1) The manuscript reports that in vitro fertilization (including in vitro culture) of mouse embryos seemingly originates metabolic alterations probably caused by enhanced oxidative stress compared to in vivo development. Such alterations apparently increase anaerobic glycolysis, as evidenced by altered pH and lactate levels, and remain after birth, as evidenced by altered protein abundance of MCT1 and LDHB.

      The manuscript concludes that IVF alters embryo metabolism, increasing oxidative damage and glycolytic activity. The topic is interesting but I consider that the conclusions are not well supported by the experiments:

      1) In vivo generated blastocysts are analyzed at a more advanced developmental stage than their in vitro counterparts as evidenced by their increased cell number (70 vs. 50 cells). In this regard, the developmental timing when in vitro generated blastocysts are collected is undisclosed in the Materials and methods. This has an obvious effect on all experiments as the differences observed may be stage-specific rather than IVF vs. in vivo.

      A1) Thank you for the comment. The reviewer is correct and it is indeed well known that in vitro fertilization and embryo culture results in profound changes to the embryo. Overall, embryos generated in vitro are delayed compared to embryos generated in vivo. To control for this, as done in our past publications (Belli 2019; Bloise 2014; Delle Piane 2010; Giritharan 2012; Giritharan 2010; Giritharan 2006; Giritharan 2007; Rinaudo 2006; Rinaudo 2004), or by others (Doherty 2000; Ecker 2004; Weinerman 2016), we limited the analysis to expanded blastocysts of similar morphology (under microscopic examination) in all of the groups. Therefore the embryos appeared morphological similar in all of the groups. As an alternative, we could have waited longer time in vitro, but this would have resulted in embryo hatching and being not morphological similar to in vivo embryos. In addition, the 2 IVF groups provide an internal control: embryos were at the same developmental stage, but showed significant changes in metabolism and cell numbers. (96 hours of culture +13-14hours for egg collection+ 4hours of fertilization= time post HCG administration)

      We have added this information as follows: Line 377-382: To control for the known delay in development after culture in vitro, for all experiments, only expanded blastocysts of similar morphology were used, as done before (Doherty 2000; Rinaudo 2006; Rinaudo 2004). The in vivo-generated blastocysts were isolated by flushing 96-98 hours after hCG administration. IVF- 5% O2 and 20% O2 generated embryos reached the blastocyst stage after 96-98 hours following in vitro culture and 113-114 hours after hCG administration, respectively.

      Q2) Several methods are not reliable to quantify the parameters analyzed. For instance, determining protein content by immunofluorescence has been largely shown to be misleading as immunofluorescence can be affected by multiple parameters. Intracellular pH was also analyzed by an assay also based on immunofluorescence, which can also be affected by embryo size (the blastocoel is a call-devoid cavity). These analyses are not reliable.

      A2) Thank you for the comments.

      We appreciate the comments and concerns. Any single method can result in error and possible bias. Immunofluorescence analysis is a robust method that has been used to analyze the distribution of proteins in cells or tissues. For instance, oxidative stress (Liu et al., 2022, Reprod Domest Anim), several signaling molecule (Spirkova et al., 2022, Biol Reprod) and DNA methylation level (Diaz et al, 2021, Fron Gent) have been measured by immunofluorescence in preimplantation embryos and oocytes. It our study, to minimize errors, we followed exactly the same protocol and we found immunofluorescence to be reliable. In addition, global proteomics analysis of blastocysts provide partial independent confirmation of our results. While LDH-A and MCT1 were not detected, LDH-B was detected and found to be lower in IVF blastocysts, exactly as show by IF studies. Finally, western blot analysis of adult tissues confirmed reduction in LDH-B and MCT-1 levels.

      These comments have been added to the discussion as follows:

      Line 299-302: Unsupervised global proteomics analysis revealed that LDH-B was downregulated in IVF embryos. We confirmed these results by performing immunofluorescence studies. In addition we found that IVF embryos showed downregulation of both LDHA and B and of the monocarboxylate transporter, MCT 1, providing an explanation for the increase in their lactate levels

      Regarding pH measurement: to control for the possible variation in blastocoel size in different embryos, we compared immunofluorescence level of only the inner cell mass and trophoblast region of blastocysts and excluded the blastocoel region.

      This clarification has been added to the method section as follow:

      Line 488-491: To control for the possible variation in blastocoel size in different embryos, we compared immunofluorescence level of only the inner cell mass and trophoblast region of blastocysts and excluded the blastocoel region.

      Q3) Identifying proteins and metabolites in such small samples is technically difficult and error-prone, requiring validation by alternative techniques.

      We appreciate the comments and concerns. Any single method can result in error and possible bias. Immunofluorescence analysis is a robust method that has been used to analyze the distribution of proteins in cells or tissues. For instance, oxidative stress (Liu 2022), several signaling molecule (Spirkova 2022) and DNA methylation level (Diaz 2021) have been measured by immunofluorescence in preimplantation embryos and oocytes. It our study, to minimize errors, we followed exactly the same protocol and we found immunofluorescence to be reliable. In addition, global proteomics analysis of blastocysts (triplicate for each group; n=100 blastocysts for each replicate; total 900 embryos). provide partial independent confirmation of our results. While LDH-A and MCT1 were not detected, LDH-B was detected and found to be lower in IVF blastocysts, exactly as show by IF studies. Finally, western blot analysis of adult tissues confirmed reduction in LDH-B and MCT-1 levels.

      These comments have been added to the discussion as follows:

      Line 299-302: Unsupervised global proteomics analysis revealed that LDH-B was downregulated in IVF embryos. We confirmed these results by performing immunofluorescence studies. In addition we found that IVF embryos showed downregulation of both LDHA and B and of the monocarboxylate transporter, MCT 1, providing an explanation for the increase in their lactate levels

      Q4) Given the small size of these embryos (~80 µm diameter), it is unclear how they can alter significantly the composition of 500 µl of medium (106 their own volume).

      To collect 300 blastocysts, we performed multiple IVF, each IVF resulting in 10-20 blastocysts cultured in 30 microliters of media. While intracellular lactate and pyruvate were performed on the embryos collected, the media from different experiments was pooled to a final 500 microliter volume. Lactate and pyruvate levels were measured in this final volume for each group of embryo (FB, IVF5% and IVF20%)

      This has been clarified in the method section as follows:

      Line 516-519: To collect 300 blastocysts, we performed multiple IVF, each IVF resulting in 10-20 blastocysts cultured in 30 microliters of media. While intracellular lactate and pyruvate were performed on the embryos collected, the media from different experiments was pooled to a final 500 microliter volume.

      Q5) The metabolic changes observed in the offspring lack a mechanistic explanation.

      Thank you for the comment. We can formulate a hypothesis in which (Figure 8) oxidative stress from in vitro condition increase ROS and induce oxidative damage resulting in a shift toward Warburg metabolism, given that lactate is a critical energy source (Brooks, 2018). The higher intracellular lactate levels will likely induce epigenetic changes, to favor Warburg metabolism during development, as an embryonic attempt to optimize growth based on the environment predicted to be experienced in the future. When the environment does not match the prediction, disease risk increases (Godfrey 2007). Low lactate would be beneficial in a setting of low food resources because it could favor lipolysis (Brooks, 2020). In fact, lactate activates the hydroxycarboxylic acid receptor 1 (HCAR1), a G protein-coupled receptor, which in turn inhibits lipolysis in fat cells via cAMP and CREB (Liu 2009). However, since there is an abundance of food in our society, this mismatch could predispose IVF concepti to develop chronic disease like glucose intolerance.

      This hypothesis has been added to line 333-344:

      In summary, we can formulate a hypothesis in which (Figure 8) oxidative stress from in vitro condition increase ROS and induce oxidative damage resulting in a shift toward Warburg metabolism, given that lactate is a critical energy source (Brooks, 2018). The higher intracellular lactate levels will likely induce epigenetic changes, to favor Warburg metabolism during development, as an embryonic attempt to optimize growth based on the environment predicted to be experienced in the future. When the environment does not match the prediction, disease risk increases (Godfrey 2007). Low lactate would be beneficial in a setting of low food resources because it could favor lipolysis (Brooks, 2020). In fact, lactate activates the hydroxycarboxylic acid receptor 1 (HCAR1), a G protein-coupled receptor in turn inhibits lipolysis in fat cells via cAMP and CREB (Liu 2009). However, since there is an abundance of food in our society, this mismatch could predispose IVF concepti to develop chronic disease like glucose intolerance.

    1. Author Response

      Reviewer 1

      Strengths:

      This manuscript combines experimental, exploratory, and observational methods to investigate the big question in innovation literature--why do some animals innovate over others, and how information about innovations spread. By combining a variety of methods, the manuscript tackles this question in a number of ways, and finds support for previous work showing that animals can learn about foods via social olfactory inspection (i.e., muzzle to muzzle contact), and also presents data intended to investigate the role of dispersing animals in innovation and information spread.

      Using data from a previously-published experiment, the manuscript illustrates how investigators can numerous interesting questions while limiting the disturbances to wild animals. The manuscript's attempt at using exploratory analysis is also exciting, as exploratory analyses provide a useful tool for behavior research-indeed, Tinbergen insisted that behavior must first be described.

      Weaknesses:

      The manuscript's introduction is a bit unclear as to how the fact that dispersing males may be an important source of information ties to innovations in response to disruptions due to climate change, humans, or new predators, if at all. An introduction regarding the role of dispersed animals in introducing novel behaviors and social transmission would better prepare readers for the questions presented in the manuscript. As it stands now, the manuscript only provides one sentence discussing the theoretical relevance of investigating the role of dispersing animals in innovations.

      We have added some information about this to the introduction (lines 66 – 69 and 121-123) and maintain our discussion of it in the discussion.

      Additionally, while the manuscript attempts to use exploratory analysis, it does not provide enough theoretical background as to why certain questions were asked while the data were explored. While the discussion provides some background as to the role of dispersing males in innovation, the introduction provides little background, and thus does not properly frame the issue. It is unclear how dispersing males became of interest and why readers should be interested in them. As the manuscript reads now, it may be that dispersing males became interesting only as a result of the exploratory analysis-except that the predictions explicitly mentions dispersing males. Thus, manuscript at present makes it difficult to know if the questions surrounding immigrant males resulted from the exploratory analysis, or was a question the analyses were intended to answer from the beginning. If this question only came out after first reviewing the results, then this needs to be made clear in the introduction. I see no issue with reporting observations that were the result of investigations into earlier results, but it needs to be reported in a way that can be replicated in future research-I need to know the decision process that took place during the data exploration.

      We hope this is clearer from our new research aims (lines 125-173)

      The manuscript never clearly defines what counts as an immigrant male; presumably, in this species, all adult males in the group should be immigrants, as females are the philopatric sex. Sometimes, the manuscript uses "recently" to modify immigrant males, but doesn't define exactly what counts as recent, except to say that the males that innovated were in their respective groups for fewer than 3 months, but never explains why three months should be an important distinction in adult male tenure.

      We realise how we wrote about this previously was not clear and perhaps misleading. We noticed that the males that innovated had been in the group for less than three months. We do not know if this is necessary for them to innovate or not. We also added to the discussion a description of the male in AK19 who had been in the group for four months and did no innovate – as he had many other traits which we would expect to exclude him from criteria for innovation (e.g. very old, post-prime, and inactive – died within months of the experiment).

      Due to the above weaknesses, the provided predictions are a bit murky. It is not clear how variation between groups in accordance with who innovated, or initiated eating a novel food, or demographics is related to the central issue. The manuscript does contribute to the literature by looking at changing rates of muzzle contact over exposure to a novel food source, and provides a good extension of previous findings; that, if muzzle contacts help animals learn about new foods, then rates of muzzle contacts involving novel foods should decrease as animals become familiar with the food. However, this point isn't explicit in the manuscript.

      This is now addressed in the new aims paragraph (lines 125-173)

      Finally, it is also unclear as to why changing rates of muzzle contact AND whether certain individual level variables like knowledge, sex, age, and/or rank might influence muzzle contacts during opportunities to innovate.

      We are not sure exactly what the reviewer means here, but hope that the substantial revisions we have made now address their concern.

      As for the methods, the manuscript doesn't provide enough details as to why certain decisions were made. For example, no reason is given as to why only the first four sessions after an animal ate were considered, why the first three months of tenure (but not four, as seen on one group that didn't innovate) was considered to be a critical time for which immigrant males may innovate, why (including the theoretical reasons) the structure of models for one analysis was changed (dropping one variable, adding interactions), or even how the beginning and ending of a trial was decided, despite reporting that durations varied widely,-from 5 minutes to two hours.

      Please see: above about the male with 4 month tenure; and top of document for description of our updated models.

      The discussion contains results that are never elsewhere presented in the manuscript- (2a) Individual variation in uptake of a novel food according to who ate first).

      It was just an error in the sub-title in the discussion – this is now amended. But all the other corresponding details were already there, in the list of research aims in the introduction and in the results as well.

      Finally, the largest issue with the manuscript is that its results are not as convincing as the conclusions made. An issue with all the analyses is that some grouping variables in some analyses but not others despite the fact that all of the analyses contain multiple groups (necessitating group as a grouping variable) and multiple observations of the same individuals (i.e., immigrant males tested in multiple groups, necessitating animal identity as a random effect), and not accounting for individual exposure to the experiment when considering whether animals ate the food in the allotted period (an important consideration given the massive differences in trial times), making these results difficult to interpret in their current forms. As for the results regarding muzzle contact, the analyses has a number of issues that make it difficult to determine if the claims are supported. These issues include not explaining why rank calculated a year before the experiments took place was valid or if rank was calculated among all group members or within age and sex classes, not explaining how rank was normalized, and not conducting any kind of formal model comparisons before deciding the best model.

      Mostly addressed at top of this document. Regarding rank calculations: rank was not calculated a year before the experiments, it was calculated using a year’s worth of data up to the beginning of the experiments – and ranks were calculated among all group members - we have made this clearer in the methods. We also explained our method of normalisation, and noted that it was an error to include non-normalised rank in one of the models – this has now been rectified

      As for the results regarding immigrant males and innovation, little is done to help the fact that these results are from very few observations and no direct analyses. It is possible that something that occurs relatively often but in small sample sizes, like dispersing animals, could have immense power in influencing foraging traditions, and observation is a necessary step in understanding behavior. However, the manuscript doesn't consider any alternative hypotheses as to why it found what it found. No other possible difference between the groups was considered (for example, the groups that rapidly innovated appear to be quite smaller than the groups that did), making the claim that immigrant males were what allowed groups to innovate unconvincing. This is particularly true given that some groups in this study population have experimental histories (though this goes unmentioned in the current manuscript), which likely influenced neophobia-especially given work by the same research group showing that these animals are more curious compared to their unhabituated counterparts.

      We have added more discussion of alternative hypotheses to the discussion (line numbers mentioned above).

      Regarding the comment about rapid innovation in smaller groups – we are not sure what the reviewer means here – all groups except BD were similar sized. The second largest group, NH, had one of the quickest innovations and a smaller group (KB) innovated only at the third exposure. Unless the reviewer instead refers to the spread of the innovation here? This is also not quite what we see in the data – BD is the largest group and one of the fastest to spread, and KB is the smallest group and the slowest to spread. Regarding groups experimental histories, all the five studied groups have already been used in field experiments. The group (LT) with the least experimental history was the one having the greatest proportion of individuals eating the novel food at the first and over the four exposures (see Fig. 2) while one of the groups with the most experimental history (NH) was one having a smaller proportion of individuals eating the food across the experiment. This is discussed in the discussion (lines 370-380).

      Reviewer 2

      I have separated my issues with the manuscript into three sub-headings (Conceptual Clarity, Observational Detail and Analysis) below.

      1) Conceptual clarity

      There are a number of areas where it would greatly benefit the manuscript if the authors were to revisit the text and be more specific in their intentions. At present, the research questions are not always well-defined, making it difficult to determine what the data is intended to communicate. I am confident all of these issues could be fixed with relatively minor changes to the manuscript.

      For example, Line 104: Question 1 is not really a question, the authors only state that they will "investigate innovation and extraction of eating the food", which could mean almost anything.

      We re-wrote the research questions paragraph and results with this advice in mind – hope it is clearer now. We keep the innovation part just descriptive and hope this is less problematic now.

      Question 2a (line 98) is also very vague in it's wording, and I'm left unclear as to what the authors were really interested in or why. This is not helped by Line 104 which refuses to make predictions about this research question because it is "exploratory". Empirical predictions are not simply placing a bet on what we think the results of the study will be, but rather laying out how the results could be for the benefit of the reader. For instance, if testing the effects of 10 different teaching methods on language acquisition-rate: Even if we have no a priori idea of which method will be most effective, we can nevertheless generate competing hypotheses and describe their corresponding predictions. This is a helpful way to justify and set expectations for the specific parameters that will be examined by the methods of the study. In fact, in the current paper, the authors in fact had some very clear a priori expectations going into this study that immigrant males would be vectors of behavioural transmission (clear that is from the rest of the introduction, and the parameters used in their analysis, which were not chosen at random).

      We have now updated the whole research aims (lines 125-173).

      The multiple references to 'long-lived' species in the abstract (line 16 and introduction (39, 56) is a bit confusing given the focus of this study. Although such categorisations are arbitrary by nature (a vervet is certainly long-lived compared to a dragonfly), I would not typically put vervet monkeys (or marmosets, line 62) in the same category as apes (references 8 and 9) or humans (line 62) in this regard.

      When we use “long-lived” in the introduction, we explain that we mean animals with slow generational turnover for whom genetic adaptation is relatively slow – too slow to adapt to very rapid environmental change. Within the distinctions the reviewer makes here, we feel that vervets and marmosets are much more similar to apes than to dragonflies etc. in this respect… and we think making the comparisons that we do are valid in this context (though we do agree that for other reasons we would not find it appropriate). We have modified the sentence in the introduction (line 4042) and hope this is clearer now. The study in reference 9 is about crop-raiding, which is something vervets can learn to do within one generation too. In addition, reference 8 is used as it was one of the earlier and long-standing definitions of innovation which we are using here – we are not comparing vervets to apes directly, but we do not think a different definition of innovation is required.

      This contributes a little towards the lack of overall conceptual focus for the manuscript: beginning in this fashion suggests the authors are building a "comparative evolutionary origins" story, hinting perhaps at the phylogenetic relevance of the work to understanding human behaviour, but the final paragraph of the study contextualises the findings only in terms of their relevance to feeding ecology and conservation efforts. I would recommend that the authors think carefully about their intended audience and tailor the text accordingly. This is not to say that readers interested in human evolution will not be interested in conservation efforts, but rather that each of these aspects should be represented in each stage of the manuscript (otherwise - conservationists may not read far into the Introduction, and cultural evolution fans will be left adrift in the Conclusion).

      We agree that the line running through the whole paper needed to be clearer and have tried to improve this.

      2) Observational detail

      There are a number of areas of the manuscript which I found to be lacking in sufficient detail to accurately determine what occurred in these experimental sessions, making the data difficult to interpret overall. All of this additional information ought to be readily available from the methods used (the experiments were observed by 3-5 researchers with video cameras (line 341)) and is all of direct relevance to the research questions set out by the authors.

      We added more details about the experiment in the method section.

      While I appreciate that it will take quite a bit of work to extract this information, I am certain that it would greatly improve the robustness and explanatory power of this study to do so.

      The data on who was first to innovate/demonstrate successful extraction of the food in each group (Question 1) and subsequent uptake (Question 2), as well as the actual mechanism by which that uptake occurred (the authors strongly imply social learning in their Discussion, but this is never directly examined) is difficult to interpret based on the information presented. Some key gaps in the story were:

      We did not intend to claim that muzzle contact was the specific mechanism by which individuals learned to extract and eat peanuts – we rather use this experiment to evaluate the function of muzzle contact in the presence of a novel food.

      We did not record observation networks in all groups during experiments and cannot obtain accurate ones from all our videos – we hope it is clearer in our text now. Our group’s previous study (Canteloup et al., 2021) already shows social transmission of the opening techniques using data of two of our groups (NH and KB).

      • Which/how many individuals encountered the food and in what order? I.e., were migrants/innovators simply the first to notice the food?

      No, and we have now added some info about other individuals approaching the box and inspecting the peanuts before innovation took place

      • Did any individuals try and fail to extract the food before an "innovator" successfully demonstrated?
      • How many tried and failed to extract the nuts before and after observing effective demonstrators?

      We have added the number of individuals that inspected the peanuts (visually and with contact)

      • Were individuals who observed others interact with the food more likely to approach and/or extract it themselves?
      • Did group-members use the same methods of extraction as their 'innovators'?

      Yes – this is the topic of Canteloup et al. 2021 – and these data are not presented again here. That study was on two of the groups presented here (KB and NH), and with up to 10 exposures in each of those groups and present a fine-grained analysis of peanuts opening techniques used by monkeys. We hope this is clearer now in the text where we refer to this paper.

      • How many tried and succeeded without having directly observed another individual do so (i.e. 'reinvention' as per Tennie et al.)?

      For this, and the above points: We did not record an observation network for the groups added in this study and are not able to answer this – it is not the focus of this study. For this reason, we do not make claims in this line in the present study, and are cautious with our social learning related language. Whilst we examine the role of muzzle contact in acquiring information about a novel food, we do not expect this behaviour to be a necessary prerequisite in being able to extract and eat this food – indeed many individuals who learned to eat did not perform muzzle contacts. This aspect of the study is about using this novel food situation to explore whether muzzle contact serves information acquisition – which our evidence suggests it does.

      Moreover, the processing of this food is not complex and is similar to natural foods in their environment, and we do expect individuals to be capable of reinventing it easily (and this point with Tennie’s hypothesis is actually discussed in Canteloup et al. 2021 paper) – but the point here is that their natural tendency is to be neophobic to unknown food, and therefore they do not readily eat it until they see a conspecific doing so, after which they do. And we also used this opportunity, though in a very small sample size, to investigate which individuals would overcome that neophobia and be the first to eat successfully.

      The connective tissue between the research questions set out by the authors is clearly social learning. In short: the thesis is that Migrants/Innovators bring a novel behaviour to the group, then there is 'uptake' (social learning), which may be influenced by demographic factors and muzzle-contact (biases + mechanisms). Given this focus (e.g. lines 224-264 of the Discussion), I would expect at least some of the details above to be addressed in order to provide robust support for these claims.

      See above – the reason we talk about ‘uptake’ rather than social learning is that we really see this as a case of social disinhibition of neophobia, rather than more detailed social learning such as copying or imitation, as it would be in a tool-use setting, for example (though in Canteloup et al. 2021 paper, evidence is found that the specific methods to open peanuts are socially transmitted).

      Question 2a (Lines 136-146): This data is hard to interpret without knowing how much of the group was present and visible during these exposures.

      Please see response to reviewer 1 on this.

      For example: 9% update in NH group does not sound impressive, but if only 10% of the total group were present while the rest were elsewhere, then this is 90% of all present individuals. Meanwhile if 100% of BD group were present and only experienced 31% uptake, then this is quite a striking difference between groups.

      Experiments were done at sunrise at monkeys’ sleeping site in AK, LT, NH and KB where most of the group was present in the area; we added more precision on this point in the Method section (lines 615-619).

      Of course, there is also an issue of how many individuals can physically engage with the novel food even if they want to - the presence of dominant individuals, steepness of hierarchy within that group, etc, will significantly influence this (and is all of interest with regards to the authors' research questions).

      We discuss this with respect to the result showing that higher rank individuals were more likely to extract and eat the food at the first exposure and over all four exposures

      Muzzle-contact behaviour: The authors use their data to implicate muzzle-contact in social learning, but this seems a leap from the data presented (some more on this in the Analysis section).

      We hope our distinction between information acquisition and information use is clearer now.

      For example: - What is the role of kinship in these events?

      We did not analyse kinship here, but we see a lot of targeting towards adult males, and we do not have reliable kinship data for them. We also checked (see response to reviewer 3) the muzzle contacts initiated by knowledgeable adult females, and they are mostly towards adult males, not towards related juveniles (see new figure 4D and lines 497-500).

      • Did they occur when the juvenile had free access to the food (i.e. not likely to be chased off by a feeding adult)?

      We recorded muzzle contacts visible within 2m of the box, so individuals were not necessarily eating at the box at the time of engaging in muzzle contacts. However, the majority of muzzle contacts that we could record took place directly at the edge of the box – at the location where the food is accessed – so an individual would not likely be if they were not able to have access to the food. It is possible they could be there and not eating, but they would not have been chased off, otherwise they would not be able to engage in muzzle contacts there. But it is not entirely clear what the reviewer’s point is here.

      • Did they primarily occur when adults had a mouthful of food? (i.e. could it simply be attempted pilfering/begging)

      This is not typical of this species. Very few specific individuals remove food from others’ mouths, and they do it with their hands, usually beginning with grooming their face and cheekpouches, before prising their mouth open and removing food from the victim’s cheekpouches

      • What proportion of PRESENT (not total) individuals were naïve and knowledgeable in each group for each trial (if 90% present were knowledgeable, then it is not surprising that they would be targeted more often)?

      We agree somewhat with this statement, but given the multiple ways we show the effect of knowledge – both at the individual level and the group level (effect of exposure number i.e. overall group familiarity) – we feel we present enough evidence to establish the link between knowledge of the food and muzzle contacts. We find that the model showing the interaction between exposure number and number of monkeys eating on the overall rate of muzzle contacts actually addresses this issue, because we see that when many monkeys are eating during later exposures, when many were indeed knowledgeable, the rate of muzzle contacts is massively decreased. Moreover, if 90% of the individuals present are knowledgeable, then only 10% of the individuals present are naïve, and we show both that knowledgeable individuals are targeted, but also that naïve individuals are initiators.

      • Did these events ever lead to food-sharing (In other words, how likely are they to simply be begging events)?

      We do not observe food-sharing in vervets.

      • Did muzzle-contact quantifiably LEAD to successful extraction of the food? If the authors wish to implicate muzzle-contact in social learning, it is not sufficient to show that naïve individuals were more likely to make muzzle-contact, they must also show that naïve individuals who made more muzzlecontact were more likely to learn the target behaviour.

      We disagree here, because there is a distinction between information acquisition and information use - obtaining olfactory information about a novel resource that conspecifics are eating is not the same as learning a complex tool use behaviour for which detailed observation of a model is required. We are not claiming that that muzzle contact is THE mechanism by which the monkeys learn how to eat the food – but we do believe that the clear separation between naïve individuals initiating and knowledgeable individuals being target, and the decrease of the rate of this behaviour as groups’ familiarity with the food increases – is good evidence that this behaviour functions to acquire information about a novel food.

      3) Analysis

      There are a number of issues with the current analysis which I strongly recommend be addressed before publication. Some of these are likely to simply require additional details inserted to the manuscript, whereas others would require more substantial changes. I begin with two general points (A & B), before addressing specific sections of the manuscript.

      A) My primary issue with each of the analyses in this manuscript is that the authors have fit complex statistical models for each of their analyses with no steps to ascertain whether these models are a good fit for the data. With a relatively small dataset and a very large number of fixed effects and interactions, there is a considerable risk of overfitting. This is likely to be especially problematic when predictor variables are likely to be intercorrelated (age, sex and rank in the case of this analysis).

      We have now checked for overfitting in our models.

      The most straightforward way to resolve this issue is to take a model-comparison approach. Fitting either a) a full suite of models (including a 'null' model) with each possible permutation of fixed effects and interactions (since the authors argue their analysis is exploratory) or b) a smaller set of models which the authors find plausible based on their a priori understanding of the study system. These models could then be compared using information criterion to determine which structure provides the best out-of-sample predictive fit for the data, and the outputs of this model interpreted. Alternatively, a model-averaging approach can be taken, where the effects of each individual predictor are averaged and weighted across all models in the set. Both of these approaches can be performed easily using the r package 'MuMIn'. There are also a number of tutorials that can be found online for understanding and carrying out these approaches.

      Please see our answer at the beginning of the document, detailing how we have updated our models.

      B) It does not seem that interobserver reliability testing was carried out on any of the data used in these analyses. This is a major oversight which should be addressed before publication (or indeed any re-analysis of the data).

      We have added this now and mention it above already.

      Line 444: Much more detail is needed here. What, precisely, was the outcome measure? Was collinearity of predictors assessed? (I would expect Age + Rank to be correlated, as well as Sex + Rank).

      This is now addressed (please see details above) – we use VIFs to assess multicollinearity of predictors in our models and find they are all satisfactory (see R code).

      Line 452. A few comments on this muzzle-contact analysis:

      The comments below are a little confusing as some seem to refer to the muzzle-contact rate model (previously line number 452), and some seem to refer to the initiator/receiver model. We have tried to figure out which comments refer to which, and answer accordingly.

      "We investigated muzzle contact behaviour in groups where large proportions of the groups started to extract and eat peanuts over the first four exposures"

      What was the criteria for "a large proportion"?

      All groups are now included in this analysis.

      The text for this muzzle-contact analysis would indicate that this model was not fit with any random effects, which would be extremely concerning. However, having checked the R code which the authors provided, I see that Individual has been fit as a random effect. This should be mentioned in the manuscript. I would also strongly recommend fitting Group (it was an RE in the previous models, oddly) and potentially exposure number as well.

      The model about muzzle contact rate never contained individual as a random effect because individuals are not relevant in this model – it is the number of muzzle contacts occurring during each exposure. However, the reviewer might refer here to the model that we forgot to provide the script for. Nonetheless, we have substantially revised this model, it now (Model 3) includes all groups, and has group as a random effect.

      Following on from this, if the model was fit with individual as a random effect it becomes confusing that Figure 3 which represents this data seemingly does not control for repeated measures (it contains many more datapoints than the study's actual sample size of 164 individuals). This needs to be corrected for this figure to be meaningfully interpretable.

      Figure 3 is not related to the model described in (original) line 452.

      The numbers were referring to the number of muzzle contacts, and this was written in the figure caption. However, we no longer present these details on the new figure (see Fig 4).

      Finally, would it make sense to somehow incorporate the number of individuals present for this analysis? Much like any other social or communicative behaviour, I would predict the frequency of occurrence to depend on how many opportunities (i.e. social partners) there are to engage in it.

      We have included the number of monkeys eating in our muzzle contact rate model now (Model 3) as upon further thought, we found that this was the issue leading us to want to exclude exposures, and only include the groups where many monkeys were eating. We have resolved this now by including all groups and not dropping exposures, and rather we include an interaction between number of monkeys eating and exposure number. We feel this addresses our hypothesis here much more satisfactorily. We hope these updates also address the reviewers concerns adequately.

      Line 460: "For BD and LT we excluded exposures 4 and 3, respectively, due to circumstances resulting in very small proportions of these groups present at these exposures"

      What was the criterion for a satisfactory proportion? Why was this chosen

      See above – this is now addressed.

      Line 461: "We ran the same model including these outlier exposures and present these results in the supplementary material (SM3)."

      The results of this supplemental analysis should be briefly stated. Do they support the original analysis or not?

      We no longer present this like this. We revised the model examining muzzle contact rate substantially and actually included the number of individuals eating in the model rather than excluding groups where this number was low. The results of the new model show good support our hypothesis.

      Line 465: "Due to very low numbers of infants ever being targets of muzzle contacts, we merged the infant and juvenile age categories for this analysis."

      This strikes me as a rather large mistake. The research question being asked by the authors here is "How does age influence muzzle-contact behaviour?"

      Then, when one age group (infants) is very unlikely to be a target of muzzle-contact, the authors have erased this finding by merging them with another age category (juveniles). This really does not make sense, and seriously confounds any interpretation of either age category.

      Yes we agree with this issue, and no longer do that. Rather we remove the infant data from this model, which is now Model 6, because of the large amounts of error they introduced into the model due to the small sample size. We show the process in the R code, and we describe our reasons in the text (lines 713-719). Since we are now only comparing within age- and sex-categories (see below) we do not find this decision introduces any bias.

      Lines 466-474: Why was rank removed for the second and third models? Why is Group no longer a random effect (as in the previous analysis)? The authors need to justify such steps to give the reader confidence in their approach.

      This is now addressed and discussed in descriptions of our new models.

      Furthermore - because of the way this model is designed, I do not think it can actually be used to infer that these groups are preferentially targeted, merely that adult female and adult males are LESS likely to target others than to be targeted themselves, which is a very different assertion.

      Because the specific outcome measure was not described here, this only became apparent to me after inspecting Figure 3, where outcome measure is described as "Probability of (an individual) being a target rather than initiator" - so, it can tell us that adults are more often targeted rather than initiating, but does not tell us if they are targeted more frequently than juveniles (who may get targeted very often, but initiate so often that this ratio is offset).

      We thank the reviewer for noticing this as we had indeed chosen an inappropriate model for what we were intending to measure – this has been addressed now with two additional models (Models 4 and 5; see details at the top of document). We nonetheless found the aspects of this model to still be highly interesting, so have re-framed it to focus on them.

      Lines 467-473: "Our first simple model included individuals' knowledge of the novel food at the time of each muzzle contact (knowledgeable = previously succeeded to extract and eat peanuts; naïve = never previously succeeded to extract and eat peanuts) and age, sex and rank as fixed effects. Individual was included as a random effect. The second model was the same, but we removed rank and added interactions between: knowledge and age; and knowledge and sex. The third model was the same as the second, but we also added a three-way interaction between knowledge, age and sex."

      This is a good example of some of the issues I describe above. What is the justification for each of these model-structures? The addition and subtraction of variables and interactions seems arbitrary to the reader.

      For Model 6, we no longer include rank at all, because we had not hypothetical reason to (see lines 723-725). We now begin with the three-way interaction, and only remove this, because it is not significant, and the model had problems converging as well, due to its complexity. We show this in the R script. We retain only the two separate interactions, and we do not include group as a random effect in this model due to the complexity AND because we do not think there is a theoretical requirement for it to be included here (this is explained in lines 730-735- in the manuscript. We report the results of the 3-way interaction in the supplementary material – SM3 Table S2).

      Reviewer 3

      In this study, the authors introduce a novel food that requires handling time to five vervet monkey groups, some of which had previous experience with the food. Through the natural dispersal of males in the population, they show that dispersing individuals transmit behavioral innovations between groups and are often also innovators. They also examine muzzle contact initiations and targets within the groups as a way to determine who is seeking social information on the new food source and who is the target of information seeking. The authors show that knowledgeable adults are more often the target of muzzle contacts compared to young individuals and those that are not knowledgeable.

      This is a very interesting study that provides some novel insights. The methods employed will be useful to others that are considering an experimental approach to their field research. The data set is good and analyzed appropriately and the conclusions are justified. However, there are several areas where the paper could be improved for readers in terms of its clarity.

      1) It wasn't until the Discussion that it became clear to me that the actual physiological and personality traits of dispersers were being linked with innovation. From the Title, Abstract, and Introduction, it seemed as though the focus was on dispersing males bringing their experience with a novel food to a new group to pass it on. I think it needs to be made clear much earlier in the manuscript that the authors are investigating not only the transmission of behavioural adaptation but also how the traits of dispersers might may make them more likely to innovate.

      We have now addressed this above.

      2) Early in the paper on line 28, the authors state that continued initiation of muzzle contacts by adult females could have been an effort to seek social information. This is true but another interpretation is that females were imparting or giving social information. It seems important here and elsewhere (lines 322-323) to consider and report the target of these initiations. If these were directed at more knowledgeable individuals, it supports the idea that this was social information seeking. If muzzle contacts were directed to younger or unknowledgeable individuals, it would imply a form of teaching, which is possible but perhaps unlikely, so I think the authors need to be totally clear here.

      We thank the reviewer for pointing this out We looked into our data and now present figure 4D, showing that almost all knowledgeable adult females’ muzzle contacts were targeted towards knowledgeable adult males and talk about it in the discussion (lines 499-500).

      3) The argument made on lines 344-350 needs more fleshing out to be convincing or it should be deleted. The link between number of dispersers, social organization, and large geographic range seems a little muddled. There are many dispersing individuals in species that are not typically in large multi-male, multi-female social organizations. Indeed, in many species both sexes disperse. Think of pair living birds where both sexes disperse and geographic range can be enormous. There are also no data or references presented here to show that species in multi-male, multi-female social organizations do have larger geographic ranges than those that are not in these social organizations. It seems to me that, even if this is the case, niche is more important than social organization, for instance not being dependent on forests to constrain much of your range.

      We have removed this section

    1. Author Response

      Reviewer #2 (Public Review):

      The authors clearly show that the regulation of Ia afferent input is altered during voluntary movements in individuals with chronic, incomplete spinal cord injury. This is informative as afferent stimulation (epidural or transcutaneous) is a principal research strategy to enable voluntary movement in individuals with chronic spinal cord injury. A subset of enrolled individuals was tested with adjusted stimulus intensities to match intact individuals' responses at rest; however, the criteria for the selection of this group of subjects were not described.

      We clarified that the subgroup of subjects tested with adjusted stimulus intensities was the group of individuals that was able to return for this additional testing. This information was added to the manuscript.

      The correlation graphs clearly show two disparate population responses, so it is not clear that there is a strong correlation between inhibition or facilitation of the H-reflex independent of spinal cord injury. As the adjusted stimulus responses showed the function of the circuit in injured individuals, why were those measures not used in the correlation analysis?

      The reviewer raises a good point. Please see our response to point #5 from reviewer #1. We did not observed a correlation between changes in H-reflex size and the D1 inhibition and FN facilitation in the adjusted responses. This could be related, at least in part, to the smaller number of participants tested in the adjusted condition. This information was added to the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This is an extremely well-done study, revealing a fascinating phenotype of mes-4 mutant, which they show upregulates X-linked genes, leading to PGC death. These X-linked genes are mostly oogenesis genes, upregulation of which likely impedes normal proliferation of PGCs. The results are very concrete and supports their conclusion, and contribute significantly to the field. I do not have any major concerns except for a couple of conceptual issues. First, the title 'germline immortality' does not seem to be well aligned with the results. It is not wrong that PGCs die in mes-4 mutant, and thus the germline is 'mortal': however, the term 'germline immortality' implies multi-generational passages of germline, and the data in the present study, where mutant PGCs just die in the offspring, do not necessarily point to 'germline immortality' per se. So, I suggest to change the title to reflect the contents of the paper better.

      Good point. We changed germline immortality to germline survival and/or development throughout the paper.

      Second, although the authors speculate (in the discussion) why X activation is toxic to germ cells (discussing that upregulated X-linked genes are oogenesis genes, whose precocious activation is toxic to PGCs), there is not sufficient discussion as to why the effect is mostly limited to X chromosome, and why mes-4 is specifically involved in this. Is it because all oogenesis genes are concentrated on X chromosome? (likely not). Are autosomal genes that are upregulated in mes-4 mutant also oogenesis genes? Is this related to dosage compensation? I would like to see fuller discussions as to why X chromosome requires special regulation, also discussing the role of mes-4 in this context. I understand that the authors might have refrained from expanding discussions on matters that do not have any data, but without this discussion, I feel that many readers will be left wondering 'why?'.

      As noted in Point #5 above, we added to Discussion whether up-regulation of X genes in mes-4 mutant PGCs and EGCs reflects a defect in dosage compensation or a defect in keeping the oogenesis program (which is enriched for X-linked genes) quiet in the nascent germline (see lines 604-630). Based on new analyses showing up-regulation of oogenesis genes (on the X and autosomes) in mes-4 and PRC2 nascent germlines and the points in Discussion, we favor the view that the essential function of MES-4 and PRC2 is to repress X-linked oogenesis genes in PGCs and EGCs (see Figures 6 and 7, associated figure supplements, and lines 389-417).

      Reviewer #2 (Public Review):

      This manuscript makes substantial progress in resolving a long-standing mystery regarding the precise role of the histone methyltransferase MES-4 in promoting germline development. MES-4 maintains the histone modification H3K36me3 and germ cell survival, but prior evidence was unable to distinguish among several possibilities for target pathways. This paper utilizes a transcriptional profiling approach at the critical time of germline development to definitively demonstrate that the essential function of MES-4 is to repress X gene expression in germ cells. This result is surprising because X repression is an indirect effect of MES-4 activity (MES-4 does not localize to the X), while the direct effect of maintaining germline gene expression is not essential. To buttress this finding, the authors also utilize a series of elegant genetic experiments to independently test whether expression from the X is sufficient to cause germ cell degeneration. They then go further to identify a single X-linked target, lin-15b, as a primary contributor to the inappropriate X-linked gene expression in mes-4 mutants, by showing that loss of lin-15b activity rescues both the germline degeneration and X mis-expression of mes-4 mutants. Finally, the authors demonstrate that PRC2, the H3K27me3 histone methyltransferase and MRG-1, a candidate H3K36me3 effector protein, are also involved in promoting X silencing through lin-15b.

      The manuscript's strengths lie in the development or application of novel techniques, including the profiling of individual pairs of PGCs (a non-trivial advancement), as well as some very well-designed and conceptually innovative genetic assays. These were used to address specific and important gaps in knowledge regarding the phenotype of mes-4, which had been elusive despite having been studied for almost 30 years. Although specific to C. elegans in some ways, the findings are clearly relevant to conserved regulatory events, such as epigenetic memory mechanisms and establishment of opposing chromatin states. Thus, this work provides a substantial advance in the field overall.

      One limitation of this study is the lack of clarity about the conclusions regarding the relationship between the two H3K36me3 histone methyltransferases mes-4 and met-1, and between X vs autosomal gene expression. The authors do not precisely state what genes (X or A) are affected in the met-1 and mes-4 mutants. Ultimately, this confusion muddles the final message of X chromosome upregulation being the critical contributor to the mes-4 germline degeneration phenotype. The experiment presented in figure 3B indicates that loss of mes-4 or met-1 is sufficient to prevent germline development even when the Xs are repressed, indicating that failure to activate autosomal gene expression is also an underlying cause of the degeneration. Perhaps this cannot be definitively concluded without directly assessing met-1 and met-1;mes-4 mutant PGCs (or EGCs) for gene expression changes. If technically possible, this would be a very valuable experiment to directly examine autosomal gene expression changes in the double mutant.

      We profiled met-1 PGCs and observed very few mis-regulated genes (Figure 7 – supplemental figure 1). We tried to profile met-1; mes-4 double mutant PGCs, which completely lack both MET-1 and MES-4 and inherit chromosomes that lack H3K36me3. That was not feasible, due to the high level of embryonic lethality and rapid deterioration of PGCs dissected from met-1; mes-4 double mutant larvae. Notably, this demonstrates that germlines that lack both maternal K36me3 HMTs are sicker than those that lack just 1 of the HMTs. The high degree of embryo lethality suggests an essential function for MET-1 and MES-4 in the soma. As requested, we generated and included a list of X and autosomal genes mis-regulated in met-1, mes-4, and other mutant PGCs (see Figure 7—figure supplement 1).

      The sterility of hermaphrodites with a met-1; mes-4 mutant XspXsp germline and lacking either maternal MES-4 or maternal MET-1 may be due to mis-regulation of autosomal genes, or it may reflect that the X chromosomes are not repressed in met-1; mes-4 XspXp germlines that lack H3K36me3. To test that, we would need to profile those XspXsp PGCs. It is not feasible to identify mutant F1 larvae with Xsp/Xsp PGCs immediately after hatching, which is required for transcript profiling. We think that the main message from analyzing met-1; mes-4 mutant XspXsp germlines -- that inherited H3K36me3 marking is not critical for germline development but re-establishment of marking is important and requires both enzymes – does not require our delving into the cause of sterility of mutant XspXsp germlines lacking MET-1.

    1. Author Response

      Reviewer 1

      This is an interesting manuscript that has the potential to answer questions about a controversial topic in evolutionary biology - the evolutionary patterns and drivers of hand preferences in humans and nonhuman primates. To accomplish this, the authors generate new data and gather an impressive amount of published data across many anthropoid species, and test for the effects of ecology (terrestrial vs. arboreal), brain size, and tool use on handedness using phylogenetically informed statistical analyses. They find that humans represent an extreme among the species sampled, that direction of handedness was not correlated with any of the predictors tested, and that strength of handedness was higher among arboreal species.

      Although phylogenetic modeling (which accounts for relatedness between species) is implemented in the primary analyses reported in this paper (e.g., testing the effects of ecology, brain size, and tool use on handedness), this is not the case for some other analyses (e.g., testing the effects of sex, age, and subgroup on handedness). This represents one potential area of improvement.

      Overall, the manuscript is very well-written and the new data gathered is impressive. This work is critical for our understanding of the role of handedness in primate evolution.

      Thank you for your positive feedback on our manuscript. We appreciate your in-depth review and answer the queries point by point below.

      Reviewer 2

      The present paper presents an impressive meta-analyze on handedness for a bimanual coordinated tube task (which has been considered in the literature as a robust and reliable task to assess hand preference in nonhuman primates) in about 38 primate species including new sets of data collected by the authors themselves. The work that has been done to compile exhaustively all the available data is considerable and very valuable. The authors presented also, in the introduction, a very nice and useful review and clarification of the different existing evolutionary theories that have previously been proposed in the literature to try to interpret the discrepancy of findings reported across primate species. For instance, different hypotheses are contrasted such as (1) the one highlighting the role of the tool use emergence, (2) the one highlighting the role of brain size, and (3) the postural origin hypothesis (i.e., the predominance of right-handedness evolved with the emergence of anthropoid primates, regardless of ecology), or (4) the "novel (corrected) postural origin hypothesis" that I have been proposed with coauthors (i.e., the predominance of right-handedness for bimanual actions is related to terrestrial ecology whereas predominance of left-handedness is related to arboreal ecology, regardless of phylogeny). Such an exhaustive review of handedness data in bimanual tasks across the largest comparative approach ever done allows the authors to evaluate those several hypotheses by testing the effect of their related factors (phylogeny, ecology, emergence of tool use, brain size) on the pattern of handedness. Using quantitative phylogenetic methods, the authors found that none of those factors are actually predictive of the direction of population-level handedness in non-human primates questioning seriously each of those existing hypotheses.

      Thank you very much for the positive evaluation of our manuscript and the additional statistical work behind it!

      I believe this large review and study is very important and relevant for investigating the evolution of handedness, although I questioned the strong claim (supported by the lake of findings resulting from the quantitative phylogenetic methods) that the dichotomy of arboreal versus terrestrial lifestyle has nothing to do with the direction of population-level handedness in a non-human primate. A significant difference in direction of handedness between these two lifestyles seems still robust when considering clade-level (not species-level), an effect driven by overrepresented species for which high sample sizes have been included. The question of sample size and statistical power for evaluating and inferring population-level of handedness is thus a potential critical factor that should be discussed for evaluating different evolutionary theories. It might be indeed not excluded that the lake of results at the species level is equivocal given the lack of statistical power in most species (related to a poor sample size of subjects).

      Nevertheless, I congratulate the authors for this amazing and considerable work. I had such a pleasure reading it and hope my comments and questions were useful.

      Thank you very much.

    1. Author Response

      Reviewer 2

      Mouse olfactory neurons express one single type of odorant receptor (OR) out of ~1000 possible choices, and the neurons expressing the same type of OR project their axons to two or a few glomeruli in the olfactory bulb (OB). The goal of this work was to identify glomeruli that are activated by the lowest concentration of one given odorant (this would be the primary odorant for the glomerulus). A panel of 185 odorants that cover a wide range of chemical structures was designed for this purpose. The authors imaged the dorsal regions of the 8 OBs from 4 transgenic mice that express the Ca2+ reporter GCaMP6 in the mature olfactory neurons while they were exposed to the odorants delivered in vapor phase. In this way, the authors were able to identify glomeruli that were responsive to odorants at very low concentrations (estimated to be in the picomolar to nanomolar range). They also show that while the spatial representation of odorant chemicals in the bulb is sparse, rather than clearly delimited (except for that of amines and carboxylic acids), glomeruli recognizing structurally related odorants are co-tuned.

      The experiments are well executed and the images of the activated glomeruli in the OBs are impressive. These results show that olfactory neurons (and their cognate ORs) can be high affinity and selective receptors. These qualities cannot be easily detected when using conventional heterologous expression experiments or ex vivo assays, where responses are usually observed in the range of micromolar concentration of the odorants. The results reveal important aspects of odorant decoding in living mice and suggest that odorant concentrations that are effectively processed by the olfactory system are much lower than the ones usually considered. This high-resolution approach also facilitates the analysis of how odorant chemical structure is spatially represented in the OB.

      There are a few points that the authors might want to consider:

      Although it is assumed that each one of the glomeruli represents one OR type, the exact identity of the ORs that correlate with each of the 26 glomeruli remains unknown. Could the authors identify which ORs correspond to the 26 glomeruli based on the glomerular OB map determined by spatial transcriptomics (for example in https://www.nature.com/articles/s41593-022-01030-8) and on the position coordinates of the 26 glomeruli shown in Table S2? It would be nice to see whether the ORs sequences cluster in a way that correlates with co-tuning of the responses to structurally related odorants.

      We agree that it would be nice to relate the functionally-identified glomeruli to known ORs. Unfortunately this is difficult with current resources. The two recent glomerular maps of OR identity that are derived from spatial transcriptomics (the Wang et al., paper to which the reviewer refers, as well as Zhu et al (https://www.biorxiv.org/content/10.1101/2021.09.13.460128v1), on which we are contributing authors) do not provide sufficient precision in glomerular location to match to functionally-identified glomeruli solely on the basis of position. More fundamentally, the spatial ‘jitter’ in glomerular position from animal to animal, coupled with the interspersed nature of glomeruli with different odorant tuning properties, likely make it impossible to align functional maps to spatial transcriptomic maps derived in separate animals at the level of single ORs/glomeruli; though this approach could narrow the field of candidate ORs to a relatively small number (i.e., 10 – 20 ORs). We have added text to this effect in the Discussion (lines 574 - 82).

      We do think that the use of functionally-identified glomeruli could be paired with tagging of candidate ORs as a means of further deorphanization and in vivo characterization of OR response properties. To emphasize this point, we have added text pointing out that several of our identified glomeruli match well with position and ‘best’ odorants for a few ORs that have been previously mapped to dorsal glomeruli – namely, M72 (Olfr160), MOR204-34 (Olfr510; see Oka et al., 2006), and Olfr1377 (from the recent Zhu et al. paper). This text is in lines 253 - 259.

      Despite the fact that each OR has two glomeruli per bulb (one lateral and one medial), for most of the odorants, only one activated glomerulus per bulb was observed (ex. Figures 1 and 2). Is the other one always out of the field of vision (dorsal surface of OB), or is it not activated? This should be explained in the text.

      We thank the reviewer for raising this point. In general, only one glomerulus of the pair of OR-cognate glomeruli is visible on the dorsal surface – with the exception of the TAAR glomeruli, in which both the medial and lateral glomeruli are often dorsal, as shown by earlier studies. Consistent with this, we did observe paired glomeruli selectively activated by certain amines (other amines appeared to activate multiple TAARs and so evoked multi-glomerular maps). We agree that it is helpful to report this, and have added a supplementary Figure (Figure 2 –figure supplement 2) showing putative paired TAAR glomeruli; the Figure also shows that the ‘medial’ and ‘lateral’ glomerulus of each pair have near-identical response spectra, consistent with their being linked to the same TAAR. We have also added text addressing these points in the Results (lines 260-267).

      A global analysis summarizing how the results could be extrapolated to the whole OB would be helpful and informative. For example, what is the total number of glomeruli in the mouse OB? What percentage of these were accessible for imaging in the experiments (1004 per bulb)? Primary odorants were identified for 26 glomeruli in the accessible region (dorsal OB), but according to Figure S3C, 288 glomeruli responded to only one odorant at low concentrations.

      We agree that it would be helpful to report such an estimated extrapolation of results to the remainder of the OB. As for an estimation of the fraction of glomeruli/ORs/TAARs visible in our imaging experiments, using the Zhu et al. paper as a reference, in which ORs/TAARs were directly measured from explants of the approximate imaging surface, we arrive at a low-end estimate of ~150 glomeruli (the Zhu et al. study detected 121 ORs and 9 TAARs from the functional-imaging area; assuming that all 15 TAARs project dorsally and that each of the paired TAAR glomeruli are visible on the dorsal surface, we arrive at ~120 OR-glomeruli + (15 x 2 TAAR glomeruli) = 150 glomeruli). One might reasonably increase this estimate by 10% to account for a failure to detect certain low-abundance ORs by Zhu et al. The number of odorant-responsive glomeruli across our 8 OBs ranged from 103 – 142 per OB (median, 126), suggesting that our odorant panel is able to probe a large majority of the dorsal-projecting OR/TAAR repertoire (75 – 90%), and to functionally identify approximately 15% of visible glomeruli. We have added these estimates to the Text (lines 117 - 122).

      To clarify about the other numbers mentioned by the reviewer, 1004 refers to the total number of imaged glomeruli across 8 OBs, as specified in the main Text and the Figure S3 legend. Likewise, 288 refers to the total number of glomeruli responding to only a single odorant across all 8 OBs.

      Would be good to summarize briefly in the text (Page 7 line 192), which were the stringent criteria used to select the glomeruli/diagnostic odorant pairs, even though it is in the methods. It would make it easier for the reader, and would also make clear why only 26 glomeruli out of the 288 were selected as good glomerulus/diagnostic odorant pairs. How many of the 185 odorants are diagnostic odorants for the imaged glomeruli? How many odorants are not diagnostic odorants for the imaged region, and could therefore be likely to act so for the glomeruli in the regions that were not accessible? And so on.

      We have clarified these criteria for choosing the 26 identified glomeruli, explicitly describing them in the main section of the Results (lines 225 - 232), and also clarified the reporting of the numbers of odorants that serve as diagnostic odorants using these criteria (41 odorants) (lines 238-239).

      The authors find that the glomerular sensitivities to different odorant structure classes are not clearly spatially discrete, but are overlapping and interdigitated. Are they temporally discrete instead? Could this question be addressed?

      Unfortunately the relatively slow kinetics of the GCaMP6s reporter is poorly suited to discern temporal differences in responses across glomeruli. However we agree that attempting to do so with faster reporters would be very interesting, especially since much earlier work from this laboratory has noted marked differences in response dynamics as a function of glomerular location and odorant identity; we have mentioned this as a possibility in new text in the Discussion (line 566-567).

    1. Author Response

      Reviewer #1 (Public Review):

      1) The quality of many data and some experimental should be improved. Specifically, most experiments used the overexpression approach. Genetic approaches would need to be employed, particularly in embryos.

      We have improved the experimental data as suggested. Specifically, we have added more overview pictures and have selected more representative images. In addition, we have added a new set of experiments using the paracrine colony formation assays to support the requirement for Wnt3 cytonemes in proliferation and growth as well as Flot2 function in Fig.2 and 4.

      As suggested, we have added various loss-of-function approaches. For example, we show the effect of a dominant-negative Flot2 construct and a siRNA-mediated knockdown of Flot2. We further added an F0 Crispant approach in the zebrafish embryo explained in the following: The Flotillin genes have undergone teleost-specific genome duplication in the zebrafish. Therefore, there are four Flotillin genes present in zebrafish: namely Flot1a,1b and 2a,2b (von Phillipsborn et al., 2005). The Flotillins prominently expressed during zebrafish blastula, and gastrula stages are Flot1a and Flot2b. Therefore, we have designed the appropriate gRNAs to knock out Flot1b and Flot2a. In detail, we used CRISPR-mediated knock-out of Flot1b and 2a by microinjections of a combination of 3 different gRNAs per gene and Cas9 to block Flot1b/2a function. We have used this approach regularly to generate F0 Crispants for individual genes or in combination (Winter et al., 2022). Our experiments showed a significant reduction of Wnt8a cytonemes in zebrafish gastrulation after KO of Flot1b/2a. We have included this new data set in Fig. 6.

      2) The dominant-negative Flot2 is the key tool utilized in the paper, but it is unclear whether this construct has been characterized in the system used and how it affects endogenous protein function. Has its impact on the endogenous Flot2 been examined?

      The construct has been characterised by Neumann-Giesen et al., 2004. We have further characterised this construct and added a confocal analysis (Supp. Fig. 1E) showing the effect of DN-Flot2-GFP expression on endogenous Flot2 localisation, assessed by IF. When comparing to surrounding untransfected cells, we show that the Flot2 mutant construct causes mislocalisation and accumulation of WT Flot2 (and loss of punctate staining).

      3) Similarly, the effectiveness and specificity of siRNA for example, the expression level of Flot2 would need to be assessed in all experiments.

      Western Blot showed successful knockdown of Flot2 expression in Supp. Fig. 2D. Furthermore, we have added an image (Supp. Fig. 2E) of Lrp6 and Flot2 antibody staining after Flot2 siRNA treatment. Minimal Flot2 staining further indicates successful and specific knockdown of Flot2.

      4) Furthermore, it is unclear whether the tagged constructs (e.g., Flot2-GFP, Wnt8a-mcherry) have been characterized and whether the tags affect the protein function.

      All fluorescently tagged proteins used have been characterised in previous papers, and any constructs cloned by us have been assessed by their ability to induce reporter activation. See cited sources in the methods section.

      5) Most images show one single cell. Could more cells be presented? The nature of the images should be disclosed. For example, are those confocal images (single plane or Z-stack)?

      Based on this comment (and comments from the other reviewers), we have used more zoomed-out images to show more than one cell, specifically in Fig. 1, Fig. 3 and Suppl. Figures. All images shown are Z-stack images unless stated, and a statement has been added to materials and methods to reflect this.

      6) The P values for which group are not clear in many panels. It is not clear which groups were analyzed. For example, Fig. 4C, D, H and many other panels.

      We have added p values for all conditions in all graphs. Usually, the experimental values are compared to the control in that graph unless indicated otherwise with bars.

      7) Statistical analyses are lacking for some panels. For example, χ2 test is needed for many panels, including Fig. 3E, and many others.

      As the reviewer suggested, we added a Pearson’s χ2 test for Fig. 3E and the new Fig. 6K.

      8) Figure 1: 1) AGS is supposed to compare with control (HFE-145). These data are missing in the chart. The cell number in AGS is significantly higher than that in other cells (25 vs, 7 and 8, line 666), which can compromise the statistical analysis.

      We have added the quantification of HFE filopodia length has been added to Fig. 1B.

      9) 2) Qualification data are needed to support the statement in Line 71-72.

      We show quantification of the effects of the expression of IRSp534K on cell number and proliferation (BrdU assay) are shown in Fig. 4. The appropriate reference has been added to this in the text.

      10) 3) Fig1D: Wnt3-positive filopodia in AGS is double compared to that in HFE-145, which is not consistent with the image shown in Fig1C. 4) Fig1H-I: The red channel is overexposed. The authors should explain why a-myox and a-evi signals were detected outside the cell (or just the background)? The more appealing evidence should be the co-localization of Myox or Evi with wnt3a on the filopodia.

      We are grateful for this comment. To address these points, we selected images more representative of the quantitative data in Fig. 1C, D. We further improved the staining of Myosin-X and Evi to reduce the background. Zoomed panels are overexposed to highlight the localisation on cytonemes. Furthermore, co-staining of Evi and Wnt3, showing co-localisation on cytonemes, has also been added (Fig. 1J)

      11) 2. Figure 2 nicely showed the impact of paracrine Wnt signaling induced by producing cells. However, there are many issues with this experiment. 1) The reporter plasmids are transiently transfected, which inevitably leads to the expression at different expression levels . How could the authors compare the expression levels as a readout in different conditions if this is the case? Better and reliable methods should use stable cell lines. Thus, the authors should make a stable 7xTCF-NLS-mCherry stable line or co-transfect the cell with GFP to show the relative transfection level. This concern also applied to other figures using 7xTCF-NLS-mCherry reporter assay.

      The HFE-145 (TCF-mCherry) cells were transfected in separate dishes but were pooled together and counted before re-plating for co-cultivation. Therefore, the transfection efficiency should be consistent. For the AGS cells, we tried generating a stable cell line, but this was unsuccessful. Nevertheless, transfection efficiencies were consistently above 70%. To strengthen our line of arguments, we have added a new set of data to monitor the consequences of Wnt3 dissemination by cytonemes in a paracrine colony-forming assay. These data support our conclusion that Wnt3 is transported on cytonemes to neighbouring cells to regulate the survival and proliferation of gastric cancer stem cells.

      12) Thus, the mCherry positive cells in Fig2B, D and F cannot present all receiving cells, as the transfection rate should not be 100%. Also, did all experiments start with a similar cell number ? Thus, the Chart in D is not accurate, and the reason that assesses the number of receiving cells is not clear. It is not clear what "per image" means in D? Is the number correct (1 cell vs 1.5 cells) in D?

      The reviewer is correct that the mCherry-positive cells cannot represent 100% of cells due to transfection efficiency. But efficiency was consistently high (>70%), and all quantifications were shown in relative values. Furthermore, all experiments started with the same number of cells (counted for transfection and again for plating during co-cultivation). Therefore, we believe the comparison of the quantifications is valid.

      13) Additionally, is it possible to image Wnt3 is being transported to the receiving cells ?

      We have added an antibody staining against Wnt3 of STF-mCh-expressing HFE-145 cells after co-cultivation with AGS cells to show strong Wnt3 staining in the recipient cells, indicating transport (Supp. Fig. 1C).

      14) Fig2 C: Western blot could be added to show the mCherry expression level in each group.

      All experiments started with the same number of cells (counted for transfection and again for plating during co-cultivation). Furthermore, we believe measuring the fluorescence in the nuclei in the receiving cells gives an accurate measure of the reporter activity.

      15) It is better to include the red channel only in E. It is difficult to see the red signal in the current images.

      The BrdU stains are brightfield images and do not use any fluorescence. Therefore, there are no channels to split.

      16) How was the qualification conducted? Could the whole population be analyzed more quantitatively?

      Quantification was conducted by counting BrdU+ cells as a percentage of total cells, counterstained with haematoxylin, as outlined in the materials and methods section. We quantified 10 separate locations with roughly 50 cells per repeat. Based on these measurements, a representation of the population was displayed.

      17) Figure 4: 1) The critical data should be that the formation of wnt3a cytoneme (length, number) is impaired in Flot2-deficient cells, which are missing in the figure and the manuscript

      We agree that the impact of Flot2 siRNA on Wnt3 should be assessed; however, because Flot2 KD reduces filopodia number and length, it would not be reliable or accurate to compare the percentage of Wnt3-positive cytonemes to controls, as this would naturally record a reduction regardless of direct effects on Wnt3. Therefore, we performed antibody staining against Wnt3 after Flot2 KD (and in DN-Flot2-expressing cells) to assess its localisation (Supp. Fig. 2). However, quantifying these observations is difficult due to the abovementioned points.

      18) 2) A-D: The expression of Flot2 should be presented in separated images. The membrane localization is not clear. Fig4D shows flot2 occasionally localized with Wnt3. Time-lapse experiments will provide additional evidence of the constant localization of Flot and Wnt3

      For clarification, we have added time-lapse images of Flot2-GFP/Wnt3-mCh expressing cells (Supp. Fig. 2G), where Flot2 and Wnt3 can be seen travelling together intracellularly.

      19) E: This panel has similar issues described in Figure 2. How was the transfection rate in E? Did all cells express Flot2 or dnFlot2? Their expression should be examined at the same time.

      As highlighted in the previous comment, whilst we cannot guarantee all cells are expressing constructs, transfection efficiency was consistently high, >70% throughout experiments. Additionally, imaging of BrdU+ cells requires a brightfield camera that can capture colour. Microscopes which can capture fluorescent images only show brightfield in black and white, whereby BrdU staining cannot be distinguished from counterstains. Therefore, showing fluorescence of cells in these BrdU stains is not possible, but cells were checked for transfection efficiency before the staining process.

      20) 4. Figure 5 is one of the key figures. However, the quality of the images is not high enough to support the conclusion. A-D: The membrane co-localization is not convincing. Better images with a membrane marker are needed. Also, it is better to present images in separate channels. The red color A should be magenta

      We are sorry that the resolution of the images seemed to be reduced in the version uploaded. We have therefore improved the resolution of the images and added an analysis of AGS cells expressing Ror2-BFP, Flot2-GFP and membrane-mCherry, which shows the membrane localisation of Flot2 and Ror2 (Supp. Fig. 3a). As suggested by the reviewer, Fig. 5A has also been changed to magenta.

      21) Did the dominated-negative Flot2 affect the expression of endogenous Flot2? Similarly, the expression of endogenous Flot2 in siRNA expressing cells should be shown

      Yes, the expression of endogenous Flot2 is reduced in mislocalised as described in previous comments. In addition, we provide further evidence showing the consequences of Wnt3 dissemination by Flot2-dependent cytonemes in the paracrine colony-forming assay. These data support our conclusion that Flot2 is required for Wnt3 transported on cytonemes to neighbouring cells to regulate the survival and proliferation of gastric cancer stem cells.

      22) Instead of showing the image of single-cell, additional experiments, for example, the western blot should provide additional evidence to show Ror2 expression on the membrane is lost

      We respectfully disagree with the reviewer. Our data suggest that Ror2 expression is not reduced upon reduction of Flot2 function. Instead, we find that Ror2 localisation is altered. Therefore, a Western Blot analysis would not be able to show a shift in the localisation of the protein.

      23) 5E. The current images are too small to appreciate the co-localization. Similarly, separated channels should be presented

      As suggested, the Golgi/Ror2-mCh images have been increased to allow appreciation of the co-localisation. In addition, images for Rab5, Rab7 and LAMP1 have been moved to the Supp. Fig. 3, with separate channels shown (including for Golgi).

      24) How many experiments have been conducted? It seems that the cell number is not high

      As advised by the reviewer, we repeated this experiment to increase the n number for each group to 10.

      25) It is necessary to describe how E-Co-efficient (PCC) was determined in more detail

      A description of how the PCC was measured has been added to the material and methods section.

      26) F: The label for the X-axis is missing

      We have added the label for the X-axis.

      27) G: The nuclei and cell boundaries are not clear ; the markers for these should be included to give confidence where and how the quantification was conducted

      We have marked the nuclei as well as the adjacent cytoplasm with asterisks - to show the localisation in which the fluorescence of KTR-mCherry was measured.

      28) Similarly, the expression of Flot2 should be examined in these experiments as it is likely not all cells express those constructs at similar levels

      Flot2-GFP (and other constructs) expression was analysed, and only expressing cells were selected for quantifying the JNK reporter. We decided against showing these channels in the images as it made the visualisation of the JNK reporter difficult.

      29) Additional experiments, for example, Western bolt to show pJNK levels, are necessary to support the conclusion further

      The KTR-mCherry reporter is a valid reporter to quantify JNK activity. We are convinced that this reporter is a better tool for measuring pJNK levels. The reporter allows visualizing JNK activity in specific cells within minutes. This advantage has been demonstrated in numerous publications, including in our own (Brunt et al., 2021). Therefore, we believe the KTR-mCherry reporter is a fast and reliable tool to measure JNK activity in individual cells.

      30) . H: The P values for which group are not clear.

      All p values above the bars are compared to the untransfected control unless indicated otherwise with bars. p values for all groups have been added

      31) I-J: The mem-mCherry shows the protrusions but not the cytoneme because these did not show wnt3 labeling.

      Cytonemes are defined as cellular protrusions transporting signalling components (Kornberg et a., 2014; Zhang and Scholpp, 2019). Therefore, cytonemes, as well as by the presence of Wnt3, can be defined by the presence of other Wnt signalling components, such as Ror2 (Mattes et al., 2018). Therefore, we term the protrusions in this figure cytonemes due to the presence of Ror2.

      32) Figure 6: The experimental designs are problematic. 1) Is Flot2 expressed in zebrafish embryos at the stage analyzed? The results in panels A-B using the overexpression approach do not reflect the endogenous expression of Flot2

      A previous paper (Philipsborn et al., 2005) looking at flotillin expression in zebrafish embryos shows that Flotillins are expressed early in development (observed as early as 1 hpf), which continues throughout gastrulation. We have added the following reference to the text to highlight this: "we addressed Flot2 function during zebrafish development, where Flotillins are highly expressed in early developmental stages and can be visualised on the tips of cellular protrusions (von Philipsborn et al., 2005)".

      33) Overexpression of Flot2-GFP could cause unintentional consequences.

      We agree with the reviewer that Flot2 OE could have consequences. To test for these consequences, the microinjection of membrane-mCherry was used as a control to confirm that the observed phenotypes are specific to Flot2 function rather than a side effect on injection. Furthermore, we have added a new data set in which we generated F0 Crispants for Flot1b/2a. We could show that these cells display shorter and fewer Wnt8a cytonemes.

      34) Also, where were those cells that were imaged ?

      We have added a schematic drawing to indicate where the cells were imaged within the embryo (Fig. 6a).

      35) Could the authors show more cells? Images of separated channels should be shown. The cell in B seems to be round. Was the cell at the mitosis stage?

      We show more images from cells with the typical morphology of a zebrafish epiblast cell.

      36) The authors injected various DNAs to show the consequence of the expression. This method is very unreliable, as injection of DNA likely leads to mosaic expression of the proteins at different expression levels thus, the expression levels are very hard to be controlled. Has the expression of various constructs been compared in different conditions? RNA injection experiments are recommended, as these usually lead to uniform and reliable protein expression

      We have used mRNA expression as well as DNA expression in our previous works. We found that DNA expression has several advantages for analysing small cell clones. After DNA injection, only a few cells express the construct at sufficient levels allowing good imaging. Therefore, we used a Flot2-GFP pCS2+ construct to generate a sparse, mosaic expression. In these experiments, we find a reliable, stable expression of Flot2-GFP in a subset of cells, which is very important for imaging.

      37) 4) Did overexpression of Flot2 or Wnt8 cause severe developmental defects? Were those embryos healthy? Could the authors show live images of group embryos? The authors need to explain the "0" values in some columns (+wnt8a, flot2/wnt8s) in G. Did these results indicate those embryos did not express pax6a at all?

      Indeed, overexpression of Flot2 and Wnt8a led to an arrest in development in development with obvious malformation of the body axis and frequent oedema formation. Therefore, we focussed our analysis on gastrulation and stopped at 24hpf to visualize a potential effect on AP patterning. "0" values represent embryos where the forebrain is entirely missing (Pax6a expression could only be seen in the hindbrain). This has previously been observed (Brunt et al., 2022). We have added the following sentence to clarify this: " In some cases, the FB was completely lacking (no Pax6a expression), which was recorded as "0"."

      Reviewer #2 (Public Review):

      1) The case for Flot2 being a modulator of Wnt cytonemes is made, but its characterization as a specific regulator of Wnt cytonemes is over-stated given the data provided. To make the case for Wnt specificity, the authors need to show that Flot2 modulation does not impact signaling filopodia housing other signaling molecules.

      We agree with this reviewer and have changed the term "regulator" to "modulator". Furthermore, we have added a short paragraph to the discussion highlighting that Flot2 can also promote Hh cytonemes, and thus Flot2 could modulate cytonemes in general. We further changed the text accordingly to describe "specificity" as having a specific role related to morphogen receptors.

      2) For some experiments, cell viability appears to be a complicating factor. The cells in which IRSp53 function is targeted look very unhealthy , so it is not clear reliable results can be obtained using the experimental parameters described.

      We agree with the reviewer. Therefore, we have revisited this data set. We find that IRSp534K-GFP expressing cells were healthy in general. Consequently, we have replaced the previous image with a cell with a morphology seen in most cells (Suppl. Fig. 1).

      Reviewer #3 (Public Review):

      1) Whether Flot2 manipulation specifically affects Wnts on cytonemes, or it could have a more general effect should also be considered

      We agree with this reviewer and have addressed this point in our response to reviewer 2. For example, we have changed the term "regulator" to "modulator" in the manuscript. Furthermore, we have added a short paragraph to the discussion highlighting that Flot2 can also promote Hh cytonemes, and thus Flot2 could modulate cytonemes in general. We further changed the text to describe "specificity" as having a specific role related to morphogen receptors.

      2) Statistical analysis should be done more consistently. It is either missing for some samples or the comparison between samples is not given

      p values have been added to all groups and some additional comparisons have been added

      3) The part of the manuscript related to the analysis of Ror2 and Flot2 in cytoneme formation and PCP pathway could be better connected with the central theme of the work/title, which is mainly on canonical signalling by Wnt3. Perhaps, directly analyzing the effect of Ror2 manipulation on Wnt3 levels on the cytoneme could be useful.

      Our experimental data suggest that the Ror2 function is required for Wnt3 cytoneme formation. We further show that Flot2 is needed to transport Ror2 to the plasma membrane to promote cytoneme formation. Therefore, we believe that we have addressed the sequence of events in this manuscript by showing the requirement for Flot2 in cytoneme formation and providing a possible molecular mechanism.

    1. Author Response

      Reviewer #1 (Public Review):

      Viola et. al. compared the electron transfer efficiency of two types of oxygenic far-red photosystem II (PSII) with the "conventional" PSII and analyzed how these far-red PSII use the limited energy from infrared photons to proceed photosynthesis. Oxygenic photosynthesis is an energy-intensive process, and a large headroom is also needed for preventing harmful back-reactions from occurring, which can produce singlet oxygen. This research investigated how the far-rad PSII managed to do their work with limited energy.

      The authors measured and compared the forward reactions of different kinds of PSII (Chl-a-PSII, Chl-d-PSII and Chl-f-PSII), including the flash-induced chlorophyll fluorescence decay and S-states turnover. These results led to a conclusion that the forward reaction quantum efficiency was not changed between "conventional" PSII and far-red PSII. However, the back-reactions of three types of PSII are different based on the measurements of the prompt fluorescence decay, delayed luminescence decay, and thermoluminescence band locations. The authors concluded that the two far-red PSII (Chl-d-PSII and Chl-f-PSII) have a different strategy for utilizing infrared light. Indeed, the authors showed that Chl-d-PSII containing cyanobacteria produced more singlet oxygen than other types, and this result was explained by the energy profile in the electron transfer chain.

      The major strength of this research is the authors made a direct comparison of different far-red PSII under the same conditions. It's exciting to have a side-by-side comparison between two types of far-red PSII. In addition, the authors also measured the singlet oxygen produced from all types of PSII which clearly showed the differences in the routes of recombination.

      We thank the reviewer for the interest demonstrated in our work and for the thoughtful comments, that we have addressed below.

      However, there are some concerns:

      1) The flash-induced fluorescence decay, thermoluminescence, delayed luminescence and S-states turnovers of the Chl-d-PSII and Chl-f-PSII have been characterized before (ref 5, 26, 39), but from intact cells compared to isolated membranes in this study, and similar conclusions have been achieved. The authors mentioned four reasons (lines 115-120, see the manuscript for the authors' arguments "i." to "iv.") why it's important to use isolated membranes. However, in my opinion, these reasons are not sufficiently strengthened:

      i. The transmembrane potentials from cells can be collapsed by adding uncouplers;

      ii. The authors mentioned the quinone pool in the cells is uncontrollable, but the authors didn't actually measure or manipulate the quinone pool in the membrane (e.g., the ratio of QB/QB-/empty-pocket in the samples);

      iii. The phycobilisomes can be controlled by different conditions through state transitions;

      iv. The isolation of membranes may not remove membrane-related quenching mechanisms (e.g., PSII quenching in State II, spillover, etc.).

      We do not agree with the reviewer on this point. We consider the use of membranes (or isolated PSII) as being the best solution to limit the effects listed at the end of the Introduction and to provide consistency between the different measurements, some of which cannot be performed in intact cells (i.e., the UV absorption measurements). More specifically:

      i) The effectiveness of uncouplers in dissipating the membrane potential is likely to vary between species (e.g., Chroococcidiopsis cells form aggregates incapsulated by a protective layer of excreted polymers) and should be assessed by directly measuring the membrane potential. ElectroChromic Shift-based measurements of the membrane potential in cyanobacteria have only been demonstrated in Synechocystis sp. PCC6803 and Synechococcus elongatus sp. PCC7942 (Viola et al. 2019, https://doi.org/10.1073/pnas.1913099116) and still need to be adapted to the far-red species used here. Additionally, commonly used uncouplers such as CCCP and FCCP are ADRY reagents, that interfere with PSII water splitting by directly reducing TyrZ (Ghanotakis et al. 1982, https://doi.org/10.1016/0005-2728(82)90115-3), and would affect all the measurements presented in this work.

      ii) In the dark, the redox state of the PQ pool in cyanobacterial cells has been observed to be kept in a highly reduced state by respiration, with potential consequences on the QB/QB- ratio. This could well vary between species, based on their different physiologies and growth conditions. In isolated cyanobacterial membranes and PSII, the QB/QB- ratio is expected to be around 50% after a short dark adaptation. This seems to be the case in our samples, based on the flash-dependent oscillations of the S2QB- and S3QB- thermoluminescence shown in Appendix 2 compared to the literature (Rutherford et al. 1982, https://doi.org/10.1016/0005-2728(82)90061-5), assuming an initial ~75% S1 population, as confirmed by the flash-dependent oxygen evolution and UV absorption. This is now mentioned in Appendix 2.

      iii) The control of state transitions requires specific illumination regimes incompatible with the conditions required for our experiments. Moreover, state transitions remain largely uncharacterised in the far-red species used in the present work. In some of these species, the situation is further complicated by the presence of both visible and far-red light-absorbing phycobilisomes that have a different spatial distribution in the cell (MacGregor-Chatwin et al. 2022, https://doi.org/10.1126/sciadv.abj4437).

      iv) Non-photochemical energy quenching in cyanobacteria seems to occur in phycobilisomes, due to the action of the Orange Carotenoid Protein (OCP). Both OCP and the phycobilisomes, if present in cyanobacterial cells (and that depends on the strains), are removed when membranes are isolated. It’s been proposed that direct quenching of the PSII core occurs in Synechococcus elongatus 7942 cells in state II (Choubeh et al. 2018, https://doi.org/10.1016/j.bbabio.2018.06.008), but since the mechanism has not been elucidated, no conclusion can be made on whether this could occur in membranes. The same is true for spill-over. Additionally, neither of the two mechanisms could be better controlled in cells than in membranes, so there would be no advantage here from working in vivo.

      In addition, the authors reached a conclusion that the Chl-f-PSII containing species should suffer from fluctuation light-induced membrane potential spikes, but don't actually measure this in physiologically relevant preparations. It will be more beneficial to use intact cells instead of an isolated membrane. I suggest the authors either restrict their conclusions to what the isolated membranes clearly show or make measurements in intact cells.

      The proposal that the far-red forms of PSII (both Chl-d-PSII and Chl-f-PSII) should suffer from increased charge recombination induced by spikes of membrane potential in fluctuating light is not new (see for example Nürnberg et al. 2018, https://doi.org/10.1126/science.aar8313), and is based on the observations made in plant PSII (Davis et al. 2016, https://doi.org/10.7554/eLife.16921) and assumed to be universal in oxygenic photosynthesis. In PSII, the transfer of electrons from the primary donor chlorophyll to QA occurs vectorially in the membrane, against the trans-membrane electric field, thanks to these electron transfer steps being exergonic. Spikes in the electric field due to sudden intensity fluctuations increase the probability of backward electron transfer. If the overall drop in the energy of the electron from the primary donor to QA is smaller (in a long wavelength PSII), it should result in a higher probability of backward transfer for a given trans-membrane electric field, and therefore a greater susceptibility to spikes in the electric field. We did not measure these effects and we do not claim to have done so. As already mentioned in the answer to point i) above, doing so would require the development of ElectroChromic Shift-based measurements of the membrane potential in the cyanobacterial species containing far-red photosystems. This is a separate research project beyond the scope of the present work.

      In conclusion, we believe that our statement justifying the use of isolated membranes at the end of the Introduction is valid.

      1. The authors measured the fluorescence decays as part of the evidence to show the stability of S2QA-. I have several concerns about these measurements:

      i. In figure 2B, the WL C. thermalis (blue) trace has a unique decay phase with a lifetime of about 0.2s, which the authors denoted as S2QA- recombination. Could the author elaborate on how this phase was assigned to this state?

      All decay kinetics in presence of DCMU are bi-phasic (with an additional faster phase in the WL and FR C. thermalis samples, attributed to a small fraction of centres where DCMU did not bind). In the manuscript we did originally assign both phases as arising from S2QA- recombination, but it is true that the middle phase, that is slightly faster in WL C. thermalis, is too fast to originate from that. This phase can rather be ascribed to TyrZ•(H+)QA- recombination occurring in a fraction of intact PSII centres before the full stabilization of charge separation, as shown in Debus et al. 2000 (https://doi.org/10.1021/bi992749w), or in centres lacking a Mn-cluster. We have now modified the paragraph regarding the fluorescence decay in presence of DCMU accordingly (L. 142-145): “The shorter lifetime (~0.22-1 s) of the middle decay phase (amplitude 15-20%) was compatible with it originating from TyrZ•(H+)QA- recombination occurring either in centres lacking an intact Mn-cluster (24) or in intact centres before charge separation is fully stabilised, as proposed in (23).”.

      A luminescence decay phase with a similar lifetime was initially ascribed, incorrectly, only to TyrZ•(H+)QA- recombination occurring in centres devoid of an intact Mn-cluster, in Appendix 5. This has now been rectified.

      ii. In figure S1 (the full version of 2B), all the fluorescence traces seem to rise at the end of the measurements. Could the authors check whether the measuring light intensity was actinic?

      This rise is significant only in the A. marina dataset (now Figure 2-figure supplement 1), and given the low signal to noise ratio in the last points of the fluorescence curve, we consider this small anomaly to be a measuring artefact. The rise is absent in the other traces in Figure 2- figure supplement 1 and in Figure 2B, except for the last point of the A. marina dataset in Fig. 2B. The corresponding Source data provided, shows that a rise in the last point of the measurements is only present in one of the three A. marina replicates (#2), while the non-decaying fluorescence is present in all A. marina samples and discussed in the text. Except for this last anomalous point, the decay curves of the A. marina replicate #2 do not differ significantly from the other two replicates. This clearly suggests an artefact, and is not consistent with the measuring light being actinic. A clarifying sentence has been added in the legend of Figure 2- figure supplement 1.

      iii. In figure S2, it seems to me that the fluorescence decay of Synechocystis + DCMU (Green open squares) was slower than the WL C. thermalis and is similar to the FRL C. thermalis in figure 2B. If the Synechocystis + DCMU is indeed similar to FR C. thermalis, would that be consistent with the authors' conclusions?

      When fitting the Synechocystis+DCMU fluorescence decay kinetics (in what is now Appendix 1-figure 1), we obtain two decay phases with, respectively: an amplitude of ~12% and lifetime of ~0.22 s, and an amplitude of ~81% and lifetime of ~7.9 s. These values are similar to those reported for WL C. thermalis in Table 1, with an overall fluorescence decay faster than in FR C. thermalis. Nonetheless, because of the limited number of Synechocystis biological replicates, we limit ourselves to a qualitative comparison. The luminescence decay kinetics are also faster in Synechocystis (as in WL C. thermalis) than in FR C. thermalis (now Figure 5- figure supplement 2).

      These data are consistent with our conclusions: the energy gap between QA- and Phe in Chl-f-PSII is at least as large as in Chl-a-PSII, or could even be larger, as suggested by the slower S2QA- recombination measured by fluorescence (Figure 2) and luminescence (Figure 3) decay.

      iv. It's known that DCMU will alter the redox potential of QA/QA- in plants. Would it have similar effects to the PSII studied in this research? If so, it will be meaningful to include these effects in the energy diagram in fig 7.

      Yes, we do expect DCMU to change the QA/QA- redox potential in our samples, as it does in plants and other cyanobacteria, although the actual effect in different PSII types would need to be measured. The energy gap values in now Figure 8 are only estimates based on literature values and on the relative changes reported here, they are not calculated from any of our data and do not specifically refer to the experimental conditions we used, including the use of DCMU. For this reason, we think that adding the effects of DCMU in the diagram would not be particularly useful and could be confusing.

      1. The authors didn't use WL C. thermalis for measuring oxygen evolution and the authors claimed that the PSII content in WL C. thermalis is too low. Is that a technical issue (e.g., cannot purify PSII enriched membranes) or a biological issue (i.e., white light condition produced less PSII)? In Fig S9C, the oxygen generated from WL C. thermalis is comparable to FR C. thermalis. Could the author explain how they reached the conclusion that PSII in WL C. thermalis was low? In addition, the author should also provide evidence showing that the samples of WL C. thermalis do not have significant PSII activity under far-red light.

      We did measure the flash dependence of oxygen evolution in WL C. thermalis membranes, and we did observe oscillations with visible flashes (but not with far-red flashes, as expected). However, the data were not good enough to be able to perform any significant analysis. Unfortunately, in the case of WL C. thermalis, we have not been able to isolate O2-evolving cores, as stated in L. 194-195. The WL C. thermalis data have now been added in Figure 3- figure supplement 1, together with the non-normalised traces of all other samples (following the suggestion by reviewer #3), and the text has been modified accordingly. The data in Figure 3- figure supplement 1 also provide evidence that the samples of WL C. thermalis do not have significant PSII activity under far-red light (although this was already clearly demonstrated in Nürnberg et al. 2018).

      We do have evidence that the PSII content per chlorophyll is lower in WL C. thermalis than in FR C. thermalis, based on fluorescence emission spectra, yield of isolated PSII and PSI from purification procedures, and O2 evolution per chlorophyll, as can be seen for example in Figure 3- figure supplement 1. The levels of PSII accumulation depend on the growth stage (among other factors) in model species such as Synechocystis. Since C. thermalis cells grow more slowly than other cyanobacteria species and their physiology has not been studied in detail yet, it is difficult to control the levels of PSII accumulation. This explains the inter-sample variability in the rates of O2 evolution per chlorophyll measured with the Clark electrode, that have now been added in Appendix 6-table 1.

      1. The authors used an indirect method, which used chemical trap histidine and oxygen consumption, for measuring the production of singlet oxygen from different types of PSII. I have several concerns about this approach.

      i. Why not use a probe that reacts directly with singlet oxygen probes like SOSG or EPR probes to unambiguously confirm the production of singlet oxygen? The difficulties of not using SOSG mentioned in Rehman et al (SI Ref#22) should be no longer problems when isolated membranes were used. The advantage would be a validation of the results and perhaps increased sensitivity.

      Although SOSG or EPR probes could also be used to detect singlet oxygen production, these other methods seem to be significantly less sensitive than histidine trapping. For example, Fufezan et al. 2007 (https://doi.org/10.1074/jbc.M610951200) used the EPR spin trap TEMPO and needed 30 minutes of illumination. Extended illumination (up to 1 hour) has also been used to detect singlet oxygen using SOGE (Flors et al 2006, https://doi.org/10.1093/jxb/erj181).With the histidine trapping method used here, less than 2 minutes of illumination were required to measure the singlet oxygen production rates. This allowed potential problems of prolonged illumination (e.g. a loss of intact PSII centres due to photodamage) to be minimised, and allowed us to confirm the results obtained in isolated membranes with those obtained in intact cells.

      As shown in now Figure 6- figure supplement 1E, the histidine-dependent oxygen consumption was suppressed by the singlet oxygen quencher sodium azide, as also shown in Rehman et al. 2013 (https://doi.org/10.1016/j.bbabio.2013.02.016). We also independently confirmed that the singlet oxygen generated by illumination of the dye Rose Bengal can be efficiently detected with the histidine trapping method and suppressed by the addition of sodium azide (Figure 6- figure supplement 1F). For these reasons, we are confident that what we measure with the histidine trapping method is singlet oxygen production.

      ii. In Rehman et al (SI Ref#22), wild-type Synechocystis cells showed significant production of singlet oxygen in the presence of DCMU and His (Figure 3A in SI Ref#22), however, the amount of singlet oxygen measured from the membranes in this study seemed to be less (Fig S10E). Could the authors provide some explanations?

      Fig. 3A in Rehman et al. showed that the production of singlet oxygen was about 10% with respect to the oxygen evolution activity in absence of additions (open squares). The light saturation curves in Fig. 4B of the same paper also show that at saturating light intensity the singlet oxygen production rate is about 10% compared to the O2 evolution rate. The traces we show in Figure 6-figure supplement 1 are only representative. The comparison should be made between the results in Rehman et al. and the averages of biological replicates that we show in Fig. 6 (membranes) and Appendix 6-figure 4A (cells). For WL and FR C. thermalis, we measure singlet oxygen production rates that are about 20% of the O2 evolution rates, slightly higher than those measured in Synechocystis in Rehman et al. Considering the variability between biological replicates, we consider our values in line with those in Rehman et al.

      iii. Can the presented results distinguish the production of singlet oxygen from recombination or other sources (e.g., antenna, free chlorophyll)? Some key controls are needed to strengthen the authors' claims.

      This is difficult to demonstrate unequivocally, but we have different lines of evidence that support the conclusion that the increase in singlet oxygen production in A. marina originates from differences in PSII charge recombination with respect to the other samples:

      i) The high levels of singlet oxygen production are observed in intact cells as well as in membranes. In neither of these samples do we expect to have significant amounts of damaged PSII or free chlorophyll, so these seem highly unlikely as the main sources of the singlet oxygen in our measurements. This is now stated more explicitly in L. 305 and Appendix 6.

      ii) According to the data in Appendix 6-figure 1B, singlet oxygen production in A. marina membranes shows a similar light saturation to that of maximal O2 evolution. This suggests that the singlet oxygen production we measure is related to PSII photochemistry. We have now stated this explicitly in L. 288-290.

      iii) Our thermoluminescence and delayed luminescence results indicate that in Chl-d-PSII the energy gap between Phe and QA is smaller than in Chl-a-PSII, as already suggested in the literature, and Chl-f-PSII. Therefore, this indicates more charge recombination going via repopulation of Phe- in Chl-d-PSII, with a consequent increase of singlet oxygen production.

      The antenna chlorophylls could form triplets under high light, by inter-system crossing, but in intact antennas the chlorophyll triplets are expected to be mostly quenched by nearby carotenoids (see https://www.jstor.org/stable/24030848 for a review on the subject). The generation of antenna triplet states in non-photoinhibitory conditions has been demonstrated in plant and algal thylakoids (Santabarbara et al 2002, 2007 doi: 10.1021/bi0201163, doi: 10.1016/j.bbabio.2006.10.007). Yet, these signals, which are attributed to a small population of damaged antennas, are small compared to those of triplets generated by charge recombination. Due to its apparently stochastic nature, the generation of antenna triplets by inter-system crossing is not expected to be significantly different between the different PSII complexes investigated in this study.

      On the other hand, it is generally recognised that in the PSII reaction centre, the carotenoid on the D1 side is not close enough to ChlD1 to directly quench its triplet state, when formed (see Telfer et al. 1994, https://doi.org/10.1016/S0021-9258(17)36825-4). The singlet oxygen produced in the reaction centre could disrupt the coupling between chlorophylls and carotenoids in the antenna, resulting in singlet oxygen production also from the antenna, in a cascade effect. This can happen with prolonged strong illumination (Fufezan et al. 2002, https://doi.org/10.1016/S0014-5793(02)03724-9).

      iv. I could not fully understand the singlet oxygen production experiments with tris-washed samples. In my opinion, the Mn-cluster depleted PSII should have accelerated charge recombination (100 ms between the YZ/QA, vs ~ 5 sec between the S2/QA), which should lead to an increase in singlet oxygen production. Correct me if I'm wrong about this, but if my reasoning is correct then how do the authors explain the discrepancy?

      Our rationale for performing the tris-washing experiment was indeed to see if this would lead to an increase in singlet oxygen production, thus implying that the high production in the A. marina samples could arise from a higher fraction of PSII centres without the Mn-cluster, as explained both in the main text and in Appendix 6. The fact that the treatment did not increase the singlet oxygen production suggests that this does not specifically arise from PSII lacking the Mn-cluster.

      The lack of singlet oxygen increase following tris-washing is not necessarily controversial, as the fact that TyrZ•QA- recombination is faster than S2QA- recombination does not necessarily imply that more of it occurs via backward electron transfer from QA- to Phe. The removal of the Mn-cluster could decrease the production of singlet oxygen by charge recombination, since it causes an increase in the redox potential of QA and, therefore, of the energy gap between Phe and QA, thus decreasing the probability of charge recombination going via the repopulation of Phe-. This is proposed to be a mechanism to protect PSII during photoactivation of the Mn-cluster (see Johnson et al 1995, https://doi.org/10.1016/0005-2728(95)00003-2).

      Our data show that the singlet oxygen production in A. marina is not specifically related to PSII lacking the Mn-cluster and are not in conflict with what is expected based on our knowledge of PSII energetics.

      v. The y-axes in Figure S10 should either contain "delta" (Δµmol O2 ml-1) or use the measured absolute oxygen concentration. I'd suggest the latter, since the reaction is oxygen consuming, it's good to show that all the samples started with similar amounts of dissolved oxygen. Low O2 levels could decrease 1O2 production, though this would be more of an issue with cells than membranes.

      The y-axis labels in the figures (now Figure 6-supplementary figure 1 and Appendix 6-figures 1D and E, 2, 3 and 4A) have been changed to Δµmol O2 ml-1. We prefer to show the traces after subtraction of the baseline recorded in the dark (now explicitly indicated in the corresponding figure legends) for a better visual comparison. All samples were left to equilibrate with air (stirred) before starting the measurements, so all started with similar levels of dissolved oxygen. This is especially important when measuring PSI-dependent oxygen consumption (Appendix 6-figure 3), because the addition of ascorbate and TMPD leads to a transient drop in oxygen concentration in the sample, which leads to artefacts in absence of the equilibration step. This information has been added to the corresponding Materials and Methods section (4.5). Additionally, when using Rose Bengal to generate singlet oxygen, the histidine-dependent oxygen consumption was about 10 times higher than in any of the measurements done with biological samples, and still we did not observe saturation of the signal in the illumination time used (added panel F in Figure 6- figure supplement 1). Therefore, we are confident that the singlet oxygen measurements in membranes and cells were not skewed by limiting oxygen concentrations in the measuring chamber.

      The y-axis labels of what is now Appendix 6-figure 1B and C have also been corrected (as ml-1 was used instead of h-1).

      Reviewer #3 (Public Review):

      In this manuscript, Viola and co-authors address the question of how far-red-light-adapted (FRL) Photosystem II (PSII) is able to bypass the "red limit", or the minimum photon energy/frequency for charge separation to proceed effectively. They attempt to do so primarily by measuring the consequence of failure to overcome the red limit: charge recombination. From this work they have concluded that FRL PSIIs are able to achieve similar efficiency of flash-induced water-oxidizing complex turnover to those adapted to standard visible light. However, they conclude that FRL PSII which uses chlorophyll-d is significantly more susceptible to charge recombination and singlet oxygen formation, leading to increased sensitivity to high-light conditions. FRL PSII which uses chlorophyll-f, however, is adapted to be more resistant to photodamage. These strategies are differentiated by the number and type of far-red chlorophyll used and tuning of redox potentials of cofactors in PSII.

      The methods employed are well-chosen to present complementary evidence to address the questions posed. The authors have supported themselves using polarography, fluorescence decay, absorption, luminescence and thermoluminescence, and spectrometry, all of which are employed in a manner well-established in the quantification of processes in standard PSII preparations. The results, however, have some loss of data such as total yields which would be useful in interpretation as the authors have chosen to extensively normalize data for ease of visual comparison of certain features.

      Overall, the authors have adequately achieved their aims and their conclusions are well-supported. The authors also clearly state their own expectations of the impact of their work at the end of the Discussion; thanks to these results, we can better understand the ecological niche of each type of FRL-PSII and how these significantly disparate systems may be used in future agricultural research and development.

      We thank the reviewer for the positive evaluation of our work.

      Following the reviewer’s suggestions, the total yields (on a chlorophyll basis) of the flash-dependent oxygen evolution have been provided in Figure 3- figure supplement 1. These include the flash-dependent oxygen evolution data measured in WL C. thermalis membranes, that were previously omitted because of the unsatisfactory quality, and are still omitted from Figure 3 (normalised data and fits) for the same reason. The S-state distributions calculated from the fits of the flash-dependent oxygen evolution have been added in Table 2.

      Additionally, the non-normalised oxygen evolution and consumption rates used for Figure 6A and Appendix 6-figure 4 are now provided in Appendix 6-table 1.

    1. Author Response

      Reviewer 2

      Weaknesses: While I applaud the use of a "simplified" task in rodents to disambiguate controversial questions traditionally addressed in human studies, I found that the behavioral data were underanalyzed and thus not strongly supporting the central claim of the manuscript. Below are my main comments:

      1. One of the goals of the authors was to study the neural mechanisms underlying "voluntary" movements. While they acknowledge (in the discussion) that they do not have evidence that actions are "intentional", they make the assumption that mice do "form the intent to act near the lever pull time". To back up this assumption, the authors should at least present some evidence that the action of interest (i.e., the rewarded lever-pull) is not just a random jerky movement that happens to be rewarded once in a while. In fact, mice seemed to pull the lever very frequently and impulsively (the majority of inter-pull intervals were way below 3 s in Supplementary Fig. 1.2) even for the last sessions of the training. Therefore, it is not readily apparent that mice apply any control to their lever-pull actions. Providing evidence that the action is goal-directed is important if the goal of the paper is to study neural signatures of the intention to act. A somewhat compelling analysis could be to compare rewarded lever-pulls with "spontaneous" movements, provided that these two types of movement can be convincingly characterized as goal-directed vs. incidental. In contrast, throughout the manuscript, the neural activity aligned to rewarded lever-pull events (which are assumed to be "voluntary" actions) is compared to the neural activity aligned to random times during the task (whether or not it involved movements), which may not be the most convincing control.

      a. We agree with the reviewer and have provided additional explanation and evidence for the learning component of our study.

      1. The learning trajectory of mice is also not well characterized (e.g. changes in inter-pull intervals are not quantified, nor the relative increase in rewarded actions across training sessions, etc.). Yet, several claims in the paper are directly based on the fact that mice have learned to pull the lever after 3 s interval to receive water rewards (which relates to point 1). In particular, one important assumption in the paper is that as mice learn, the lever-pull movements become more stereotyped, but this has not been shown explicitly. It would be helpful, for example, to see how analog traces of lever-pulling change throughout the learning stages and how the variance of the movement across trials decreases in late sessions.

      a. We agree and have provided additional analysis and figures.

      1. The central claim of the paper is that rewarded lever-pulls can be predicted from pre-movement neural activity several seconds (even up to 10 s) prior to the action. However, obvious motor confounds and other alternative explanations have not been convincingly ruled out. In fact, the action of lever pulling may require a series of complex movements (like changing posture, extending the forelimb, reaching the lever, grabbing the lever, etc.). The authors themselves mentioned that they found strong correlations between lever pulls and body movements in all mice, but the data is not used nor shown in the paper. The motor commands preceding but related to lever-pull could unfold at least a few hundreds of milliseconds prior to the detection of lever-pull in the task, and thus be reflected in the neural activity that is predictive of the lever pull. Moreover, if this series of movements is highly stereotyped, and in turn leads to stereotyped neural activity (like the slow oscillations observed before the lever-pulls), it could explain why the detection of lever pulling actions always occurs at a given phase of the neural oscillation. Such observations that stereotyped movements occur way before the lever-pull detection could partially rule out the fully "cognitive" explanation proposed in the paper, but would concur with recent findings that showed that ramping neural activity can be, for the most part, explained by movement-related activity (Musall et al., 2019).

      a. We agree with the reviewer and have added analysis panels showing cross correlations for behavior as well as additional panels showing there are no behavior initiation sequences in the data.

      1. Toward the end of the result section (Fig. 6), the authors briefly begin to address the issue about whether pre-movement activity can really be considered movement free. Here, "lockouts", i.e. periods where other movements (like licking, or previous lever-pulls) did not occur, were introduced in the analysis. The lockouts altered the earliest-decoding-time (EDT) of the lever-pull (in some mice EDT was even divided by half: from -4 s to -2 s). However, the effects of "micro-movements" like facial movements or changes in body posture may not be taken into account with the lockout approach. Such micro-movements have been shown to explain a large variance of the neural activity (see Stringer et al. 2019 and Musall et al. 2019). Therefore, to fully control for movement confounds, the effect of high dimensional/micro-movements extracted from video recordings should be removed from the neural activity. These analyses could yield a much shorter EDT (e.g., -0.15 s), more consistent with previous reports.

      a. We agree and have added additional discussion about sequences of behaviors or micromovements.

    1. Author Response

      Reviewer #3 (Public Review):

      This manuscript describes host genetic data of several cohorts of Kenyan children with culture proven bacteremia, severe malaria, and controls, and the association with bacteraemia. We know that many children with severe malaria actually have a bacterial co-infection. Because it is difficult to get the numbers needed for such GWAS studies, the authors plus up their numbers by lumping together bacteremia and severe malaria cases - the latter in a weighted manner for the continuum of malaria and bacteraemia. In the next step they validate their findings in a new cohort of 434 bacteraemia cases and present functional studies in monocytes. The methods used are interesting and the data are valid. Findings are important. I am not an expert in statistics, so I cannot judge the statistical methods in detail, but they seemed to be valid.

      Very many thanks.

      I have a few major points.

      1) Overview of cohorts - overview. A graphical overview of cohort could be helpful for the reader- including groups, comparisons, and time periods of collection.

      We have added a new Figure 2 setting out recruitment to the study over time.

      2) Overview of cohorts - phenotypes. The datasets used have been published previously with clinical phenotypes in more detail. Would it be possible to include a supplementary table providing these clinical phenotypes per group? In how many patients in the severe malaria group cultures were performed?

      We now provide additional clinical information in an extended Table 2. All children included in the study with severe malaria had a blood culture taken at admission (this is now stated in the footnotes to Table 2).

      3) The potential impact of the prevalence of Pf HRP2 gene deletions on the analysis is probably limited because the cohort was collected in the period 1995-2008; this should be mentioned.

      As you suggest, the relevance to our data is likely to be limited. We now discuss how this may limit translatability of PfHRP2-based models for similar studies in other settings (lines 345-350).

      4) BIRC6 is identified as risk factor for invasive bacterial infection. BIRC6 (or BRUCE) is rightfully discussed by the authors in detail. BIRC6/BRUCE indeed is a ubiquitin conjugating E2 enzyme and a well-established anti-apoptosis regulator. Interestingly, we identified UBE2U to be associated with outcome in invasive pneumococcal disease (Lees et al Nature Comm 2019). The author may well find a link here.

      Many thanks for highlighting this. This association is also interesting as the association is seen in the context of meningitis caused by pathogens not just limited to the pneumococcus. We have added this to our discussion (lines 327-330).

      5) The discussion could a presented a bit more balanced. 2/3 is now used to discuss the potential role of BIRC 6- this could be condensed while limitations of the study should also be discussed.

      We have added a section summarising our study’s limitations in the discussion (lines 331-350).

    1. Author Response

      Reviewer #3 (Public Review):

      The most interesting and novel part of the manuscript is the process for removing PMMA from the graphene after the transfer of the PMMA/graphene pad to electron microscopy grids. The authors use incubation in acetone followed by baking overnight at 200 Celsius. If this proves to be reproducible with easily obtained sources of commercial graphene, it will be a major aid in allowing more labs to generate electron microscopy grids with graphene. To further clarify the efficacy of this process and aid the reproducibility of the method, we ask the authors to improve the characterisation of the suspended graphene on the grids and add more detail in the description of the transfer and cleaning procedures.

      In particular, in order to unambiguously demonstrate removal of the PMMA, Fig. 2 should include selected area electron diffraction (SAED) data where only the graphene layer suspended over the holes (no supporting foil) contributes to the diffraction pattern. This is easily achieved with a selected area aperture of the correct size. Patterns should be shown at each step of the process after the transfer. The authors should also clearly indicate in the TEM images the area of the sample that is illuminated to generate the SAED pattern. The SAED pattern as a function of tilt could also be examined to confirm a single graphene layer is present over the hole.

      As reviewer#3 requested, we obtained the new SAED data and included in the revised Supplementary Materials (Figure S5, S6).

    1. Author Response

      Reviewer #1 (Public Review):

      I find the question relevant, the quantitative analysis carefully reasoned, and the results compelling and of broad interest. The authors should address the following comments, which mostly center around clarifying the assumptions made regarding the agents' prior knowledge, and the need for better placing this study within the context of previous research, especially regarding memory requirements of the strategy and comparison with more reactive (memory-less) strategies. Finally, a broader discussion of the limitations of the current study (e.g. what happens if x_thr and y_thr change over time?) and of the next steps would strengthen the paper.

      An assumption behind the entire study is that agents can hold in memory their belief, which in this case is their location relative to the expected location of the source. Over time this memory enables agents that start with a wide prior to refining their belief. This strong assumption makes the strategy discussed here quite different from other more reactive strategies proposed in the literature that do not require agents to build an internal map of the expected location of the source. While it is easy for a robot to maintain such a memory, how and to what extent animals do so using known mechanisms such as path integration and/or systems such as grid and place cells is less clear. A more explicit description of the key memory requirements of the strategy discussed here (once learned) and a discussion of how it might be implemented by animals, as well as a discussion of the differences in that aspect with other strategies proposed in the literature, including reactive strategies, would strengthen the paper and significantly broaden is significance.

      Along the same lines, the study assumes that the agent stores an internal model of the statistics of the plume, e.g. x_thr and y_thr, L_y etc. The predictions made in 6e/f, for example, are likely only valid if the agent already knows the constraints of the plume it is searching for (i.e. x_thr and y_thr), which seems unlikely in most natural scenarios. Perhaps the authors could discuss some ways in which these might be inferred. The authors nicely show that an agent trained with the Poisson model navigates well even in the full time-dependent simulation. But what is missing is a discussion of how animals would get trained in the first place and what information they would need access to in order to do so. Perhaps examine how an agent trained in environment A performs in environment B as a function of how strong the statistical difference between environment A and B are. One could for example change the Poisson statistics between A and B.

      Following the reviewer’s suggestion, we have added an entire new paragraph in the final discussion section (lines 430-455). There, we comment on reactive vs cognitive strategies of search and we provide details on memory requirements of the algorithm presented here and its robustness to misrepresentations of the environmental flow. In particular, Supplementary Figure 1 includes a violin plot showing that when the number of training episodes is low, the performance is bimodal (B) and how the number of alphavectors increases when the number of training episodes increases.

      Supplementary Figure 4 shows the performance of the algorithm varying the model of the environment as suggested by the reviewer, i.e. the agent is trained in environment A and performs its search in environment B. The plot reports performance as a function of the difference between environments A and B.

      Related to the previous point: the simulated plume is straight, i.e. there is no variation in the mean flow and therefore no random meandering of the plume. This means that once the walker hits the center of the plume, if it orients upwind, it is likely to reach the source because there is a continuous stream of odor on the ground it can follow, with just a few castings whenever it drifts slightly off the centerline. Is there a way for the authors to explore what would happen in the case of meandering plumes without having to run another massive simulation? Perhaps a simplified model of odor plume could be used or one could even just use the same simulated plume Poisson statistics but translate this solution perpendicular to the main flow at a slow oscillatory rate. Will the navigator now stop and sniff in the air more often? Will these sniffing events coincide with moments when the navigator loses the plume? Will agents be able to still use a constant x_thr and y_thr or would they have to learn their statics? Or will agents revert to a more memoryless or hybrid strategy?

      Following the reviewer’s suggestion, we have added a plot in Supplementary Figure 4 that shows the performance of our algorithm when trained in a fixed mean flow and searching in a meandering flow where the direction of the mean flow changes with time. The results confirm the robustness of the algorithm with respect to incorrect modeling of the environmental flow. Behavior of the agent in these more challenging conditions is largely consistent with what observed for the static plume. Performance degrades when meandering is more accentuated. We expect training over unsteady conditions will become necessary in even more extreme oscillatory conditions.

      How does the benefit of sniffing the ground vs the air change if odor molecules adsorb and de-adsorb on the surface, thus increasing the distance from the source where ground odor can be detected?

      The issue of adsorption was discussed in the bulk of the paper and we have now added a comment in the final discussion, so as to increase its visibility. Comments are found at lines 395-401.

      There is a difference in clarity between the first part of the paper and the second part that starts at line 232 with the section "Searching for airborne cues". I recommend the authors work on that second section to improve clarity. For example, the goal of that section is not immediately clear. The first paragraph talks about expanding on the intuition gained from the first part and "to address the search dynamics" but does not spell out what key question about search dynamics is to be addressed. This only becomes clear at line 260. Knowing where this is going would help readers understand the motivation behind the simplified model. Maybe lines 258-263 or something similar could be moved into the first paragraph of that section. Also related to the previous comments it would be helpful to clearly state what is assumed known by the agent and what is not. Is the agent assumed to have learned the values of x_thr, v and N in equation (2) before starting the search? As we progress through that section, important details start to be omitted and making it more difficult to follow. For example, what is the definition of t_sniff (I am guessing it is given in line 313?)? What is meant by optimization depth (line 316)? What is meant by episode index, is this referring to N (line 322)? Can the authors provide intuition about why the optimized casting strategy expands over time rather than starting wide right away (line 315)?

      The section "Searching for airborne cues" has been significantly revised for clarity throughout. Specific points raised by the reviewer are addressed:

      • we spelled out the key questions in the first paragraph as suggested (lines245-249)

      • assumptions about the agent's model and reward structure (lines 250264).

      • the definition of t_sniff (line 327), optimization depth (lines 327-330, 339) and episode index (line 345).

      • the intuition behind the casting strategy (lines 331-337)

    1. Author Response

      Reviewer #1 (Public Review):

      Most work on antibiotic resistance focuses on particular resistance genes often located on plasmids, but rarely how these genes interact with others located on the chromosome of the host organism. Considering variation in the host genome and its interaction with resistance plasmids can help predict which hosts are more likely to become resistant to a given antibiotic and explain why the same plasmid may not confer the same level of resistance to different strains.

      The authors take a clever approach to finding such genetic interactions by designing an evolution experiment using E. coli carrying an MCR-1 plasmid containing resistance genes to colistin. They then select for increased resistance to colistin and sequence the genomes of the most resistant isolates. This allowed them to identify a particular gene lpxC that confers increased resistance to E. coli when combined with the MCR-1 plasmid (more than the sum of each mutation alone) and find that this is because of decreased membrane surface charge. They then investigate whether this mutation is relevant in wild E. coli isolates by analysing environmental samples from patients and other sources and find that indeed, this mutation is often found in carriers of the MCR-1 plasmid.

      The study is very well-designed and presented in a concise and logical manner. The use of evolution experiments to identify the mutations and then engineer them to quantify the epistatic effects and understand the mechanism behind them is very elegant. The real-world relevance is then supported by looking for these mutations in environmental samples. Despite this simplicity and clarity, in some places, the writing could be improved. I particularly found that the second half of the paper was not as easy to follow as the first part and could benefit from some clarifications. The figures could also contain a bit more information to help the reader.

      Thank you!

      1.1 For example, the abstract starts by talking about standing genetic variation but it's not immediately clear what is meant by that. Standing genetic variation seems to suggest that the resistance gene itself is present in the initial population, rather than variation in other loci that might affect the selection of the resistance gene. This could be better formulated.

      We have revised the abstract to be clearer about the source of genetic variation.

      1.2 The figures could be improved by being more specific about the datasets: are mutations in Figure 2 in the WT or the MCR-1 positive lines? Are the SNPs in Fig. 4A in lpxC? Do all isolates in Fig. 4 have the MCR-1 plasmid?

      Thank you for the comment. We have edited the figure legend (line 128, page 5). Yes, Fig. 4A shows SNPs in lpxC, and all the isolates in Fig 4 have the MCR-1 plasmid. We have now clarified this in the figure legend (line 230, page 9).

      1.3 Finally, the arguments being made about diversity in the different phylogroups were not very clear. This could be made more explicit at first mention, rather than later in the discussion section.

      We have revised this section to clarify theses points (lines 242-245, page 10).

      Reviewer #3 (Public Review):

      Jangir et al. used an 'evolutionary ramp' experiment to evolve E. coli strains under the selection pressure of increasing colistin concentrations wherein the surviving fractions were collected for genomic analysis. They report that the mcr-1 carrying strain evolved higher colistin resistance much faster only in presence of lpxC mutations in the genome. They identify the mcr-1 and lpxC interactions to be positively epistatic and mutations only in lpxC do not lead to resistance to colistin. Taking a cue from their evolution experiments, they looked for the variations in lpxC sequences in the genomic datasets of clinical E. coli strains. They found many such variations in the genomes of clinical isolates. Importantly, they found those variations to be present even in non-resistant strains which might predispose those strains to gain untreatable levels of colistin resistance.

      Strengths:

      The study focuses on two key aspects of antibiotic resistance in clinical settings. First, is the antibiotic colistin itself which is part of the last line of defense. Second, is the importance of genomic variations in clinical isolates that have not been linked to any antibiotic resistance mechanisms. The data were presented in a logical sequence and maintained brevity. The link of lpxC to mcr-1 resistance is convincing.

      Thank you!

      Weaknesses:

      The basic premise of the paper is solid but the following should be addressed.

      3.1 In Figure 1, the authors applied the 'evolutionary ramp' method to isolate evolved strains with higher MIC to colistin; but, the conditions for the evolution of WT and strain carrying mcr-1 are different.Maintaining mcr-1 requires antibiotic selection which WT cannot withstand. Hence, if I am not mistaken, WT was not grown in the presence of any antibiotic.

      The referee’s assertion that the selective pressures experienced by the WT and MCR+ populations were different is incorrect. We increased relative antibiotic dose (i.e., as a fraction of the MIC of the parental strains) at the same rate for both the WT and MCR+ populations. This is clearly explained in the text (lines 98-100, page 3), and the absolute colistin doses are shown in Figure 1. Please also see response 2.4 above.

      In our study, we used a naturally occurring MCR-1 carrying plasmid from the IncX4 family. This plasmid is actually very stable (in the short term at least) in the absence of colistin, in spite of the costs imposed by MCR-1. We speculate that this stability in part reflect the high conjugation rate of the plasmids and the presence of a toxin-antitoxin module.

      3.2 Not only that, maintaining a ~32 Kb plasmid itself can have different selective landscapes. The authors may replicate the experiment with their low-copy clone of mcr-1 which would make it easier for the authors to have an empty vector in WT as a proper control. Since now they know the expected mutations to be in lpxC, they might sequence a PCR amplicon of that region for validation of their hypothesis.

      This is an interesting idea for a future study. We agree with the referee that the presence of the MCR-1 plasmid may impose additional selective pressures that could potentially lead to bacteria-plasmid co-evolution. However, our data suggests that bacteria-plasmid interactions were not an important selective force over the course of our experiment: we detected no mutations in the plasmid and almost all of the chromosomal mutations that we detected could be easily associated with selective pressures imposed by colistin.

      3.3 In Figure 2, what are the effects of these mutations in lpxC? The authors state that many mutations map on to the metal binding domain; but are those significant changes? LpxC is relatively well characterized and authors may want to comment on these mutations a little more.

      Yes, most of the evolved lines had mutations in the metal-binding domain site, and it is known that this site is very important for lpxC activity. For example, mutations at positions 79, 238, 242 and 246 lead to a hundred to thousand-fold decrease in lpxC activity (PMID: 24117400, 24108127, and 11148046), and many of our mutations map close to these sites (lines 140-143, page 6, and Figure 2b).

      3.4 Also, lpxC mutations showed enrichment but lpxA did not. Is this suggestive of the type of Lipid A that is more preferred for the epistatic interactions? The authors may want to comment on that.

      Interestingly, this could be the case that the epistatic interactions depend on the type of lipid A modification and the associated pleiotropic effects. Because mutations in LPS biosynthesis genes can have diverse adverse effects as it alters the membrane properties. However, in-depth future work is required to understand how the different types of changes in lipid A influence interactions with MCR. We chose not to further explore this in the paper, because lpxA was rarely mutated (2/17 clones) compared to lpxC (11/17 clones).

      3.5 In Figure 3, the lpxC mutant shows a reduction in fitness in a competition assay. What is the growth pattern of individual strains?

      The standard growth curve assay shows no significant difference in growth rate between LpxC mutant and wild-type strain (figure below). This is evident by the fact that standard growth curves are not ideal for capturing small differences in growth/fitness. Therefore, we emphasize the results of the competition experiment as this is gold standard method for measuring fitness effects (Figure 3c).

      3.6 There is a possibility that slow growth of lpxC mutant provides benefit under antibiotic stress.

      This is an interesting idea, but in this case, the slow growth of the lpxC mutant is clearly associated with a small decrease in colistin resistance (Figure 3A).

      3.7 Minor comment: the three individual replicates shown in Figure 3a are all identical within a sample and do not add to the figure where n=3. The authors can simply show SD or report correct values of replicates.

      We chose to show the raw data points, as this is the style of presentation that is being increasingly used by journals (i.e., many journals now say show all raw data points when n<6 or 10). It would not make sense to show a standard error as this was equal to 0.

      3.8 In Figure 4, as the authors themselves have stated, the difference in heterogeneity could be simply due to variation within phylogroups and subsequent compositional differences within the populations. The authors must check if mutations were found in the same location of lpxA as found in their own evolved strains. Without this information, the heterogeneity data would be speculative. Adding the lpxC variants reported in figure 2 to the trees of figure 4 (right) will make it clear if their conclusion is justified.

      This is an interesting point. We found no overlap between our experimentally evolved mutations and naturally occurring lpxC mutations, either at the level of nucleotides or codons. However, it is unclear if we should expect to see an overlap for two reasons: 1. The mutations present in natural isolates likely reflect a combination of beneficial mutations, neutral mutations, and weakly deleterious mutations. The mutations found in our evolved isolates, on the other hand, are all mutations that were beneficial under colistin selection. As such, it is probably not reasonable to expect a strong overlap between the two sets of mutations. 2. The lpxC mutations that we observed in our 11 lpxC mutated isolates are highly diverse – we found no cases of parallel evolution at the nucleotide level, and only a single example of parallel evolution at the codon level. Given this, our data suggest that a very wide diversity of sites of lpxC can interact epistatically with MCR-1 to increase colistin resistance. Again, this high diversity of potential lpxC mutations should give a weak association between lab evolved and clinical isolates.

      We have added these points in the text (lines 278-304, pages 11-12).

      3.9 The authors can perform a confirmatory experiment for the pre-existing part of their hypothesis. If they perform the evolutionary ramp experiment with a strain carrying lpxC mutant strain, will they see faster evolution of high MIC mutants?

      This is an interesting idea, our results suggest that more rapid evolution of high level colistin resistance would occur in the lpxC mutant compared to a wild-type strain (assuming that both had an equivalent opportunity to acquire MCR-1 by horizontal gene transfer).

      4.0 The rationale of how the presence of lpxC mutations can cause a strain without any colistin resistance to acquire mcr-1 is not addressed. The authors may want to comment on that.

      MCR-1 is carried on conjugative plasmids, and the main plasmid families that carry MCR-1 (IncI2 and IncX4) have high conjugative rates. We have changed the text of introduction to emphasize that MCR-1 is carried on conjugative plasmids, and we have linked MCR-1 acquisition to plasmid conjugation (lines 327-328, page 13).

    1. Author Response

      Reviewer #2 (Public Review):

      1) “…it was important that the output response was intimately linked to the bound state of the receptor, in this case the TCR, with ligand unbinding rapidly reversing all proofreading steps. This means that dissociation of a single TCR should disrupt signaling, and implicitly assumes a direct physical connection between the bound receptor and the KP modifications. However, this mechanism becomes much harder to argue when the KP steps are physically uncoupled from bound TCR, such as in LAT microclusters or DAG production.”

      We agree that signaling events in the kinetic proofreading chain must be linked to ligand unbinding. We have added discussion to the paragraphs beginning on page 20 line 440 of recent work from Yi et al. 2019 and Lo et al. 2018 suggesting a physical link between bound TCRs and LAT clusters. The full paragraphs are reproduced below.

      “The kinetic proofreading model requires all intermediate steps to reset upon unbinding of the ligand (Fig. 1A). This means that information about the receptor’s binding state must be communicated to all proofreading steps. If kinetic proofreading steps exist beyond the T cell receptor, how is unbinding information conveyed to these effectors? Importantly, there is evidence of physical proximity of LAT with the receptor. While TCR/Zap-70 and LAT/PLCγ microclusters form spatially segregated domains, these domains remain adjacent to one another (Yi et al., 2019). Lo et al. demonstrated that the protein Lck binds Zap-70 with its SH2 domain and LAT with its SH3 domain, potentially bridging the two signaling domains together and propagating binding information (Lo et al., 2018).

      An attractive reset mechanism is the segregation of CD45 away from bound receptors, creating spatial regions in which TCR and LAT associated activating events can occur (S. J. Davis & van der Merwe, 2006). Super-resolution microscopy by Razvag et al. measured TCR/CD45 segregated regions within seconds of antigen contact at the tips of T cell microvilli (Razvag et al., 2018). Upon unbinding, these regions of phosphatase exclusion collapse, allowing CD45 to dephosphorylate receptor ITAMs and LAT clusters. However, the rate of dephosphorylation for LAT and receptor ITAMs could differ. LAT clusters exclude CD45 in reconstituted bilayer systems, potentially limiting the dephosphorylation to LAT molecules at the edges of the cluster thus slowing reset (Su et al., 2016). The kinetics of multivalent protein-protein interactions within TCR and LAT clusters can also influence dephosphorylation and dissociation rates (Goyette et al., 2022).

      A CD45-mediated reset mechanism would restrict proofreading to membrane-bound signaling events occurring within a CD45-depleted region. Downstream events that dissociate away from the membrane or diffuse out of the segregated region could not directly participate in the proofreading chain, as the collapse of a CD45 segregated region could not reset signaling entities released into the cytosol (e.g. release of IP3 in the cleavage of PIP2 to DAG).”

      2) …The data clearly demonstrate a time delay between receptor binding and the measured outputs, but it is not so surprising that this lag would exist in propagating the signal through the intracellular network.

      We apologize for this point of confusion in our methodology. We are unable to measure the time lag between receptor binding and signal propagation through the network because our system is terminated by blue light. Binding is stochastically initiated much like native ligand/receptor interactions. The time values reported in our dataset are the average ligand binding half-lives of the LOV2 ligand under various intensities of constant blue-light illumination, as measured by separate in vitro kinetic washout experiments. Our model is fit to the steady-state signaling output achieved after a 3 minute exposure of cells to LOV2 ligands of an average ligand binding half-life enforced by constant blue light illumination. We clarify this point by including the following paragraphs beginning on page 8 line 170.

      “We are unable to control when binding events start since our optogenetic system is inhibited by blue-light, as opposed to being activated by blue-light. The initiation of binding after blue-light inhibition is a function of both the stochastic relaxation of inhibited LOV2 back into the binding-state as well as the diffusion of binding-state LOV2 from outside the previously illuminated area. Without temporal control over the start of binding, it is difficult to measure the time delay between ligand binding and a downstream signaling event (Yi et al., 2019). Such studies typically require careful single-molecule imaging of numerous stochastic binding events (Lin et al., 2019).

      To overcome this technical limitation of our system, we chose instead to measure the steady-state output of the antigen signaling cascade achieved several minutes after ligand binding. Kinetic proofreading systems behave differently than non-proofreading systems at steady-state. A non-proofreading system’s steady-state output is set by the number of ligand-bound receptors and not the binding half-lives of those ligands (Fig. 3D, left). In contrast, a kinetic proofreading system can produce different steady-state outputs in response to ligands of different binding half-lives, even when ligand densities are adjusted to achieve equivalent occupancy (Daniels et al., 2006) (Fig. 3D, right). Signaling events take varying amounts of time to occur after ligand binding (Lin et al., 2019; Yi et al., 2019). However, the temporal delays between steps are on the order of tens of seconds. By imaging the cells after minutes of constant exposure to a set ligand binding half-life, we measure the steady state output achieved at a signaling event in the cascade on a longer timescale than these delays (Tischer & Weiner, 2019).”

      3) The authors use a simple equation for KP to fit their datasets in Figure 4, equivalently to their previous work. However, no goodness-of-fit metric is provided for these fits, and by manual inspection it is hard to see the defining curves of their KP model in the datasets, especially not for LAT and DAG, where the datasets look much more like vertical bars. The estimated values of steps (n) may well be the best fit to the data, but they are not necessarily a 'good' fit.

      To assist readers in assessing how well our models fit our datasets, we have included heatmaps of the residuals from each model fit (Fig 4S3) on page 52, along with discussion (reproduced below) of the residual plots of regions where our models imperfectly capture our dataset on page 13 line 283.

      “To assess our model fits, we evaluated the residuals of each model subtracted from their respective dataset. For Zap70 recruitment, our model underestimates the degree of activation at moderate binding half-lives and receptor occupancies, as indicated by the positive region in the center of the heatmap. It is possible that Zap70 recruitment reaches saturation at shorter ligand binding half-lives than our model predicts (Fig. 4S3 A). For both LAT clustering and DAG generation, our models performed poorest in the region of lowest occupancy and shortest half-life (Fig. 4S3 B&C). In this region of our dataset, the fluorescent signal from bound LOV2 above the background fluorescence of unbound LOV2 is smallest. To compensate for fluorescence of unbound LOV2, we subtract off the local background fluorescence of unbound LOV2 around each cell. In doing so we may be underestimating the amount of LOV2 bound to each cell, leading to an underestimation of signaling output by the models. Future studies at LOV2 densities approaching single molecule would better capture this regime of receptor occupancy, but cell-to-cell variation in activation would be too high to be compatible with our current steady-state analysis (Lin et al., 2019).”

      4) The values of n are also very high, which would imply that the kp rate constant might be very fast to compensate; no estimates of this value are presented. Recent data from the Dushek lab (Pettmann et al, eLife 2021) measured n to be ~3, which seems much more physically realistic. Furthermore, in their previous published work, Tischer & Weiner measured n to be 2.7 for DAG production but in the present study it is now n=11.3, using the same equation

      We are unable estimate the kp rate constant, as our datasets are at steady state and do not provide temporal information. To assess the plausibility of our higher n value fits, we explored the steady-state model presented in Ganti et al. PNAS 2020, which defines a kp rate of 0.1 s-1. This model predicts the minimum number of signaling steps required to achieve a defined Hopfield error rate at defined cognate-ligand/self-ligand concentration and half-life ratios. Our exploration of this model is detailed in Fig. 4S4 on page 53 and detailed in discussion on page 14 line 299

      “In our previous work our model fit fewer (N=2.7) steps to DAG generation. We now fit a higher number of steps (N=11.3) to DAG generation. This change could be due to the incorporation of ICAM into our current study, which has been shown to potentiate ligand discrimination (Pettmann et al., 2021). Furthermore, our previous antibody-based adhesion may have short-circuited some proofreading steps by irreversibly holding the cell membrane close to the supported lipid bilayer. To evaluate if our higher value fits are indeed the best fit values for our datasets, we fit our model to each dataset while holding the value of N constant in the range of zero to fourteen steps, and evaluated the average residual value for each model fit (Fig 4S3 D). For all signaling steps, the fit value of N was near the minima of average residual and had a lower average residual value than a model with 3 proofreading steps.

      To assess the plausibility of a larger number of proofreading steps, we implemented the steady state kinetic proofreading model from Ganti et al. (Ganti et al., 2020). The model estimates the minimum number of proofreading steps required to discriminate between cognate-ligands and self-ligands with different binding half-lives present at a given concentration ratios at a given Hopfield error-rate (Hopfield, 1974). First, we evaluated what combinations of ligand half-lives and concentration ratios an 11-step kinetic proofreading network could discriminate at an error rate less than 10-3 (Fig 4S4 A). We chose the error rate of 10-3, as it is an order of magnitude less than the theorized 10-4 upper limit error rate of the native TCR (Ganti et al., 2020). At moderate half-life ratios, an 11-step network can discriminate cognate peptides present in small concentrations (e.g. 1 cognate-ligand per 1000 self-ligands at a half-life ratio of 6).

      In our optogenetic system, the ratio of the average ligand binding half-life between the longest suppressive half-life and the shortest fully activated half-life is about 2. However, an 11-step network is insufficient to discriminate between ligands with a half-life ratio of 2, even at the high ligand ratio of 1 (equal concentrations of cognate- and self-ligand). This suggests our cells are unlikely to be detecting the average ligand binding half-life of each blue-light condition, but are more likely detecting longer-lived binding events from the underlying distribution of half-lives. Another possibility is that our in vitro washout measurements, which measure average ligand binding half-lives of soluble ligands diffusing in three dimensions, differ from the half-lives of ligand-receptor interactions between the cell’s plasma membrane and the supported lipid bilayer diffusing in two dimensions (J. Huang et al., 2010).

      To better explore the kinetic proofreading model space, we generated heatmaps reporting the required number of steps to discriminate combinations of ligand and half-life ratios at an error rate of 10-3 (Fig 4S4 B). To discriminate between ligands with a half-life ratio of two, at least 14 steps are needed when the ligands are at equal concentrations, and more than 25 steps are needed if cognate-ligands are 1 per 1000 self-ligands. The required number of proofreading steps decreases rapidly as the half-life ratio increases, reaching a minimum of 8-steps needed for a concentration ratio of 1/1000 and a half-life ratio of 10, which is more in line with physiological half-life ratios between agonist and non-agonist peptides (M. M. Davis et al., 1998).

      After comparing our results with the Ganti model, this analysis suggest that our number of fit proofreading steps may be somewhat inflated as a function of our use the average ligand binding half-lives of three dimensional washout experiments in place of the two dimensional single molecule information T cells use to make activation decisions. However, the higher fit N values are more consistent with the required number of steps to discriminate ligands under more physiological conditions than our previous measurements of ~3 steps, which would not be expected to discriminate ligands with half-life ratio of 10 even at a ligand ratio of 1 (Fig 4S4 B, right).”

      5) If the fitted value of n provides no realistic insight into the KP mechanism, it should not be discussed as though it does.

      The many assumptions of our simplistic model likely results in error in determining the absolute number of fit proofreading steps. We feel the strength of our model lies in capturing the relative increase in the strength of proofreading as signal propagates through the cascade, and not determining the absolute number of proofreading steps, though it is comforting that our values are broadly consistent with the expectations of Ganti et al. To highlight the point that relative values are the most important feature of our experiments, we are open to normalizing our n fit values by the fit n of Zap70 for all discussion of our results and the proofreading strength increase shown in Fig 4D if the reviewers think this will better highlight the relative increase in proofreading strength.

      6) While it is good to confirm it, the result that downstream signaling complexes reset more slowly than distal ones is surely to be expected, given the increased number of steps over which ligand unbinding must traverse, as in their Erlang distribution. You would not expect ERK phosphorylation to decrease at the same rate as LAT cluster dissociation for this same reason. However, the fact that the lifetime of LAT clustering (14.2s) or ZAP70 (9.6s) is so different to LOV2 (3.3s) provides good evidence that it is not proofreading, as by definition the measured outputs should rapidly return to the 'unbound' state in line with ligand unbinding. At least for LAT, there must be a 'memory' from previous signalling lasting several seconds, which means the system has not reset, as required for true KP.

      Slower resetting of downstream signaling events in a kinetic proofreading cascade is not a given, as it could be the case that all events reset at the same rate. One requirement for kinetic proofreading is that events in the chain be irreversible on the timescale of the ligand binding half-life. The steps are reset through an orthogonal pathway, opposed to traversing back down a chain of reversible reactions. Both the TCR and LAT are dephosphorylated by the phosphatase CD45, and it would be possible for CD45 to dephosphorylate both proteins at the same rate (or even dephosphorylate LAT faster than the TCR). To clarify this point, we have expanded discussion on possible reset mechanism on page 21 line 451 as reproduced below

      “An attractive reset mechanism is the segregation of CD45 away from bound receptors, creating spatial regions in which TCR and LAT associated activating events can occur (S. J. Davis & van der Merwe, 2006). Super-resolution microscopy by Razvag et al. measured TCR/CD45 segregated regions within seconds of antigen contact at the tips of T cell microvilli (Razvag et al., 2018). Upon unbinding these regions of phosphatase exclusion collapse, allowing CD45 to dephosphorylate receptor ITAMs and LAT clusters. However, the rate of dephosphorylation for LAT and receptor ITAMs could differ. LAT clusters exclude CD45 in reconstituted bilayer systems, potentially limiting the dephosphorylation to LAT molecules at the edges of the cluster thus slowing reset (Su et al., 2016). The kinetics of multivalent protein-protein interactions within TCR and LAT clusters can also influence dephosphorylation and dissociation rates (Goyette et al., 2022).

      A CD45-mediated reset mechanism would restrict proofreading to membrane-bound signaling events occurring within a CD45-depleted region. Downstream events that dissociate away from the membrane or diffuse out of the segregated region could not directly participate in the proofreading chain, as the collapse of a CD45 segregated region could not reset signaling entities released into the cytosol (e.g. release of IP3 in the cleavage of PIP2 to DAG).”

      We also added discussion of recent work from Harris et al. quantifying the slower timescale of Ca++ and ERK reset upon TCR signal termination on Page 23 line 498 as reproduced below.

      “Recently Harris et al. quantified the reset rate of the downstream signaling events Ca++ release and ERK phosphorylation upon signal inhibition to be 29 seconds and 3 minutes respectively (Harris et al., 2021). They showed both Ca++ and ERK levels can persist across short inhibitions of signaling. What makes LAT clusters different than these persistent downstream events? The dissolution of LAT clusters is directly triggered by the unbinding of ligand from the TCR, and both the TCR and LAT are de-phosphorylated by CD45. To our knowledge, the rate of ERK dephosphorylation or cytosolic Ca++ depletion are not accelerated by TCR unbinding, and are turned over through constant rather than agonist-gated degradation. A useful future line of inquiry would be to quantify the reset rate for signaling steps throughout the cascade upon ligand unbinding versus orthogonal signal inhibition (e.g. kinase inhibition).”

    1. Author Response

      Reviewer #2 (Public Review):

      In general, the study has several novel comments, the experimental design is appropriate and the manuscript is well written. While the manuscript contains a lot of data, still it is a bit descriptive. There are also some issues, which should be addressed.

      1) In Figure 1E, the authors demonstrate a small but significant decrease in body weight of mutant mice. The difference is not so drastic. They also mentioned that some mice showed kyphosis. Please provide data on what percentage of mutant mice showed kyphosis. Please also provide individual hind limb muscle weight normalized with body weight.

      Thank you for your suggestions. The kyphosis was observed in some (more than one third of) Dst-b mutant mice as shown in the author response image 1. MRI or CT imaging of the skeleton is necessary to accurately diagnose kyphosis, however, the imaging was not performed in this paper. Therefore, we would like not to provide data on what percentage of mutant mice showed kyphosis.

      We weighed the soleus of hind limb and demonstrated the data (lines 132-135).

      2) There is a lot of variability in the age of the mice employed for this study. For example, in Figure 3, the authors mentioned 23 months old mice (Fig. 3a) and over 20 months old and over 18 months old mice. What was the exact age of the mice? Why three different age mice were used for the same set of experiments? The authors should also comment on whether the onset of myopathy in skeletal and cardiac muscle occurs at the same or different age in mutant mice.

      According to the comments, we described exact ages in each figure legends. The reason for the variability in age of mice is that we performed a lot of different kinds of experiment at different time points. We described the myopathy phenotypes occurred around 16 months of age and older (lines 128-129). As for cardiomyopathy, fibrosis was observed around 16 months of age and older (Figure 3D,E).

      3) Authors have studied protein aggregation only in the soleus muscle of mutant mice. Do the same types of aggregates also form in cardiomyocytes? They write that desmin aggregates were observed in cardiomyocytes of mutant mice. Please show those results in a supplemental figure.

      According to the suggestion, we presented the data on desmin aggregates in the cardiomyocytes of Dst-bE2610Ter/E2610Ter mice (Figure 4-figure supplement 1).

      4) In Figure 5, the authors suggest that mutant mice have mitochondrial abnormalities. However, this analysis is quite abstract and inconclusive. Immunohistochemical images show higher levels of CytoC and Tom20 whereas QRT-PCR demonstrates a significant decrease in mRNA levels of some of the mitochondria-related molecules. Authors should perform additional experiments to determine whether there is any difference in mitochondrial content between WT and mutant mice. In addition, they should perform some functional assays (i.e. OCR, seahorse experiment etc.) to measure mitochondria oxidative phosphorylation capacity is affected in mutant mice.

      Thank you very much for the comment. Mitochondrial accumulation was a characteristic phenotype in Dst-bE2610Ter/E2610Ter muscle and also in other types of MFM. We performed quantitative analyses and added the data (Figure 5B). Mitochondrial accumulation was observed even in young stage when protein aggregates were not observed (Figure 3-figure supplement 1A). As the reviewer pointed out, it is important to demonstrate changes in mitochondrial function, but at this moment, we do not have that assay system and would like to present it as data for a future paper, including analysis on mitophagy.

      5) The morphology of the mitochondria in TEM images shows features that are commonly observed during oxidative damage. Is there any evidence of oxidative stress in skeletal and cardiac muscle of mutant mice?

      Thank you very much for the insightful comment. Gene ontology and KEGG pathway analysis on RNA-seq data did not show alterations of oxidative stress in the heart. We performed q-PCR for genes associated with oxidative stress in soleus (Figure 1-figure supplement 3), which did not show alterations in oxidative stress. In the future, we would to investigate on this point.

      Reviewer #3 (Public Review):

      This manuscript by Yoshioka et al. provides an extensive analysis of cardiac and skeletal muscle in a mouse model of Dst-b mutation. The authors have generated the mutant mouse model to selectively mutate Dst-b isoform of Dystonin and show that such a mutation leads to cardiomyopathy and late-onset myofibrillar myopathy. This is a novel discovery which adds valuable information to the genetic basis and molecular mechanism of MFM mediated by Dst-b. However, the manuscript needs substantial revision and additional feasible experiments.

      In Figure3A, the authors suggest that there are smaller myofibers in the mutated mice however they do not provide enough data to support that. Cross-sectional areas between the mutant and WT have to be counted and represented as bins. This can better show the presence of smaller myofibers and muscle degeneration in the mutant mice.

      Thank you for the helpful comment. We quantified distribution of cross-sectional area (CSA) in the soleus and then the data was indicated in Figure 3C. It indicates that there are smaller myofibers in the mutant mice.

      In Figure 3A-B, the authors show that mutant mice have significantly more myofibers with centrally located myonuclei indicating the constant degeneration and regeneration in the mutant mice. Another indicator of this is the number of activated muscle stem cells. Under homeostasis, authors can compare the number of quiescent muscle stem cells and activated muscle stem cells. If there is constant degeneration and regeneration in the mutant muscle, there will be more cycling muscle stem cells and that will further prove such phenotype in question. Alternatively, they can use EdU water and quantify the number of EdU+/Pax7+ cells between the mutant and WT.

      Thank you very much for the interesting comment. We agree that the subject of muscle regeneration in Dst-b mutant mice to be interesting. The authors tried to address this issue by making ISH probes for Pax7 and Emerin, which label muscle stem cells (image below). However, we were unable to reach a conclusion at this time. We intend to address this issue in the future.

      In figure 2F, the authors show behavioral tests on the mutant mice of age 1 year. They do not show any significant difference in muscle strength. However, most of the myopathic phenotypes they observe are at 23 months of age, these behavioral tests can be repeated at that age to see if there is more muscle weakness in the mutant mice compared to the WT. Also, are these behavioral test readouts affected by the cardiomyopathy independent of skeletal muscle strength?

      We have used rotarod test and wire hang test to evaluate motor coordination and have reported impairment of motor performance in dt mice (Horie et al., 2020). The purpose of these behavior tests in the present study was to evaluate motor coordination of Dst-b mutant mice compared to dt mice, not to address the skeletal muscle function. The text has been changed to clarify this point (lines 121-123).

      Generally speaking, these behavioral tests, especially the rotarod test, may be affected by cardiac abnormalities. However, it is difficult to draw conclusions from the results of this study, since there were no significant differences in the behavioral experiments.

      They show in Figure 3B that the number of CNF's are affected to a different extent in different muscles. These muscles have a different composition of myofibers, one consisting mostly of slow-type fibers while the other is mostly of fast-type. The question of whether Dst-b mutation effect of muscle fiber types is not clear. Is there a difference?

      Thank you very much for insightful comment. We performed qPCR to evaluate whether Dst-b mutation affects the myofiber type of soleus muscle (Figure 1-figure supplement 3B). Expression levels of the genes did not change between WT and Dst-b mutant mice.

      The cardiac myopathy phenotype that is clearly shown in figure 3 is shown in mice of 16 months of age whereas the skeletal muscle myopathy phenotype is shown in 23-month-old mice. The reason for the choice of the age of the mice should be discussed. Does the cardiac phenotype precede the skeletal muscle phenotype? Have they looked at the skeletal muscle phenotype at earlier ages? If so, that data should be provided as well and discussed.

      Thank you for the comment. We analyzed myopathy and cardiomyopathy phenotypes in mice aged between 16-23 months and then have chosen histological photographs with the high quality. As shown in Figure 3B, CNFs increased in the soleus from all Dst-b mutant mice aged between 16-23 months. We added description that skeletal myopathy phenotypes occurred at 16-month-old mice.

      The authors clearly show the formation of protein aggregates in the myofibers in the mutant mice. They further characterize the composition of these desmin aggregates by determining their co aggregates such as plectin and ab-Crystallin. Another component of the z-disk that has been shown to be involved in the aggregates in MFM is myotilin. The authors should also show the presence/ absence and co-aggregation of this protein with the desmin aggregates present in the mutant mice.

      According with the suggestion, we performed immunohistochemistry of myotilin. Myotilin was abnormally accumulated in myofibers of the soleus from Dst-b mutant mice. We thank the nice comment and added the data in Figure 4-figure supplement 2.

      The authors show abnormal accumulation of mitochondria through cyt c and Tom20 staining. The increased Tom20 levels in the mutant are shown in figure 5A which is from mice that are 23-month-old. However, in figure 3-figure supplement 1a they also show elevated Tom20 staining in the mutant mice that are only 1-2 months old. However, no other phenotype is observed at this age except for the disrupted mitochondria according to the data provided. This needs to be discussed and addressed.

      We would like to correct that the data in figure 3-figure supplement 1a is 3-4 months old mutant mice. These data show that mitochondrial accumulation precedes CNF and desmin aggregation. We have described this point in the text (lines 206-209).

      In Figure 5, the authors show changes in gene expression levels of genes involved in oxidative phosphorylation which supports the disrupted mitochondrial function. Additionally, ROS levels could be compared between the WT and mutant mice.

      To address the involvement of oxidative stress, we performed q-PCR for genes associated with oxidative stress response in soleus (Figure 1-figure supplement 3C). qPCR data did not show alterations in such genes. In the future, we would like to investigate on this point.

      In Figure 5 authors show disrupted oxidative phosphorylation in the mutant soleus muscle. Is this also associated with the fiber-type switch? Since mouse soleus muscle is a mix of fast and slow fiber types, they can look at differences in gene expression of key marker genes for slow and fast myofibers.

      Thank you very much for the suggestion. We quantified expression levels of muscle fiber-type marker genes (Figure 1-figure supplement 3B). There is no data to suggest the fiber-type switch.

      In figure 2, the authors show that mutant mice increase their body weight at a normal pace until 13 weeks of age after which the mutant mice become lighter than their WT counterparts. Is this suggestive of loss of muscle mass? If so, the authors show the muscle atrophy phenotype in 23-month-old mice with cross-sections. Does this mean muscle atrophy starts at an earlier age at 16 months in these mice? Please provide details on the age of the mice for each experiment. In addition, in the text (line 121) authors phrase that the mutant mice become leaner. Lean usually means a decrease in fat mass and an increase in muscle mass. Is this the case? If so, there is no data to support that and the phenotype in the mutant mice suggests there is muscle atrophy in these mice. Therefore, it would not be appropriate to suggest that these mice get lean. However, it is interesting that the bodyweight of the mutant mice gets significantly lighter after 13 weeks. EchoMRI analysis can be performed between these mice to see the total body composition to determine if there is a change in the different type of fat, lean or water composition.

      Thank you for your comments. We provided exact ages in each figure legend. We described that skeletal myopathy phenotypes occur as early as 16-month-old mice, and CSA analysis showed that increased small caliber myofibers in the soleus of Dst-b mutant mice. However, muscle mass of the soleus normalized by body weight was not significantly different between control and Dst-bE2610Ter/E2610Ter mice. Therefore, muscle atrophy may be not significant enough to affect muscle weight.

      Because we have not quantified the fat mass in Dst-b mutant mice, we changed the phrase from “the mutant mice become leaner” to “they become lower body weight compare with WT mice” (line 120).

      Authors have performed RNA-Seq for the left ventricle from the mutant and the WT mice. Separate clustering of the WT and the mutant has to be shown at least through a PCA plot. Some IGV tracks to show the expression level changes in key genes between the mutant and WT should be shown as well. In addition, they could show how some of the genes involved in autophagy and protein degradation are affected since these are mainly the mechanism by which there is protein aggregation in MFM's.

      Thank you for your helpful comment. We performed principal component analysis (PCA) and hierarchical clustering. The data showed that transcriptomic features of WT and Dst-b mutant hearts are separated (Figure 8-figure supplement 1A, B). To evaluate the change in expression level of genes, we also performed real time-PCR (Figure 8-figure supplement 1C). Our Gene ontology analysis and KEGG pathway analysis on RNA-seq data in the heart did not suggest the alterations in autophagy and protein degradation, while many genes responsible for unfolded protein response affected (Figure 8C, Figure 8-figure supplement 1C). Previous studies have reported that unfolded protein response is abnormal in several animal models for myofibrillar myopathy (Winter et al., 2014; Fang et al., J Clin Invest, 2017). We would like to investigate underlying mechanisms of protein aggregates in Dst-b mutant myofibers in the future.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents analysis of an impressive dataset acquired from sibling pairs, where one child had a specific gene mutation (22q11.2DS), whereas other child served as a blood-related, healthy control. The authors gathered rich, multi-faced data, including genetic profile, behavioral testing, neuropsychiatric questionnaires, and sleep PSG.

      The analyses explore group differences (gene mutation vs. healthy controls) in terms of sleep architecture, sleep-specific brain oscillations and performance on a memory task.

      The authors utilized a solid mix-model statistical approach, which not only controlled for the multi-comparison problem, but also accounted for between-subject and within-family variance. This was supplemented by mediation analysis, exploring the exact interaction between the variables. Remarkably, the two subject groups were gender balanced, and were matched in terms of age and sex.

      Thank you for this endorsement of our approach.

      There are some aspects requiring clarification. In the discussion section, some claims come across as too general, or too speculative, and lack proper evidence in the current analysis of in the references.

      We have extensively revised our discussion, including introducing more referencing and adding subheadings which we hope makes our conclusions both more structured and better evidenced (Discussion, pages 27 – 31)

      Furthermore, the authors seem to treat their (child) participants with the gene mutation as forerunners of (adult) schizophrenic patients, to whom their repeatedly compare the findings. However, less than half of these children with 22q11.2DS are expected to develop psychotic disorders. In fact, they are at risk of many other neuropsychiatric conditions (incl. intellectual disability, ASD, ADHD, epilepsy) (cf. introduction section).

      We have revised our introduction (page 4 -5) and discussion to clarify the significant comorbidity in 22q11.2DS. We discuss the limitations and future directions section of our work in the discussion (page 30)

      Furthermore, the liberal criteria for detecting slow-waves, along with odd topography of the detections, limit the credibility of the slow-wave-related results.

      As there is no single common method for SW detection, as noted on page 37, we prioritised rate of detection in order to provide a robust dataset for spindle-SW coupling analysis. We considered the use of an absolute detection threshold (e.g. – 75 microVolts) – however, because our participants were of a wide range of ages (6 to 20 years), and it is established that the absolute amplitude of the EEG decreases across childhood (e.g. Hahn et al 2020), our view is that the use of an absolute detection threshold would potentially bias the detection of slow waves by age. We have added comments on this matter to the methods section (page 37)

      Lastly, we cannot be sure whether the presented memory effects reflect between-group difference in general cognitive capacities, or, as claimed, in overnight memory consolidation.

      We have added statistical analysis of the overnight change in performance (results, page 6) to explore this point. We clarify that although 22q11.2DS is associated with slower learning and worse accuracy in the test session, there is not a difference in overnight change in 22q11.2DS.

      Generally, the current study introduces dataset connecting various aspects of 22q11.2DS. It has a great potential for complementing the current state of knowledge not only in the clinical, but also in sleep-science field.

      Thank you

      Reviewer #2 (Public Review):

      This study examines 22q11.2 microdeletion syndrome in 28 individuals and their unaffected siblings. Though the sample size is small, it is on par with many neuroimaging studies of the syndrome. Part of the interest in this disorder arises from the risk this syndrome confers for neuropsychiatric disorders in general and psychosis specifically. The authors examine sleep neurophysiology in 22q11.2DS and their siblings. Principal findings include increase slow wave and spindle amplitudes in deletion carriers as compared to controls.

      Strengths of this manuscript include:

      • The inclusion of siblings as a control group, which minimizes environmental and (other) genetic confounds

      • The data analyses of the sleep EEG are appropriate and in-depth

      • High-density sleep EEG allows for topographic mapping

      We thank the reviewer for this positive endorsement of our work

      Weaknesses of this manuscript include:

      • The manuscript is framed as an investigation of the psychosis and schizophrenia; however, psychotic experiences did not differ between 22q11.2DS and healthy controls. Therefore, the emphasis on schizophrenia and psychosis does not pertain to this sample and the manuscript introduction and discussion should be carefully reframed. The final sentence of the abstract is also not supported by the data: "... out findings may therefore reflect delayed or compromised neurodevelopmental processes which precede, and may be biomarkers for, psychotic disorders".

      We have expanded our abstract, introduction and discussion to reflect the complex neurodevelopment phenotype observed in 22q11.2DS, and discuss the links between our findings, and elements of this phenotype

      • What is the rationale for using a mediation model to test for the association between genotype and psychiatric symptoms? Given the modest sample size would a regression to test the association between genotype and psychiatric symptoms be more appropriate?

      Our rationale for mediation analysis was to expand on making simple group comparisons for various measures by asking if genotype effects on particular psychiatric/behavioural measures were potentially mediated by EEG measures. This is of considerable interest because, as noted above, the behavioural and psychiatric phenotype in 22q11.2DS is complex, and therefore dissection of links between particular EEG features and phenotypes, and asking if EEG measures can be biomarkers for these phenotypes, may give insight into this complexity.

      • From Table 1, which presents means, standard deviations and statistics, it is hard to tell if there is a range of symptoms or if there are some participants with 22q11.2DS who met diagnostic criteria for a the listed disorder while others who have no or sub-threshold symptoms. This is important and informs the statistical analysis. Given the broad range of psychiatric symptoms, I also wonder if a composite score of psychopathology may be more appropriate. What about other psychiatric symptoms such as depression?

      We have added a supplementary figure to figure 1 to provide individual participants scores on psychiatric measures and FSIQ to fully inform the reader about individual data.

      We have taken the approach of using symptom scores, rather than using binary cut offs for diagnosis, to maximise the use of our dataset, and given many/all psychiatric phenotypes exist on a spectrum, to reflect the difference between clinical and research diagnoses.

      Regarding depression, it has been previously demonstrated in 22q11.2DS that mood disorders are rare at young ages (Chawner et al 2019), therefore given the low frequency, we have not included depression in this dataset

      We have considered the utility of a composite psychopathology score; however, it is already established that 22q11.2DS can be associated with a broad range of psychiatric/behavioural difficulties; in this study we were primarily interested in exploring the links (if any) between specific groups of symptoms, and specific features of the sleep phenotype. Therefore, we feel a composite psychopathology score would not add to the overall clarity of the manuscript

      • The age range is very broad spanning 6 to 20 years. As there are marked changes in the sleep EEG with age, it is important to understand the influence of age. The small sample size precludes investigating age by group interactions meaningfully, but the presentation of the ages of 22q11.2DS and controls, rather than means, standard deviations and ranges, would be helpful for the reader to understand the sample.

      We have added scatter plots of EEG measures and age to each figure supplement to allow the reader to see changes with age

      Also, a figure showing individual data (e.g., spindle power) as a function of age and group would be informative. The authors should also discuss the possibility that the difference between the groups may vary as a function of age as has been shown for cortical grey matter volume (Bagaiutdinova et al., Molecular Psychiatry, 2021).

      We have provided plots of individual data with age for our main figures, in the figure supplements. We also note we have included age as a covariate in all main statistical models (methods, page 39). We thank the reviewer for the additional reference, this has been added to the discussion (page 29)

      • There is a large group difference with regards to full scale IQ. IQ is associated with sleep spindles (e.g., Gruber et al., Int J of Psychphsy, 2013; Geiger et al., SLEEP, 2011). For this reason, the authors should control for IQ in all analyses.

      We note that the relationship between spindle characteristics and IQ has been questioned (e.g. Reynolds et al 2018 performed a meta-analysis which suggests no correlation with FSIQ, which would suggest against the suggested approach). We also note that genotype effects on FSIQ were not mediated by spindle properties. Furthermore, the phenotype in 22q11.2DS is complex, while lower IQ is a well evidenced part, it is only one component. We are unclear if it would be justified to regress out only one component of a phenotype.

      • The authors find greater power in the delta and sigma bands in 22q11.2DS compared to their siblings. Looking at the Figure 2, it appears power is elevated across frequencies. If this were the case, this would likely change the interpretation of the findings, and suggest that the sleep EEG likely reflects changes in cortical thickness between controls and 22q11.2DS participants.

      We thank the review for this interesting comment. We have now altered the approach taken to our analysis of spectral data in order to probe overall differences in overall power, using the IRASA approach described by Hahn et al 2020. We present these results on page 13, and use measures derived from this analysis in the mediation and behavioural analyses, and discuss these findings in the discussion (page 29)

      • Along the same lines as the above comment, it would be interesting to examine REM sleep and test how specific to sleep spindles and slow waves these findings are.

      We have now added analysis of REM-derived spectral measures, which we believe complement our finding of altered proportions of REM sleep in 22q11.2DS compared to controls (page 13)

      Reviewer #3 (Public Review):

      In this study, Donnelly and colleagues quantified sleep oscillations and their coordination in in young people with 22q11.2 Deletion Syndrome and their siblings. They demonstrate that 22q11.2DS was associated with enhanced power the in slow wave and sleep spindle range, elevated slow-wave and spindle amplitudes and altered coupling between spindles and slow-waves. In addition, spindle and slow-wave amplitudes in 22q11.2DS correlated negatively with the outcomes of a memory test. Overall, the topic and the results of the present study are interesting and timely. The authors employed many thoughtful analyses, making sense out of complicated data. However, some features of the manuscript need further clarification.

      1.) Several interesting results of the manuscript are related to altered sleep spindle characteristics in 22q11.2DS (increased power, increased amplitudes and altered coupling with slow waves). On top of that the authors report, that the spindle frequency was correlated with age. I was wondering whether the authors might want to take these individual (age-related) differences into account in their analyses. The authors could detect the peak spindle frequency per participant and inform their spindle detection procedure accordingly. This procedure might lead to an even more clear cut picture concerning altered spindle activity in 22q11.2DS.

      We thank the review for this informative suggestion. We have now implemented this method, detecting spindles for each individual at a frequency defined through IRASA analysis of the EEG (results, page 13; methods, page 35), and then using the properties of spindles detected through this method in further analysis.

      We have included age as a covariate in all main models (methods, page 39), and present individual data scattered with age in our figure supplements.

      2.) The authors state in the methods section that EEG data was re-referenced to a common average during pre-processing. Did the authors take into account that this reference scheme will lead to a polarity inversion of the signal, potentially over parietal/occipital areas? This inversion will not affect spindle related analyses, but might misguide the detection of slow waves and hence confound related analyses and results.

      We have reviewed our data preprocessing pipeline, and updated it based on the latest methods suggested from the EEGlab authors (methods, page 33). As a supplementary analysis we applied a heuristic signal polarity measure described by the authors of the luna software package https://zzz.bwh.harvard.edu/luna/vignettes/nsrr-polarity/ and did not observe any inversion of polarity in our sample.

      In the included figure (below) we calculated the Hjorth measure of signal polarity as described in luna, at every electrode and plotted a topoplot of the measure. In the figure numbers > 0 represent signals with a positive polarity, values < 0 a negative polarity. As demonstrated in the figure, there were no electrodes with a positive polarity, although we note that the most peripheral electrodes had an approximately neutral polarity, whereas more central electrodes had a slight negative bias.

      We also note that we only detected negative half waves with our slow wave detection algorithm, using a threshold set for each channel based on its own characteristics, so would not necessarily expect alterations in slow waves detection. Further, other authors have suggested that average referencing does not impact SW detection (e.g. Wennberg 2010)

      3.) I have some issues understanding the reported slow wave - spindle coupling results. Figure 5A indicates that ~100 degrees correspond to the down-state of the slow wave. Figure 5E shows that spindles preferentially clustered at fronto-central electrodes between 0 and 90 degrees, hence they seem to peak towards the slow wave downstate. This finding is rather puzzling given the prototypical grouping of sleep spindles by slow wave up-states (Staresina, 2015; Helfrich, 2018; Hahn, 2020). Could it be that the majority of detected spindles represent slow spindles (9-12 Hz; Mölle, 2011)?

      We observed peaks of spindle activity in the range of 9 – 24 degrees (so on the descending slope from the positive peak of the slow wave), but an average spindle frequencies in the 12 – 13 Hz range. Given we allowed each individual to have an individual spindle detection frequency, as above, and did not observe bimodal distributions of power in the sigma frequency band (Figure 2 Supplement 1), we do not believe our spindles specifically represent slow spindles

      Slow spindles are known to peak rather at the up- to down-state transition (which would fit the reported results) and show a frontal distribution (which again would fit to the spindle amplitude topographies in Fig 3E). If that was the case, it would make sense to specifically look at fast spindles (12-16 Hz) as well, given their presumed role in memory consolidation (Klinzing, 2019).

      We agree with the reviewer’s assessment of the distribution of the putative spindles we have detected. However, as we and other authors (Hahn et al 2020) have noted, we did not observe discrete fast and slow spindle frequency peaks in our analysis of the PSD (as has been observed by other authors e.g. Cox et al 2017). For this reason, and to reduce the complexity of the manuscript, we believe the best approach with our dataset is to focus on spindles at large, rather than detecting spindles in arbitrary frequency bands.

      In addition, is it possible that the rather strong phase shift from fronto-central to occipital sites is driven by a polarity inversion due to using a common reference (see comment 2)?

      As noted above, we do not observe significant polarity inversion in our signals using the luna heuristic measure. We were not able to identify published literature to inform our investigation of this suggestion, but would be happy to consider any specific suggestions from the reviewer

      Apart from that I would suggest to statistically evaluate non-uniformity using e.g. the Rayleigh test (both within and across participants).

      We have added an analysis of non-uniformity to the results section (results, page 20).

      4.) Somewhat related to the point raised above. The authors state that in the methods that slow wave spindle events were defined as time-windows were the peaks of spindles overlapped with slow waves. How was the duration of slow waves defined in this scenario? If it was up- to up-state the authors might miss spindles which lock briefly after the post down-state upstate, thereby overrepresenting spindles that lock to early phases of slow waves. Why not just defining a clear slow wave related time-window, such as slow wave down-state {plus minus} 1.5 seconds?

      We have implemented this suggestion (methods, page 38)

      5.) The authors correlated the NREM sleep features with the outcomes of a post-sleep memory test (both encoding and an initial memory test took place pre-sleep). If the authors want to show a clear association between sleep-related oscillations and the behavioural expressions of memory consolidation, taking just the post sleep memory task is probably not the best choice. The post-sleep test will, as the pre-sleep test, in isolation rather reflect general memory related abilities. To uncover the distinct behavioural effects of consolidation the authors should assess the relative difference between the pre- and post-sleep memory performance and correlate this metric with their EEG outcomes.

      We have added evening-morning performance difference as a measure to the results (page 6); however as there was no difference between groups in overnight change in performance, we focus on morning performance in relating behaviour to EEG outcomes (explored in results, page 6)

    1. Author Response

      Reviewer #1 (Public Review):

      Wang et al. adapt a new statistical framework on a multi-site multi-year database to investigate the effects of environmental variables on the temporal stability of plant communities and biomass productivity in Chinese grassland. The authors show with several lines of evidence that 1. the temporal stability of the region is due to spatial asynchrony of community dynamics, 2. this stability relies on dominant species, but less so on other community metrics, and 3. reductions, but also increasing variability in water availability reduces the stability of the system, with rather important future consequences to humans living in the region.

      A significant strength of the ms lies in solid statistics. Wang et al. apply to a real dataset a new framework (and two pathways, i.e., community-level vs. population-level metrics) with formulas the authors develop (in special for dominant species). Additionally, they provide a summary/test of the effect of environmental variables in shaping regional stability with SEM analyses. This new framework may be one that the larger ecological and ecosystem academic communities, interested in temporal changes of ecological processes across large spatial scales, are looking for.

      Thank you for your positive assessment of our study. We have tried to incorporate all your suggestions in the revised manuscript.

      Reviewer #2 (Public Review):

      The authors analyse an impressive dataset of field data collected across Inner Mongolian Grasslands to test theory concerning the mechanisms promoting temporal stability of plant biomass.

      Overall, the analyses seem solid, and the paper is based on strong theory, but the overall message is diluted by a large number of different analyses, making the analysis, results, and interpretation confusing in several places.

      The unfocused nature of the analysis and presentation of the results makes it difficult to evaluate whether the authors achieve their aims, and whether their results support the conclusions. My general impression is that they do, but the number of different analyses, supplementary results, etc., really complicates the narrative and interpretation.

      The paper is an interesting test of theory, and a practical test of the theory outlined in a previous paper (Wang et al.) could be a real asset to anyone aiming to explore the mechanisms promoting temporal stability across scales. The dataset too is a large and potentially useful one.

      That said, without a clearer narrative and streamlined set of analyses, it is difficult to interpret the potential impact of this work - which is a shame, because clearly the work put in was considerable. By focusing on only a few key analyses and results, interpretability and potential impact could be much improved.

      Thank you very much for your constructive suggestions. We have revised the paper throughout to increase the focus and readability. To help readers to easily understand stability theory and analysis as used in this study, we added Box 1 which combines the theoretical framework with hypotheses, especially about proposed effects of species diversity on stability, and provides a glossary of terms. In addition, we summarize our approach at the end of the Introduction section and in the Results section first present the analysis using all species and then the analysis using only dominant species. Furthermore, to focus on our main findings, we removed detailed analyses (but deliver them as summary files together with dataset and R script to a third-party data deposition). Finally, we added calculations of CV and synchrony across spatial scales to the Methods section.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper presents a Bayesian model framework for estimating individual perceptual uncertainty from continuous tracking data, taking into account motor variability, action cost, and possible misestimation of the generative dynamics. While the contribution is mostly technical, the analyses are well done and clearly explained. The paper provides therefore a didactic resource for students wishing to implement similar models on continuous action data.

      First off, the paper is lucidly written - which made it a very pleasant read, especially compared to many other modeling papers, and the authors are to be congratulated for this. As such, the paper provides a valuable resource for didactic purposes alone. While the employed methods are not necessarily individually novel, the assembly of various parts into a coherent framework appears nonetheless valuable.

      Thank you for the positive evaluation!

      I have two major concerns, though:

      1). My main comment regards the model comparison using WAIC (Figure 4E) or cross-validation (Figure S4a): If we translate these numbers into Bayes factors, they are extraordinarily high. I assume that the p(x_i|\theta_s) in equation 7 are calculated assuming that the motor noise on u_{i,t} is independent? This would assume that motor processes act i.i.d with a timeframe of 60ms, which is probably not a very realistic assumption- given that much of the motor variability (as stated by the authors) comes likely from a central (i.e. planning) origin. Would the delta-WAIC not be much smaller if motor noise was assumed to be correlated across time points? Would this assumption change the \sigma estimates?

      Thank you for posing this question. First, sequential models tend to have much larger differences in the likelihood of parameters given data because of the large number of individual data points within a single sequence. Thus, it is not uncommon for model comparison to show much more extreme differences between models for sequential data, as is the case in the present manuscript.

      Second, since our computational framework is based on LQG control, the model indeed assumes that motor noise is independent across time steps. We agree that this assumption might not be realistic for time steps of 16ms duration. While this assumption is certainly a simplification, the assumption of independent noise across time steps is very common both in perceptual models as well as in models of motor control, and there is to our knowledge no computationally straightforward way around it in the LQG framework. It thus applies to all of the models considered in this paper, as they all assume temporally uncorrelated noise, both in perception and action. Therefore, the ranking between the models in the model comparison should hopefully not be affected in a systematic way favoring individual models disproportionately more than others, although the magnitudes of differences in WAIC might be smaller. Since the differences in WAIC are currently in the range of 1e4, we think that they will still be significant, even when accounting for correlated noise.

      Third, we think that the simplifying assumption of independent noise does not invalidate the calculation of the WAIC, which assumes independence across trials. The p(x_i | theta_s) in equation (8) are the likelihoods of whole trials. To compute them, we assume independence of the motor noise across time steps.

      We have added a short passage in the subsection ‘model comparison’:

      “Note that the assumption of independent noise across time steps might lead to WAIC values that are larger than those obtained under a more realistic noise model involving correlations across time. However, this should not necessarily affect the ranking between models in a systematic way, i.e. favoring individual models disproportionately more than others.”

      and a passage in the discussion that points out that modeling the noise as being independent across time points is a simplifying assumption:

      “Finally, assuming independent noise across time steps at the experimental sampling rate of (60Hz) is certainly a simplifying assumption. Nevertheless, the assumption of independent noise across time steps is very common both in models of perceptual inference as well as in models of motor control, and there is to our knowledge no computationally straightforward way around it in the LQG framework.”

      2). While the results in Figure 4a are interesting, the deviation of the \sigma estimates from the standard psychophysical estimates for the most difficult condition remains unexplained. What are the limits of this method in estimating perceptual acuity near the perceptual threshold? Is there a problem that subjects just "give up" and the motor cost becomes overwhelming? Would this not invalidate the method for threshold detection?

      We fully agree that for the most difficult conditions at the lowest contrasts all sequential models we considered are biased with respect to the uncertainties obtained with the 2AFC experiment, which is supposed to be equivalent. Interestingly, when considering synthetic data, we did not see such a discrepancy. Thus, the observed bias points towards an additional mechanism such as a computational cost or computational uncertainty, that is not captured by the current models at very low contrast.

      For the results in Fig. 4, we assumed a constant behavioral cost across all conditions. The assumption that the cost is independent of perceptual uncertainty might not hold in reality, exactly in line with your hypothesis that subjects might just "give up". There are other possible explanations, though, that could potentially be relevant here. For example, the visual system is known to integrate visual signals over longer times, when contrast is lower. This may introduce additional non-linearities in the integration, which could affect the sensitivity, as already pointed out in the study by Bonnen et al. (2015).

      We have added the following passage in the discussion section:

      “In the lowest contrast conditions, all models we considered show a large and systematic deviation in the estimated perceptual uncertainty compared to the equivalent 2AFC task. Note that when considering synthetic data, we did not see such a discrepancy. Thus, the observed bias points towards additional mechanisms such as a computational cost or computational uncertainty, that are not captured by the current models at very low contrast. One reason for this could be that the assumption of constant behavioral costs across different contrast conditions might not hold at very low contrasts, because subjects might simply give up tracking the target although they can still perceive its location. Another possible explanation is that the visual system is known to integrate visual signals over longer times at lower contrasts [Dean & Tolhurst, 1986; Bair & Movshon, 2004], which could affect not only sensitivity in a nonlinear fashion but could also lead to nonlinear control actions extending across a longer time horizon. Further research will be required to isolate the specific reasons.“

      Reviewer #2 (Public Review):

      This manuscript develops and describes a framework for the analysis of data from so-called continuous psychophysics experiments, a relatively recent approach that leverages continuous behavioral tracking in response to dynamic stimuli (e.g. targets following a position random walk). Continuous psychophysics has the potential to dramatically improve the pace of data collection without sacrificing the ability to accurately estimate parameters of psychophysical interest. The manuscript applies ideas from optimal control theory to enrich the analysis of such data. They develop a nested set of data-analytic models: Model 1: the Kalman filter (KF), Model 2: the optimal actor (which is a special case of a linear quadratic regulator appropriate for linear dynamics and Gaussian variability), Model 3: the bounded actor w. behavioral costs, and Model 4: the bounded actor w. behavioral costs and subjective beliefs. Each successive model incorporates parameters that the previous model did not. Each parameter is of potential importance in any serious attempt to human model visuomotor behavior. They advertise that their methods improve the accuracy the inferred values of certain parameters relative to previous methods. And they advertise that their methods enable the estimation of certain parameters that previous analyses did not.

      What were the parameters? In this context, the Kalman filter model has one free parameter: perceptual uncertainty of target position (\sigma). The optimal actor (Model 2) incorporates perceptual uncertainty of cursor position (\sigma_p) and motor variability (\sigma_m), in addition to perceptual uncertainty of target position (\sigma) that is included in the Kalman filter (Model 1). The bounded actor with behavioral costs (Model 3) incorporates a control cost parameter (c) that penalizes effort ('movement energy'). And the bounded actor with behavioral costs and subjective beliefs (Model 4) further incorporates the human observer possibly mistaken 'beliefs' about target dynamics (i.e. how the human's internal model of target motion differs from the true generative model. Model allows for the true target dynamics (position-random-walk with drift = \sigma_rw) to be mistakenly believed to be governed by a position-random-walk with drift = \sigma_s plus a velocity-random-walk with drift = \sigma_v).

      The authors develop each of these models, show on simulated data that true model parameters can be accurately inferred, and then analyze previously collected data from three papers that helped to introduce the continuous psychophysics approach (Bonnen et al. 2015, 2017 & Knoll et al. 2018). They report that, of the considered models, the most sophisticated model (Model 4) provides the best accounting of previously collected data. This model more faithfully approximates the cross-correlograms relating target and human tracking velocities than the Kalman filter model, and is favored by the widely applicable information criterion (WAIC).

      The manuscript makes clear and timely contributions. Methods that are capable of accurately estimating the parameters described above from continuous psychophysics experiments have obvious value to the community. The manuscript tackles a difficult problem and seems to have made important progress.<br /> Some topics of central importance were not discussed with sufficient detail to satisfy an interested reader, so I believe that additional discussion and/or analyses are required. But the work appears to be well-executed and poised to make a nice contribution to the field.

      The manuscript, however, was an uneven read. Parts of it were very nicely written, and clearly explained the issues of interest. Other parts seemed organized around debatable logic, making inappropriate comparisons to--and misleading characterizations of--previous work. Other parts still were weakened by poor editing, typos, and grammatical mistakes.

      Overall, it is a nice piece of work. But the authors should provide substantially more discussion so that readers will develop a better intuition and how and why the inference routines enable accurate estimation, and how the values of certain parameters trade off with one another. Most especially, the authors should be very careful to accurately describe and appropriately use the previous literature.

      Thanks for the generous overall assessment and the thorough review! We hope that we can address the points you raised in our revised manuscript with the answers to your specific comments below.

      To summarize, we have substantially revised the discussion section to clarify our reasoning and avoid potential misinterpretations of parts of our manuscript as a misrepresentation of previous work. We have also extended the introduction and the exposition of our models in the results section to help readers develop an intuition about the models and inference routines.

    1. Author Response

      Reviewer #3 (Public Review):

      Gavanetto et al. propose an interesting method to identify membrane proteins based on the analysis of single-molecule AFM (smAFM) force-extension traces obtained from native plasma membranes. In the proposed pipeline, the authors use smAFM to non-specifically probe isolated plasma membranes by recording a large number (millions) of force-extension traces. While, as expected, most of them lack any binding or represent spurious events, the authors use an unsupervised clustering algorithm to identify groups of force-extension curves with a similar mechanical pattern, suggesting that each cluster corresponds to a unique protein species that can be fingerprinted by its specific force-extension pattern. By implementing a Bayesian framework, the authors contrast the identified groups with proteomics databases, which provide the most likely proteins that correspond to the identified force-extension clusters. A set of control experiments complements the manuscript to validate the proposed methodology, such as the application of their pipeline using purified samples or overexpressing a specific protein species to enrich its population.

      The primary strength of the manuscript is its originality, as it proposes a novel application of smAFM as a protein-detection method that can be applied in native samples. This methodology combines ingredients from conventional mass spectrometry and cryoEM; the contour length released upon extending a protein is a direct measure of its sequence extension (related to its mass), but the force pattern contains insightful information about the protein's structure. In this sense, the authors' proposal is very smart. However, the relationship between protein structure and mechanics is far from straightforward, and here perhaps lies one of the main limitations of the proposed method. This is particularly true for the case of membrane proteins, where we cannot talk about protein unfolding in its classical sense but rather about pullout events which is likely what each peak corresponds to (indeed, the authors speak throughout the paper about unfolding events, which I believe is not the correct term).

      We fully agree with the semantics concern of reviewer #3 about the term unfolding. A membrane protein when pulled with the tip of the AFM is pulled out of the membrane (see 2 in the image below) and, simultaneously, the segment that is pulled out unfolds (see 3). To our knowledge, force peaks corresponding to a contour length equal to 2 where not consistently observed or reported (when e.g. a transmembrane alpha helix is out of the membrane but folded).

      Since the field evolved with the practice of using the term ‘unfolding’ even for membrane proteins (see for instance (Kessler and Gaub, 2006; Oesterhelt et al., 2000; Yu et al., 2017) and many others), we would prefer to stick with this term.

      In the context of membrane proteins the term unfolding therefore refers to at least the tertiary structure of the protein, because it is not clear when and at which timescale the secondary structures really unfolds.

      We pointed this out in Line 131 (and following Lines).

    1. Author Response

      Reviewer #1 (Public Review):

      Cheng et al. address one of the fundamental questions of gene expression regulation - what are the relative contributions of RNA-level and protein-level regulation to the final gene expression levels. In order to do that they take advantage of mainly published datasets, especially tumor datasets where matching somatic copy number alterations (SCNAs), RNA expression and protein expression data is available. Performing proteogenomic analysis (taking DNA, RNA and protein into account) they address several open questions, such as: Is gene compensation happening mainly at the RNA level, protein level or both for each gene? Is this the same across different tissue types and also cellular pathways? Taking advantage of the SCNAs in the DNA, the authors use correlation analysis of DNA to RNA and RNA to protein to determine if the expression of a gene is regulated mainly at the level of RNA or protein in the respective samples.

      Although it is mainly a very descriptive study, the meta-analysis of existing datasets (and one smaller dataset that was newly generated) yields very interesting observations, which will be of interest to the cancer and gene expression community. However, there is limited mechanistic insight into how the observations can be explained. This is not a problem in my view as the observations are interesting enough in themselves.

      The main findings of the study are:

      • In general genes are either regulated at the RNA-level or at the protein level, but rarely at both.

      • This is the first study (at least as far as I know) to look at tissue-specific RNA-level and protein-level compensation across several different tumor types. Interestingly, the authors show tissue specificity of RNA and protein-level compensation - for example lung adenocarcinoma does not show nearly any compensation.

      • Protein complex genes show stronger protein-level regulation than non-complex genes and the opposite trend in regards to RNA level regulation.

      • There seems to be an agreement for genes within the same pathway that they show a similar regulatory mode (either RNA level or protein level).

      • Genes involved in RNA processing, mRNA translation and mitochondrial regulation are generally upregulated at the protein-level in highly aneuploid primary tumor samples.

      However, I do think that two points need to be addressed by additional analyses to strengthen the findings.

      • The authors show that SCNAs are often significantly compensated at the protein-level in most tumor types. This compensation is also normally stronger than RNA level compensation. A technical issue about this finding that needs to be addressed is that this is mainly based on proteomics data that used TMT for quantification. TMT-based quantifications, although quite precise, are not always the most accurate measurements in the sense of capturing the true amplitude of changes. This is due to the so-called ratio compression of TMT mass spec data. The authors need to account for that in order to exclude that this technical limitation of TMT-based proteomics measurements is a main contributor to the protein-level compensation seen. Do the authors also have some proteomics data where label-free quantification of SILAC quantification was used? Do the same conclusions hold true when such data sets are used?

      We thank the reviewer for this comment and point which we have now addressed through the following literature search or analyses:

      • First, we found there are some previous studies which observed the similar protein-level compensation in yeast and human cells by different detection methods. Dephoure et al. compared two different methods, stable isotope labeling by amino acids in cell culture (SILAC) and tandem mass tag (TMT) based proteomics. The protein-level compensation of gained genes in yeast was discovered by both methods (Figure 2 and Figure 2 – figure supplement 1 of Dephoure et al., 2014). Similarly, Stingele et al. identified the protein-level compensation in pairs of isogenic diploid and aneuploid human cell lines by SILAC (Figure 2B of Stingele et al., 2012). Another group also found the protein-level compensation in primary human fibroblasts from individuals with Patau (trisomy 13), Edwards (trisomy 18) or Down (trisomy 21) syndromes by MS3-based approach (Hwang et al., 2021), which should eliminate the interference of ratio distortion (Ting et al., 2011). Taken together, those previous studies suggest the protein-level compensation should not be just the artifacts induced by the technical limitation of TMT-based proteomics.

      • To further validate the protein-level compensation, we performed the same analysis on TCGA (The Cancer Genome Atlas Program) (Research Network et al., 2013) COAD samples for which label-free proteomics data is available (Zhang et al., Nature, 2014). Consistent with TMT-based proteomics, significant compensation at the protein level was found, which is higher for complex genes than non-complex genes (Figure 1 – figure supplement 1C, Supplementary File 1G). As we observed before for COAD (Figure 1C), RNA-level compensation was shown in all groups of DNA change, and was stronger for non-complex genes (deep loss and high gain, FDR<0.005, Figure 1 – figure supplement 1C, Supplementary File 1G). These results suggest that the limitations imposed by the TMT quantification do not alter the conclusions of our analysis on gene compensation. We have now added this data in Figure 1 – figure supplement 1C and Supplementary File 1G and corresponding text at page 5.

      • Many of the statistically significant differences seen - e.g complexed proteins versus non-complexed proteins, highly conserved proteins versus less conserved proteins - have actually a relatively small effect size. It is not 100% clear to me that the authors apply always the most stringent and appropriate statistical evaluation. For example, when two density plots are compared and it is evaluated if the distributions differ significantly from each other (e.g. the median), the authors constantly use a bootstrapping strategy (most plots in Fig 2 and Fig S2). Due to the high number of iterations, bootstrapping is very sensitive to picking up statistical differences, even if there are very small effect size differences (as is the case for many of the comparisons). Would not a KS test be more appropriate to compare two density distributions? If a KS test is applied - do the authors still recapitulate the same statistical significance tendencies as seen with their bootstrapping strategy?

      We thank the reviewer for this comment, and we have addressed it in detail. We have performed the analyses using Mann-Whitney U test and Kolmogorov-Smirnov (KS) test (Supplementary File 2K). Compared with bootstrapping, the p-values calculated by Mann-Whitney U test or KS test were much smaller, close to zero. Therefore, the same statistical significance tendencies were observed no matter which statistic method was used (bootstrapping, Mann-Whitney U test or Kolmogorov-Smirnov test). While Mann-Whitney U test or KS test carries the risk of p-value inflation due to the high sample number, the bootstrapping method can solve the problem as it is independent from the sample number. Initially we had used Mann-Whitney U test for all our analyses and were suggested to include bootstrapping method after consultation with the NYU Biostatistics Resource.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes a systematic biochemical analysis of UBX proteins in facilitating protein unfolding by the p97-UFD1-NPL4 (referred to here is the p97 complex). The p97 complex binds Ub and unfolds it to allow the ubiquitylated protein to be translocated into the p97 ATPase pore for unfolding. This paper demonstrates that UBX proteins are able to reduce the necessary ubiquitin chain length in order to support unfolding by p97. They explore this using ubiquitylated CMG helicase as a substrate. Removal of CMG helicase from replicated DNA is required for completion of DNA synthesis.

      First the authors demonstrate that the p97 complex only only unfolds CMG with very long Ub chains. The then show that the high threshold for Ub is reduced when UBXN7, FAF1 or FAF2 are added. These proteins bind to both the p97 complex and Ub in substrates. This is then followed up in cells by demonstrating that removal of UBXN7 and FAF1 reduces CMG disassembly and is synthetic with reduced CMG ubiquitin ligase activity.

      The conclusion that human p97 requires UBX proteins to support unfolding/segregase activity when Ub chains are short would be strengthened by more precise characterization of the length of ubiquitin chains being studied, as the methods do not precisely determine the chain lengths and how this is overlapping with the number and location of primary ubiquitylation sites on Mcm7.

      Please see our reply above to essential revision point 2 (data in Figure 1-figure supplement 1 and Figure 2-figure supplement 3)

      The in cellulo results, while consistent with a contributing role for FAF1 and UBXN7 in disassembly of the CMG by p97, indicate that either other factors are required in cells or that p97 can disassemble CMG with relative short chains in cells without the need for the UBX proteins. This needs to be reconciled with the proposed model.

      We now discuss on lines 444-450 that CMG disassembly in the absence of UBXN7 and FAF1 might be promoted by additional UBX proteins not characterised in this study, or else be due to extensive CMG-MCM7 ubiquitylation that bypasses the requirement for UBX proteins (as predicted by our data in Figure 1). Note that short ubiquitin chains on CMG-MCM7 in cells treated with p97 inhibitor need to be interpreted with caution, as it is likely that p97 inhibition lowers the pool of free ubiquitin in cells. This point is discussed on lines 444-445 of the revised manuscript.

      Reviewer #3 (Public Review):

      The ATPase p97 (Cdc48 in yeast) unfolds ubiquitinated substrates with the help of its heterodimeric cofactor UFD1-NPL4 (U-N). Using the previously established CMG helicase complex as model substrate in a fully reconstituted biochemical assay, Fujisawa and Labib show that p97-U-N can efficiently disassemble the helicase complex only when it is modified with multiple, long ubiquitin chains. This is in contrast to the yeast Cdc48-U-N complex, which disassembles helicase complexes carrying long or short (6-10 ubiquitin moieties) chains with similar efficiency. The authors demonstrate that the requirement of p97-U-N for long chains can be overcome by the presence of p97 cofactors of the UBA-UBX type, including UBXN7, FAF1, FAF2 and (much less so) UBXN1. They show that this reduction in the 'ubiquitin threshold' of p97-U-N by UBXN7, FAF1 and FAF2 requires their UBX domain mediating p97 binding. They further show that the UBA and UIM domains of UBXN7 contribute to its activity in the assay, whereas the UBA domain of FAF1 and FAF2 is dispensable. Instead, a coiled-coil domain preceding the UBX domain of FAF1 and FAF2 is required for their activity, and both the coiled-coil-UBX domain organization and its activity are conserved in the worm homologue UBXN-3. Using UBXN7 and FAF1 knockout cells, Fujisawa and Labib then demonstrate that UBXN7 is required for efficient CMG helicase disassembly during S phase, with a minor contribution of FAF1, whereas both cofactors possess redundant roles in mitotic CMG helicase disassembly. Finally, the authors show that UBXN7 and FAF1 double knockout cells are hypersensitive to the NEDDylation inhibitor MLN4924 and suggest that this reflects their importance for p97-U-N unfoldase activity under conditions of restricted ubiquitination activity.

      This manuscript describes the intriguing observation that the yeast and mammalian Cdc48/p97-U-N complexes have distinct requirements, at least in the in vitro assay used, with respect to the substrate´s ubiquitination state and to the presence of additional cofactors. While the concept of UBA-UBX cofactors assisting/stimulating Cdc48/p97-U-N activity is well-established, their link to ubiquitin chain length is novel and unexpected. The experiments are performed to a high technical standard, and the conclusions are mostly supported by the data. However, a shortcoming of the paper is that it remains entirely descriptive regarding the effect of the UBX proteins on the ubiquitin threshold, without providing mechanistic insights into their function or the molecular basis underlying the distinct thresholds.

      1) It remains unclear if the failure of p97-U-N to disassemble the helicase complex carrying short ubiquitin chains reflects impaired binding, priming or translocation of the substrate. It should be straightforward to test if the UBA-UBX cofactors simply stabilize the p97-U-N-substrate complex.

      As shown in previous studies, human UFD1-NPL4 bind stably to p97 in the absence of UBX proteins (our new data in Figure 3-figure supplement 2D illustrate this).

      The distinct domain requirements for UBXN7 (UBA, UIM, UBX) and FAF1/FAF2 (coiled-coil-UBX) suggest different mechanisms of stimulation, which should be discussed in more detail.

      We discuss further the roles of UBXN7 and FAF1/FAF2 on lines 533-548.

      The additive defects of the UBXN7 and FAF1 double knockout cells could indicate either redundant functions (as the authors propose) or synergistic function of both cofactors. To that end, the authors could test if UBXN7 and FAF1 can bind simultaneously to the same p97-U-N-substrate complex and if they act synergistically in helicase disassembly, e.g. at limiting cofactor concentrations.

      Previous studies have found that UBXN7 binds to p97 and UFD1-NPL4 with a 1:6:1 ratio and the same is true for FAF1, without any evidence of both UBXN7 and FAF1 binding to the same p97-UFD1-NPL4 complexes (Hanzelmann et al., 2011). Correspondingly, we did not observe any synergistic effect of FAF1 with UBXN7 upon the disassembly of ubiquitylated CMG by p97-UFD1-NPL4, when comparing reactions with a single UBX protein or reactions with both (our unpublished data).

      2) Having all purified proteins at hand, the authors should test which component of the system causes the elevated ubiquitin threshold of mammalian p97-U-N, by combining yeast Cdc48 with mammalian U-N and vice versa, etc.

      We thank the reviewer for this very interesting suggestion. The data are presented in Figure 3, showing that human UFD1-NPL4 and yeast Ufd1-Npl4 set the ubiquitin threshold for their cognate unfoldase enzymes.

      Can yeast Ubx5, which is a clear homologue of UBXN7, substitute for the mammalian UBA-UBX cofactors?

      This was also an interesting suggesting – we tested Ubx5 and didn’t see any stimulation. We didn’t include the data as we lack a positive control for Ubx5 activity.

      3) The authors emphasize that mammalian p97-U-N in the absence of UBA-UBX cofactors requires long ubiquitin chains for activity. However, they should consider the possibility that the critical property is chain topology, rather than chain length. There is evidence that p97-U-N prefers substrates with branched chains (see PMIDs 28512218, 29033132), and multiple ubiquitin chains on the helicase substrate may mimic those.

      We thank the reviewer for raising this important point and we now cite the two papers mentioned above, on lines 171 and 177.

      In the revised version of the manuscript, we characterise carefully the ubiquitin chains that are formed under the various conditions used (Figure 1-figure supplement 1). Importantly, we also show that human p97-UFD1-NPL4 can disassemble highly ubiquitylated CMG, regardless of whether there are several or just one ubiquitin chains attached to CMG-Mcm7 (Figure 1-figure supplement A+C; Figure 2-figure supplement 3A).

      Moreover, we also show that human p97-UFD1-NPL4 is comparable to yeast Cdc48-Ufd1-Npl4 in being able to disassemble CMG that is highly ubiquitylated with ‘K48-only’ ubiquitin that cannot form mixed chain linkages (Figure 2-figure supplement 3B).

      These data indicate that p97-UFD1-NPL4 can disassemble heavily ubiquitylated CMG complexes with long K48-linked ubiquitin chains on CMG-Mcm7, regardless of the number of chains and regardless of the presence of other chain linkages (in addition to K48-linked chains).

      It appears that worm CDC48-U-N in the absence of UBXN-3 cannot efficiently disassemble substrate carrying even long chains (Fig. 3 - supplement 2). The authors should discuss this finding in the context of their ubiquitin threshold model.

      This is an interesting point, suggesting that the threshold of C. elegans CDC-48_UFD-1_NPL-4 is even higher than human p97-UFD1-NPL4, in the absence of UBX proteins. However, we think that this issue is beyond the scope of our manuscript and likely requires structural biology to provide a definitive explanation. Our manuscript just uses the C. elegans enzymes to make one simple and clear point – namely that the essential role of the coiled coil domain of human FAF1 is conserved in its worm orthologue UBXN-3.

    1. Author Response

      Reviewer #3 (Public Review):

      In this work, the authors describe a novel method, based on deep learning, to analyze large numbers of yeast cells dividing in a controlled environment. The method builds on existing yeast cell trapping microfluidic devices that have been used for replicative lifespan assay. The authors demonstrate how an optimized microfluidic device can be coupled with deep learning methods to perform automatic cell division tracking and single cell trajectories quantification. The overall performance of the method is impressive: it allows to deal with large image datasets generated by timelapse microscopy several order of magnitudes faster than what manual annotation would require. The method has been carefully tested on several microscopy settings and datasets and compared with known results from the literature in a convincing manner. In addition, the authors show how the analysis pipeline can be enriched with semantic segmentation to quantify cellular physiology and gene expression during their lifespan, creating high quality, high throughput measurements of single cell trajectories. The software, its documentation and related datasets are available through public repository. Taken together, the author succeeded in setting up a method that can be a game changer for high throughput longitudinal analysis of yeast cells.

      Overall, the method seems robust and powerful but some aspects need to be clarified and/or extended.

      • The authors chose MATLAB to develop DetecDiv. This is a valid choice but as Python is becoming the standard for deep learning developments it is important to 1/ better justify the use of MATLAB and 2/ discuss how this can be "translated into" and/or linked with Python. This would facilitate adoption by other research teams.

      Using MATLAB as a prototyping language was instrumental for us in establishing the proof of principle of the method reported here since we have a long-standing experience in MATLAB programming. Yet, we fully agree with the reviewer that Python may appear as a more legitimate choice, especially in the field of deep learning. For future work, we are considering moving our code to Python to make it more widely accessible and to more quickly benefit from the latest development in the field. We also envision that Python developers could transpose our methods for their own research interests.

      Last, we note that MATLAB has bidirectional communication with a number of programming languages, including Python. Therefore, it is currently possible to use Python scripts to fully control the DetecDiv pipelines by calling its low levels functions at the command line. Obviously, it may be more cumbersome to use than native Python code and it restricts the possibility to use the graphical user interface that we have developed.

      • A critical aspect of deep learning methods is their potential ability to be used on a different datasets and/or experimental setup (transfer learning). The authors explained that a "generalist" model, trained using several datasets perform comparably (or even better) than "specialist" models that are independently trained on a specific dataset. Yet, they do not discuss how accurate would an already trained generalist model perform on a novel dataset made with a different imaging setup and/or a different yeast strain?

      We thank the reviewer for this comment, which is somewhat related to point #2 raised by reviewer #1, regarding the ability of a model to generalize its prediction to various contexts. In the revised version, we now provide clear evidence that the model designed for division counting and RLS analysis, which is trained on WT data only, can successfully predict the lifespan of mutants such as fob1delta and sir2delta (new Figure 2 - Figure supplement 5) and the onset of cell death during stress response assays (Figure 6 - Figure supplement 1).

      However, changing the imaging conditions (e.g. magnification, illumination, etc) would quickly deteriorate the performance of the model, unless it has been exposed to these new conditions upon training. Hence, the purpose of the ‘generalist’ approach we use is to demonstrate that the models we use have the capacity to deal with various imaging conditions when appropriately trained.

      We have added a sentence in the discussion to explain what determines the potential of a model to be successfully employed in different contexts.

    1. Author Response

      Reviewer #1 (Public Review):

      This study investigates low-frequency (LF) local field potentials and high-frequency (HF, >30 Hz) broadband activity in response to the visual presentation of faces. To this end, rhythmic visual stimuli were presented to 121 human participants undergoing depth electrode recordings for epilepsy. Recordings were obtained from the ventral occipito-temporal cortex and brain activity was analyzed using a frequency-tagging approach. The results show that the spatial, functional, and timing properties of LF and HF responses are largely similar, which in part contradicts previous investigations in smaller groups of participants. Together, these findings provide novel and convincing insights into the properties and functional significance of LF and HF brain responses to sensory stimuli.

      Strengths

      • The properties and functional significance of LF and HF brain responses is a timely and relevant basic science topic.

      • The study includes intracranial recordings in a uniquely high number of human participants.

      • Using a frequency tagging paradigm for recording and comparing LF and HF responses is innovative and straightforward.

      • The manuscript is well-written and well-illustrated, and the interpretation of the findings is mostly appropriate.

      Weaknesses

      • The writing style of the manuscript sometimes reflects a "race" between the functional significance of LF and HF brain responses and researchers focusing on one or the other. A more neutral and balanced writing style might be more appropriate.

      We would like first to thank the reviewer for his/her positive evaluation as well as constructive and helpful comments for revising our manuscript.

      Regarding the writing style: we had one major goal in this study, which is to investigate the relationship between low and high frequencies. However, it is fair to say – as we indicate in our introduction section – that low frequency responses are increasingly cast aside in the intracranial recording literature. That is, an increasing proportion of publications simply disregard the evoked electrophysiological response that occur at the low end of the frequency spectrum, to focus exclusively on the high-frequency response (e.g., Crone et al., 2001; Flinker et al., 2011; Mesgarani and Chang, 2012; Bastin et al., 2013; Davidesco et al., 2013; Kadipasoaglu et al., 2016; 2017; Shum et al., 2013; Golan et al., 2016; 2017; Grossman et al., 2019; Wang et al., 2021, see list of references at the end of the reply).

      Thus, on top of the direct objective comparison between the two types of signals that our study originally provides, we think that it is fair to somehow reestablish the functional significance of low frequency activity in intracranial recording studies.

      The writing style reflects that perspective rather than a race between the functional significance of LF and HF brain responses.

      • It remains unclear whether and how the current findings generalize to the processing of other sensory stimuli and paradigms. Rhythmic presentation of visual stimuli at 6 Hz with face stimuli every five stimuli (1.2 Hz) represents a very particular type of sensory stimulation. Stimulation with other stimuli, or at other frequencies likely induce different responses. This important limitation should be appropriately acknowledged in the manuscript.

      We agree with the Reviewer 1 (see also Reviewer 2) that it is indeed important to discuss whether the current findings generalize to the other brain functions and to previous findings obtained with different methodologies. We argue that our original methodological approach allows maximizing the generalizability of our findings.

      First, frequency-tagging approach is a longstanding stimulation method, starting from the 1930s (i.e., well before standard evoked potential recording methods; Adrian & Matthews, 1934; intracranially: Kamp et al., 1960) and widely used in vison science (Regan, 1989; Norcia et al., 2015) but also in other domains (e.g., auditory, somato-sensory stimulation). More importantly, this approach does not only significantly increase the signal-to-noise ratio of neural responses, but the objectivity and the reliability of the LF-HF signal comparison (objective identification and quantification of the responses, very similar analysis pipelines).

      Second, regarding the frequency of stimulation, our scalp EEG studies with high-level stimuli (generally faces) have shown that the frequency selection has little effect on the amplitude and the shape of the responses, as long as the frequency is chosen within a suitable range for the studied function (Alonso-Prieto et al., 2013). Regarding the paradigm used specifically in the present study (originally reported in Rossion et al., 2015 and discussed in detail for iEEG studies in Rossion et al., 2018), it has been validated with a wide range of approaches (EEG, MEG, iEEG, fMRI) and populations (healthy adults, patients, children and infants), identifying typically lateralized occipito-temporal face-selective neural activity with a peak in the middle section of the lateral fusiform gyrus (Jonas et al., 2016; Hagen et al., 2020 in iEEG; Gao et al., 2018 in fMRI).

      Importantly, specifically for the paradigm used in the present study, our experiments have shown that the neural face-selective responses are strictly identical whether the faces are inserted at periodic or non-periodic intervals within the train of nonface objects (Quek & Rossion, 2017), that the ratio of periodicity for faces vs. objects (e.g., 1/5, 1/7 … 1/11) does not matter as long as the face-selective responses do not overlap in time (Retter & Rossion, 2016; Retter et al., 2020) and that the responses are identical across a suitable range of base frequency rates (Retter et al., 2020).

      Finally, we fully acknowledge that the category-selective responses would be different in amplitude and localization for other types of stimuli, as also shown in our previous EEG (Jacques et al., 2016) and iEEG (Hagen et al., 2020) studies. Yet, as indicated in our introduction and discussion section, there are many advantages of using such a highly familiar and salient stimulus as faces, and in the visual domain at least we are confident that our conclusions regarding the relationship between low and high frequencies would generalize to other categories of stimuli.

      We added a new section on the generalizability of our findings at the end of the Discussion, p.32-33 (line 880) (see also Reviewer 2’s comments). Please see above in the “essential revisions” for the full added section.

      Reviewer #2 (Public Review):

      The study by Jacques and colleagues examines two types of signals obtained from human intracortical electroencephalography (iEEG) measures, the steady-state visual evoked potential and a broadband response extending to higher frequencies (>100 Hz). The study is much larger than typical for iEEG, with 121 subjects and ~8,000 recording sites. The main purpose of the study is to compare the two signals in terms of spatial specificity and stimulus tuning (here, to images of faces vs other kinds of images).

      The experiments consisted of subjects viewing images presented 6 times per second, with every 5th image depicting a face. Thus the stimulus frequency is 6 Hz and the face image frequency is 1.2 Hz. The main measures of interest are the responses at 1.2 Hz and harmonics, which indicate face selectivity (a different response to the face images than the other images). To compare the two types of signals (evoked potential and broadband), the authors measure either the voltage fluctuations at 1.2 Hz and harmonics (steady-state visually evoked potential) or the fluctuations of broadband power at these same frequencies.

      Much prior work has led to the interpretation of the broadband signal as the best iEEG correlate of spatially local neuronal activity, with some studies even linking the high-frequency broadband signal to the local firing rate of neurons near the electrode. In contrast, the evoked potential is often thought to arise from synchronous neural activity spread over a relatively large spatial extent. As such, the broadband signal, particularly in higher frequencies (here, 30-160 Hz) is often believed to carry more specific information about brain responses, both in terms of spatial fidelity to the cortical sources (the cortical point spread function) and in terms of functional tuning (e.g., preference for one stimulus class over another). This study challenges these claims, particularly, the first one, and concludes that (1) the point spread functions of the two signals are nearly identical, (2) the cortical locations giving rise to the two signals are nearly identical, and (3) the evoked potential has a considerably higher signal-to-noise ratio.

      These conclusions are surprising, particularly the first one (same point spread functions) given the literature which seems to have mostly concluded that the broadband signal is more local. As such, the findings pose a challenge to the field in interpreting the neuronal basis of the various iEEG signals. The study is large and well done, and the analysis and visualizations are generally clear and convincing. The similarity in cortical localization (which brain areas give rise to face-selective signals) and in point-spread functions are especially clear and convincing.

      We thank the reviewer for his/her fair and positive evaluation of our work and helpful comments.

      Although the reviewer does not disagree or criticize our methodology, we would like to reply to their comment about the surprising nature of our findings (particularly the similar spatial extent of LF and HF). In fact, we think that there is little evidence for a difference in ‘point-spread’ function in the literature, and thus that these results are not really that surprising. As we indicate in the original submission (discussion), in human studies, to our knowledge, the only direct comparisons of spatial extent of LF responses and HF is performed by counting and reporting the number of significant electrodes showing a significant response in the two signals (Miller et al., 2007; Crone et al., 1998; Pfurtscheller et al., 2003; see list of references at the end of the reply). Overall, these studies find a smaller number of significant electrodes with HF compared to LF. Intracranial EEG studies pointing to a more focal origin of HF activity generally cite one or several of these publications (e.g. Shum et al., 2013). In the current study, we replicate this finding and provide additional analyses showing that it is confounded with SNR differences across signals and created artificially by the statistical threshold. When no threshold is used and a more appropriate measure of spatial extent is computed (here, spatial extent at half maximum), we find no difference between the 2 signals, except for a small difference in the left anterior temporal lobe. Moreover, in intracranial EEG literature, the localness of the HF response is often backed by the hypothesis that HF is a proxy for firing rate. Indeed, since spikes are supposed to be local, it is implied that HF has to be local as well. However, while clear correlations have been found between HF measured with micro-electrodes and firing rate (e.g., Nir et al. 2007; Manning et al., 2009), there is no information on how local the activity measured at these electrodes is, and no evidence that the HF signal is more local than LF signal in these recordings. Last, the link between (local?) firing rate and HF/broadband signal has been show using micro-electrodes which vastly differ in size compared to macro-electrodes. The nature of the relationship and its spatial properties may differ between micro-electrodes and macro-electrodes used in ECOG/SEEG recordings.

      We feel these points were all already discussed thoroughly in the original submission of the manuscript (see p. 28-30 in the revised manuscript) and did not modify the revised manuscript.

      The lack of difference between the two signals (other than SNR), might ordinarily raise suspicion that there is some kind of confound, meaning that the two measures are not independent. Yet there are no obvious confounds: in principle, the broadband measure could reflect the high-frequency portion of the evoked response, rather than a separate, non-phase locked response to the signal. However, this is unlikely, given the rapid fall-off in the SSVEP at amplitudes much lower than the 30 Hz low-frequency end of the broadband measure. And the lack of difference between the two signals should not be confused for a null result: both signals are robust and reliable, and both are largely found in the expected parts of the brain for face selectivity (meaning the authors did not fail to measure the signals - it just turns out that the two measures have highly similar characteristics).

      The current reviewer and reviewer #3 both commented or raised concerned about the fact that HF signal as measured in our study might be contaminated by LF evoked response, thereby explaining our findings of a strong similarity between the 2 signals.

      This was actually a potential (minor) concern given the time-frequency (wavelet) parameters used in the original manuscript. Indeed, the frequency bandwidth (as measured as half width at half maximum) of the wavelet used at the lower bound (30Hz) of the HF signal extended to 11Hz (i.e., half width at half maximum = 19 Hz). At 40Hz, the bandwidth extended to 24Hz (i.e., HWHM = 16 Hz). While low-frequency face-selective responses at that range (above 16 Hz) are negligible (see e.g., Retter & Rossion, 2016; and data below for the present study), they could have potentially slightly contaminated the high frequency activity indeed.

      To fully ensure that our findings could not be explained by such a contamination, we recomputed the HF signal using wavelets with a smaller frequency bandwidth and changed the high frequency range to 40-160 Hz. This ensures that the lowest frequency included in the HF signal (defined as the bottom of the frequency range minus half of the frequency bandwidth, i.e., half width at half maximum) is 30 Hz, which is well above the highest significant harmonic of face-selective response in our frequency-tagging experiment (i.e., 22.8 Hz ; defined as the harmonic of face frequency where, at group level, the number of recording contacts with a significant response was not higher than the number of significant contacts detected for noise in bins surrounding harmonics of the face frequency, see figure below). Thus, the signal measured in the 40-160 Hz range is not contaminated by lower frequency evoked responses.

      We recomputed all analyses and statistics as reported in the original manuscript with the new HF definition. Overall, this change had very little impact on the findings, except for slightly lower correlation between HF and LF (in Occipital and Anterior temporal lobe) when using single recording contacts as unit data points (Note that we slightly modified the way we compute the maximal expected correlation. Originally we used the test-retest reliability averaged over LF and HF; in the revised version we use the lower reliability value of the 2 signals, which is more correct since the lower reliability is the true upper limit of the correlation). This indicates that the HF activity was mostly independent from phase-locked LF signal already in the original submission. However, since the analyses with the revised time-frequency analyses parameters enforce this independence, the revised analyses are reported as the main analyses in the manuscript.

      The manuscript was completely revised accordingly and all figures (main and supplementary) were modified to reflect these new analyses. We also extended the methods section on HF analyses (p. 37) to indicate that HF parameters were selected to ensure independence of the HF signal from the LF evoked response, and provide additional information on wavelet frequency bandwidth.

      There are some limitations to the possible generalizability of the conclusions drawn here. First, all of the experiments are of the same type (steady-state paradigm). It could be that with a different experimental design (e.g., slower and/or jittered presentation) the results would differ. In particular, the regularity of the stimulation (6 Hz images, 1.2 Hz faces) might cause the cortex to enter a rhythmic and non-typical state, with more correlated responses across signal types. Nonetheless, the steady-state paradigm is widely used in research, and even if the conclusions turn out to hold only for this paradigm, they would be important. (And of course, they might generalize beyond it.)

      We understand the concern of the reviewer and appreciate the last statement about the wide use of the steady-state paradigm and the importance of our conclusions. Above that, we are very confident that our results can be generalized to slower and jittered presentations. Indeed, with this paradigm in particular, we have compared different frequency rates and periodic and nonperiodic stimulations in previous studies (Retter & Rossion, 2016; Quek et al., 2017; Retter et al., 2020). Importantly, specifically for the paradigm used in the present study, the neural face-selective responses are strictly identical whether the faces are inserted at periodic or non-periodic intervals within the train of nonface objects (Quek & Rossion, 2017), showing that the regularity of stimulation does not cause a non-typical state.

      Please see our reply above to essential revisions and reviewer 1, in which we fully address this issue, as well as the revised discussion section (p. 32-33).

      A second limitation is the type of stimulus and neural responses - images of faces, face-selectivity of neural responses. If the differences from previous work on these types of signals are due to the type of experiment - e.g., finger movements and motor cortex, spatial summation and visual cortex - rather than to the difference in sample size of type of analysis, then the conclusions about the similarity of the two types of signals would be more constrained. Again, this is not a flaw in the study, but rather a possible limitation in the generality of the conclusions.

      This is a good point, which has been discussed above also. Please note that this was already partly discussed in the original manuscript when discussing the potential factors explaining the spatial differences between our study and motor cortex studies:

      “Second, the hypothesis for a more focal HF compared to LF signals is mostly supported by recordings performed in a single region, the sensorimotor cortex (Miller et al., 2007; Crone et al., 1998; Pfurtscheller et al., 2003; Hermes et al., 2012), which largely consist of primary cortices. In contrast, here we recorded across a very large cortical region, the VOTC, composed of many different areas with various cortical geometries and cytoarchitectonic properties. Moreover, by recording higher-order category-selective activity, we measured activity confined to associative areas. Both neuronal density (Collins et al., 2010; Turner et al., 2016) and myelination (Bryant and Preuss, 2018) are substantially lower in associative cortices than in primary cortices in primates, and these factors may thus contribute to the lack of spatial extent difference between HF and LF observed here as compared to previous reports.” (p. 29-30).

      Also in the same section (p. 30) we refer to the type of signals compared in previous motor cortex studies:

      “Third, previous studies compared the spatial properties of an increase (relative to baseline) in HF amplitude to the spatial properties of a decrease (i.e. event-related desynchronization) of LF amplitude in the alpha and beta frequency ranges (Crone et al.,1998; 2001; Pfurtscheller et al., 2003; Miller et al., 2007; Hermes et al., 2012). This comparison may be unwarranted due to likely different mechanisms, brain networks and cortical layers involved in generating neuronal increases and decreases (e.g., input vs. modulatory signal, Pfurtscheller and Lopes da Silva, 1999; Schroeder and Lakatos, 2009). In the current study, our frequency-domain analysis makes no assumption about the increase and decrease of signals by face relative to non-face stimuli.”

      In the original submission, we also acknowledged that the functional correspondence between LF and HF signals is not at ceiling (p. 31) :

      “We acknowledge that the correlations found here are not at ceiling and that there were also slight offsets in the location of maximum amplitude across signals along electrode arrays (Figures 5 and 6). This lack of a complete functional overlap between LF and HF is also in line with previous reports of slightly different selectivity and functional properties across these signals, such as a different sensitivity to spatial summation (Winawer et al., 2013), to selective attention (Davidesko et al., 2013) or to stimulus repetition (Privmann et al., 2011). While part of these differences may be due to methodological differences in signal quantification, they also underline that these signals are not always strongly related, due to several factors. For instance, although both signals involve post-synaptic (i.e., dentritic) neural events, they nevertheless have distinct neurophysiological origins (that are not yet fully understood; see Buszaki, 2012; Leszczyński et al., 2020; Miller et al., 2009). In addition, these differing neurophysiological origins may interact with the precise setting of the recording sites capturing these signals (e.g., geometry/orientation of the neural sources relative to the recording site, cortical depth in which the signals are measured).”

      Additional arguments regarding the generalizability can be found in the added section of the discussion as mentioned above.

      Finally, the study relies on depth electrodes, which differs from some prior work on broadband signals using surface electrodes. Depth electrodes (stereotactic EEG) are in quite wide use so this too is not a criticism of the methods. Nonetheless, an important question is the degree to which the conclusions generalize, and surface electrodes, which tend to have higher SNR for broadband measures, might, in principle, show a different pattern than that observed her.

      This is an interesting point, which cannot be addressed in our study obviously. We agree with the reviewer’s point. However, in contrast to ECoG, which is restricted to superficial cortical layers and gyri, SEEG has the advantages of sampling all cortical layers and a wide range anatomical structures (gyri, sulci, deep structures as medial temporal structures. Therefore, we believe that using SEEG ensures maximal generalizability of our findings. Overall, the relatively low spatial resolution of these 2 recording methods (i.e., several millimeters) compared the average cortical thickness (~2-3 mm) makes it very unlikely that SEEG and ECOG would reveal different patterns of LF-HF functional correspondence.

      We added this point in a new section on the generalizability of our findings at the end of the Discussion (p.33, line 896).

      Overall, the large study and elegant approach have led to some provocative conclusions that will likely challenge near-consensus views in the field. It is an important step forward in the quantitate analysis of human neuroscience measurements.

      We sincerely thank the reviewer for his/her appreciation of our work

      Reviewer #3 (Public Review):

      Jacques et al. aim to assess properties of low and high-frequency signal content in intracranial stereo encephalography data in the human associative cortex using a frequency tagging paradigm using face stimuli. In the results, a high correspondence between high- and low-frequency content in terms of concordant dynamics is highlighted. The major critique is that the assessment in the way it was performed is not valid to disambiguate neural dynamics of responses in low- and high-frequency frequency bands and to make general claims about their selectivity and interplay.

      The periodic visual stimulation induces a sharp non-sinusoidal transient impulse response with power across all frequencies (see Fig. 1D time-frequency representation). The calculated mean high-frequency amplitude envelope will therefore be dependent on properties of the used time-frequency calculation as well as noise level (e.g. 1/f contributions) in the chosen frequency band, but it will not reflect intrinsic high-frequency physiology or dynamics as it reflects spectral leakage of the transient response amplitude envelope. For instance, one can generate a synthetic non-sinusoidal signal (e.g., as a sum of sine + a number of harmonics) and apply the processing pipeline to generate the LF and HF components as illustrated in Fig. 1. This will yield two signals which will be highly similar regardless of how the LF component manifests. The fact that the two low and high-frequency measures closely track each other in spatial specificity and amplitudes/onset times and selectivity is due to the fact that they reflect exactly the same signal content. It is not possible with the measures as they have been calculated here to disambiguate physiological low- and high-frequency responses in a general way, e.g., in the absence of such a strong input drive.

      The reviewer expresses strong concerns that our measure of HF activity is merely a reflection of spectral leakage from (lower-frequencies) evoked responses. In other words, physiological HF activity would not exist in our dataset and would be artificially created by our analyses. We should start by mentioning that this comment is in no way specific to our study, but could in fact be directed at all electrophysiological studies measuring stimulus-driven responses in higher frequency bands.

      Reviewer 2 also commented on the possible contamination of evoked response in HF signal.

      This was actually a potential (minor) concern given the time-frequency (wavelet) parameters used in the original manuscript. Indeed, the frequency bandwidth (as measured as half width at half maximum) of the wavelet used at the lower bound (30Hz) of the HF signal extended to 11Hz (i.e., half width at half maximum = 19 Hz). At 40Hz, the bandwidth extended to 24Hz (i.e., HWHM = 16 Hz). While low-frequency face-selective responses at that range (above 16 Hz) are negligible (see e.g., Retter & Rossion, 2016; and data below for the present study), they could have potentially slightly contaminated the high frequency activity indeed.

      To ensure that our findings cannot be explained by such a contamination, we recomputed the HF signal using wavelet with a smaller frequency bandwidth and changed the frequency range to 40-160Hz. This ensures that the lowest frequency included in the HF signal (defined as the bottom of the frequency range minus half of the frequency bandwidth, i.e., half width at half maximum) was 30 Hz. This was well above the highest significant harmonic of face-selective response in our FPVS experiment which was 22.8 Hz (defined as the harmonic of face frequency where, at group level, the number of recording contacts with a significant response was not higher than the number of significant contacts detected for noise in bins surrounding harmonics of the face frequency, see figure below). This ensures that the signal measured in the 40-160Hz range is not contaminated by lower frequency evoked responses.

      We recomputed all analyses and statistics from the manuscript with the new HF definition. Overall, this change had very little impact on the findings, except for slightly lower correlation between HF and LF (in Occipital and Anterior temporal lobe) when using single recording contacts as unit data points (Note that we slightly modified the way we compute the maximal expected correlation. Originally we used the test-retest reliability averaged over LF and HF; now we use the lower reliability value of the 2 signals, which is more correct since the lower reliability is the true upper limit of the correlation) This indicates that the HF activity was mostly independent from phase-locked LF signal already in the original submission. However, since the analyses with the revised time-frequency analyses parameters enforces this independence, we choose to keep the revised analyses as the main analyses in the manuscript.

      The manuscript was completely revised accordingly and all figures (main and supplementary) were modified to reflect the new analyses. We also extended the method section on HF analyses (p. 37) to indicate that HF parameters were selected to ensure independence of the HF signal from the LF evoked response, and provide additional information on wavelet frequency bandwidth.

      We believe our change in the time-frequency parameters and frequency range (40-160 Hz), the supplementary analyses using 80-160 Hz signal (per request of reviewer #2; see Figure 5 – figure supplement 4 and 5) and the fact that harmonics of the face frequency signal are not observed beyond ~23Hz, provide sufficient assurances that our findings are not driven by a contamination of HF signal by evoked/LF responses (i.e., spectral leakage).

      With respect to the comment of the reviewer on the 1/f contributions on frequency band computation, as indicated in the original manuscript, the HF amplitude envelope is converted to percent signal change, separately for each frequency bin over the HF frequency range, BEFORE averaging across frequency bands. This steps works as a normalization step to remove the 1/f bias and ensures that each frequency in the HF range contributes equally to the computed HF signal. This was added to the method section (HF analysis, p 38 (line 1038) ): ” This normalization step ensures that each frequency in the HF range contributes equally to the computed HF signal, despite the overall 1/f relationship between amplitude and frequency in EEG.”

      The connection of the calculated measures to ERPs for the low-frequency and population activity for the high-frequency measures for their frequency tagging paradigm is not clear and not validated, but throughout the text they are equated, starting from the introduction.

      The frequency-tagging approach is widely used in the electrophysiology literature (Norcia et al., 2015) and as such requires no further validation. In the case our particular design, the connection between frequency-domain and time-domain representation for low-frequencies has been shown in numerous of our publications with scalp EEG (Rossion et al., 2015; Jacques et al., 2016; Retter and Rossion, 2016; Retter et al., 2020). FPVS sequences can be segmented around the presentation of the face image (just like in a traditional ERP experiment) and averaged in the time-domain to reveal ERPs (e.g., Jacques et al., 2016; Retter and Rossion, 2016; Retter et al., 2020). Face-selectivity of these ERPs can be isolated by selectively removing the base rate frequencies through notch-filtering (e.g., Retter and Rossion, 2016; Retter et al., 2020). Further, we have shown that the face-selective ERPs generated in such sequences are independent of the periodicity, or temporal predictability, of the face appearance (Queck et al. 2017) and to a large extent to the frequency of face presentation (i.e., unless faces are presented too close to each other, i.e., below 400 ms interval; Retter and Rossion, 2016). The high frequency signal in our study is measured in the same manner as in other studies and we simply quantify the periodic amplitude modulation of the HF signal. HF responses in frequency-tagging paradigm has been measured before (e.g., Winawer et al., 2013). In the current manuscript, Figure 1 provides a rational and explanation of the methodology. We also think that our manuscript in itself provides a form of validation for the quantification of HF signal in our particular frequency-tagging setup.

    1. Author Response

      Reviewer #1 (Public Review):

      This interesting work tries to predict and analyze the overlap of BCR and TCR repertoires (mainly in COVID-19 conditions) which is one of the most important aspects of adaptive immunity that is directly related to antigen specificity. However the primary claims were not fully supported by the current data and analysis the authors presented.

      1) Since the authors showed that the TCR/BCR changed with age, whether they corrected their CMV- and CMV+ analysis with age differences?

      Aging and infections both have a similar outcome on the immune repertoire in terms of diversity reduction. We took the three initial age groups (0-25, 26-50, 51-75) and shuffled them, resulting in three new groups with no age structure but the same CMV positive / negative ratio. For each one of these new control groups, we fitted the q parameter. This process gave statistically indistinguishable values of q: q=0.472± 0.006 with no variation among the groups, as was the case for true age separation, meaning that age and not CMV status is the main driver to convergent selection.

      In addition, we added a new figure from an independent cohort (with no CMV information) showing the same effects of age (Figure 2 - figure supplement 3) and a comment about the control for CMV effects in section Results - ageing in TCR repertoires, 2nd paragraph.

      2) TCR repertoire (probably BCR also) changed along with the time during SARS-CoV-2 infection (especially the first several weeks after cleaning the virus). The authors should consider the time points they used in all the COVID-19 studies to validate the method.

      Inferring the dynamics of public repertoires is a very interesting task. However, the availability of longitudinal data (TCR and BCR repertoires belonging to the same people after/before infection) for large numbers of people needed for the sharing analysis remains very limited or inexistent, preventing us from a time-dependent analysis.

      We have added a discussion about this point (see Results - Convergent BCR sharing in COVID-19 donors, first paragraph).

      3) What's the difference between different infections (e.g. CMV vs SARS-CoV-2)? Or does infection lead to the same TCR/BCR changes in the study? A detailed discussion with an analysis of TCR/BCR repertoire regarding different infections CMV vs SARS-CoV-2 needs to be provided.

      For the BCR case, we looked at SARS-CoV-2 infection. Due to the limited amount of repertoires sequenced from cohorts of people suffering from a given disease, the extension of the analysis, while interesting and needed, remains very limited. However, for TCR repertoires we compared data coming from individuals affected by two very different diseases, an acute respiratory infection (SARSCoV-2) and a chronic infection (CMV).<br /> While it is hard to draw conclusions about the behavior of these two diseases just from the sharing analysis, we can still make some observations (we included them in the main text, see Discussion, 6th paragraph). The comparison of the q factors fitted on each cohort (qcmv = 0.453 ± 0.006, 𝑞sars-cov2 = 0.452 ± 0.002) seems to show compatible values, suggesting for both cases a much less dramatic change than for B cell.

      4) Are there any features along with different infections compared with tumor/autoimmune conditions (I think there are many publications about TCR/BCR dynamics in various diseases)? Analysis of these data is not only important to control for validating their method but also can generate the most interesting data/conclusions on dissecting the specificity of TCR/BCR repertoire.

      TCR response against autoimmune diseases (like ankylosing spondylitis) has previously been discussed in other papers (e.g. Pogorelyy M. et al. PloS Biology, 2019). However, the type of analysis here exposed requires deeper repertoires with several individuals, a characteristic that is not accomplished right now by most of published datasets for autoimmune disease conditions. Moreover, the sharing analysis imposes an overall generalized response to the given antigen. But it is now widely known that many types of cancer likely have an unique genomic molecular signature in each patient, making sharing not the most suitable approach to understand the immune response in this case. We discuss these current limitations in Discussion, 6th paragraph.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors reveal dual regulatory activity of the complex nuclear receptor element (cNRE; contains hexads A+B+C) in cardiac chambers and its evolutionary origin using computational and molecular approaches. Building upon a previous observation that hexads A and B act as ventricular repressor sequences, in this study the authors identify a novel hexad C sequence with preferential atrial expression. The authors also reveal that the cNRE emerged from an endogenous viral element using comparative genomic approaches. The strength of this study is in a combination of in silico evolutionary analyses with in vivo transgenic assays in both zebrafish and mouse models. Rapid, transient expression assays in zebrafish together with assays using stable, transgenic mice demonstrate dual functionality of cNRE depending on the chamber context. This is especially intriguing given that the cNRE is present only in Galliformes and has originated likely through viral infection. Interestingly, there seem to be some species-specific differences between zebrafish and mouse models in expression response to mutations within the cNRE. Taken together, these findings bear significant implications for our understanding of dual regulatory elements in the evolutionary context of organ formation.

      We thank reviewer 1 for the thorough review and are very satisfied with his favorable view of our manuscript. We also thank reviewer 1 for suggestions and opportunities to further clarify some relevant issues.

      Reviewer #2 (Public Review):

      Nunes Santos et al. investigated the gene regulatory activity of the promoter of the quail myosin gene, SMyHC III, that is expressed specifically in the atria of the heart in quails. To do so, they computationally identified a novel 6-bp sequence within the promoter that is putatively bound by a nuclear receptor transcription factor, and hence is a putative regulatory sequence. They tested this sequence for regulatory activity using transgenic assays in zebrafish and mice, and subjected this sequence to mutagenesis to investigate whether gene regulatory effects are abrogated. They define this sequence, together with two additional known 6-bp regulatory sequences, as a novel regulatory sequence (denoted cNRE) necessary and sufficient for driving atrial-specific expression of SMyHC III. This cNRE sequence is shared across several galliform species but appears to be absent in other avian species. The authors find that there is sequence homology between the cNRE and several virus genomes, and they conclude that this regulatory sequence arose in the quail genome by viral integration.

      Strengths: The evolutionary origins of gene regulatory sequences and their impact on directing tissue-specific expression are of great interest to geneticists and evolutionary biologists. The authors of this paper attempt to bring this evolutionary perspective to the developmental biology question of how genes are differentially expressed in different chambers of the heart. The authors test for regulatory activity of the putative regulatory sequence they identified computationally in both zebrafish and mouse transgenic assays. The authors disrupt this sequence using deletions and mutagenesis, and introduce a tandem repeat of the sequence to a reporter gene to determine its consequences on chamber activity. These experiments demonstrate that the identified sequence has regulatory activity.

      We appreciate the thorough review of our manuscript and are very stimulated by the reviewer’s understanding of the contents we presented. We will take the liberty to comment after the reviewer’s considerations, in the hope to better answer the relevant points.

      Weaknesses: There are several decisions and assumptions that have been made by the authors, the reasons for which have not been articulated. Firstly, the rationale for the approach is not clear. The study is a follow-up to work previously performed by the authors which identified two 6-bp sequences important for controlling atrial-specific expression of the quail SMyHC III gene. This study appears to be motivated by the fact that these two sequences, bound by nuclear receptors, do not fully direct chamber-specific expression, and therefore this study aims to find additional regulatory sequences. It is assumed that any additional regulatory sequences should also be bound by nuclear receptors, and be 6-bp in length, and therefore the authors search for 6-bp sequences bound by nuclear receptors. It is not clear what the input sequence for this analysis was.

      Thank you for giving us the opportunity to clarify our rational. Our approach is justified by the natural progression in the understanding of the mechanisms involved in preferential atrial expression by the SMyHC III promoter. The groundwork was solidly laid down by Wang and colleagues (see references as below). They mapped potential atrial stimulators and ventricular repressors throughout the SMyHC III promoter using atrial and ventricular cultures, respectively. Wang and colleagues pinned down the relevant regulators. First between -840 and -680 bp upstream from the transcription start site, then inside this nucleotide stretch, then in the 72-bp fragment contained between -840 and -680 bp, then identified the ventricular repressor in Hexads A and B inside the 72-bp sequence (see references below). We, in this manuscript, contributed with the identification of Hexad C (immediately downstream of Hexads A and B) as a potential nuclear receptor binding site and as a bona fide atrial activator. In summary, our work represents a logical conclusion of previous work by Wang and colleagues. We continued the process of narrowing down sequences previously proven to contain atrial activators (that were unknown before our present work) and ventricular repressors (that were already described).

      Why did we use nuclear receptors as models for the putative cardiac chamber regulators binding to the cNRE? This is because previous work by Wang et al., 1996, 1998, 2001 and by Bruneau et al., 2001 showed that the 5’ portion of the cNRE (Hexads A and B) is indeed a hub for the integration of signals conveyed by nuclear receptors. Originally, Wang et al., 1996 showed that the VDR response element is a ventricular repressor acting via the 5’ portion of the cNRE. In a subsequent manuscript, Wang et al., 1998 showed that both RAR and VDR bind the 5’ portion of the cNRE. Bruneau et al., 2001 showed, by crossing IRX4 knockout mice with SMyHC III-HAP mice (Xavier-Neto et al., 1999), that IRX4 plays the role of a repressor of SMyHC III-HAP expression. Finally, Wang et al., 2001 showed that IRX4 interacts with RXR bound to the 5’ portion of the cNRE to inhibit ventricular expression.

      Why was the 3’ Hexad included as a research subject? Very early on in our work it was noted that 3’ of the original VDR response element (Hexads A and B), described by Wang et al., 1996 and 1998 as a ventricular repressor, there was a sequence (Hexad C) with almost equal binding potential to nuclear receptors as Hexads A and B (as initially judged on the basis of comparisons with canonical nuclear receptor binding sequences, but later on confirmed by in silico profiling of nuclear receptor binding, see below). This discovery prompted us to design point mutants in the 3’ portion of the cNRE to investigate whether Hexad C contained relevant regulators of heart chamber expression. These analyses revealed a strong atrial activator in the mouse (the missing atrial activator from Wang et al., 1996, 1998, 2001).

      Wang, G. F., Nikovits, W., Schleinitz, M., and Stockdale, F. E. (1996). Atrial chamber-specific expression of the slow myosin heavy chain 3 gene in the embryonic heart. J. Biol. Chem. 271, 19836-19845.

      Wang, G. F., Nikovits, W. Jr., Schleinitz, M., and Stockdale, F. E. (1998). A positive GATA element and a negative vitamin D receptorlike element control atrial chamber-specific expression of a slow myosin heavy-chain gene during cardiac morphogenesis. Mol. Cell Biol. 18, 6023-6034.

      Xavier-Neto, J., Neville, C. M., Shapiro, M. D., Houghton, L., Wang, G. F., Nikovits, W. Jr, Stockdale, F. E., and Rosenthal, N. (1999). A retinoic acid-inducible transgenic marker of sino-atrial development in the mouse heart. Development 126, 2677-2687.

      Bruneau, B. G., Bao, Z. Z., Fatkin, D., Xavier-Neto, J., Georgakopoulos, D., Maguire, C. T., Berul, C. I., Kass, D. A., Kuroski-de Bold, M. L., de Bold, A. J., Conner, D. A., Rosenthal, N., Cepko, C. L., Seidman, C. E., and Seidman, J. G. (2001). Cardiomyopathy in Irx4-deficient mice is preceded by abnormal ventricular gene expression. Mol. Cell Biol. 21, 1730-1736.

      Wang, G. F., Nikovits, W. Jr., Bao, Z.Z., and Stockdale, F.E. (2001). Irx4 forms an inhibitory complex with the vitamin D and retinoic X receptors to regulate cardiac chamber-specific slow MyHC3 expression. J Biol Chem. 276, 28835-28841.

      The methods section mentions the cNRE sequence, but this is their newly defined regulatory sequence based on the newly identified 6-bp sequence. It is therefore unclear why Hexad C was identified to be of interest, and not the GATA binding site for example, and whether other sequences in the promoter might have stronger effects on driving atrial-specific expression.

      As far as the existence of binding sites other than Hexads A, B, and C, we cannot, formally, exclude the possibility that there may be other relevant regulators of the SMyHC III gene. But we note that the sequences that we utilized were previously mapped through deletion mutant promoter approach by Wang et al., 1996 as the most powerful atrial activator(s) and ventricular repressor(s). We addressed these concerns in a new session entitled “Limitations of our work”.

      Concerning GATA regulation, Wang et al., 1996, 1998 characterized a GATA-4 site that drives generalized (atrial and ventricular) cardiac expression in quail cultures. However, we were unable to identify any relevant changes in cardiac expression in mutant GATA SMyHC III-HAP transgenic mouse lines produced with the same mutated promoter sequences described by Wang et al., 1996, 1998.

      Finding Hexad C as an atrial activator was an experimental finding. We identified it as such because we had two important inputs. First, in 1997, we consulted with Ralff Ribeiro, a specialist on nuclear receptors and he pointed out that downstream of the Hexad A + Hexad B VDRE/RARE (the ventricular repressor), there was a sequence with good potential for a nuclear receptor binding motif. This was exactly Hexad C. Then, we confirmed its potential for nuclear receptor binding by nuclear receptor profiling. After these two pieces of evidence, we thought that there was enough evidence to justify a mutant construct (Mut C). The experimental results we obtained in transgenic mice and zebrafish are consistent with the hypothesis that Hexad C does contain the long sought atrial activator predicted by Wang et al., 1996 in atrial cultures. This seems to be the most important atrial activator (a seven-fold activator) predicted by a deletion approach to be located between -840 and 680 bp in Wang et al., 1996.

      Wang, G. F., Nikovits, W., Schleinitz, M., and Stockdale, F. E. (1996). Atrial chamber-specific expression of the slow myosin heavy chain 3 gene in the embryonic heart. J. Biol. Chem. 271, 19836-19845.

      Wang, G. F., Nikovits, W. Jr., Schleinitz, M., and Stockdale, F. E. (1998). A positive GATA element and a negative vitamin D receptorlike element control atrial chamber-specific expression of a slow myosin heavy-chain gene during cardiac morphogenesis. Mol. Cell Biol. 18, 6023-6034.

      Indeed, the zebrafish transgenic assays use the 32 bp cNRE, while in the mouse transgenic assays, a 72 bp region is used. This choice of sequence length is not justified.

      As stated above, our rational was built as a continuation of the thorough work by Wang and colleagues in progressively narrowing down the location of relevant atrial stimulators and ventricular repressors. Throughout our work, we sought to obtain maximal coherence with previous studies (see references below) and to simultaneously probe cNRE function at an increased resolution. For that, we utilized previously described mutant SMyHC III promoter constructs (Wang et al., 1996) and introduced novel site-directed dinucleotide substitution mutants of individual Hexads in the SMyHC III promoter.

      Wang, G. F., Nikovits, W., Schleinitz, M., and Stockdale, F. E. (1996). Atrial chamber-specific expression of the slow myosin heavy chain 3 gene in the embryonic heart. J. Biol. Chem. 271, 19836-19845.

      Wang, G. F., Nikovits, W. Jr., Schleinitz, M., and Stockdale, F. E. (1998). A positive GATA element and a negative vitamin D receptorlike element control atrial chamber-specific expression of a slow myosin heavy-chain gene during cardiac morphogenesis. Mol. Cell Biol. 18, 6023-6034.

      Xavier-Neto, J., Neville, C. M., Shapiro, M. D., Houghton, L., Wang, G. F., Nikovits, W. Jr, Stockdale, F. E., and Rosenthal, N. (1999). A retinoic acid-inducible transgenic marker of sino-atrial development in the mouse heart. Development 126, 2677-2687.

      Bruneau, B. G., Bao, Z. Z., Fatkin, D., Xavier-Neto, J., Georgakopoulos, D., Maguire, C. T., Berul, C. I., Kass, D. A., Kuroski-de Bold, M. L., de Bold, A. J., Conner, D. A., Rosenthal, N., Cepko, C. L., Seidman, C. E., and Seidman, J. G. (2001). Cardiomyopathy in Irx4-deficient mice is preceded by abnormal ventricular gene expression. Mol. Cell Biol. 21, 1730-1736.

      Wang, G. F., Nikovits, W. Jr., Bao, Z.Z., and Stockdale, F.E. (2001). Irx4 forms an inhibitory complex with the vitamin D and retinoic X receptors to regulate cardiac chamber-specific slow MyHC3 expression. J Biol Chem. 276, 28835-28841.

      The decisions about which bases to mutate in the three hexads are also not clear. Why are the first two bases mutated in Hexad B and C and the whole region mutated in Hexad A? Is there a reason to believe these bases are particularly important?

      As for the reasons behind mutation of the first two bases in Hexad B and Hexad C, there were two:

      One reason is because these point mutations in Hexads B and C were planned after the publication of Wang et al., 1996, which defined the major role of Hexad A in ventricular repression. After this discovery, we decided that a higher level of resolution in our mutation approach would be a better way to search for additional regulators of SMyHC III expression, including the atrial regulator that was readily apparent from the results shown in Wang et al., 1996, but had not yet been described.

      The second reason is because the two first nucleotides (purines) in a nuclear-receptor binding hexad are critical for the interaction between target DNA and transcription factors of the nuclear receptor family. Substituting pyrimidines for purines in the two first positions of an hexad drastically reduces the affinity of a nuclear response element, and that is why we chose to use TT substitutions in our mutant constructs. Please refer to: Umesono et al., Cell, 1991 65: 12551266 for a review; Mader et al., J Biol Chem, 1993 268:591-600 for a mutation study; Rastinejad et al., EMBO J., 2000 19:1045-1054 for a crystallographic study (as well as additional references listed below).

      Mader, S., Chen, J. Y., Chen, Z., White, J., Chambon, P., and Gronemeyer, H. (1993). The patterns of binding of RAR, RXR and TR homo- and heterodimers to direct repeats are dictated by the binding specificites of the DNA binding domains. EMBO J. 12, 50295041.

      Ribeiro, R. C., Apriletti, J. W., Yen, P.M., Chin, W. W., and Baxter, J. D. (1994). Heterodimerization and deoxyribonucleic acid-binding properties of a retinoid X receptor-related factor. Endocrinology.135, 2076-2085.

      Zhao, Q., Chasse, S. A., Devarakonda, S., Sierk, M. L., Ahvazi, B., and Rastinejad, F. (2000). Structural basis of RXR-DNA interactions. J. Mol. Biol. 296, 509-520.

      Shaffer, P. L. and Gewirth, D. T. (2002). Structural basis of VDR-DNA interactions on direct repeat response elements. EMBO J. 21, 2242-2252.

      The control mutant also has effects on the chamber distribution of GFP expression.

      We note that, in the mouse, MutS did not produce any major changes from the typical wild type phenotypes linked to SMyHC III-HAP transgenic hearts. We concluded, based on our data, that the spacing mutant worked reasonably well as a negative mutation control in mice. We agree that it would have been particularly elegant if a spacing mutant designed for the mouse context worked in the exact same way in the zebrafish. However, the fact that there are slight differences in behavior for the mutated “spacing” constructs in species separated by, millions of years of independent evolution is not really surprising, given that the amino acid sequence of transcription factors can diverge and co-evolve with binding nucleotides and end up drifting quite substantially from an ancestral setup. As we reiterate below, we consider more fundamental the fact that the cNRE is actually able to bias cardiac expression towards a model of preferential atrial expression, even in the context of species separated by millions of years of independent evolution.

      Two claims in the paper have weak evidence. Firstly, the conclusion that the cNRE is necessary and sufficient for driving preferential expression in the atrium. Deleting the cNRE does reduce the amount of atrial reporter gene expression but there is not a "conversion" from atrial to ventricular expression as mentioned in line 205. Similarly, a fusion of 5 tandem repeats of the cNRE can induce expression of a ventricular gene in the atria (I'm assuming a single copy is insufficient), but does not abolish ventricular expression.

      We agree that our labelling of the cNRE is perhaps too strong, and we have toned it down accordingly to incorporate the much more equilibrated concept that the cNRE biases cardiac expression towards a model of preferential atrial expression.

      However, after the corrections suggested, we believe our assertion is now justified. We show that in the mouse, removal of the cNRE is followed by a major reduction of atrial expression coupled to the release of a low, but quite clear level of expression in the ventricles, when compared to the transgenic mouse harboring the wild type SMyHC III promoter. Note that, as expected, the relative power of the cNRE to establish preferential atrial expression is higher in the mouse (a mammal) than it is in the zebrafish (a teleost), which is biologically sound, as mammals and avians are closer, phylogenetically, than teleosts and avians. Yet, the direction of change of expression in atria and ventricles was exactly as expected, if a given motif responsible for preferential atrial expression was removed (the cNRE in our case), that is: marked reduction in atrial expression and small (albeit clearly evident) release of ventricular expression. We believe that these directional changes observed in species separated by millions of years of independent evolution constitute very good biological evidence for the role of the cNRE in driving preferential atrial expression.

      Concerning the 5x fusion of cNREs, we chose to produce this multimer for safety purposes only, because we did not want to risk performing incomplete experiments and having to repeat them. However, more to the point, we later compared the efficiency of one (1) versus five (5) cNRE copies in a cell culture context and the results were not different.

      Secondly, the authors claim that the cNRE regulatory sequence arose from viral integration into the genomes of galliform species. While this is an attractive mechanism for explaining novel regulatory sequences, the evidence for this is based purely on sequence homology to viral genomes. And this single observation is not robust as the significance of the sequence matches does not appear to be adjusted for sequence matches expected by chance. The "evolutionary pathway" leading to the direction of chamber-specific expression in the heart as highlighted in the abstract has therefore not been demonstrated.

      We agree with the reviewer. Because of space constraints, we decided to omit a substantial part of our work from the initial submission of the manuscript. We now include the relevant data in the revised version. We thus mapped the phylogenetic origins of the SMyHC III family of slow myosins and then established how and when the cNREs became topologically associated with the SMyHC III gene. To do that, we repeat masked all available sequences from avian SMyHC III orthologs. As it will become clear below, the cNRE is a rare sequence, rather than a low complexity repeat. Our search for cNREs outside of the quail context (Coturnix coturnix) followed two independent lines. First, we took a scaled, evolution-oriented approach. Initially, we looked for cNREs in species close to the quail (i.e., Galliformes) and then progressively farther, to include derived (i.e., Passeriformes) and basal avians (i.e., Paleognaths) as well as external groups such as crocodilians. While pursuing this line of investigation, it became clear that the cNRE was a rare form of repetitive element, which showed a conserved topological relationship with the SMyHC III gene (i.e., cNREs flanked the SMyHC III genes at 5’ and 3’ regions). Using this topological relationship as a character, we determined when it appeared during avian evolution and then set out to establish the likely origins of this rare repetitive motif. This search for the origins of the cNRE entailed comparisons to databases of repetitive genome elements, until the extreme telomeric nature of the SMyHC III gene became evident. This finding directed us to the fact that the hexad nature of the cNRE is reminiscent of the hexameric character of telomeric direct repeats. Because direct telomeric repeats are exactly featured in the genomes of avian DNA viruses that can infect the germline and integrate into the avian genome, we focused our search for the cNRE on the members of the subfamily Alphaherpesvirinae (Morissette & Flamand, 2010). In this search, we utilized the human herpes simplex virus 1 (HSV1) as a general model for herpes viruses, and a set of four (4) members of the Alphaherpesvirinae family that specifically infect Galliformes (i.e., GaHV1, the virus responsible for avian infectious laryngotracheitis in chicken, GaHV2, the Marek’s disease virus, GaHV3, a non-pathogenic virus, and MeHV1, the non-pathogenic Meleagrid herpesvirus 1 capable of infecting chicken and wild turkey) (Waidner et al., 2009). The search for cNREs in Alphaherpesvirinae was successful. We found six (6) cNRE hits in HSV1, one (1) in GaHV1, and none in MeHV1, GaHV2, and GaHV3. Our evolution-directed approach thus led to the direct recognition that cNREs can be found in the genomes of a family of viruses that contain members that infect avians and integrate their double-stranded DNA into the host germline (Morissette & Flamand, 2010). Therefore, as a second independent approach, as pointed out by the reviewer, we set out to further extend this proof of concept by broadening our search to all known sequenced viruses and perform an unbiased, internally consistent, and quantitative analysis of cNRE presence in viral genomes, as already reported in the initial submission of this manuscript.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript Nunes Santos et al. use a combination of computation and experimental methods to identify and characterize a cis-regulatory element that mediates expression of the quail Slow Myosin Heavy Chain III (SMyHC III) gene in the heart (specifically in the atria). Previous studies had identified a cis-regulatory element that can drive expression of SMyHC III in the heart, but not specifically (solely) in the atria, suggesting additional regulatory elements are responsible for the specific expression of SMyHC III in the atria as opposed to other elements of the heart. To identify these elements Nunes Santos et al. first used a bioinformatic approach to identify potentially functional nuclear receptor binding sites ("Hexads") in the SMyHC III promoter; previous studies had already shown that two of these Hexads are important for SMyHC III promoter function. They identified a previously unknown third Hexad within the promoter, and propose that the combination of these three (called the complex Nuclear Receptor Element or cNRE) is necessary and sufficient for specific atrial expression of SMyHC III. Next, they use experimental methods to functionally characterize the cNRE including showing that the quail SMyHC III promoter can drive green fluorescent protein (GFP) expression the atrium of developing zebrafish embryos and that the cNRE is necessary to drive the expression of the human alkaline phosphatase reporter gene (HAP) in transgenic mouse atria. Additional experiments show that the cNRE is portable regulatory element that can drive atrial expression and demonstrate the importance of the three Hexad parts. These data demonstrating that the cNRE mediates atrial-specific expression is well-done and convincing. The authors also note the possibility that the cNRE might be derived from an endogenous viral element but further data are needed to support the hypothesis that the cNRE is of viral origin.

      Strengths:

      1) The experimental work demonstrating that the cNRE is a regulatory element that can mediate the atrial-specific expression of SMyHC III.

      We thank reviewer 3 for this thorough appreciation of our work and are pleased with the evaluation of our manuscript’s potential.

      Weaknesses:

      1) Justification for use of different regulatory elements in the zebrafish (32 bp cNRE) and the mouse transgenic assays (72 bp cNRE), and discussion of the impact of this difference on the results/interpretation.

      In general, throughout our work, we sought to obtain maximal coherence with previous studies (see references below) and to simultaneously probe cNRE function at an increased resolution. For that, we utilized previously described mutant SMyHC III promoter constructs (Wang et al., 1996, 1998) and introduced novel site-directed dinucleotide substitution mutants of individual Hexads in the SMyHC III promoter. Actually, the 72-bp construct is not a 72-bp construct. It is a 5’ deletion construct that removed 72 bp from the 840 bp wild type SMyHC III construct, transforming it into a 768-bp SMyHC III promoter construct. Any directional changes observed in cardiac expression by the 768 bp as compared to the wild type promoter was interpreted in the context as missing regulators present in this 5’ 72 bp.

      Wang et al., 1996 and 1998 had already shown that Hexads A and B contained a functional VDRE/RARE, which acted as a ventricular repressor. Using the 768-bp SMyHC III promoter in mouse transgenic lines was thus a natural investigation step for us to evaluate whether regulation of the SMyHC III promoter in the mouse was similar in mice as compared to quail cardiac cultures. As shown in the manuscript, deletion of the 72 bp resulted in the release of a low level of expression in ventricles, consistent with the removal of a ventricular repressor (already described by Wang et al., 1996). It also showed a marked reduction in atrial transgene stimulation, suggesting the elimination of a very important atrial activator.

      In 1996, Wang and colleagues mapped an atrial activator to the sequence interval of 160 bp, between -840 and -680 bp (Wang et al., 1996). In our mouse transgenics, we reduced this interval to a mere 72 bp, between -840 to -768 bp. This was very useful information. Wang et al., 1998 showed that HF-1a, M-CAT, and E-box sites located between -840 and -808 bp did not influence atrial expression, so now we had a potential interval of only 40 bp between -808 and -768 bp. Further, our transgenic mice indicated that the GATA site located 3’ from Hexads A, B, and C (GATA site changed to a Sal I site at positions -749 to -743 bp) did not work as a general activator, as in the quail. Thus, the only good candidate for the atrial activator in mice inside the 40-bp fragment between -808 and -768 bp was the cNRE, with its three Hexads, A, B and the novel Hexad C. Because Hexads A plus B composed a functional VDRE/RARE that played a role in ventricular repression in the quail, we hypothesized that the atrial activator would be present in Hexad C. We then mutated the two first purines in Hexad C (the most important ones for nuclear receptor binding, please refer to Umesono et al., Cell, 1991 65: 1255-1266 for a review; Mader et al., J Biol Chem, 1993 268:591-600 for a mutation study; Rastinejad et al., EMBO J., 2000 19:1045-1054 for a crystallographic study as well as additional references listed below) and performed the experiments that demonstrated a profound reduction in atrial expression in the mouse context, revealing the long-sought atrial activator.

      Mader, S., Chen, J. Y., Chen, Z., White, J., Chambon, P., and Gronemeyer, H. (1993). The patterns of binding of RAR, RXR and TR homo- and heterodimers to direct repeats are dictated by the binding specificites of the DNA binding domains. EMBO J. 12, 50295041.

      Ribeiro, R. C., Apriletti, J. W., Yen, P.M., Chin, W. W., and Baxter, J. D. (1994). Heterodimerization and deoxyribonucleic acid-binding properties of a retinoid X receptor-related factor. Endocrinology.135, 2076-2085.

      Wang, G. F., Nikovits, W., Schleinitz, M., and Stockdale, F. E. (1996). Atrial chamber-specific expression of the slow myosin heavy chain 3 gene in the embryonic heart. J. Biol. Chem. 271, 19836-19845.

      Wang, G. F., Nikovits, W. Jr., Schleinitz, M., and Stockdale, F. E. (1998). A positive GATA element and a negative vitamin D receptorlike element control atrial chamber-specific expression of a slow myosin heavy-chain gene during cardiac morphogenesis. Mol. Cell Biol. 18, 6023-6034.

      Zhao, Q., Chasse, S. A., Devarakonda, S., Sierk, M. L., Ahvazi, B., and Rastinejad, F. (2000). Structural basis of RXR-DNA interactions. J. Mol. Biol. 296, 509-520.

      Shaffer, P. L. and Gewirth, D. T. (2002). Structural basis of VDR-DNA interactions on direct repeat response elements. EMBO J. 21, 2242-2252.

      2) Is the cNRE really "necessary and sufficient"? I define necessary and sufficient in this context as a regulatory element that fully recapitulates the expression of the target gene, so if the cNRE was "necessary and sufficient" to direct the appropriate expression of SMyHC III it should be able to drive expression of a reporter gene solely in the atria. While deletion of the cNRE does reduce expression of the reporter gene in atria it is not completely lost nor converted from atrial to ventricular expression (as I understand the study design would suggest should be the effect), similarly fusion of 5 repeats of the cNRE induces expression of a ventricular gene in the atria but also does not convert expression from ventricle to atria. This doesn't seem to satisfy the requirements of a "necessary and sufficient" condition. Perhaps a discussion of why the expectations for "necessary and sufficient" are not met but are still consistent would be beneficial here.

      We agree with your reasoning. Our description of the cNRE was perhaps too strong, and we have toned it down accordingly in the revised manuscript to incorporate a much more equilibrated concept that the cNRE biases cardiac expression towards a model of preferential atrial expression. After these corrections, we believe our novel assertion is justified. We show that in the mouse, removal of the cNRE is followed by a major reduction of atrial expression coupled to the release of a low, but quite clear level of expression in the ventricles, when compared to the transgenic mouse harboring the wild type SMyHC III promoter. Note that, as expected, the relative power of the cNRE to establish preferential atrial expression is higher in the mouse (a mammal) than it is in the zebrafish (a teleost), which is biologically sound, as mammals and avians are closer, phylogenetically, than teleosts and avians. Yet, the direction of change of expression in atria and ventricles was exactly as expected, if a given motif responsible for preferential atrial expression was removed (the cNRE in our case), that is: marked reduction in atrial expression and small (albeit evident) release of ventricular expression. We believe that these directional changes observed in species separated by millions of years of independent evolution constitute very good biological evidence for the role of the cNRE in driving preferential atrial expression.

      3) The claim that the cNRE is derived from a viral integration is not supported by the data. Specifically, the cNRE has sequence similarity to some viral genomes, but this need not be because of homology and can also be because of chance or convergence. Indeed, the region of the chicken genome with the cNRE does have repetitive elements but these are simple sequence repeats, such as (CTCTATGGGG)n and (ACCCATAGAG)n, and a G-rich low complexity region, rather than viral elements; The same is true for the truly genome. These data indicate that the cNRE is not derived from an endogenous virus but is a repetitive and low complexity region, these regions are expected to occur more frequently than expected for larger and more complex regions which would cause the BLAST E value to decrease and appear "significant”, but this is entirely expected because short alignments can have high E values by chance. (Also note that E values do not indicate statistical significance, rather they are the number of hits one can "expect" to see by chance when searching specific database.)

      We do understand the criticism, but we would like to advance another concept, based on a series of results that we obtained using bioinformatics-oriented and evolution-oriented analyses. We performed a cNRE scan in the Gallus gallus genome (galGal5), using varying numbers of nucleotide mismatches. When we searched the galGaL5 genome with coordinates matching the localization of cNREs obtained using matchPattern with up to 8 mismatches, only thirty-one (31) and thirty-four (34) hits were found in the 5’ and 3’ strands, respectively. This indicates that a cNRE match is a rather uncommon finding in the Gallus gallus genome.

      A more systematic profiling of genome occurrence versus nucleotide mismatch indicated that a significant upward inflexion in the relationship between number of cNRE hits and divergence from the original cNRE version (Coturnix coturnix) is recorded only at 12 mismatches or greater. At 8 mismatches, the total number of cNREs on each DNA strand varied little among all avian species examined, remaining close to the average (31+/- 2,2 cNREs for the 5’ strand, range 1748; 34 +/- 3,3 for the 3’ strand, range 14-64). Consistent with the idea that the cNRE is a specific regulatory motif, rather than an abundant, low complexity sequence, there are only two cNRE occurrences in chromosome 19, which harbors AMHC1, the Gallus gallus ortholog of the Coturnix coturnix SMyHC III gene.

      Figure 1: Number of cNRE hits to galGal5 according to maximum mismatches allowed: the cNRE is not an abundant low complexity sequence, but rather a rare repetitive sequence with a clear cutoff level of mismatches allowed. Consistent with this, there are only two (2) cNRE sequences in chromosome 19, the chromosome that contains the AMHC1 gene (the chicken ortholog of the quail SMyHC III gene). ## [1] chr19 [16510, 16541] * | 5’-CAAGGACAAAGAGGGGACAAAGAGGCGGAGGT-3 ## [2] chr19 [32821, 32852] * ‘5’-CAAGGACAAAGAGTGGACAAAGAGGCAGACGT-3

      In the evolutionary strategy, which we now include, we first mapped the phylogenetic origins of the SMyHC III family of slow myosins and then established how and when the cNREs became topologically associated with the SMyHC III gene. To do that we repeat masked all available sequences from avian SMyHC III orthologs. As it will become clear below, the cNRE is a rare sequence, rather than a low complexity repeat. Our search for cNREs outside of the quail context (Coturnix coturnix) followed two independent lines. First, we took a scaled, evolution-oriented approach. Initially, we looked for cNREs in species close to the quail (i.e., Galliformes) and then progressively farther, to include derived (i.e., Passeriformes) and basal avians (i.e., Paleognaths) as well as external groups such as crocodilians. While pursuing this line of investigation, it became clear that the cNRE was a rare form of repetitive element, which showed a conserved topological relationship with the SMyHC III gene (i.e., cNREs flanked the SMyHC III genes at 5’ and 3’ regions). Using this topological relationship as a character, we determined when it appeared during avian evolution, and then set out to establish the likely origins of this rare repetitive motif. This search for the origins of the cNRE entailed comparisons to databases of repetitive genome elements, until the extreme telomeric nature of the SMyHC III gene became evident. This finding directed us to the fact that the hexad nature of the cNRE is reminiscent of the hexameric character of telomeric direct repeats. Because direct telomeric repeats are exactly featured in the genomes of avian DNA viruses that can infect the germline and integrate into the avian genome (Morissette & Flamand, 2010), we focused our search for the cNRE on the members of the subfamily Alphaherpesvirinae. In this search, we utilized the human herpes simplex virus 1 (HSV1) as a general model for herpes viruses and a set of four (4) members of the Alphaherpesvirinae family that specifically infect Galliformes (i.e., GaHV1, the virus responsible for avian infectious laryngotracheitis in chickens, GaHV2, the Marek’s disease virus, GaHV3, a non-pathogenic virus and MeHV1, the non-pathogenic Meleagrid herpesvirus 1 capable of infecting chicken and wild turkey) (Waidner et al., 2009). The search for cNREs in Alphaherpesvirinae was successful. We found six (6) cNRE hits in HSV1 and one (1) cNRE was detected in GaHV1, but none in MeHV1, GaHV2, and GaHV3.

      Our evolution-directed approach thus led to the direct recognition that cNREs up to a cutoff mismatch value of 11 can be found in the genomes of a family of viruses that contain members that infect avians and integrate their double-stranded DNA into the host germline. Therefore, as a second independent approach, we set out to extend this proof of concept by broadening our search to all known sequenced viruses to perform an unbiased, internally consistent, and quantitative analysis of cNRE presence in viral genomes, as already reported in the initial submission of this manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      Areas for Improvement: While I believe the overall experiment seemed quite strong, the statistical approach does not seem in-line with current recommendations. Most importantly, the authors appear to have used stepwise model reduction-including, importantly, removing non-significant fixed effects-to test the significance of predictors. Several simulation studies have shown that this increases the likelihood of false positive results (e.g., Mundry & Nunn 2009 Am. Nat., Forstmeier & Schielzeth 2001 Behav. Ecol. Sociobiol.). The above concern related to the fact that the authors reported some "trending" results (e.g., 0.05 < P < 0.10) as being relevant, but not others. A revised statistical approach may clear up confusion with this. I give recommendations to the authors regarding these issues in the "recommendations for authors" section.

      We have changed our statistical approach as per the reviewer’s suggestions (see responses to essential point 1 above and specific comments below). The reanalyses did not qualitatively change any of our original findings.

      The authors also appeared to "Gaussianise" response variable distributions. As a reader, I could not understand whether this took the place of fitting models using an appropriate error distributions for the response variable (e.g., Poisson), or whether this was a necessary step just for the distributions of the model residuals. It is important to give more details on this, to know if did or did not affect the conclusions of the study.

      We have reanalysed the number of caring events using a GLMM with a negative binomial distribution (Table S4b), and time spent caring for clutches using a GLMM with a Gaussian distribution and log link function (Table S5b), rather than “Gaussanising” these response variables (lines 649–655). This new analytical approach did not qualitatively change our findings.

      Finally, but perhaps most importantly, the lack of a clear set of hypotheses relevant to the specific variables measured here made it hard to understand the Results section. In the Results, the relevance of a given predictor was only described after the statistical significance of that predictor was revealed. This gives the appearance that the authors measured a wide range of factors and only describe the relevance of the ones revealed as significant.

      We have included a paragraph at the end of the Introduction (line 86ff.) in which we provide a clear set of hypotheses relating to potential effects of chronic outgroup conflict on reproductive rate, investment and output. These hypotheses are supported by relevant background information and references, and we explicitly mention all our specific response variables that are analysed.

      Reviewer #3 (Public Review):

      In my opinion the striking thing this about these results is that the intrusions were not physical encounters: it is not that the incurred groups were physically attacked or that eggs were eaten. Similarly, it wasn't that the 'invading' individuals were directly competing for food with the target group (or if they were eating some residual food then it would be no more than in the control condition). Although the strength of the paper is in the neat experimental design, this is unfortunately obscured due to insufficient explanation of the experiment prior to the results. But this shortcoming is superficial - the authors could move up some more details from the methods.

      We have expanded the description of our methods at the start of the Results section (line 114ff.).

    1. Author Response

      Reviewer #1 (Public Review):

      This work by Wei-Jia Luo and colleagues elegantly employs in vitro and in vivo models to demonstrate that within the mouse liver, macrophages respond to lipopolysaccharide (LPS) by releasing active IL-12 (IL-12p70), which is a heterodimer of IL-12p35 and IL-12p40. They observed that the availability of "free" IL-12p35 to this heterodimerization process is governed by the molecular chaperone HLJ1. In response to LPS, HLJ1 separates homodimerized IL-12p35 into monomers, which then can heterodimerize with IL-12p40 to form active IL-12p70. This active IL-12 is released from macrophages in the liver, which then act on neighboring natural killer T cells to release interferon gamma. This interferon gamma circulates systemically and is responsible for mortality in a mouse model of endotoxic shock.

      Overall, this work is mechanistically compelling and demonstrates a novel multicellular inflammatory pathway that contributes to death in a murine model of endotoxic shock. However, it is unclear if the observed pathway is limited to this highly reductionist model, or if it applies to models that better approximate the complexity of human sepsis. Indeed, the long-standing concept of "cytokine storm" as the major mediator of sepsis has largely failed to yield benefits in clinical trials. These numerous and repeated translational failures cast doubt on the translational validity of reductionist in vivo animal models of sepsis.

      Thank the reviewer’s affirmation. One of the major aims of our work is to identify a novel multicellular inflammatory pathway mediated by HLJ1 that contributes to endotoxic shock. We agree that although our understanding of cytokine storm as the major mediator of sepsis had made dramatic progress over the past decade, these findings could not translate yet into effective treatments. As the reviewer mentioned, almost all clinical trials targeting cytokine effects failed, especially in the context of sepsis. We also know that among several explanations, the appropriateness of in vivo animal models should be concerned (Chousterman et al., 2017). Some approaches to treat cytokine storm were aimed to target the direct tissue consequences of inflammation cascade such as the blood vessel (London et al., 2010). Another possible strategy to treat cytokine storm was to target signaling that promotes cytokine synthesis and secretion (Maceyka et al., 2012). It may be feasible to quell the cytokine storm after infection by targeting upstream signaling, and reducing cytokine synthesis as well as secretion is a valid alternative to direct cytokine antagonism (Chousterman et al., 2017). Furthermore, in this study we found Hlj1−/− mice showed reduced IFN-g and improved survival when treated with daily systemic antibiotics after CLP surgery (Figure 6), indicating that targeting cytokine storm in combination with antibiotics provides a promising therapeutic strategy to treat sepsis. Combined, we think HLJ1-targeting strategy might be a potential therapy to treat cytokine storm-associated sepsis. We emphasized and discussed the concept in the Discussion of our revised manuscript (Page 19, line 441-453).

      We highly appreciated the reviewer #1 and other reviewers raised the same issue. We worked hard and attentively to response comments point-by-point below.  

      This raises several specific concerns with regard to the model used by the investigators:

      (1) The authors use a massive dose of LPS that rapidly leads to the death of mice in 24 hours. This massive and rapid mortality is not consistent with human sepsis, which is a more crescendo course with a mortality of ~30%. Indeed, when the authors used a more clinically-relevant model of mild endotoxemia, HLJ1 appeared to have no impact on mortality (Figure 1A).

      Thank for the comment. Indeed, since we observed HLJ1 knockout mice could survive from high dose of LPS, we use 20 mg/kg LPS to perform the subsequent experiments based on these obvious and significant phenomena. We also recognized the importance of administration of low dosages of LPS. To address this issue, we performed additional experiments and made some revisions point-by-point.

      i. Because 4 mg/kg is a common non-lethal dosage to induce TLR4 and IFN-γ signaling (Kunze et al., 2019; Malgorzata-Miller et al., 2016), we performed additional experiment with 4 mg/kg LPS according to the editor’s suggestion. As a result, Hlj1−/− mice showed lower serum levels of BUN, creatinine and ALT and thus less severe organ damage than Hlj1+/+ mice after 4mg/kg LPS treatment. The data are showed in Figure 1C and D of revised Figure 1 (Figure 1).

      ii. We also performed ELISA test and found that serum levels of IFN-γ were lower in Hlj1−/− mice than in Hlj1+/+ mice after 4 mg/kg LPS injection. The result is in Figure 2C of revised Figure 2 (Figure 2).

      iii. Combined, this result indicated the effect of HLJ1 deletion on reducing IFN-γ and alleviating organ injury can also be found during moderate endotoxemia. We described and discussed the result in the revised manuscript (Page 6, line 134-141; Page 18, line 423-437)

      (2) LPS is a model of endotoxemia, not a model of sepsis. Accordingly, it is unclear if the protective benefit of blocking IL-12 will similarly be seen as a live-infection model of sepsis, in which inflammatory signaling may be necessary for pathogen clearance.

      Thank the reviewer for raising these critical issues and providing valuable suggestions. This issue was also mentioned by other reviewers. Although the LPS-induced endotoxemia is a simple model with higher reproducibility and reliability comparing to other sepsis models, it indeed cannot represent actual sepsis and is based on the notion that it is the host’s response to bacteria but not the pathogen itself, that leads to mortality and organ failure (Deitch, 2005). Therefore, according to the reviewers’ suggestion, we performed additional live-infection model of sepsis including cecal ligation and puncture (CLP) which resembles clinical disease and septic shock (Deitch, 2005) to reassure the importance of HLJ1 on sepsis. As a consequence, we found IFN-γ expression was lower in liver and spleen of Hlj1−/− mice comparing to Hlj1+/+ mice (Figure 6A and B). We analyzed serum markers of organ dysfunction and Hlj1−/− mice showed lower serum levels of BUN, creatinine and AST (Figure 6C). H&E staining showed kidney injury at the histology level after CLP surgery, while Hlj1−/− mice showed less severe kidney injury than Hlj1+/+ (Figure 6D). We further found Hlj1−/− mice showed significantly improved survival compared to Hlj1+/+ mice when mice were treated with systemic antibiotics (Figure 6E). Combined, we demonstrated the effect of HLJ1 deletion on attenuation of CLP-induced sepsis with down-regulated IFN-γ, and concluded that the benefit of blocking IL-12 and HLJ1 can similarly be seen as a live-infection model of sepsis. The result is showed as below (revised Figure 6). The corresponding result was also added in the revised manuscript (Page 11-12, line 268-286). Please check it as well as the above responses to other reviewers.

      Page 11-12, line 268-286 "HLJ1 deletion protect mice from CLP-induced organ dysfunction and septic death To address the question whether HLJ1 also regulates IFN-γ-dependent septic shock in live infection model, we performed CLP (cecal ligation and puncture) surgery which more resembles clinical disease and human sepsis. CLP significantly induced transcriptional levels of IFN-γ in the liver of Hlj1+/+ mice comparing to mice receiving sham surgery while Hlj1−/− mice showed significantly lower IFN-γ mRNA than Hlj1+/+ mice (Figure 6A). This phenomenon was not restricted to the liver since lower expression of splenic IFN-γ was also found in Hlj1−/− mice (Figure 6B). The CLP surgery resulted in serious renal and liver damage while Hlj1−/− mice showed alleviated organ dysfunction with significantly lower serum levels of BUN, creatinine and AST (Figure 6C). H&E staining showed kidney injury at the histology level after CLP, while Hlj1−/− mice showed less severe kidney injury than Hlj1+/+ mice (Figure 6D). However, there was no significant difference in survival when comparing Hlj1+/+ and Hlj1−/− mice (Figure 6E). We hypothesized that severe bacteremia contributed to mortality in mice that did not receive any treatment, so we treat mice with systemic antibiotics. As a result, Hlj1−/− mice displayed significantly improved survival compared with Hlj1+/+ mice when mice received daily systemic antibiotics after CLP (Figure 6E). These results implied the agent responsible for bacteria clearance can be combined with immune modulation such as HLJ1 targeting to improve the outcome of sepsis."

      (3) Finally, it is unclear if the findings are only relevant to mice, or if they also have relevance to humans.

      We admit human studies is important, while there are some objective difficulties need to be overcame; for example, cohort identification, individual variation, and clinical considerations. This is our limitation since our findings were only based on animal models and human cell lines. We further performed CLP experiments which is more relevant to human sepsis, while it is not a true human study. That had been added as Figure 6 of our revised manuscript (Figure 6). Actually, based on the present result, we plan to initiate some specific clinical human studies. For example, we plan to collect blood monocytes from critically ill patients from ICU to see whether HLJ1 expression levels in monocytes is higher in patients with sepsis than in patients without sepsis. On the other hand, we also want to know whether HLJ1 expression levels in monocytes or in serum are correlated to inflammatory markers such as C-reactive protein, procalcitonin, and lactate in sepsis patients, because we found serum levels of HLJ1 correlated to IL-12 in mouse. In our unpublished preliminary result, HLJ1 can be detected in serum of patients with sepsis. This inspires us to investigate whether HLJ1 can be a diagnostic or prognostic marker in the future. We anticipate these results can be in our future publications. Thank you very much for your understanding.  

      Reviewer #2 (Public Review):

      The authors show that HLJ1 converts misfolded IL-12p35 homodimers to monomers, which maintains bioactive IL-12p70 heterodimerization and secretion. In turn, this contributes to increased IL-12 activity, leading to enhanced IFN-gamma production and lethality in mice challenged with LPS to model sepsis.

      Strengths:

      • Huge and diverse dataset (e.g. in vivo, in vitro, single cell RNAseq, adoptive transfer etc.) with interesting findings that could be of relevance to the field.

      We deeply thank the reviewer for the affirmation. We hope our comprehensive dataset can provide a novel insight of relevance to the field. With this information, we also keep investigating the underlying molecular alteration resulting from endotoxin-induced immune responses. Thank you very much. At the mention of our weaknesses raised by the reviewer, we totally agreed on it and take it very seriously and revised point-by-point. Thank you very much.

      Weaknesses:

      • The flow/narrative of the paper is very hard to follow. This may result from the fact that the order of presented results is a bit puzzling. Normally, one would add-in the cytokine results (now figure 3), after the survival curves in Figure 1. Furthermore, the flow cytometry data presented in Figure 4 is more or less a validation of the scRNAseq data presented in Figure 2 in another organ. Likewise, Figure 5 is sort of a validation of Figure 3 in another organ. The authors seem to jump from organ to organ, from in vivo to in vitro and vice-versa all the time which makes the paper extremely difficult to follow.

      Thank the reviewer for the valuable suggestion. Actually, we were also hesitant to this arrangement in our first submission. We rearranged our results so that the flow/narrative of the paper can be easier to follow:

      1. We moved the result of figure 3 to become figure 2 so that the cytokine array results would after the survival curve results.

      2. The flow cytometry result presented in Figure 4 was moved to Figure 5 so that it would after the result of sc-RNA sequencing.

      3. The qPCR result of pro-proinflammatory cytokines presented in figure 5 was moved to Figure 2-figure supplement 1 so that it would be a validation of cytokine array in another organ.

      In addition, along with other suggestions from reviewers, we have rewritten the introduction and the discussion sections and reorganized whole manuscript so that we can focus more on important issues. All the modification and rearrangement can be checked in the revised manuscript with changes tracked. Please check our revised manuscript. Thank you for your kind suggestions.

      • Use of extremely high dosages of LPS.

      Thank for the comment. This issue had been raised by several reviewers and the editor. Indeed, since we observed HLJ1 knockout mice could survive from high dose of LPS, we use 20 mg/kg LPS to perform the subsequent experiments based on this obvious and significant phenomenon. We also recognized the importance of administration of low dosages of LPS. To address this issue, we performed additional experiments and made some revisions point-by-point.

      i. Because 4 mg/kg is a common non-lethal dosage to induce TLR4 and IFN-γ signaling (Kunze et al., 2019; Malgorzata-Miller et al., 2016), we performed additional experiment with 4 mg/kg LPS according to the editor’s suggestion. As a result, Hlj1−/− mice showed lower serum levels of BUN, Creatinine and ALT and thus less severe organ damage than Hlj1+/+ mice after 4mg/kg LPS injection (Figure 1C). H&E staining showed kidney injury at the histology level after LPS treatment, while Hlj1−/− mice showed less severe kidney injury than Hlj1+/+ mice (Figure 1D). The data are showed in Figure 1C and D (in below) of revised Figure 1 (Figure 1).

      ii. We also performed ELISA test and found that serum levels of IFN-γ were lower in Hlj1−/− mice than in Hlj1+/+ mice after 4 mg/kg LPS injection. The result is in Figure 2C (in below) of revised Figure 2 (Figure 2).

      iii. Combined, this result indicated the effect of HLJ1 deletion on reducing IFN-γ and alleviating organ injury can also be found during moderate endotoxemia. We described and discussed the result in the revised manuscript (Page 6, line 134-141; Page 18, line 423-437)

      • Much of the presented data is replication of previous work. For instance, neutralization of IFN-γ (e.g. Billiau et al., Eur. J. Immunol. 1987; Car et al. J. Exp. Med. 1994) and anti-IL-12 (e.g. Zisman et al., Shock 1997) has been shown to lower mortality in LPS models in mice.

      Thank reviewer for the reminding. We apologized for our unclear description leading to misunderstanding. To carefully and firstly identify the novel role of HLJ1 in sepsis, we actually investigated it on several well-known bases. Indeed, the role of IFN-γ and IL-12 has been recognized in previous studies and their neutralization attenuating LPS-induced endotoxic shock have been reported. However, our study focused on the effect of HLJ1 deletion on IL-12/IFN-γ-axis and septic death. Firstly, we observed IFN-γ and IL-12 decreased after HLJ1 deletion during sepsis. On the one hand, we use IL-12/IFN-γ neutralization and found it could improve survival in wild-type mice rather than in Hlj1 knockout mice, suggesting the importance of HLJ1 in IL-12/IFN-γ-mediated mortality. On the other hand, if the difference of mortality rate across genotypes could become no difference after IL-12 or IFN-γ neutralization, then we can infer that HLJ1 contributes to mortality mainly through IL-12 and IFN-γ signaling. These ideals came from a previous study published in Cell (Ponzetta et al., 2019). The authors elegantly proved the role of Csf3r in IL-12/IFN-γ-axis and subsequent tumor incidence by showing that IFN-γ neutralization can alter the phenotype in wildtype mice rather than in knockout mice. This rationale inspired and prompted us to perform the similar neutralization experiment for understanding the precise role of HLJ1 in sepsis.

      • No true sepsis model is used, only LPS. This is important, as for instance neutralization of IFN-γ and IL-12 has been shown to improve outcome in endotoxemia before (see above), but had no effect on survival in more relevant sepsis models such as cecal ligation and puncture (e.g. see Romero et al., Journal of Leukocyte Biology 2010; Zisman et al., Shock 1997). Furthermore, IFN-γ is even proposed (and used on a small scale) as therapy in sepsis patients to reverse immunosuppression.

      Thank the reviewer raised these critical issues and provided valuable suggestions. It was also mentioned by other reviewers. Although the LPS-induced endotoxemia is a simple model with higher reproducibility and reliability comparing to other sepsis models, it indeed cannot represent actual sepsis and is based on the notion that it is the host’s response to bacteria but not the pathogen itself, that leads to mortality and organ failure (Deitch, 2005). Therefore, we performed additional model including cecal ligation and puncture (CLP) which resembles clinical disease and septic shock (Deitch, 2005) to reassure the importance of HLJ1 to human sepsis. Please see our revised Figure 6 (Figure 6) and responses to other reviewers above.

      In accordance with the previous result from Romero et al showing that IFN-γ neutralization did not improve survival rate, we observed similar survival rate between Hlj1+/+ and Hlj1−/− mice after CLP. However, when they treated mice with systemic antibiotics, IFN-γ knockout mice survived significantly better than wild-type mice (Romero et al., 2010). In CLP model, it is possible that severe bacteremia contributed to mortality in mice that did not receive antibiotics in an IFN-γ-independent manner, so we treated mice with systemic antibiotics immediately after CLP. As a result, we further found Hlj1−/− mice showed significantly improved survival compared to Hlj1+/+ mice when mice were treated with systemic antibiotics after CLP surgery (Figure 6E), indicating that targeting cytokine storm in combination with antibiotics provides a promising therapeutic strategy to treat sepsis. The result is showed in Figure 6E (in below) of revised Figure 6 (Figure 6). This suggests that HLJ1-targeting strategy can be combined with antibiotics to become combined therapy for future clinical applications. We emphasized and discussed the concept in the Discussion of the revised manuscript (Page 18-19, line 441-453).

    1. Author Response

      Reviewer #1 (Public Review):

      In their manuscript "CompoundRay: An open-source tool for high-speed and high-fidelity rendering of compound eyes", the authors describe their software package to simulate vision in 3D environments as perceived through a compound eye of arbitrary geometry. The software uses hardware accelerated ray casting using NVIDIA Optix to generate simulations at very high framerates of ~5000 FPS on recent NVIDIA graphics hardware. The software is released under the permissive MIT license, publicly available at https://github.com/ManganLab/eye-renderer, and well documented. CompoundRay can be extraordinarily useful for computational neuroscience experiments exploring insect vision and robotics with insect like vision devices.

      The manuscript describes the target of the work: realistic simulating vision as perceived by compound eyes in arthropods and thoroughly reviews the state of the art. The software CompoundRay is then presented to address the shortcomings of existing solutions which are either oversimplifying the geometry of compound eyes (e.g. assuming shared focal points), using an unrealistic rendering model (e.g. local geometry projection) or being slower than real-time.

      The manuscript then details implementation choices and the conceptual design and components of the software. The effect of compound eye geometries is discussed using some examples. The speed of the simulator depending on SNR is assessed and shown for three physiological compound eye geometries.

      I find the described open source compound eye vision simulation software extraordinarily useful and important. The manuscript reviews the state of the art well. The figures are well made and easy to understand. The description of the method and software, in my opinion, needs work to make it more succinct and easier to understand (details below). In general, I found relevant concepts and ideas buried in overly complicated meandering descriptions, and important details missing. Some editorial work could help a lot here.

      Thank you for the very positive feedback.

      Major:

      1) The transfer of the scene seen by an arbitrary geometry compound eye into a display image lacks information and discussion about the focal center/ choice of projection. I believe that only the orientation of ommatidia is used to generate this projection which leads to the overlap/ non-coverage in Fig. 5c. Correct? It would be great if, for such scenarios, a semi-orthogonal+cylindrical projection could be added? Also, please explain better.

      For clarification, CompoundRay allows for a number of projection modes from any 3D sampling surface to visualised 2D projections. This has now been made clearer with an updated Methods section “From single ommatidia to full compound eye” (lines 171-188), and also a more clarified explanation of the display pipeline within the “CompoundRay Software Pipeline” section (lines 245-247).

      We note that Fig 5 is simply intended as an example of the extreme differences in information that can be provided by nodel (the current state of the art) and non-nodal imagers (as in biological systems). A user could indeed produce custom projections (as now noted in the future work section of the Discussion), such as semi-orthgonal+cylindrical projections by modifying the projection shaders but we do not feel that this adds substantially to the desired message of Fig 5 as currently all view images are generated using the same projection method allowing them to be compared. Further to this, a semi-orthogonal+cylindrical projection would only serve to display these types of eyes and not be of significant use outside of this category of design. Rather, the utility of CompoundRay for research is now demonstrated by the inclusion of an entirely new example experiment (lines 394-467) (Fig 10) which compares artificial and realistic compound eye models in a visual tracking task.

      In additional we note that specific references to the “orientation-wise spherical mapping” of images have been added to appropriate image captions (Fig 5 & 6).

      Finally, we have attempted to be more explicit about about the way that 2D projection systems work within CompoundRay (182-185)

      2) It is clear that CompoundRay is fast and addresses complex compound eyegeometries. It remains unclear, why global illumination models are discussed while the implementation uses ray casting to sample textures without illumination which is equivalent to projection rendering which runs fast on much simpler hardware. If the argument is speed and simplicity of writing the code, that's great, write it so. If it is an intrinsic advantage of the ray-casting method, then comparison with the 'many-cameras' approach sketched below should be done:

      In your model, each ommatidium is an independent pin-hole camera. Instead of sampling this camera by ray-casting, you could use projection rendering to generate a small image per ommatidium-camera, then average over the intensities with an appropriate foveation function (Gaussian in your scenario, but could be other kernels). The resolution of the per-camera image defines the number of samples for anti-aliasing, randomizing will be harder than with ray-casting ;). What else is better when using ray-casting? Fewer samples? Hardware support? Possible to increase recursion depth and do more global things than local illumination and shadows? Easier to parallelize on specific hardware and with specific software libraries? Don't you think it would make sense to explain the entire procedure like that? That would make the choice to use ray-casting much easier to understand for naive readers like me.

      Thanks for this feedback, and can see that it was misleading to include this in our previous Methods section. We have now reduced and moved discussion of global illumination models to the future work section at the end of the Discussion. We have also added a clarification to the end of this document that summarises this point as it was raised by multiple reviewers (see Changes Relating to Colour and Light Sampling)

      3) CompoundRay, as far as I understand, currently renders RGB images at 8-bitprecision. This may not be sufficient to simulate the vision of arthropod eyes that are sensitive to other wavelengths and at variable sensitivity.

      Thanks for pointing out this easy-to-miss implementation detail. Indeed, you are correct that the native output is at 8-bit level as is standard to match display equipment. However, we note that the underlying on-GPU implementation operates at a 32-bit depth, so exposing this to the higher-level Python API should be possible, which could then be used as you suggest. We view adding enhanced lighting properties including shadows, illumination and higher bit depths so as to better support increased-bandwidth visual sensor simulation as future updates which we have now outlined in the Discussion (line 549-553).

      Reviewer #2 (Public Review):

      In this paper, the authors describe a new software tool which simulates the spatial geometry of insect compound eyes. This new tool improves on existing tools by taking advantage of recent advances in computer graphics hardware which supports high performance real-time ray tracing to enable simulation of insect eyes with greater fidelity than previously. For example, this tool allows the simulation of eyes in which the optical axes of the ommatidia do not converge to a single point and takes advantage of ray tracing as a rendering modality to directly sample the scene with simulated light rays. The paper states these aims clearly and convincingly demonstrates that the software meets these aims. I think the availability of a high-quality, open-source software tool to simulate the geometry of compound eyes will be generally useful to researchers studying vision and visual behavior in insects and roboticists working on bio-inspired visual systems, and I am optimistic that the describe tool could fill that role well.

      Thankyou for the positive feedback.

      As far as weaknesses of the paper, the most major issue for me is that I could not find any example of why the additional modeling fidelity or speed is useful in understanding a biological phenomenon. While the work is technically impressive, I think such a demonstration would increase its impact substantially.

      An example experiment has been added as requested.

      I can identify a few more, relatively minor, weaknesses: the software tool is not particularly easy to install but I think this is due primarily to the usage of advanced graphics hardware and software libraries and hence not something the authors can easily correct. In fact, the authors provide substantial documentation to help with installation.

      Indeed, we have tried to ease installation as much as possible by provided detailed documentation. This has been updated since initial submission and proven sufficient for multiple users. We have looked into dockerising the code but as correctly identified by the reviewer there are significant challenges due to proprietory hardware and their drivers.

      Another weakness of the tool, which the authors might like to address in the paper, is that there are some aspects of insect vision and optics which are not directly addressed. For example, the wavelength and polarization properties of light rays are hardly addressed despite extensive research into the sensation of these properties. Furthermore, the optical model employed here is purely ray based and would not allow investigating the wave nature of light which is important for propagation from the corneal surface to the photoreceptors in many species.

      Indeed, it is correct that the current implementation does not allow such advanced light modellign features but as our initial aim was to allow arbitrary surface shapes this was considered beyond the scope of this work. However, we have added a short description of extensions that the method would allow without significant architectural changes which include many of those listed by the reviewer. As the renderer simulates light as it reaches the lens surface, it is hoped that further works will be able to use this natural boundary between the eye surface and it’s internals to build further computational models that use the data generated in CompoundRay as a starting point to then simulate inside-eye light transport.

    1. Author Response

      Reviewer #3 (Public Review):

      Here, Johnstone et al. developed novel tools to study endogenous and tissue-specific circadian clocks, which control gene expression oscillation over a 24-hour period. They find that these genetically encoded luciferase-based tools, which they call LABL (Locally Activatable BioLuminiscence). Other known techniques monitor downstream products of circadian clock gene activity (ie, neuronal calcium imaging) or utilize terminal assays such as qRT-PCR or require removal of organs for ex vivo monitoring. The authors show that their LABL technique faithfully mimics the oscillations of gene expression seen with other techniques for broad circadian expression drivers and for neuronally specific expression drivers but show different patterns for non-neuronal, so-called peripheral clocks. These results suggest that the canonical hierarchy of central clocks regulating peripheral clocks may need closer re-examination.

      The conclusions of this paper are mostly well supported but three specific aspects need to be clarified or tested.

      1) Figures 5A, 6B, and 6E are critical for the conclusions of this paper and from what this reviewer can tell, they support these conclusions but the overlay of mutant and wild type on the same graphs obscures both. This reviewer would suggest including split graphs with wild type and mutant alone for independent evaluation.

      We agree with the reviewer that our well-intended attempt to compare two conditions ended up confusing both. We therefore show the wild-type graphs separately and earlier for clear demonstration that LABL can be used with drivers that target specific neuronal clusters.

      2) Luminescence is a well-established, high-resolution real-time monitor; at the same time, my one concern is that luminescence via luciferase and feeding of luciferin substrate might be dependent on the host animal's feeding patterns. How do we know that the peaks and troughs of luminescence are not due to peaks and troughs of feeding and metabolism rather than peaks and troughs of circadian clock gene expression? Can the authors offer evidence to support the latter?

      The reviewer makes a good point. We refer the reviewer to the section above on major concerns that we addressed above, point 4, “metabolism argument”.

      3) While the comparison of wild type to arrhythmic mutants is consistent with current data and seems to reflect faithful monitoring of tissue-specific circadian clock activity, the classic technique for demonstrating faithful monitoring of clock activity is to slow down or speed up the clock. The authors have themselves used this technique in previous publications, including using phosphosite-specific mutants of clock components and flies containing constitutively active or kinase-inactive regulators of clock activity. Another classic technique is to use short or long period mutants. Use of any of these types of mutants showing that they shift the luminescence rhythms generated by LABL would provide further evidence that LABL reflects endogenous, tissue-specific clock activity. Alternatively, monitoring the rhythm of a clock thought to be independent of central clock activity such as that in the antennae or Malpighian tubules and showing that this is not disrupted by central clock disruption would provide such support as well.

      We thank the reviewer for this suggestion. We refer the reviewer to the major concerns that we addressed above, point 1, where we describe recording perS and perL mutants.

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Villalta, Schmitt, Estrozi and colleagues report their results on genome compaction in one of the most complex known viruses, the Mimivirus. This work will be of interest to a broad readership, and particularly to virologists and structural biologists. The authors describe a novel mechanism used by mimivirus to compact and package its 1.2 Mb dsDNA genome. In particular, the mimivirus genome is shown to be packed into magnificent cylinder-like assemblies composed of GMC-type oxidoreductases, presenting yet another remarkable case of enzyme exaptation. By using cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET), the authors determined the structures of such fibers in several relaxation states, which presumably represent different stages of nucleoprotein unpacking upon delivery into host cytoplasm. The authors also suggest (although do not directly visualize) that the lumen of the genomic fibers contains several viral enzymes, most notably, DNA-dependent RNA polymerase, which is necessary for cytoplasmic replication of the mimivirus. Overall, this is an important discovery, which further expands our appreciation of the "inventiveness" of viruses.

      We thank this reviewer for the positive and constructive comments. We provide now some additional data corresponding to unpublished follow up studies, we hope will help all reviewers assessing the quality and reliability of our work.

      I am not an expert on helical reconstructions and cannot evaluate the validity of the models. Thus, my specific comments will focus on aspects of the work with which I am more familiar.

      1) In light of the presented results, it is reasonable to assume that GMC-type oxidoreductases of the mimivirus are very important for the formation of functional virions. However, in a previous study (PMID: 21646533), it has been shown that the genes encoding GMC-type oxidoreductases can be deleted from the virus genome (M4 mutant) without the loss of infectivity. The M4 virions were devoid of the external fibers decorating the icosahedral capsid, but the genome was still packaged. How do the authors reconcile these results with those presented in the present manuscript? This should be addressed in the Discussion section.

      In fact, like the reviewers, we initially assumed that the GMC-oxidoreductases were essential. Now, we believe it might be premature to assume that GMC-type oxidoreductases are the only type of proteins that can be involved in the scaffolding of the Mimiviridae genomic fibers. We managed to extract the genomic fiber of M4 (the isolate without GMC oxidoreductases). The fiber also has a rod-shaped structure but protein composition analysis of the purified fiber shows that different proteins are involved in its assembly.

      We hope the reviewers will accept to reserve our finding for a following publication.

      2) The authors state that mimivirus encodes two GMC-type oxidoreductases (qu_946 and qu_143) and that both could be fitted into the electron densities. However, I could not understand whether the authors think that the fibers are heteroassemblies of both oxidoreductases or different fibers are composed of different proteins, or only one is used for fiber formation. Please clarify. In case you are not able to distinguish between the two homologs (e.g., due to limited resolution), state so explicitly.

      We cannot discriminate between the two GMC-oxidoreductases due to their close identity (69% identity, 81% similarity) and the resolution of the map. Yet we think that in most cells the qu_946 GMC-oxidoreductase is the most abundant at the time of genome packaging (from our proteomic study, between 2 and 9 times). Yet, in some cells the second GMCoxidoreductase could become the most abundant and, in that case, the genomic fiber is built using qu_143.

      3) I am slightly puzzled by the observed "ball of yarn". It is hard for me to imagine that a cylindrical container/fiber containing a continuous dsDNA genome could be bent or fragmented into bundles because this would break the protein-protein interactions holding the fiber together. In Figures 1C and S1, are these parts of the same fiber or multiple fibers coming out of one capsid? Related to this question - is there evidence (e.g., from qPCR) that Mimivirus carries a single copy of genomic dsDNA per capsid?

      We believe this reviewer should think in terms of packaging. The folded genome is packaged through two lipid membranes (the one lining the capsid interior and the one in the nucleoid) concomitantly with its wrapping by the protein shell ribbon. Thus, there is plenty of space in the nucleoid at the beginning of the packaging and the genomic fiber is gently folded inside. But as more genome needs to be packaged, this compresses the flexible fiber into the nucleoid until it is totally encased in the nucleoid. That also defines the size of the nucleoid in the icosahedral capsid. This tight packaging is exemplified in Fig 1A for instance or the AFM images of the nucleoid enclosed in P3 of this file.

      We provide a more general answer in the answers requested by the editor.

      We think that the entire genome can only be packaged in the capsid through its assembly within the protein shell. We also think the genomic fiber is progressively built on the genomic DNA while it progresses into the capsid, most likely by an energy driven packaging machinery. This process can be compared to bacterial pili assembly, except that pili are built on the surface of the cell, while the genomic fiber is built into a compartment, the nucleoid, forcing it to fold in this compartment, which is only possible due to the high flexibility of the genomic fiber. Thus, the entire genome corresponds to ~40 µm of genomic fiber, which when folded as a ball can entirely fit into the nucleoid. The organization of the genome in a large “tubular structure” and its folding inside the nucleoid compartment has been previously reported by AFM studies of the mimivirus particles (Kuznetsov, Y. G. et al. Virology 2010; Kuznetsov YG et al. J. Virol. 2013, Fig 15), which the authors refer to as “highly condensed nucleoprotein masses about 350 nm in diameter within the inner membrane sacs of virions”, with the presence of tubular structure they refer to as “thick cables of the nucleic acid” (image P3 herein).

      4) The authors describe the interactions between the monomers in the dimer of qu_946 as well as between qu_946 and DNA. I would also like to see a brief description of protein-protein interactions between subunits within the same helical strand as well as between helical strands, which hold the whole assembly together (i.e., what are the contacts between green subunits as well as between green and yellow subunits shown in Fig 2C). The authors suggest that the shell "would guide the folding of the dsDNA strands into the structure" (L310). To support this statement, the authors could show the lumen of the fiber rendered by electrostatic potential.

      We thank this reviewer for these suggestions. An additional supplementary Table (Table S4) is now provided listing the various contacting residues in each genomic fiber map and for each GMC-oxidoreductase. The number of contacts obviously decrease in the relaxed structure, but even in the compact forms, we noticed there are relatively few contacts intra and inter-strands, which may also explain the flexibility of the structure. We now provide a new figure 3 in which the lumen of the fiber is rendered by electrostatic potential for the Cl1a map and each of the two GMC-oxidoreductases.

      5) Please provide some background information on the distribution of GMC-type oxidoreductases in other families of giant viruses, so that it is clearer whether the described packaging mechanism is specific to mimiviruses or is more widespread.

      This is a central point, also linked to the question about M4. In fact, like the reviewers, we initially assumed that the GMC-oxidoreductases were essential. Now, we believe it might be premature to assume that GMC-type oxidoreductases are the only type of proteins that can be involved in the scaffolding of the Mimiviridae genomic fibers.

      If this reviewer still thinks this is essential to this manuscript we can provide a multiple alignment of the GMC-oxidoreductases of members of each clade upon request.

      Reviewer #3 (Public Review):

      Since it was presented to the scientific community as a viral entity, mimivirus has the unlimited capacity to cause surprise and admiration. In this manuscript, Villalta, Schmitt, Estrozi, et al. and Abergel present how the mimivirus gigantic genome is organized into the virion. The authors succeeded in developing a protocol to trigger virus genome uncoating followed by genome-associated proteins purification. The presented data indicates that a helical shield composed of two GMC-type oxidoreductases is associated with the mimivirus genome, named genomic fiber. By cryo-EM, and cryo-tomography different forms and stages of the genomic fiber were detailed described, indicating the dynamics of fibers conformational changes, likely related to genome packing and uncoating during the virus replication cycle. In-depth analysis of a substantial number of individual virus fibers revealed that the mimivirus genome is folded and organized inside the aforementioned helical shield, which seems to be novel among giant icosahedral viruses. Proteomics in association with image analysis indicates that mimivirus packed genome forms a channel, which accommodates key enzymes related to early phases of the replication cycle, especially RNA polymerase subunits.

      I must disclose that I am not an expert on structural virology and proteomic analysis. Therefore, I don't feel I can contribute to the improvement of this kind of analysis. That said, I congratulate the authors for their efforts to make the manuscript story understandable to nonexperts.

      We are grateful to this reviewer for these positive comments.

      I have a few suggestions and comments:

      1) Please consider the "nucleocapsid" concept during genomic fiber presentation. I believe it fits in;

      We fully agree and this was why we referred to APBV-1. Obviously, it was not clear and we now explicitly use the word “nucleocapsid” in the text.

      2) The "ball of yarn" analogy is nice, but fig 1C shows several fibers unconnected (free) in one of their ends. I am wondering if it means that the genomic fiber is not a long-single structure covering the whole genome, but a bunch of several independent helical structures covering the whole genome and attached in such "ball of yarn". Like several threads connected. Could the authors clarify that please?

      In the “ball of yarn” structures, there are clearly breaks that give the impression of multiple fibers. Yet, these breaks are due to the multiple steps of the extraction, enrichment and purification treatment. The genomic fiber is built as a long (~40 µm) single structure folded in the nucleoid while it is loaded. As a result, it is tightly packed into the nucleoid and broken into fragments upon release due to the fragilizing treatment. As exemplified in the CryoEM image provided above (P9) on freshly opened capsids, these breaks appear to depend on the treatment. This reviewer could also look at the answer we provided to Reviewer 2 point 3 as this could help clarify how it is possible to package the genomic fiber and subsequently fold it into the nucleoid to the point where it is tightly packed and under pressure.

      3) Considering previously published data on proteomics of viral factories and transcriptomics of mimivirus: is there any temporal association between GMC-type oxidoreductases' peak of expression and genome replication during the viral cycle? what about RNA pol subunits? Are all those proteins highly expressed during the late cycle? or do they reach the peak concomitantly with genome replication? This information can support the discussion on the genome-fibers assembly during the cycle.

      We thank this reviewer for these suggestions. We now added time of expression of the proteins involved in the genomic fiber composition along the manuscript. We added explicit sentences in the main text both for the GMC-oxidoreductases and RNA polymerase subunits. The RNA polymerase as well as proteins involved in mRNA maturation are in the virion (Table S2 B) and studies by others demonstrate early transcription takes place in the nucleoid once transferred in the host cytoplasm (Reference 24). We also provided a link to the reviewers where to find the expression data for the different mimivirus genes. http://www.igs.cnrs-mrs.fr/mimivirus/

      4) Taken together, data seem convincing to demonstrate that the virus genome is located inside the helical shield. However, I believe that the authors could better explain why we only see 20 kb fragments in the gel, including in the control (in Fig S2).

      We hope our answers to this comment will convince this reviewer.

      Fig S2 corresponds to a regular 1% agarose gel and not to a PFGE gel. This gel was simply to show there is DNA associated with the genomic fiber and not to show the size of the DNA as the genomic fiber has been broken into pieces and we thus do not expect to have very high molecular weight. I must point out that when extracting the DNA form Mimivirus capsids using standard kits and pipetting, it also migrates at the top of the gel (Lane 1 in Fig. S2) while it would likely appear as a smear above 20 kb on a PFGE. By contrast when the viral particles are put into plugs prior lysis, the genomic DNA migrates at the proper size, as shown in the publication from Boyer et al. 2011 (reference 31), showing the genome of Mimivirus is a linear genome migrating around 1.37 Mb (Fig 1, Panel B, Lane M1). In P9 of this letter, an image of a long (> 6 µm) and flexible fiber is presented.

      Reviewer #4 (Public Review):

      In the manuscript "The giant Mimivirus 1.2 Mb genome is elegantly organized into a 30 nm helical protein shield", the authors show that, when subjected to low pH stress, the Mimivirus particle releases 30nm-diameter filamentous assemblies. These filaments consist of a protein shell that envelopes the Mimivirus genomic DNA. The protein shell is composed of two GMC-oxidoreductases, the same protein that forms the long fibers emanating from the capsid of the Mimivirus.

      Overall, despite being interested in the subject, this scientist was left confused about several aspects of the paper described below. The presentation of the material is also confusing.

      We hope the answers and images we provide to all Reviewers in page 2 to 12 herein will clarify the various points raised by this reviewer.

      1) The presented data do not allow the estimation of the amount of mimivirus genome organized into 30 nm diameter filaments. Hence, the title of the paper is misleading.

      The entire genome should be packaged in the genomic fiber. That was already observed by other and we now provide an image of the nucleoid imaged by AFM that was published. The image was extracted from Kuznetsov et al. J. Virol. 2013. See p9 of this letter.

      2)The filamentous structures are a result of extremely harsh treatment of the virus particle, which starts with a 1.5 hour-long incubation at pH 2. Do the filaments actually exist inside the virus particle as the title of the paper implies?

      The 1 h incubation at 30°C and pH 2 was only applied to recover the nucleoids (see material and method section “Nucleoid extraction”) presented in Fig S1A. Acidic treatment was never applied to produce the genomic fiber as we noticed it is sensitive to both temperature and acidic treatment. All steps of the extraction protocol were performed at pH 7.5 (section: “Extraction and purification of the mimivirus genomic fiber”). We must emphasize that the release of the genomic fiber can be seen at the very first step of the extraction protocol (protease treatment). The sample was also controlled at each step of the protocol by negative staining TEM to assess the status of the genomic fiber. We had to optimize the protocol as using a too soft proteolytic treatment led to too few opened particles but with mostly a compact genomic fiber released, if it was too harsh, all particles were opened but the genomic fiber was mostly in the ribbon state. We had to compromise to get a decent amount of compact and relaxing structures to be able to perform the present work. We would like to stress out that we could reproducibly obtain the genomic fiber from many preparations and that we could observe them with different virions (including M4), even using different protocols (only the one with the better yield is reported in the manuscript).

      In the Figure 1B the genomic fiber can be seen inside a virion and is still encased in the membrane compartment. These structures were not reported in previous cryo-EM analyses of the virions. As said above, they were only reported by AFM studies of the mimivirus particles (Kuznetsov, Y. G. et al. Virology 2010; Kuznetsov YG et al. J. Virol. 2013, Fig 15). See p9.

      Or [might] these filaments [form during] host take over?

      Or [perhaps] these filaments [result from a harsh in vitro treatment] and have nothing to do with either?"

      The first two questions can be answered with the help of cryoFIB tomography, which might be beyond the scope of a "paper revision". However, the properties of the two GMCoxidoreductases in the presence and in the absence of genomic DNA must be examined in greater detail. Can these proteins, by themselves, form similar hollow filaments (or any filaments) when subjected to the same treatment as the virus?

      I personally have difficulties to imagine that such a complex structure could be the result of an artefact due to the treatment for several reasons: - It is unlikely that by simply putting the GMC-oxidoreductases with DNA would result in a helical structure where the DNA is folded 5 times and internally lining the protein shell (extended data video1 of one tomogram). It would be like crystallizing the proteins (in a heterogeneous sample) onto the folded DNA to form a helix with a hollow lumen. The crystallographic data obtained by others by on the mimivirus GMC-oxidoreductase did not produce tubular structures either and they reported 3 crystal forms. They overexpressed the proteins in E. coli and did not report such structures bound to DNA either.

      • Given the presence of compact and relaxed forms, once relaxed the helix cannot go back to a compact state passively by simply rewinding suggesting the relaxed forms are the result of decompaction of a constrained structure. This is also supported by the loss of DNA in the relaxed state Cl3. Last steps of unfolding correspond to the loss of one ribbon strand after the other.

      • The contacts between chains intra and inter strand are also scarce supporting an active assembly of the structure. We now provide an additional supplementary Table S4 with the different contacts for the different states of the genomic fiber.

        3) Although the assignment of the qu_946 oxidoreductase to the corresponding cryo-EM density is correct (as the resolution is high enough), I am confused about the other oxidoreductase (qu_143). Where does it fit to? Which structure does it form?

      We cannot discriminate between the two GMC-oxidoreductases due to their close identity (69% identity, 81% similarity) and the resolution of the map. Yet we think that in most cells the qu_946 GMC-oxidoreductase is the most abundant at the time of genome packaging (from our proteomic study, between 2 and 9 times). Yet, in some cells the second GMCoxidoreductase could become the most abundant and, in that case, the genomic fiber is built using qu_143.

      Equally important, what is going on with the N-terminal 50-residue domain of qu_946? Is there a space for it in the cryoEM map? Is it disordered?

      The N-terminal domain is only present in the fibrils decorating the capsids. As illustrated in Fig S12, when analyzed by MS-based proteomics, the comparison of the peptide coverage of the GMC-oxidoreductases whether they compose the fibrils or the genomic fiber is not the same. The N-terminal domain is clearly covered when the fibrils (data not shown) or intact virions are analyzed and not covered when the analysis is performed on the genomic fiber. That is why we propose this N-terminal domain could be an addressing signal (see main text) and that a protease could be cleaving it in the case of the genomic fiber assembly.

      Main text: The proteomic analyses provided different sequence coverages for the GMCoxidoreductases depending on whether samples were virions or the purified genomic fiber preparations, with substantial under-representation of the N-terminal domain in the genomic fiber (Fig. S12). Accordingly, the maturation of the GMC-oxidoreductases involved in genome packaging must be mediated by one of the many proteases encoded by the virus or the host cell.

      Indeed, there is no space to accommodate this domain as it would prevent the interaction between the protein shell and the DNA or/and induce an increase of the genomic fiber diameter that would be too big to be accommodated into the nucleoid.

      4) The bubblegram analysis is not very convincing. The bubbles appear to correlate with the length or thickness of the structure - the long or overlapped structures form bubbles. The bubbles may not be due to the presence of DNA.

      The point is, as demonstrated by our structural studies, that the relaxed structure lost the DNA. This is why bubble cannot be seen in the relaxed broken fibers. On long fibers still in compact form, the DNA is visible in the structure and bubble can be seen. Yet the evidence for the presence of DNA in the structure is also provided by the agarose gel of the purified genomic fiber and the cryo-EM structures. Bubblegrams are just one additional analysis which was provided.

    1. Author Response

      Public Evaluation Summary:

      Predicting if a tumour has aggressive or metastatic characteristics would be of great utility in the clinic as it would help patient stratification and management. In this manuscript, Carrier and collaborators derive a signature for melanoma aggressiveness relying on methylated regions of tumour and cell line genomes. The identification of a 4-gene methylation biomarker for melanoma aggressiveness and survival is an important contribution. This manuscript is of relevance to clinicians and melanoma researchers interested in biomarker research.

      We would like to specify that the methylation of 5 CpGs was identified as potential signature of the aggressiveness and survival of melanoma in primary tumors. Three are the key and original findings of the study: 1/ the observation that robust DNA methylation traits of aggressiveness are independent of the physiological context; 2/ the methodology of combining DNA methylome analysis and chromosome cluster-based analysis that can be applied beyond melanoma; and 3/ the identification of the methylation of 5 CpGs (and not genes) that provide a predictive value of the aggressiveness of the melanoma in primary tumors.

      Reviewer #1 (Public Review):

      In this manuscript, Carrier and collaborators derive a methylation signature for melanoma aggressiveness from the sequential analyses on various cell lines in different organisms and test it in a set of primary and metastatic melanoma tumours. However, I think that some of the claims are a little premature as a broader sample size would need to be tested to assess the signature robustness and applicability.

      Strengths

      • The approach the authors take is innovative and I agree with their premise that genes that make cells be more aggressive should be detected across different organisms.

      • Different organisms were evaluated.

      • Figures are illustrative and the narrative is very clear.

      We thank the referee for the comments

      Weaknesses

      • The sample size is small. In my opinion, a broader and more diverse set of samples would need to be tested if authors suggest making a diagnostic kit with the genes in their signature

      We agree with the referee and we are pursuing the project with the aim to develop a assay measuring the 5 CpG simultaneously that can be easily used by dermatologists and professionals. But this is beyond the scope of the manuscript, which reports an original strategy and the novel findings listed above.

      • A more comprehensive comparison with what other authors have found when doing similar studies would be needed to put in context their results.

      We have clarified this point in the revised version. Only two of the five CpGs that we have identified are available in public datasets. The complete signature of the five CpGs has not been analyzed in other studies.

      Reviewer #2 (Public Review):

      Carrier et al. sought to define the methylome associated with increased aggressiveness of melanoma, with the goal of identifying common changes in methylation and to define a methylation signature of disease progression. To do so, they analyzed 3 cell line pairs that either were established from the same patient (primary vs cutaneous metastasis) or that were a parental cell line and its derivatives generated through repeated transplantation and selection for the ability to metastasize to the lung. Among these pairs, 229 genes were identified as commonly hypermethylated. Interestingly, genomic mapping of these genes revealed that 74 of these genes localized to 9 methylation clusters, 34 of which had two CpGs and at least 40% differential methylation. Carrier et al. also performed Ingenuity pathway analysis, uncovering 116 genes among the 229 with putative cancer-associated functions. From these genes, 8 candidates were selected for validation in cell lines and patient samples. 4 out of 8 genes (MYH1, PCDHB16, PCDHB15, BCL2L10) showed differential methylation in patient samples, and their methylation status correlated with patient overall survival. Carrier et al. then devised a score based on methylation of these 4 genes, which performed better in predicting patient prognosis based on primary tumor methylation score than did the Breslow index. This methylation score could therefore be used as a biomarker of melanoma aggressiveness and this approach could be implemented in other tumor types.

      Overall, the approach appears to be well designed, the results are of good quality, and generally support the claims. For some aspects of the paper, the rationale is not immediately apparent and should be better described, for instance the choice of the 8 genes selected or validation appears arbitrary and the cut-off long term vs short term survival of patients (1 year) is not justified clinically or scientifically. Providing additional information will make this study clearer for the reader.

      We thank the referee for the comments. We have clarified these two points in the revised version. The 8 genes were chosen because they were distributed on chromosomes with clusters of methylation, correspond to peaks of hypermethylation and have a potential function role in cancer formation. One year was chosen because at the time of the study it was the average overall survival of diagnosticated cutaneous melanoma.

      Reviewer #3 (Public Review):

      The authors propose that the DNA methylation signature of tumor aggressiveness would be independent of the physiological context: starting from a human tumor, shared signatures relevant to aggressiveness should emerge independent of whether this trait was acquired in humans or whether cells have been implanted into rats or mice. In a multi-step selection process, they identified hypermethylated sites common to the most aggressive melanoma forms, analyzed the distribution of these sites in the genome, and validated these methylation peaks in cell lines and patient samples.

      The weakness is related to the use of murine cells and also to The Functional annotation and pathway analysis. The list of hypermethylated genes was imported into QIAGEN's Ingenuity® Pathway Analysis (IPA®, QIAGEN Redwood City, www.qiagen.com/ingenuity). I am wondering if it would be more appropriate to use other platforms to explore the data.

      No murine cells were used in the study but human derived cells. This was clarified in the text. Other platforms were used (Panther.db, Kegg pathway, DAVID, …) but the results were more complete with IPA.

      The strengths are related to the main strategy that identified a DNA methylation signature of five CpG sites in four gene promoters in primary tumors that could predict the overall survival of the patients and thus has potential diagnostic application. This strategy, which overcomes heterogeneity in tumors due to the environment, can potentially be generalized to other cancers involving DNA methylation alterations

      The authors combined analysis of the DNA methylome with the chromosomal location. The multistep strategy developed and used to identify differentially methylated genes predicting aggressiveness is originally identified as a common pattern or a specific signature of melanoma aggressiveness. The unique approach used in this study yielded a potential DNA methylation signature that correlates with outcomes.

      The description of a novel multistep approach allowed identifying a methylation signature of five CpGs in primary melanoma tissues that has the potential to predict survival outcomes in cutaneous melanoma patients. This integrated approach can be applied not only to other cancer types but also to other diseases or biological processes such as aging and development.

      We greatly appreciate the comments of the referee that underline the strengths of our study.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors aim to evaluate the flexibility of the amination network in E. coli. To achieve this, they knock out key enzymes GDH and GOGAT (which supply the majority of the cell's fixed nitrogen by aminating 2-oxoglutarate to glutamate), creating a glutamate auxotroph strain (glut-aux). They first consider whether exogenous amino acids can either replace glutamate, create glutamate through transamination, or be converted directly into glutamate. They found that many amino acids rescued growth of glut-aux, either through conversion to glutamate (proline, via putA), or transamination to glutamate (many, via aspC), and validate this finding with isotopic nitrogen labeling and demonstrating concentration-growth rate dependence. Then, for some amino acids that didn't initially rescue growth in the glut-aux strain, the authors engineer growth rescue through laboratory evolution, gene deletion, and exogenous transaminase overexpression. Finally, they propose that E. coli may accommodate non-canonical (non-glutamate) ammonium assimilation. Informed by the glut-aux rescue experiments, they engineer two strains that assimilate ammonium through alternative amino acids: aspartate (via native aspA overexpression) and leucine (via exogenous leucine dehydrogenase overexpression).

      Expanding the repertoire and characterization of auxotrophic microbial strains is an important goal for synthetic biology and metabolic engineering. By creating an E. coli glutamate auxotroph strain, demonstrating and expanding growth rescue on other amino acids, and engineering alternative ammonium entry points, the authors support their claims of flexibility and promiscuity in the cellular amination network. These claims are corroborated by comprehensive growth data and isotope labeling. While certain aspects of their investigation are not novel and the manuscript could benefit from more contextualizing, their findings will be of broad interest to researchers investigating nitrogen assimilation in microbes, and those seeking to engineer E. coli for bioproduction and novel metabolic circuits.

      Strengths:

      *The collection of growth rate data is comprehensive, and in combination with nitrogen isotope labelling, paints a clear picture of amine donation (and ammonium assimilation, in figure 7). The growth rate dependence experiments represent an impressive amount of work, and are particularly informative in the strain engineering experiments in figure 6.

      *The putA and aspC knockouts are elegant demonstrations of the specificity and promiscuity of E. coli's amination network, respectively. The contextualization with previous in vitro data was very informative, and reporting the minimal effect of the ybdL knockout demonstrated the importance of the glut-aux strain in assessing the promiscuity of various transaminases in their cellular context.

      *Engineering the growth of glut-aux on four amino acids that didn't originally rescue growth is impressive, particularly getting exogenous transaminases to work as intended. As mentioned in the manuscript, this shows the potential for this particular auxotroph strain to serve as a growth-based selection platform for alternative amine sources.

      *Engineering alternative ammonium assimilation through aspartate with a simple native AspA overexpression is a very strong demonstration of the flexibility of E. coli's amination network. This result may be useful for metabolic engineers looking to optimize E. coli for growth on formate and other low-energy substrates for the production of biofuel and high-value products.

      Weaknesses:

      *The framing of hypotheses for alternative routes of amine donor assimilation are clarifying to a reader unaware of the range of amine supplementation options available to E. coli: (i) replacement of glutamate, (ii) amine donation to 2-ketoglutarate to create glutamate, (iii) indirect amine donation to 2-ketoglutarate to create glutamate, and (iv) conversion to glutamate. However, outside of the figure 1 caption and lines 97-100 in the results section, the hypotheses are not mentioned again according to this classification. Directly after introducing figure 2, the authors discuss the possible ways that various amino acids rescue the growth of glutaux and hypothesize that amine transfer is responsible. It would be immediately helpful to organize this discussion by the i-iv classification system introduced in the preceding paragraph. Similarly, figures 5-7 can be classified in this way. For example, the action of PutA in figure 5, where you say that proline is "metabolized... to glutamate", is unclear, and presumably refers to being "metabolically converted to glutamate," hypothesis (iv).

      We thank the reviewer for this comment and agree that repeating the classification when specific mechanisms are described is indeed very helpful for understanding the context. Correspondingly, we now refer to these classifications whenever addressing rescue mechanisms in the text passages e.g. in lines 132, 133, 152, 189 – 191, 197, 213, 240, 394, 417, 468, 471.

      *Construction of a glutamate auxotroph strain, by deletion of gdhA and gltBD, is well-established (Dougherty et al, 1993), and has been standardized in the Keio collection (Baba et al, 2006). While it is critical that the authors used the same lambda-red recombinase strain for all deletions after making the glut-aux strain, it should be made clear to readers what has been done before by adding some context here.

      We thank the reviewer for drawing our attention to this relevant publication (https://doi.org/10.1128/jb.175.1.111-116.1993), which we now cite in lines 90-91. However, the publication describes a D-glutamate auxotrophic E. coli strain with nonsense mutations in the genes dga (glutamate racemase murI) and gltS (involved in Glu-transport). Thus, this is a very different strain to the one that is the center of our study, which is auxotrophic for L-glutamate.

      *Most of the experiments were conducted in M9 media with glycerol as the carbon and energy source. Glycerol is utilized by oxidative phosphorylation in E. coli, like glucose, but is not a preferred carbon source. However, glycerol leads to higher growth rates when amines are supplied by amino acids rather than ammonia, due to imbalances in 2-ketoglutarate (Anat Bren et al, Sci Rep 2016). Knowing that cellular pyruvate and 2-ketoglutarate concentrations are different depending on carbon and nitrogen source, and both are relevant for thermodynamic favorability in cellular amination networks, the authors should justify why glycerol is the carbon source used for most experiments, as they justify fumarate in figure 7.

      We agree with the reviewer that metabolite concentrations differ depending on which carbon source is fed. Ideally, glucose would be used as the most favored carbon source. However, the presence of glucose leads to catabolite repression and hence might not provide the best testing conditions when analyzing the growth effects of additionally provided amines. Hence, in our opinion glycerol, which can be fed together with other supplements, was the better choice to test growth rescuing effects of amino acids. We now explain this to the reader from line 114 - 116.

      *In figure 3, it's unclear why AFLMPST and R are the only proteinogenic amino acids that are analyzed for 15N labeling. One might assume it's a technical issue for the mass spectroscopy data, but the relatively small selection of both amino-donor amino acids and 15N fraction amino acids makes initial interpretation of the figure confusing. Emphasizing that the 15N measurements are representative of all proteinogenic amino acids, and the amino-donors are representative of all amino acids that rescued growth for glut-aux would help.

      We agree with the reviewer that the amino acids selected need to be explained. From line 157, we now explain that we show amino acids covering the aspartate, glutamate, and serine families as well as representatives of branched-chain, aromatic, and arginine which contains an amine derived from the δ amino group of glutamine that originated from free ammonium.

      Additionally, figure 3B could benefit from greater distinction between amine groups and ammonium ions. –

      We changed the figure according to the reviewer’s suggestion. Now in both Figure 1 and Figure 3B free ammonium is highlighted in boxes thus making it is easier to distinguish it to amino groups

      Also, there is presumably a typo in the "external amine donor" cartoon, with 14NH4+ in the grey circle rather than 14NH3+.

      The error was changed accordingly.

      *Adaptive laboratory evolution is not a fair description of how the authors found that a dadX mutation led to growth rescue of glut-aux+alaA on alanine (line 221). Although two weeks of growth may allow for evolution of E. coli in some cases, a single growth curve over two weeks is similar in duration and concept to some of the other (non-evolution) growth curve experiments (Figure 6C). Rather than being evolved from a series of mutations, the appearance of dadX mutants is much more likely the result of highly stringent selection on mutations acquired during outgrowth before the selection was applied. Given the inoculum size of ~106 cells from overnight culture, and E. coli's spontaneous mutation rate of ~10-3 mutations per genome per generation, there is a reasonable probability of isolating one or more dadX mutant cells in the inoculum, which then expanded over two weeks (rather than evolved), given the growth rate of those mutants evident in figure 6A. Labeling this experiment as a spontaneous mutant selection of glut-aux+alaA engineered strain would make the aims and outcome of the experiment more transparent. Alternatively, one could report the growth data from the experiment, if available, or conduct a selective plating of prepared glut-aux+alaA inocula on M9+alanine plates to show the existence of a small mutant population.

      After checking again we now show data from a platereader experiment in supplementary Figure 6-figure supplement 1A showing the emergence of spontaneous mutants after a shorter cultivation time (110 and 130 h). We changed the text according to the reviewer’s suggestion and describe the observation as spontaneous mutants, lines 259 - 260.

      *Beta-alanine and ornithine are important non-proteinogenic amino acids, but there are hundreds of others. It is unclear to the reader why they were selected for assessing amine donation to glut-aux, or why beta-alanine was selected for adding an exogenous transamination route. Stating the relevance of these amino acids to E. coli's amination network or metabolic engineering, or stating that they were serendipitous findings of rescue and no rescue of glut-aux by non-proteinogenic amino acids, would make the choices for strain engineering seem less arbitrary. Similarly, strains engineered to utilize glycine or serine as amine donors (fig 6), or aspartate or leucine as centers of ammonium fixation (fig 7), seem to be chosen arbitrarily out of many amino acids that did or did not initially rescue growth of glut-aux. Simply stating that these were the best (or worst) amine donors based on growth rescue in figure 2 would explain why the strain engineering was not systematic over all 20 proteinogenic amino acids for ammonium fixation or amine donation.

      We thank the reviewer for the constructive criticism. Our investigations first started with testing proteinogenic amino acids. After many of these amino acids rescued the growth of the glut-aux strain we investigated if also non-proteinogenic amino acids are able to complement the growth of the strain. Beta-alanine and ornithine were chosen because they are derived from aspartate (beta-alanine) and glutamate (ornithine), both belonging to families best-complementing growth of the glut-aux strain. We now comment on why we selected these amino acids in the text from line 110 - 114.

      In the paragraph describing engineering the use of beta-alanine, we now refer to the biotechnological interest of beta-alanine-derived products (lines 286 - 289).

      From line 321, we describe why engineering the use of glycine and serine as amine donors is relevant.

      In the ammonium assimilation section, we now comment from line 381 that aspartate and leucine represent amine donors supporting fast and very slow growth.

      Reviewer #2 (Public Review):

      The authors are trying to show the existence of a reversible amination network that allows nitrogen transfer via transaminases for synthesis of several amino acids. Nitrogen assimilation and distribution is known to start with ammonia assimilation via glutamate and glutamine synthesis, and subsequent transfer of the nitrogen via transaminases and amidotransferases. To demonstrate an amination network, i.e., reversible nitrogen transfer, the authors start with a glutamate auxotroph and provide a variety of compounds to determine which support growth. Growth implies the transfer of amino groups to glutamate. The authors show some pathways required for transfer of the amino acid nitrogen via genetic analysis. The concept of an amination network is clever since current thinking would suggest that nitrogen flows in one direction and is not reversible. The basic method is genetic which is sometimes supplemented with isotope dilution experiments. In addition to analysis of a possible amination network, experiments are presented that test whether alternate routes of ammonia assimilation (the source for amino groups) are possible. While the authors show that a reversible amination can exist (nitrogen flow from glutamate is known, while the authors show that nitrogen flow to glutamate is possible), they do not provide any evidence that the nitrogen flux to glutamate does exist in nature for wild-type strains. A genetic analysis with complex strains (multiple mutations) and very specific growth conditions cannot provide evidence for nitrogen flux to glutamate from other amino acids. Positive evidence requires a biochemical analysis: how much N15 from an amino acid is transferred to glutamate or other amino acids. Without such results, it cannot be established that amino groups can be transferred to glutamate at an appreciable level or that the amination network is reversible, which is an important conclusion of this work. Without such results, the proposed amination network is a theoretical possibility that is detectable only in genetically complex strains and specific medium. The impact of the work is more limited without a biochemical analysis.

      We agree with the reviewer’s comment that we use a synthetic system to study the amination network. The results obtained with this system do not necessarily describe the amination network operating in a wild-type strain, especially because of differing metabolite concentrations. Also, we are not claiming that these reverse reactions, e.g. for glutamate synthesis are happening in a wild-type E. coli and we are referring to the glut-aux strain in all relevant parts of the manuscript.

      Nevertheless, we believe that our approach of using the glut-aux strain as a readout reveals the potential of the network to interconvert amino acids into one another, although parts of the network may only become relevant under specific conditions in which glutamate availability is limiting.

      We made some amendments to the manuscript clarifying that we are using a synthetic strain for our analysis which may not reflect the intracellular amino acid concentrations in the wild-type (lines 145 – 148, lines 462 – 466).

      We agree that a further deep biochemical characterization of the amination network, as well as dynamic 15Nlabeling experiments, will reveal very valuable information about the connections within the amination network. However, these analyses are beyond the scope of this manuscript and cannot be easily done.

      Strengths and weaknesses.

      The results suggest that the amino group can be readily transferred between keto acids via a network of transaminases in strains. Several of the proposed reactions are novel and have not been previously described, such as tryptophan as an amino donor. The idea and experimental design are clever.

      We thank the reviewer for the support.

      This work does not discuss several relevant topics. Please notice that the comment is that several issues are not discussed or considered. The topics overlap.

      The activities of transaminases are important for this study but are not discussed. A useful summary of transaminase levels is provided in the following reference: Mol Microbiol . 2014 94:843-56. doi: 10.1111/mmi.12801. PMID: 25243376. The 3 most abundant transaminases are SerC, AspC, and IlvE. The results from that paper are consistent with many results of this work. In the cited work, all defects in transaminases that result in phenotypes were complemented with all transaminase genes. The cited paper is directly relevant to this work.

      We thank the reviewer for the comment. Now, when referring to overlapping transaminase activities we cite the work of Lal et al. who identified high enzymatic redundancies within E. coli transaminases (lines 229 – 233).

      There is no discussion of metabolite levels. This is important since the most abundant metabolite in wild type E. coli is undoubtedly glutamate (see work by Rabinowitz), which by mass action will provide the direction for transaminase reactions. For the mutant strains used, glutamate might not be the most abundant metabolite, and reversible transaminases conceivably will flow in an unphysiological direction. It is likely that metabolite levels are substantially perturbed in the mutants analyzed, and that the proposed amination network (nitrogen flow to glutamate) requires these metabolic perturbations. Several of these perturbations are likely to be effects on glutamine synthetase (GS). The GltBD mutant should prevent induction of the Ntr response. However, given the unusual conditions assayed, this is not a certainty. Several amino acids inhibit GS activity, including serine, glycine, alanine, histidine, and tryptophan. The levels of metabolites needed to drive reactions toward glutamate synthesis may never occur.

      We agree with the reviewer that our synthetic strain’s metabolome is most like different from the WT. From line 146 we clarify the difference in the glut-aux strain compared to the WT. Additionally, we mention the difference in the discussion from line 462.

      GS activity relies on glutamate availability and hence glutamate needs to be produced via the network first (regardless of which amine donor is provided). Thus, inhibition of GS would cause glutamine auxotrophy and hence a no-growth phenotype. However, since we see immediate growth on histidine and tryptophan, GS must not be inhibited completely. Thus we conclude that GS regulation is not stopping the strains from growing but might have an influence on growth velocity (from line 536).

      Additionally, growth with serine, glycine, or alanine was possible after engineering or adaptation. However, for the strain evolved towards growth with L-alanine, no mutations associable with GS regulation were observed.

      However, now we emphasized regulatory/inhibitory effects in lines 446 and 536 - 546, and give more explanations on amino acid toxicity (lines 447 - 449).

      There is no discussion of the regulation of the enzymes involved. For example, aspA is controlled by several factors (EcoCyc). Is the control of the relevant enzymes consistent with the proposed amino transfer, or does the cell require a novel form of regulation and/or a suppressor mutation? Do the transaminase levels change during these experiments?

      We thank the reviewer for the constructive criticism. Since both AspA and LeuDH were overexpressed from plasmids with synthetic promoters, we don’t expect native transcriptional regulation to be relevant (mentioned in line 539). Since in both cases we obtained immediate growth in selective conditions, we dont expect suppressor mutations to be required in the tested conditions. However, we recognize that a lack of regulatory control mechanisms like present for GDH / GS will result in less flexible metabolic adaptation to changing conditions (lines 541 - 543), thus limiting the utilization space for these ammonium assimilation mechanisms. Although we didn’t measure transaminase levels, in line with the summarizing model suggested by reviewer #3, we now suggest transaminase upregulation as a possible adaptation mechanism of the glutaux in the discussion (lines 486 - 493).

      The authors are imposing strong selective pressure (growth or no growth) and the possibility that suppressor mutations can rapidly accumulate is not discussed or assessed.

      The reviewer is right, mutations can lead to falsified results under selective pressure. That is why we chose our preculture conditions accordingly. This is now clarified at the beginning of the results section from line 118 - 121.

      The results of this paper make the implicit assumption that the transport of any amino acid that is added to the medium will not limit growth. Transport is undoubtedly often limiting. To avoid this problem, di- and tripeptides could have been used. Both are rapidly transported, and the amination network may prove to be larger than the results suggest. The use of dipeptides could increase the amino acids that can transfer amino groups, since their internal concentration would be higher. Experiments are not requested, but the authors should consider whether their proposed network is potentially larger. (It is understood that on one hand the reviewer is questioning whether a reversible network exists, and on the other hand that it may be larger. These are not incompatible since under conditions in which peptides are the amino source, the network may exist.)

      We thank the reviewer for this comment. In the discussion section we now refer to this point from line 445. Please note that we already mention in the in the results section that glutamate uptake might be limiting (from line 195 - 199).

      Furthermore, we discussed the topic of threonine, which can serve as a nitrogen source only if the expression of threonine dehydrogenase is induced by the presence of leucine (from line 449). For valine, we expect valinebased inhibition of acetohydroxy acid synthase needed for isoleucine biosynthesis to be toxic for all strains (line 451 - 452). For beta-alanine, we showed that transaminase availability is limiting growth, and that reengineering growth was independent from transport mechanisms (from line 509).

      The proposed alternate ammonia assimilation pathway has some interesting conceptual issues that should be addressed. Ammonia assimilation is necessarily at the interface of carbon/energy and nitrogen metabolism. The incredibly complex control of ammonia assimilation via glutamine and glutamate has layers upon layers of regulation that ensure that energy is not drained when all nitrogen-containing compounds are present at sufficient levels. Any alternate ammonia assimilation pathway in nature must take this into consideration. It is predicted that the constructed strains in this study will poorly handle many environmental stresses and changing nutrient content. These considerations are largely theoretical but limit the ability of alternate pathways to exist in nature, except perhaps under certain conditions. It is not suggested that these issues should be addressed experimentally but it would be important to acknowledge them.

      We agree with the reviewer that this is a very valuable point to discuss. We added a section to the discussion from line 536.

    1. Author Response

      Reviewer #1 (Public Review):

      The data support the claims, and the manuscript does not have significant weaknesses in its present form. Key strengths of the paper include using a creative HR-based reporter system combining different inducible DSB positions along a chromosome arm and testing plasmid-based and chromosomal donor sequences. Combining that system with the visualization of specific chromosomal sites via microscopy is powerful. Overall, this work will constitute a timely and helpful contribution to the field of DSB/genome mobility in DNA repair, especially in yeast, and may inform similar mechanisms in other organisms. Importantly, this study also reconciles some of the apparent contradictions in the field.

      We thank the reviewer for these positive comments on the quality of the THRIV system, in helping us to understand global mobility and to reconcile the different studies in the field. The possibility that these mobilities also exist in other organisms is attractive because they could be a way to anticipate the position of the damage in the genome and its possible outcome.

      Reviewer #2 (Public Review):

      The authors are clarifying the role of global mobility in homologous recombination (HR). Global mobility is positively correlated with recombinant product formation in some reports. However, some studies argue the contrary and report that global mobility is not essential for HR. To characterize the role of global chromatin mobility during HR, the authors set up a system in haploid yeast cells that allows simultaneously tracking of HR at the single-cell level and allows the analysis of different positions of the DSB induction. By moving the position of the DSB within their system, the authors postulate that the chromosomal conformation surrounding a DNA break affects the global mobility response. Finally, the authors assessed the contributions of H2A(X) phosphorylation, checkpoint progression and Rad51 in the mobility response.

      One of the strengths of the manuscript is the development of "THRIV" as an efficient method for tracking homologous recombination in vivo. The authors take advantage of the power of yeast genetics and use gene deletions and as well as mutations to test the contribution of H2A(X) phosphorylation, checkpoint progression and Rad51 to the mobility response in their THRIV system.

      A major weakness in the manuscript is the lack of a marker to indicate that DSB formation has occurred (or is occurring)? Although at 6 hours there is 80% I-SceI cutting, around 20% of the cells are uncut and cannot be distinguished from the ones that are cut (or have already been repaired). Thus, the MSD analysis is done in the blind with respect to cells actually undergoing DSB repair.

      The authors clearly outlined their aims and have substantial evidence to support their conclusions. They discovered new features of global mobility that may clear up some of the controversies in the field. They overinterpreted some of their observations, but these criticisms can be easily addressed.

      The authors addressed conflicting results concerning the importance of global mobility to HR and their results aid in reconciling some of the controversies in the field. A key strength of this manuscript is the analysis of global mobility in response to breaks at different locations within chromosomes? They identified two types of DSB-induced global chromatin mobility involved in HR and postulate that they differ based on the position of the DSB. For example, DSBs close to the centromere exhibit increased global mobility that is not essential for repair and depends solely on H2A(X) phosphorylation. However, if the DSB is far away from the centromere, then global mobility is essential for HR and is dependent on H2A(X) phosphorylation, checkpoint progression as well as the Rad51 recombinase.

      The Bloom lab had previously identified differences in mobility based on the position of the tracked site. However, in the study reported here, the mobility response is analyzed after inducing DSBs located at different positions along the chromosome.

      They also addressed the question of the importance of the Rad51 protein in increased global mobility in haploid cells. Previous studies used DNA damaging agents that induce DSBs randomly throughout the genome, where it would have been rare to induce DSBs near the centromere. In the studies reported in this manuscript, they find no increase in global mobility in a rad51∆ background for breaks induced near the centromere (proximal), but find that breaks induced near the telomeres (distal), are dependent on both gamma-H2A(X) spreading and the Rad51 recombinase.

      We thank the referee for his constructive comments on the strength of our system to accurately determine the impact of a DSB according to its position in the genome. Concerning the issue of damaged cells that were not detected, it is a very important and exciting issue because it confronts our data with the question of biological heterogeneity. We provide evidence on the consistency of our findings despite the lack of detection of undamaged cells.

      Reviewer #3 (Public Review):

      In this study, Garcia Fernandez et al. employ a variety of genetic constructs to define the mechanism underlying the global chromatin mobility elicited in response to a single DNA double-strand break (DSB). Such local and global chromatin mobility increases have been described a decade ago by the Gasser and Rothstein laboratories, and a number of determinants have been identified: one epistasis group results in H2A-S129 phosphorylation via Rad9 and Mec1 activation. The mechanism is thought to be due to chromatin rigidification (Herbert 2017; Miné-Hattab 2017) or general eviction of histones (Cheblal 2020). More enigmatic, global chromatin mobility increase also depends on Rad51, a central recombination protein downstream of checkpoint activation (Smith & Rothstein 2017), which is also required for local DSB mobility (Dion .. Gasser 2012). The authors set out to address this difficulty in the field.

      A premise of their study is the convergence of two types of observations: First, the H2A phosphorylation ChIP profile matches that of Rad51, with both spreading in trans on other chromosomes at the level of centromeres when a DSB occurs in the vicinity of one of them (Renkawitz 2014). Second, global mobility depends on H2A phosphorylation and on Rad51 (their previous study Herbert 2017). They thus address whether the Rad51-ssDNA filament (and associated proteins) marks the chromatin engaged during the homology search. They found that the extent of the mobility depends on the residency time of the filament in a particular genomic and nuclear region, which can be induced at an initially distant trans site by providing a region of homology. Unfortunately, these findings are not clearly apparent from the title and the abstract, and in fact somewhat misrepresented in the manuscript, which would call for a rewrite (see points below).

      The main goal of our study was to understand the role of global mobility in the repair by homologous recombination, depending on the location of the damage. We found distinct global mobility mechanisms, in particular in the involvement of the Rad51 nucleofilament, depending on whether the DSB was pericentromeric or not. It is thus likely that when the DSB is far from the pericentromere, the residence time of the Rad51 nucleofilament with the donor has an impact on global mobility. Thus, if our experiments were not designed to answer directly the question of the residence time of the nucleofilament, we now discuss in more detail the causes and consequences of the global mobility.

      To this end, they induce the formation of a site-specific DSB in either of two regions: a centromere-proximal region and a telomere-proximal region, and measure the mobility of an undamaged site near the centromere on another chromosome (with a LacO-LacI-GFP system). This system reveals that only the centromere-proximal DSB induces the mobility of the centromere-proximal undamaged site, in a Rad9- and Rad51-independent manner. Providing a homologous donor in the vicinity of the LacO array (albeit in trans) restores its mobility when the DSB is located in a subtelomeric region, in a Rad9- and Rad51-dependent fashion. These genetic requirements are the same as those described for local DSB mobility (Dion & Gasser 2012), drawing a link between the two types of mobility, which to my knowledge was not described. The authors should focus their message (too scattered in the current manuscript), on these key findings and the diffusive "painting" model, in which the canvas is H2A, the moving paintbrush Mec1, and the hand the Rad51-ssDNA filament whose movement depends on Rad9. In the absence of Rad51-Rad9 the hand stays still, only decorating H2A in its immediate environment. The amount of paint deposited depends on the residency time of the Rad51-ssDNA-Mec1 filament in a given nuclear region. This synthesis is in agreement with the data presented and contrasts with their proposal that "two types of global mobility" exist.

      The brush model is very useful in explaining the distal mobility, which indeed is linked to local mobility genetic requirements, but it is also helpful to think of different model than the brush model when pericentromeric damage occurs. To stay in the terms of painting technique, this model would be similar to the pouring technique, when oil paint is deposited on water and spreads in a multidirectional manner. It is likely that Mec1 or Tel1 are the factors responsible for this spreading pattern. We therefore propose to maintain the notion of two distinct types of mobilities. Without going into pictorial techniques in the text, we have attempted to clarify these two models in the manuscript.

      The rest of the manuscript attempts to define a role in DSB repair of this phosphor-H2A-dependent mobility, using a fluorescence recovery assay upon DSB repair. They correlate a defect in the centromere-proximal mobility (in the rad9 or h2a-s129a mutant) when a DSB is distantly induced in the subtelomere with a defect in repairing the DSB. Repair efficiency is not affected by these mutations when the donor is located initially close to the DSB site. This part is less convincing, as repair failure specifically at a distant donor in the rad9 and H2A-S129A mutants may result from other defects relating to chromatin than its mobility (i.e. affecting homology sampling, DNA strand invasion, D-loop extension, D-loop disruption, etc), which could be partially alleviated by repeated DSB-donor encounters when the two are spatially close. In fact, suggesting that undamaged site mobility is required for the early step of the homology search directly contradicts the fact that the centromere-proximal mobility induced by a subtelomeric DSB depends on the presence of a donor near the centromere: mobility is thus a product of homology identification and increased Rad51-ssDNA filament residency in the vicinity of the centromere, and so downstream of homology search. This is a major pitfall in their interpretation and model.

      We thank the referee for helping to clarify the question of the cause and consequence of global mobility. As he pointed out, the fact that a donor is required to observe both H2A phosphorylation and distal mobility implicates the recombination process itself, as well as the residence time of the Rad51 nucleofilament, in the ƴ--‐H2A(X) spreading and indicates that recombination would be the cause of distal mobility. In contrast, the fact that proximal mobility can exist independently of homologous recombination suggests that in this particular configuration, HR would then be a consequence of proximal mobility.

      In conclusion, I think the data presented are of importance, as they identify a link between local and global chromatin mobility. The authors should rewrite their manuscript and reorganize the figures to focus on the painter model that their data support. I propose experiments that will help bolster the manuscript conclusions.

      1) Attempt dual-color tracking of the DSB (i.e. Rad52-mCherry or Ddc1-mCherry) and the donor site, and track MSD as a function of proximity between the DSB and the Lac array (with DSB +/-dCen). The expectation is that only upon contact (or after getting in close range) should the MSD at the centromere-proximal LacO array increase with a DSB at a subtelomere. Furthermore, this approach will help distinguish MSDs in cells bearing a DSB (Rad52 foci) from undamaged ones (no Rad52 foci)(see Mine-Hattab & Rothstein 2012). This would help overcome the inefficient DSB induction of their system (less than 50% at 1 hr post-galactose addition, and reaching 80% at 6 hr). For the reader to have a better appreciation of the data distribution, replace the whisker plots of MSD at 10 seconds with either scatter dot plot or violin plots, whichever conveys most clearly the distribution of the data: indeed, a bimodal distribution is expected in the current data, with undamaged cells having lower, and damaged cells having higher MSDs.

      The reviewer raises two points here.

      The first point concerns the residence time of the Rad51 filament with the donor when a subtelomeric DSB happens. Measuring the DSBs as a function of the distance between donor and Rad52mCherry (or Ddc1--‐mCherry) would allow deciding on the cause or the consequence of the global mobility. Thus, if mobility is the consequence of (stochastic) contact, leading to a better efficiency of homologous recombination, we would see an increase in MSDs only when the distance between donor and filament would be small. Conversely, if global mobility is the cause of contact, the increase in mobility would be visible even when the distance between donor and filament is large. It would be necessary to have a labelling system with 3 different fluorophores — the one for the global mobility, the one for the donor and the one allowing following the filament. This triple labelling is still to be developed.

      The second point concerns the important question of the heterogeneity of a population, a central challenge in biology. Here we wish to distinguish between undamaged and damaged cells. Even if a selection of the damaged cells had been made, this would not solve entirely the inherent cell to cell variation: at a given time, it is possible that a cell, although damaged, moves little and conversely that a cell moves more, even if not damaged. The question of heterogeneity is therefore important and the subject of intense research that goes beyond the framework of our work (Altschuler and Wu, 2010). However, in order to start to clarify if a bias could exist when considering a mixed population (20% undamaged and 80% damaged), we analyzed MSDs, using a scatter plot. We considered two population of cells where the damage is the best controlled, i.e. i) the red population which we know has been repaired and, importantly, has lost the cut site and will be not cut again (undamaged--‐only population) and ii) the white population, blocked in G2/M, because it is damaged and not repaired (damaged--‐only population). These two populations show very significant differences in their median MSDs. We artificially mixed the MSDs values obtained from these two populations at a rate of 20% of undamaged--‐only cells and 80% of damaged--‐only cells. We observed that the mean MSDs of the damaged--‐only and undamaged--‐only cells were significantly different. Yet, the mean MSD of damaged--‐only cells was not statistically different from the mean MSD from the 20%--‐80% mixed cell population. Thus, the conclusions based on the average MSDs of all cells remain consistent.

      Scatter plot showing the MSD at 10 seconds of the damaged-­‐only population (in white), the repaired-­‐only population (in red), or the 20%-­‐80% mixed population

      2) Perform the phospho-H2A ChIP-qPCR in the C and S strains in the absence of Rad51 and Rad9, to strengthen the painter model.

      ChIP experiments in mutant backgrounds as well as phosphorylation/dephosphorylation kinetics would corroborate the mobility data described here, but are beyond the scope of this manuscript. Yet, a phospho--‐ H2A ChIP experiment was performed in a Δrad51 mutant in Renkawitz et al. 2013. In that case, γH2A propagation was restricted only to the region around the DSB, corroborating both the requirement for Rad51 in distal mobility and the lack of requirement for Rad51 in proximal mobility.

      3) Their data at least partly run against previously published results, or fail to account for them. For instance, it is hard to see how their model (or the painter model), could explain the constitutively activated global mobility increase observed by Smith .. Rothstein 2018 in a rad51 rad52 mutant. Furthermore, the gasser lab linked the increased chromatin mobility to a general loss of histones genome-wide, which would be inconsistent with the more localized mechanism proposed here. Do they represent an independent mechanism? These conflicting observations need to be discussed in detail.

      Apart from the fact that the mechanisms in place in a haploid or a diploid cell are not necessarily comparable, it is not clear to us that our data are inconsistent with that of Smith et al. (Smith et al., 2018). Indeed, it is not known by which mechanisms the increase in global mobility is constitutively activated in a Δrad51 Δrad52 mutant. But according to their hypothesis the induction of a checkpoint is likely and so is the phosphorylation of H2A. It would be interesting to verify γH2A in such a context. This question is now mentioned in the main text.

      Concerning histone loss, it appears to be different depending on the number of DSBs. Upon multiple DNA damage following genotoxic treatment with Zeocin, Susan Gasser's group has clearly established that nucleosome loss occurs (Cheblal et al., 2020; Hauer et al., 2017). Nucleosome loss, like H2A phosphorylation as we have shown (Garcia Fernandez et al., 2021; Herbert et al., 2017), leads to increased global mobility. The state of chromatin following these histone losses or modifications is not yet fully understood, but could coexist. In the case of a single DSB by HO, it is the local mobility of the MAT locus that is examined (Fig3B in (Cheblal et al., 2020). In this case, the increase in mobility is indeed dependent on Arp8 which controls histone degradation and correlates with a polymer pattern consistent with normal chromatin. It is likely that histone degradation occurs locally when a single DSB occurs. Concerning histone loss genome wide, the question remains open. If histone eviction nevertheless occurred globally upon a single DSB, both types of modifications could be possible. This aspect is now mentioned in the discussion.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors develop a computational framework for the in silico evolution of "digital organisms" -- in short, programs capable of executing instructions (reading inputs, performing operations with them and producing outputs) and replicating, potentially generating variation ("mutations") in the set of instructions of their offspring. They use this framework to compare the success of various selection algorithms in producing populations of digital organisms capable of carrying out a set of functions (Boolean logic and basic math operations). They study whether different treatments yield different results, focusing on whether selection algorithms from evolutionary computing could outperform strategies typically applied in artificial selection experiments in the laboratory.

      The authors' idea is original and intriguing. Their framework of "digital organisms directed evolution" could represent a powerful tool to further explore the potential transfer of strategies from the field of evolutionary computing to the field of microbial evolution. The inclusion of a "no selection" and a "random selection" control is very valuable (and not common in other studies on artificial selection at the population level). The sharp differences they find between selection schemes commonly used in the laboratory (elite, top-10% selection) and algorithms from evolutionary computing (lexicase, non-dominated elite, tournament selection) are interesting and could support the claim that the latter might be well suited for application to microbial evolution. However, I think there are some confounding factors that could be biasing these results, and these should be addressed so that the specific claims of the paper can be fully supported by the data.

      My main concern has to do with the observation that some selection protocols (elite, top-10% and tournament) are unable to maintain diversity in the task profiles. I am left wondering whether this is truly a limitation of those protocols, or if it is a (perhaps a bit trivial) consequence of the more general experimental design. Specifically, when selecting the populations to propagate into the next "meta-generation", a sample of organisms is taken. This sample is of only 10 individuals (1% of the maximum population size of 1000). In my mind, this could mean that populations where all (or most) organisms can perform multiple functions (say, populations of "generalists") are favored against populations of "specialists" where, even if all (or most) functions were covered at the population level, this coverage relied on the coexistence of multiple "strains" that performed only a few functions each with little overlap across strains. In other words, the experimental design could be introducing a (perhaps unacknowledged) selective pressure favoring populations of generalists. In fact, the observation that lexicon and non-dominated elite selection schemes seem to be able to overcome this potential bias and maintain a high diversity and spread of task profiles is interesting. However, I am not sure whether the relatively modest performance of the elite, top-10% and tournament protocols could be improved by lifting the selective pressure introduced at sampling.

      As a more minor comment, I think the paper could be made more easily accessible to readers outside of the field of evolutionary computing. I think a clearer analogy should be established early on between the behavior of the "digital organisms" in this work and that of real microbes. Although some aspects are straightforward (organisms are born, "execute a genetic program" and divide more or less efficiently depending on the instructions within that program), some details were difficult for me to understand. There are two problems with this: first, it is hard to create an intuition regarding what it means to "perform a function" or "mutate" in the context of a digital organism evolutionary process. It was also unclear to me whether the choice of giving functions a benefit at the population vs. at the individual level was arbitrary, or if it was somehow related with the intrinsic dynamics of the system. The meaning of "the environment" is also somewhat obscure: what exactly are "inputs"? Are the same inputs provided to every organism in every population and in every generation/meta-generation? How can a same program perform multiple functions? These questions were not obvious to me, and I had to carefully go through sections of the Supplementary Material to gain a sense of how these digital organisms behaved in practice. I think providing a more general intuition in this regard, even if at the expense of some details and technicalities, would help make the text more accessible to a broad audience. The second problem with this is that it makes it difficult to extrapolate the conclusions to a microbial evolution context. The authors themselves acknowledge multiple limitations, particularly the lack of ecological interactions and the simplicity of the environment. While these are reasonable minimal assumptions, they most likely affect the results. In microbial populations, interactions are common even in the simplest environments. The environment itself is modified by the organisms, leading to the creation of new niches into which additional species can be selected or evolve. These processes are critical for the diversity and function of microbial populations -- and in fact, it could be argued that many collective functions emerge from individuals' interspecific interactions and are not necessarily present at any single organism level. I understand that including these more complex mechanisms falls out of the scope of this work, and I believe that the simpler model presented here is a valuable starting point. However, I do think that specific claims in the text such as "our experiments suggest that steering evolution at the population-level is more challenging than steering at the individual-level" should be avoided, since one could easily imagine that this is a result of the assumptions of this specific model. And, again, I think establishing a more clear analogy between digital organisms and microbes would make it easier for a broader audience to understand these limitations.

      Thank you for your detailed summary and kind remarks. We very much appreciate all of your constructive feedback. In particular, thank you for identifying areas of our manuscript that could be made more accessible for a broader audience.

      In addition to the changes we made to address your specific recommendations (below), we made edits throughout the manuscript to address the general feedback/concerns from your summary:

      ● I think a clearer analogy should be established early on between the behavior of the 'digital organisms' in this work and that of real microbes.

      We made a number of edits to the description of digital organisms to help clarify this connection (see below). We also made edits throughout the manuscript in an effort to make it more easily accessible to audiences outside of evolutionary computing.

      ● …what it means to "perform a function" or "mutate" in the context of a digital organism evolutionary process.

      We edited our description of how digital organisms perform functions, adding an example to improve clarity ("Digital Organisms" subsection of the Digital Directed Evolution section). We expanded our description of how digital organisms mutate. We also included a reference to [Wilke and Adami, 2002], which nicely overviews the "biology" of the type of digital organisms (self-replicating computer programs) used in our model.

      ● It was also unclear to me whether the choice of giving functions a benefit at the population vs. at the individual level was arbitrary, or if it was somehow related with the intrinsic dynamics of the system.

      We agree that this was unclear. Roughly, the individual-level functions are simpler (i.e., require fewer instructions to encode) than the population-level functions. We clarified this in the caption for Table 1.

      ● What exactly are "inputs"? Are the same inputs provided to every organism in every population and in every generation/meta-generation? How can a same program perform multiple functions?

      When a digital organism is "born", we randomly generate a set of numeric values that the organism can access by executing an 'input' instruction, which will load the input value into one of the digital organism's memory registers. The same inputs are not provided to every organism in every generation/population. Programs perform multiple functions by performing the requisite computations (Table 1) on values it received via executing 'input' instructions and then by executing an 'output' instruction. Each time an organism produces an output (by executing the 'output' instruction), we check to see if that output is the correct result to one of the 22 designated functions (Table 1) given the set of inputs available to the organism. We further clarified how inputs work and how programs can perform multiple functions in the "Digital Organisms" subsection of the Digital Directed Evolution section.

      ● The second problem with this is that it makes it difficult to extrapolate the conclusions to a microbial evolution context. The authors themselves acknowledge multiple limitations, particularly the lack of ecological interactions and the simplicity of the environment. While these are reasonable minimal assumptions, they most likely affect the results.

      We absolutely agree that our simplifications influence our results and that adding the capacity for more interactions is a critical next step for this work. We would not be surprised if more sophisticated artificial selection protocols were even more useful in the context of more complex ecological interactions than in the simple environments we evaluated. For example, if we had a measure for community stability, we could directly select on stability as an independent objective while simultaneously selecting on community functions.

      ● I do think that specific claims in the text such as "our experiments suggest that steering evolution at the population-level is more challenging than steering at the individual-level" should be avoided.

      This is fair. We intended to argue that steering evolution at the population-level (as is often done in directed microbial evolution) is more challenging overall than steering evolution in conventional evolutionary computing systems where each individual in a population can be independently evaluated and the selection protocol has access to high resolution about the individual's phenotype/genome. We narrowed the scope of this statement to the following: "While results across these two contexts are not directly comparable, we found steering evolution at the population-level to be more challenging than steering at the individual-level (as in conventional evolutionary computing)."

    1. Author Response

      Reviewer #1 (Public Review):

      The study by Jimenez et al. investigates the molecular mechanism by which dosage compensating (DC) condensins spread along the X chromosomes of C. Elegans worms. It has been previously known that DC condensins are loaded onto X chromosomes at specific sites called rex, that are distributed along the whole length of the chromosome. Here, Jimenez et al showed that an insertion of one or multiple rex sites into an autosome is sufficient for DC condensin recruitment and spreading. Using ChIP-seq, they show that DC condensins spread for hundreds of kilobases on the both sides of the rex site, with occasional sites of accumulation. The authors used Hi-C to study the effect of rex insertion on the chromosome conformation. They found that individual rex sites form boundaries that insulate spatial contacts regardless of their orientation, while two adjacent insertion sites can form loop-anchored contact domains. These findings support the model, in which DC condensins spread along the chromosome via the process of loop extrusion. In addition, the authors fused the X chromosome with the chromosome V and demonstrated that condensins can spread for multiple megabases across the fusion site and induce local compaction of the affected region. Finally, the targeted dCas9-Suntag complex to multiple adjacent copies of a repeat on chrX to demonstrate that condensins can accumulate at "bulky" obstacles.

      Overall, I find the experiments in this study are sufficient to support the key statements. My only comment is minor. In the discussion, the authors seem to imply that their data supports bi-directional loop extrusion by DC condesins (p.11 line 16). Yet, their data is consistent with a model, where condensins are loaded in a random orientation, but then extrude loops only into one fixed direction. Along these lines, ref. [20] (Terekawa et al) is mentioned as supporting bi-directional extrusion, while this paper in fact demonstrated that, once loaded onto DNA, condensins keep moving into a single direction with barely any observed inversions.

      Data is consistent with a model, where condensins are loaded in a random orientation, but then extrude loops only into one fixed direction:

      We agree with this interpretation and is explicitly stated and incorporated into the model (see section on A model to explain X-specific recruitment of condensin DC and formation of loop-anchored TADs by rex sites)

      Reviewer #2 (Public Review):

      SMC complexes play critical roles in chromosome organization from bacteria to humans. Recently in vitro studies found that SMC complexes function by extrude DNA loops. In vivo evidence for the loop extrusion model is less direct. The study by Jimenez et al investigated the mechanism of a specialized SMC complex called Condensin DC that mediates dosage compensation in C. elegans. This is an excellent experimental system to study SMC action in vivo because the specific loading sequence (rex) for Condensin DC was identified. The authors inserted the sites ectopically into autosomes and found that Condensin DC was recruited to ectopic sites and spreads to long distances. In a strain with a fusion chromosome (X;V), the complex spread beyond ChX to ChV. Finally, the authors generated a dCas9 mediated protein roadblock to test whether a large protein barrier prevents Condensin DC from spreading.

      Strengths:

      The authors have an elegant experimental system to investigate SMC action in vivo. They have a comprehensive set of tools including ectopic loading sites, fusion chromosomes, dCas9 block, Hi-C, ChIP-seq and RNA-seq.

      Weaknesses:

      While the experimental system has great potential, some specific choices of insertion sites did not yield clear results and caused confusions. If they modify the location of rex site or the dCas9 binding sites, they might be able to bring more insights. I detail them below.

      1) The authors inserted rex sites to autosomes and observed recruitment of Condensin DC to the ectopic sites. The engineering of rex sites to ectopic locations was done before, so was the observation that these ectopic sites recruit Condensin DC and generate TAD border (Albritton 2018; Anderson 2019). The current study has 3 rex sites on ChII and has the potential to bring new insights on how multiple rex sites act cooperatively and how they create TAD borders. However, the results presented were not clear because the author used rex sites with different strengths. The middle site did not form TAD loops with the other two sites. It is unclear whether the strength of the rex site matter or whether the distance between the sites matter. If they used only the two strong sites, or used all 3 sites of the same strength, the authors could have clarified this point.

      Reviewer comment on the middle rex in the three rex insertion did not form TAD loops:

      Upon repeating Hi-C experiments in L3 (where Hi-C features are more clear compared to mixed developmental stage embryos) and additional analysis/visualization of the data (log2ratio to the wildtype condition), we show that the middle rex (rex-1) also forms TAD loops with both flanking stronger rex-8. The loop involving rex-1 (weaker rex) is clearly weaker than the TAD loop between the two flanking strong rex-8.

      Reviewer question on “does the strength of rex sites matter”:

      It is likely that the strength of the rex contributes to the strength of TAD loops based on the observation that flanking rex-8 inserts form a TAD loop stronger than one between rex-1 and rex-8. We agree with the reviewer that to address the contribution of the strength of rex sites, we would need to insert the same rex with increasing rex strength in pairwise fashion of equal distance. However, insertion of single rex-1 did recruit condensin DC (Supplemental Figure 5A). Therefore, double rex-1 insertion is unlikely to work and necessitates the presence of a strong rex like rex-8 nearby. The use of a strong “super rex” clarifies that the rex sites need to both recruit and act as a boundary and both their function correlate with their strength (see section under Condensin DC is loaded at rex sites and spreads in either direction)

      Reviewer question on “does the distance of rex sites matter”:

      In Anderson 2009 (Figure S4) the authors inserted three strong rex sites greater than 1MB apart from each other on chromosome-I, and observed no changes in Hi-C matrix. On the other hand, our three rex sites are inserted within a 100kb region, and showed loop-anchored TAD. This suggests that distance matters, and distribution of rex sites contribute to cooperativity (see Discussion on The cooperativity of rex sites contributes to the X-specific recruitment and spreading of condensin DC)

      2) The authors used the dCas9 system to test the loop extrusion model. They found that DPY-27 is enriched at the dCas9 array. They concluded that the dCas9 array blocked Condensin DC spreading and this result supported the loop extrusion model. However, this interpretation is not supported by the DPY-27 enrichment profile or the HiC profile. If the authors were correct that Condensin DC, loaded on rex sites on either side of the array, extruded DNA loops and got blocked by the dCas9 array, we would expect DPY-27 enrichment to build up highest at the periphery of the array and lowest at the center of the array; we would expect a domain border to form at the array because of the lack of interactions between regions outside of the array. Yet, the DPY-27 ChIP profile is flat and there is no change in HiC profile. The near-identical shape of the dCas9 and DPY-27 ChIP-seq peaks is reminiscent of a technical bias of ChIP-seq, that is open chromatin is more "ChIP-able" (Teytelman PNAS 2013). It is possible that dCas9+sgRNA unwinding the DNA caused artifact in ChIP-seq. It is possible that a freely diffusing nuclear-localized protein will show the same ChIP profile at the dCas9 site with no biological relevance. Since this result is a major conclusion of the paper, it is necessary for the authors to perform a ChIP-seq control using a freely diffusing nuclear protein.

      We thank the reviewer for their recognition of an potentially artifactual pattern and urge to perform additional controls. Please see our detailed response to Essential Revision point (3) and new section under (A dCas9-based block failed to recapitulate rex-like boundary on the X-chromosome).

      3) If the authors targeted dCas9 to a different site, they might be able to clearly show whether Condensin DC spreading is blocked by such road block. For instance, if they use the X-V fusion, and target dCas9 to a region on ChV but close to the junction, they could test their hypothesis by DPY-27 ChIP-seq.

      This is an excellent idea and one we had hoped to initially do years ago. However, this turned out to be a difficult experiment as there is no unique repetitive sequence near the fusion site on ChrX;V and dCas9 resulting in ChIP artifacts in our existing system as demonstrated in Supplemental Figure 4-1.

      4) The model (Fig 6) is confusing. The authors are trying to support the loop extrusion model in the text but their drawing is not loop extrusion (Banigan and Mirny 2020). The author should clarify what they mean. For instance, after recruitment at rex site (red bar, with two arrows pointing left and right), Condensin DC was drawn to encircle a single piece of DNA as it moves to the left. It is not clear how the blue ring on the right can capture another piece of DNA and extrude a DNA loop and then later reversed to encircling a single piece of DNA before approaching the green protein block.

      We updated the model figure (Figure 7) with clearly stated properties of the model in the legend and with figures more in line with the depictions of loop extrusion in previous work.

      Reviewer #3 (Public Review):

      1) The insertion locations of new rex sites is clear in the top panels of Figures 1A and S1A, but not in the bottom panels of these two figures. My interpretation of these figures is that the lines with pink and grey boxes are shown to help the reader understand how many rex sites are inserted in each line but the location of these boxes does not coincide with the actual location of the insertion sites. In the top panel of Figure 1A, it appears that the two "pink" sites correspond to two very large peaks of Dp727, whereas in the bottom they appear to be present in a region devoid of Dpy27. Authors should fix this because it is very confusing, since the bottom panel suggest that condensin is recruited to rex sites and then spreads to other sites in the genome without any condensin remaining at the rex sites.

      We have updated the figure with proper legend. The insertion sites are computationally annotated with more intuitive cartoons shown on top (Figure 5,6).

      2) The idea that rex sites recruit condensin would require the existence of a sequence-specific DNA binding protein that binds to rex and then interacts with condensin. This protein would then release condensin, which would extrude away and stop at TSSs. Has this been actually shown previously or is this an interpretation of observations such as those shown in Figure 1A? If not, the results shown in Figure 1 would equally agree with a model suggesting that condensin loads randomly in both autosomes and the X chromosome, and extrusion is stopped by large protein complexes bound to rex sites, which explains the accumulation at these sites. TSSs contain large transcription complexes that are not sufficient to stop condensin on their own but are able to if the second anchor contains 1-2 rex sites. This would make more sense in the context of what is known about cohesin in mammals. If someone has unequivocally shown that this is not the case, authors should discuss this in the Introduction because most non-worm readers will be thinking in these terms.

      It has not been unequivocally shown that rex sites are the loading sites. However, all prior information better supports this conclusion. We did add previous work demonstrating that rex sites autonomously recruit the DCC on extrachromosomal arrays to help readers outside the field. We directly address the “loading at all chromosomes” model and more extensive comparison to the cohesin system in the result section (Condensin DC is loaded at rex sites and spreads in either direction) and the discussion (Previous models of condensin DC binding on the X chromosomes). We conclude that rex sites functioning as mere barrier elements are insufficient to explain the system and likely also function as loading sites for condensin DC.

      3) Figure 1A. Authors should not ignore the large Dpy27 peak in the worms with one rex insertion. What is at this site, where one can also observe a Dpy27 peak in wt worms? Are there similar sites in other regions of the autosomes of wt worms? If so, if these sites do not contain rex motifs, they may indicate alternative regions of the genome that can either recruit or stop extrusion of condensin.

      Binding of the the DCC to the X and autosomes have been extensively analyzed in previous work using immunofluorescence (Csankovzski et al 2003 Science and other Meyer lab papers, FRAP and DPY-27 Halo localization in Breimann-Morao et al JCS 2022) ChIP-chip (Ercan et al Nature Genetics 2007, Jans et al Genes and Development 2009) and ChIP-seq (Albritton et al Elife 2017). Autosomal localization is not noticeable in IF data. In ChIP-chip and ChIP-seq, DCC is largely specific to the X-chromosome but there is some background signal on autosomes with a slight enrichment at active promoters, which are also shown to be more easily ChIPped in many studies. There is also some binding to repetitive regions e.g. histone genes, but no specific features or strong autosomal binding sites emerged in previous analyses by our lab and others. In summary, like any ChIP-seq data, there is a non-zero number of peaks (based on MACS peak caller) on autosomes, which may be true off-target binding events or technical artifacts.

      The peak that was present in the original figure is not prominent upon new read mapping and normalization of the data and use of equal y-scales between the insertion region and a comparable X-chromosomal region (See same site appearing as a blimp just past 9 Mb in Supplemental Figure 5-1). We notice such mapping and technical variability occur at some regions that were not originally designated as “blacklisted”by modENCODE, but give spurious peaks.

      In summary, autosomal binding is far less than that of X (we provide ectopic versus X chromosome DCC binding in Figure 5 and cite prior literature using IF, FRAP, ChIP-chip, ChIP-seq data). In addition, we are aware of the shortcomings of ChIP-seq technique, and only make relative conclusions. We show that when rex sites are inserted, binding near rex sites is higher than regions that are farther away (‘spreading’) (Figure 5A, Supplemental Figure 5-1). We also show that relative to other autosomes, the mean signal on chromosome-II increases when rex sites are inserted (Figure 5B). This relative comparison allows us to say that insertion of rex sites increases the binding frequency on chromosome-II without knowing the true off-target/autosomal binding frequency prior to rex insertion.

      4) Please include a supplementary table describing all the Hi-C data used in the manuscript, including numbers of replicates, total number of sequenced read pairs, mapped reads, inter-and intra-chromosomal contacts, and number of contacts >20 kb and <20 kb.

      This is provided in the corresponding tabs of the Supplemental File 1.

      5) Page 8, lines 24-38. Based on this discussion, it is difficult to visualize what is happening. First, the authors suggest that condensin is recruited to the ectopic rex sites and "spreads" bidirectionally away from these sites to stop at various sites in the genome. Now, in this discussion, the authors suggest that rex sites containing condensin make loops. Does this happen without extrusion, just by the rex sites coming together in the 3D space? Are the loops formed through interactions between two condensin rings? When the authors say that condensin "spreads", does this take place by extrusion or a different mechanism? As mentioned in #2 above, everything would make better sense if the accumulation of condensin at rex sites is not a consequence of initial recruitment but rather a consequence of random loading followed by extrusion and retention at rex sites.

      Please see our new discussion sections (Previous models of condensin DC binding on the X-chromosomes and A model to explain X-specific recruitment of condensin DC and formation of loop-anchored TADs by rex sites) and Figure 7 for better explanation of why “recruitment everywhere” is insufficient to explain all aspects of the system and the data, as well as a more clear explanation of our model.

      6) Figure 2C. Were the interactions highlighted in this figure determined to be the only statistically significantly different between control and rex insertions or were they defined visually? The interaction between the center rex bait and the right rex pink site appears to be the same as in control. However, there seem to be some significantly visually different interactions between the center and right baits and other regions in the genome. Authors should test whether these interactions are statistically significant and, if so, what is located at these non-rex sites.

      We report the differences between the wild type and insertion strains in the revised manuscript by using log2ratio of the Hi-C matrices (Figure 6). Here, the stripes and the TAD insulation effects are more clear and indicative of barrier function of rex sites (also shown as insulation score between insertion/wild type shown below the matrices). The DPY-27 binding sites near the inserted rex sites tend to be actively transcribed genes. This is consistent with previously observed positive correlation between DPY-27 and Pol II ChIP-seq data at non-rex DCC sites on X-chromosomes and at the autosomal spreading region in the X;V fusion chromosome (Ercan 2009 Current Biology, and Street et al 2019 Genetics).

      7) Figure 3A. The fact that rex sites can contain more than one motif, presumably a binding site for an unknown protein, complicates data interpretation. It would be helpful if the authors indicate at the top of Figure 3A the number of motifs and their orientation for each rex site currently shown. In the bottom panels of this figure, it appears that not all rex sites indicated at the top are able to "recruit" condensin. Authors should comment on this, and if there are differences in the number of motifs at these sites or the sequence of the motifs. Also, the newly inserted sites appear to "recruit" less condensin than some of the existing ones. Do the sites with the taller Dpy27 peaks have more motifs?

      To make the interpretation simple in the revised Figure 3, we provide the Hi-C matrix comparison between the two insertion strains where rex-8 was inserted in two opposite directions. Hi-C interactions are similar in two strains, thus rex-8 direction does not matter.

      Reviewer comment on “Rex sites containing more than one motif complicates data interpretation”:

      It is important to highlight that while rex-8 contains multiple motifs, as is the case for many strong rex sites (Albritton 2017, Figure 5A), they are all oriented in the same direction for rex-8. This is the main rationale for using rex-8 as opposed to other rex sites. The number and the orientation of motifs are indicated in the cartoon in the revised figure.

      Reviewer comment on “It appears that not all rex sites “recruit” condensin. Do sites with taller DPY27 peaks have more motifs?”

      This question has been addressed in our prior work, Albritton et al. 2017. Like shown for the binding motifs corresponding to transcription factors, not all 12-bp rex motifs are bound by the DCC, and the strength and the number of motifs correlate with binding strength but not absolutely (Albritton 2017, Figure 5A), and the nucleotide perturbation of the motif abolishes binding (McDonel 2006, Figure 2). Similarly, the strength of binding also correlates with insulation strength (Anderson 2019, Figure S3B).

      8) It is unclear from the experiments described in Figure 1 how the formation of new loops would affect transcription. In Figure 3A, it appears that some of the Hi-C heatmaps show signal that could correspond to compartmental interactions. I wonder if the authors have tested whether the formation of new loops disrupts these interactions, which may contribute to the stabilization of promoter contacts and affect transcription. It may be informative to look at subtraction heatmaps between the new insertion data and control, although the Hi-C data in the center panel appears to have lower quality.

      Linking 3D interaction and transcription, while an ongoing project in our lab, is difficult. C. elegans has high gene density (~5kb/gene) with small gene length (average ~2kb). This makes analyzing E-P or P-P interactions as a result of rex insertion difficult. However, we agree that compartment analysis is a good starting point. We thus provide compartment analysis for the endogenous X (Figure 1) and discuss implications in repression using the X;V fusion experiment (Figure 2, see section Spreading of condensin DC entails loop extrusion but cannot sufficiently form TADs without rex sites). Future work involving higher resolution techniques such Micro-C could better address the relationship between condensin DC mediated loops and promoter contacts related to transcription.

      9) Figure 4 and page 9 lines 16-36. It is not completely clear from the discussion of Figure 4 whether the Hi-C data from wildtype was obtained with fixed embryos whereas the data from X;V was obtained with unfixed embryos. If this is the case, it may not be appropriate to directly compare the two samples. When the authors say "the autosomal spreading region showed an increase in DNA contacts measured by Hi-C", is this within the region or between the region and other sites in the genome? Since the two datasets have been normalized to the same number of contacts, an increase in interactions within the chromosome V region adjacent to the X chromosome in the X;V sample could be explained if this region interacts less with the adjacent X chromosome. Authors should discuss in more detail how this analysis was performed and perhaps use subtraction heatmaps to illustrate the point.

      We repeated experiments in L3 with controls all performed under the same crosslinking conditions. We also adopted the use of log-derivatives to infer loop size, which has been shown to better capture the chromatin state than the P(s) (Polovnikov 2022). Schematics for sub-regions of the XV chromosomes, for which the log-derivative of P(s) is computed, are also provided for comparison, thus the readers can compare the region of condensin DC spreading (proximal V) to chromosome V that is unbound by condensin DC (distal V). We reason that the log-derivative of P(s), which uses the relative change in P(s), is better equipped to deal with the reviewer’s concerns regarding the relative nature of the normalized contact frequency matrix.

      10) Figure S4A. If there is an increase in condensin (Dpy27) in chromosome V and an increase in interactions in this region, would this imply that the "spreading" of condensin takes place by loop extrusion? Otherwise, the "spreading" of condensin as suggested in the model of Figure 6 would not create new interactions.

      Our data showing increased interactions specifically in the proximal V along with decreased compartmentalization at this region (Figure 2) indicate that at least some spreading involves loop extrusion. This is reflected in our discussion of the model (Figure 7).

      11) Figure 5 and page 10, lines 20-21. It is clear from Figure 5B that the presence of the block leads to an accumulation of condensin, although the bottom panels of Figure 5C suggest that this accumulation is lower than at the flanking rex-33 and rex-14 sites. However, contrary to the author's conclusion that this in vivo evidence for loop extrusion, the result may suggest the opposite. If condensin was extruding loops and stopped at the dCas9 site it should have formed a loop. Were the same cells used for the ChIP-seq and Hi-C experiments? If not, one trivial explanation is that dCas9 failed to work in the cells used for Hi-C. Authors should comment on the fact that the rex-23 and rex-34 sites do not seem to be located at TAD boundaries, whereas TAD boundaries in the left region of the figure seem to lack rex sites.

      We agree with the author’s prediction that stopping at the dCas9 site should have formed a loop. See our response to Essential Revision 2.

    1. Author Response

      Reviewer #1 (Public Review):

      Dosil et al. have extensively analyzed NK cell-derived extracellular vesicles containing miRNAs. They analyzed the miRNAs in NK cell-derived EVs and found that specific types of miRNAs are contained in NK cell-derived EVs. Furthermore, they found that NK cell-derived EVs have immunomodulatory functions for T-cell response as well as for monocytes and moDCs. This paper is well designed and provides important information on NK cell-derived EVs. However, it is unclear whether NK cell-derived EVs are different from EVs derived from other immune cells such as T cells and B cells.

      We thank the reviewer for his/her comments and for pointing out this key point.

      1) The authors analyzed human NK cell-derived EVs. The repertoire of miRNAs in NK-EVs may differ among individuals. It would be better to show the degree of individual differences.

      We thank the reviewer for highlighting this point and agree that miRNA content in NK-EVs differs among individuals. We have now included a separate table where we show the relative abundance of EV-miRNAs in secreting activated NK cells and their secreted EVs from small RNA sequencing data, and the corresponding plots, including statistics (new Figure 1-figure supplement 2B,C). However, it is important to highlight that the enrichment of these miRNAs in NK-EVs compared to their parental cells is consistent within individuals, as shown in Figure 1- figure supplement 2 and Supplementary Table S1 where all individual data are shown.

      Furthermore, to address the reviewer concern of whether NK-EV content differs from that of other EVs from different cell types we have further analyzed the average ratio of EVs vs secreting cells from a recent article (11) and found that the enrichment of specific miRNAs in NK-EVs is rather cell specific and differs from other unrelated cells such as white fat and hepatic cells, as shown in Figure Review 1 below.

      Figure Review 1. Parental cell and EV expression of NK-EV enriched miRNAs

      2) The authors analyzed the effect of NK-EVs on T cell response in Fig. 4. However, it is possible that EVs affect T cell responses in a nonspecific manner. It may be necessary to include control EVs.

      To address this key point raised by the reviewer, several new experiments were performed.

      First, small EVs from two distinct human cell lines (namely the HEK-293, human epithelial kidney cells and the Raji B lymphoblast cells) were isolated, following the differential ultracentrifugation protocol, as described in the methods section. Their effects in primary T cells isolated from human healthy donors showed no impact, neither in IFN-γ secretion (new Figure 3-figure supplement 3), nor in activation, measured by CD25 expression (Figure 4-figure supplement 2E,F), that even decreased upon Raji B cell EV-treatment under Th1-polarizing conditions.

      Also, three microRNAs that are preferentially excluded from the NK-EV fraction were selected, namely hsa-miR-124, hsa-miR-3667 and hsa-miR-4158 and loaded onto gold-nanoparticles (new Figure 6-figure supplement 2), and their effects were evaluated in immunocompetent C57/BL6 mice after footpad injection. These experiments showed no effects of these nanoparticles, as observed for NK-EV enriched microRNAs, neither in activation, nor in IFN-γ secretion (new Figure 6H).

    1. Author Response

      Reviewer 3 (Public Review:

      1) The overall research question and goal of this manuscript are unclear.

      The manuscript has been edited to improve clarity and emphasize the research goals.

      2) Many of the key experiments are executed in vitro, complicating results and making it hard to discern how some of the experiments translate to in vivo differentiation.

      As requested, we provide several new experiments showing CTL responses analyzed in vivo. Endogenous CTLs were analyzed using MHCI tetramers (Fig 2C, 2D and 5C), and transfers were used to compare mutant and wildtype CTLs in the same animals (Fig 1C) and for imaging flow cytometry (4B, 4C, 4E and 4F). Our in vivo and in vitro studies support similar conclusions.

      3) There is not a consistent approach of carefully examining Trm cells.

      We provide new figures showing IV staining (Fig 1C, 1D, 5C, 5D and supp Fig 1D). The data show that SMAD4 supports formation of KLRG1+ CTLs that localize in the vasculature. Many studies have shown that TGFb is involved in formation of TRM cells. To avoid duplication, we did not emphasize this subset.

      4) Many of the core findings in this manuscript have previously been reported either in the prior work from this group (J Immunol, 2015) or more recently by Wu et al. (Cell Moll Immunol 2020).

      This statement is not accurate. To our knowledge, only four papers have examined the regulatory functions of SMAD4 in peripheral CD8 T cells, using different promoters for gene-ablation. These studies had different objectives.

      i. Hu et al (2015) - distal Lck promoter. This paper did not examine gene expression, or utilize mice with multiple mutations. Our current paper is an extension of this work, with no duplication.

      ii. Cao et al. (2015) – proximal Lck promoter. This study did not examine gene expression, or utilize mice with multiple mutations. The cytokine response may have been altered by gene ablation during thymic development.

      iii. Wu et al (2021) – CD4-Cre was used to study regulation of CD103 by Ski/SMAD4. The role of SMAD4 during regulation of EOMES, CD62L, and KLRG1 was not analyzed. IV staining was not used.

      iv. Igalouzene et al (2022) - CD4-Cre was used to study autoimmune disease in mice that lack TGFbRII, with emphasis on cells the GI tract.

      Wu et al (2021) analyzed CD103 expression using S4TR2-DKO and TR2KO cells, while similar comparisons with S4KO and control cells were not shown. The authors concluded that “Smad4 is required to limit CD103 expression in CD8+ T cells through a mechanism that is downstream of TGFβR”. This statement is not supported by our work. While their study shows that Ski is negatively-regulated by TGFb, this is not the only mechanism that controls CD103 expression. We show that SMAD4 down-regulates the Itgae gene (CD103) and induces EOMES expression independently of TGFb. This novel observation has not been reported previously. Although this manuscript is an extension of our prior work, the data were generated using additional strains of geneticallymodified mice and experimental approaches that support novel conclusions.

      5) There is not a careful or consistent assessment of memory T cell populations in lymphoid or non-lymphoid compartments.

      To address this concern, new figures showing IV staining (Figs 1C, 1D, 5C and 5D, Supplemental Fig 1D).

      6) SMAD4 was required to maintain EOMES expression in activated CTLs. This data is fairly robust; however, could this be due to differences in cell states rather than a direct role for SMAD4 in sustaining EOMES expression?

      For this study, we analyzed EOMES expression in vitro and at multiple timepoints after infection (in vivo). The data show EOMES was consistently down-regulated in SMAD4-deficient CTLs at all time points regardless of phenotype (Figs 4A, 4C and 4E). Our prior work shows that cell-state impacts susceptibility to TGFb, since TCM cells did not upregulated CD103 during stimulation with TGFb (Suarez et al 2019) and KLRG1+ CTLs maintained EOMES expression during stimulation with TGFb (Fig 4F). This point is mentioned in the text. Our revised manuscript includes chip seq data showing that SMAD4 binds to the EOMES promoter (Fig 4G), indicating a direct role for SMAD4 during gene regulation.

      7) SMAD4 has multiple roles in regulating expression of CD103, including complimentary or independent roles of Ski (last two sentences of paragraph describing Fig 3 in the results section). There was no assessment of Ski in the results of this study. Additionally, despite many conclusions about the roles of SMAD proteins in controlling gene expression, there are no experiments to assess binding of these factors (e.g. ChIP-qPCR).<br /> The objective of our study was to examine the role SMAD4 during formation of TEFF and TCM cells. We did not study Ski expression, as interactions between SMAD4 and this molecule have been reported previously (Wu et al. 2020). We provide additional Chip-seq data showing that SMAD4 binds to the EOMES promoter (Fig 4G).

      We thank the reviewers for many suggestions that have greatly improved the quality of our manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      Huisman et al. report a method for surveying tens of thousands of peptides for MHC II binding using a yeast display-based approach. The method is shown to cover the SARS-CoV-2 and dengue proteomes, providing a wide-ranging picture of peptides that may be recognized by T cells in infection and that may be used to develop T cell-directed vaccines. Three MHC II alleles are tested, serving as a proof-of-concept for wider application to additional alleles for broadened coverage of human MHC diversity. In addition, the method is directly compared to a computational MHC ligand predictor.

      The study has several strengths. Rigor is strong as the authors survey every 15-mer sequence in an antigen for binding MHC II using overlapping peptide libraries and consider various aspects of the peptide:MHC interaction in their yeast display-based system in defining what is a positive binder. In addition, there are important findings that emerge from the high-throughput MHC II binding approach, such as allele-specific binding preferences at defined positions in the MHC II binding groove, differences in binding motifs between randomized and defined peptide libraries that have implications for training prediction algorithms, and differences between experimental and computational methods for MHC II ligand discovery.

      We thank the reviewer for their comments about our manuscript.

      A discussion about the significance of the observation that the yeast display-based approach identifies MHC II ligands that are not found by NetMHCIIpan4.0 would enhance the paper. This is an important finding, on the one hand, because the method may provide new training data that will improve computational prediction accuracy. On the other hand, many of these sequences are low-affinity binders and may not be immunoreactive as peptide affinity drives T cell response (e.g., PMID: 16039577, PMID: 31253788). How this fits in the context of the oft-heard criticism that computational approaches overpredict would benefit the discussion, as well.

      We have expanded our Discussion to highlight these important discussion points. Specifically, we highlight caveats around low affinity MHC-binding peptides, including effects on immunodominance, as well as examples where low affinity peptides have proven relevant. We also add to our discussion of the utility of these data, emphasizing the potential use of yeast display datasets for augmenting current training data and importance of identifying algorithmic false positives.

      Related to this observation, the authors imply in parts (Abstract, Introduction) that the yeast display method is superior to computational predictions because it identifies MHC II ligands not discovered by computational algorithms, however, the current study is limited to three MHC II alleles, examines only one predictor, and does not provide evidence of T cell validation nor even discussion of the SARS-CoV-2 and dengue datasets in the context of published predictions, MHC II binding data, and immunological studies. The balanced approach taken in the Discussion where experimental and computational approaches are said to complement each other is constructive as it recognizes that both methods have advantages and disadvantages and is a good model for portraying their relationship in earlier parts of the paper.

      We thank the reviewers for this important feedback, and have tempered our language in the Abstract and Introduction to be more balanced, emphasizing how experimental and computational approaches complement one another.

    1. Author Response

      Reviewer 1 (Public Review):

      The paper by Chen et al studies inter-individual differences in the left-right asymmetry of the shape of the cerebral cortex. The authors introduce a novel shape asymmetry measure based on a spectral analysis of cortical geometry, reporting that relatively coarse scales of shape asymmetry are highly specific to individual study participants. Shape asymmetry (SAS) is shown to have associations with cognition and biological sex, but not handedness. Result suggest that shape asymmetry is not highly heritable, and that it is driven primarily by environmental rather than genetic influences.

      The paper has many strengths. The problem of investigating directional versus fluctuating asymmetry is clearly stated and biologically important. SAS is based on a sophisticated methodological approach and rigorously applied. The use of three datasets increases the generalizability of the results, and the comparison to fMRI measures provides important context. Weaknesses include the interpretability of the measure and some specific methodological issues that could be further addressed as discussed below.

      We appreciate the positive feedback. As in our responses below, we present new figures and analyses to increase the interpretability of the SAS and address the specific methodological issues.

      1) The lack of higher identifiability of fine-grained SAS is hard to understand. Given that secondary and tertiary sulci are not likely to change between time point 1 and time point 2, and that it is known that secondary and tertiary sulci vary more than primary sulci between people, this suggests that higher measurement error at finer scales may limit the comparisons between fine and coarse made in the paper.

      We appreciate the point. The spatial scale for optimal identifiability in our analysis included secondary sulci but captured limited information about tertiary sulci. While the general location of secondary and tertiary sulci may not change much within an individual over time, subtle changes in regional grey matter volume may alter the shape of surrounding sulci and gyri in such a way that variations at fine spatial scales carry less identifying information. Our shape measures are sensitive to both sulcal and gyral anatomy and other changes in cortical shape.

      Higher measurement noise at fine scales may indeed play a role. In our revised manuscript, we now include a demonstration of this effect in Figure 2—figure supplement 2 and have amended Lines 265-269 of the revised manuscript accordingly:

      "The reconstruction captures shape variations at a coarse scale, representing major primary and secondary sulci, but with minimal additional details. If we include additional eigenfunctions to capture more fine-scale anatomical variations, inter-session image differences increase, suggesting that finer spatial scales may be capturing dynamic aspects of brain structure that are more susceptible to increased measurement noise (Figure 2—figure supplement 2)."

      We have also amended Lines 435 to 439:

      "It is perhaps surprising that individual differences in cortical shape are most strongly expressed at coarse scales, given the known variability of fine-grained anatomical features such as the presence and trajectories of tertiary sulci. It is possible that local subtle changes in grey matter volume affect fine-scale geometry in such a way that it carries less identifying information, or that such fine scales carry too much measurement noise to be used for the purpose of identification."

      2) From a neuroanatomical perspective, it is not clear what individuals with different asymmetries of shape at different scales actually look like, which limits the interpretability of the measure.

      To improve the interpretability of the SAS, we have added one supplementary figure, which we refer to in Lines 171 to 173 of the revised manuscript:

      "In general, a brain with a higher degree of shape asymmetry has SAS values that more strongly depart from zero (Figure 1—figure supplement 1)."2

      3) The possibility that image quality could affect measures of shape asymmetry is not addressed.

      Thank you for raising this important issue. The images of each dataset used in this study −OASIS-3, ADNI, HCP− all underwent correction of FreeSurfer segmentations and passed quality control procedures for each dataset. Indeed, the HCP dataset is widely accepted to include some of the best quality images among all open-source data. Therefore, our data are not uncharacteristically noisy. As indicated in our response to Comment 1, increased noise at fine-grained resolutions may affect identifiability at these scales. To further address this issue, we have added further details in Lines 674 to 681 of the revised manuscript about the correlation between the Euler number from FreeSurfer (1) and the SAS:

      "To further check the possible influence of image quality on the SAS, we first took the mean of the Euler number of the left and right hemispheres using FreeSurfer, which is widely used as an index of image quality (1-3), and then calculated the Pearson’s correlation between the mean Euler number and the SAS across the first 200 eigenvalues. For the HCP s1200 dataset, the correlations were all below 0.07 (PFDR > 0.05). For the OASIS-3, the correlations were all below 0.18 (PFDR > 0.05) at either time 1 or time 2 MRI session. These results indicate that image quality does not strongly influence the SAS, which is in line with past findings that the eigenvalues and eigenfunctions of the Laplace-Beltrami Operator are robust to image noise (4)."

      4) The paper does not address that different way of measuring of handedness could theoretically have different associations with asymmetry measures.

      Thank you for this comment. In our original analysis, we used the handedness measured by the Edinburgh Handedness Inventory (EHI) as a continuous variable from -100 to 100, with values closer to 100 representing stronger right-handedness. There are different cut-off scores to categorize handedness, but these thresholds are still arbitrary, and thus applying the EHI score as a continuous variable is a widely used approach (5, 6). Here, we tested two thresholds to categorize the handedness. First, right-handed (EHI: 71-100), left-handed (EHI: -100 to-71), and ambidextrous (EHI: -70 to 70) (7-9); second, right-handed (EHI: 50 to 100), left-handed (EHI: -100 to-50), and ambidextrous (EHI: -49 to 49) (10, 11). The categorized handedness variable, regardless of the threshold, was still unrelated to the SAS (2 to 144 eigenvalues). We have amended Lines 818 to 828 of the revised manuscript to better clarify how handedness was measured:

      "The HCP dataset provides the handedness preference measured by the Edinburgh Handedness Inventory (EHI) (12). EHI is the most widely used handedness inventory (10, 13), with resulting scores range from -100 (complete left-handedness) to 100 (complete right-handedness) (12). Handedness preference is not a bimodal phenomenon (8), and cut-off scores to categorize the handedness are still arbitrary. We therefore used the EHI score as a continuous variable in our main analysis, which is a widely used approach (5, 6). To further confirm the robustness of the relationship between handedness and the SAS, we tested two thresholds to categorize handedness. First, right-handed (EHI: 71-100), left-handed (EHI: -100 to-71), and ambidextrous (EHI: -70 to 70) (7-9); second, right-handed (EHI: 50 to 100), left-handed (EHI: -100 to-50), and ambidextrous (EHI: -49 to 49) (10, 11). Regardless of the threshold, the categorized handedness variable was still unrelated to the SAS (2 to 144 eigenvalues)."

      Reviewer #2 (Public Review):

      Being a paleoanthropologist, I am not a real specialist of the neuroscientific field. For this reason, my understanding of the methods, and particularly of the mathematics behind, may be partial. However, I am used to studies of bilateral variation of the brain. For these reasons, my comments mostly concern the theorical framework of the study, the way the data are analysed and exploited and the interpretations. The authors propose with this paper a new approach to characterize the main asymmetries of the whole cortical shape. This new tool is interesting and provides an original perspective on a longstanding question. Thanks to this approach, the authors identify interesting individual characteristics as individual's shape asymmetry appear to be a good parameter to identify each individual. I have more concerns about the application of this new tool in the context of earlier studies of human brain asymmetries, particularly when the authors contextualise their own researcher and results within the existing knowledge on the topic. From a methodological point of view, I would be interest in having more information about the identified bilateral variation for individuals and samples and a clearer characterization of different parameters for bilateral variation.

      We appreciate the feedback. As per our response to Comment 2 of Reviewer 1’s Public Review, our revised manuscript now includes a new Figure 2—figure supplement 2, which provides examples of how cortical shape asymmetries appear at different spatial scales for different values of the SAS.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper examines EEG responses time-locked to (or "entrained" by) musical features and how these depend on tempo and feature identity. Results revealed stronger entrainment to "spectral flux" than to other, more commonly tested features such as amplitude envelope. Entrainment was also strongest for lowest rates tested (1-2 Hz).

      The paper is well written, its structure is easy to follow and the research topic is explained in a way that makes it accessible to readers outside of the field. Results will advance the scientific field and give us further insights into neural processes underlying auditory and music perception. Nevertheless, there are a few points that I believe need to be clarified or discussed to rule out alternative explanations or to better understand the acquired data.

      We thank the Reviewer for taking the time to evaluate our manuscript and for the positive response. We have now conducted further analyses to strengthen our conclusion that neural synchronization was strongest at slower musical tempi and to rule out an alternative explanation that neural synchronization was strongest for music presented near its own original or “natural” tempo. We also added some points to the Discussion in response to your comments; revised text is reproduced as part of our point-by-point responses below for your convenience. The page and line numbers correspond to the manuscript file without track changes.

      1) Results reveal spectral flux as the musical feature producing strongest entrainment. However, entrainment can only be compared across features in an unbiased way if these features are all equally present in the stimulus. I wonder whether entrainment to spectral flux is only most pronounced because the latter is the most prominent feature in music. Can the authors rule out such an explanation?

      Respectfully, it is not fully clear to us based on the literature that entrainment can only be compared across features fairly when those features are equally presented in the stimulus. Previous work in the speech domain has compared entrainment to amplitude envelope vs. spectrogram, vs. a symbolic representation of the time of occurrence of different phonemes (Di Liberto et al., 2015). Work in the music domain has compared entrainment to amplitude envelope (and its derivative) vs. features quantifying melodic expectation (surprise and entropy, quantified using a hidden Markov-model trained on a corpus of Western music; Di Liberto et al., 2020). In these papers, there was no quantification of the degree to which each feature was present in the stimulus material, and when comparing such qualitatively different features, it is not clear to us how one would do so. Nonetheless, these studies used the resulting TRF-based dependent measures to evaluate which feature best predicted the neural response. Here, although we do not know what acoustic feature might be most present / strongest in music, we believe that we can investigate the degree to which each feature predicts the neural response. In fact, we might argue the sort of reverse of the logic in your comment – that the TRF results actually tell us which feature is perceptually or psychologically the most important in terms of driving brain responses, which may not be fully predictable from the acoustics of those features.

      From a data analysis perspective, we have independently normalized (z-scored) each feature as well as the neural data, as prescribed in Crosse et al., 2021, to try to level the playing field for the musical features we are comparing. Moreover, we made changes in the discussion to acknowledge your concern. The text is reproduced here for your convenience.

      p. 26, l. 489-497: “One hurdle to performing any analysis of the coupling between neural activity and a stimulus time course is knowing ahead of time the feature or set of features that will well characterize the stimulus on a particular time scale given the nature of the research question. Indeed, there is no necessity that the feature that best drives neural synchronization will be the most obvious or prominent stimulus feature. Here, we treated feature comparison as an empirical question (Di Liberto et al., 2015), and found that spectral flux is a better predictor of neural activity than the amplitude envelope of music. Beyond this comparison though, the issue of feature selection also has important implications for comparisons of neural synchronization across, for example, different modalities.”

      2) Spectral analyses of neural data often yield the strongest power at lowest frequencies. Measures of entrainment can be biased by the amount of power present, where entrainment increases with power. Can the authors rule out that the advantage for lower frequencies is a reflection of such an effect?

      Thank you for this insightful comment. In response to your comment and the comments of Reviewer 3, we normalized the TRF correlations, stimulus–response correlations, and stimulus–response coherences by surrogate distributions that were calculated separately for each musical feature and – importantly – for every tempo condition. Following Zuk et al., 2021, we formed surrogate distributions by shifting the relevant neural data time course relative to the stimulus-feature time courses by a random amount. We did this 50 times, and for each shift re-calculated all dependent measures. We then normalized our dependent measures calculated from the intact time series relative to these surrogate distributions by subtracting the mean and dividing by the standard deviation of the surrogate distribution (“z-scoring”). Since the approach of shifting the neural data leaves the neural time series intact, the power spectrum of the data is preserved, but only its relationship to the stimulus is destroyed. After normalization, the plots obviously look a little different, but the main results – a higher level of neural synchronization to slower stimulation tempi and in response to the spectral flux – remain.

      The changes can be found throughout the manuscript, but especially on p. 11, l. 210-218, Figures 2-3 and a more detailed explanation in the Methods section.

      p. 39, l. 821-829: “In order to control for any frequency-specific differences in the overall power of the neural data that could have led to artificially inflated observed neural synchronization at lower frequencies, the SRCorr and SRCoh values were z-scored based on a surrogate distribution (Zuk et al., 2021). Each surrogate distribution was generated by shifting the neural time course by a random amount relative to the musical feature time courses, keeping the time courses of the neural data and musical features intact. For each of 50 iterations, a surrogate distribution was created for each stimulation subgroup and tempo condition. The z-scoring was calculated by subtracting the mean and dividing by the standard deviation of the surrogate distribution.”

      A related point, what was the dominant rate of spectral flux in the original set of stimuli, before tempo was manipulated? Could it be that the slow tempo was preferred because in this case participants listened to a most "natural" stimulus?

      This is a good point, thank you. We did two things to attempt to address this (see also comment Reviewer 3). First, the original tempo for each song can be found in Supplementary Table 1. To make the table more readable and more comparable with the main manuscript, we have updated the table and now state the original tempi in BPM and Hz. Second, we added histograms of the original tempi across all songs as well as the maximum amount by which all songs were tempo-shifted (i.e., the maximum tempo difference between the slowest (or fastest) version of each song segment compared to the original tempo). These histograms have been added to Figure 1 – figure supplement 2, and are paraphrased here for your convenience (p. 13 l. 265-273): The original tempo of the set of musical stimuli ranges between 1-2.75 Hz. This indeed overlaps with the tempo range that revealed strongest neural synchronization. When songs were tempo-shifted to be played at a slower tempo than the original, they were shifted by ~0.25-1.25 Hz. In contrast, shifting a song to have a faster tempo typically involved a larger shift of ~1-2.25 Hz. Thus, it is definitely possible that tempo, degree of tempo shift, and proximity to “natural” tempo were not completely independent values.

      For that reason, to investigate the effects of the amount of tempo manipulation on neural synchronization, we conducted an additional analysis. We compared TRF correlations for a) songs that were shifted very little relative to their original tempi to b) songs that were shifted a lot relative to their original tempi. We did not have enough song stimuli to do this for every stimulation tempo, but we were able to do the TRF correlation comparison for two illustrative stimulation tempo conditions (at 2.25 Hz and 1.5 Hz). In those tempo conditions, we took the TRF correlations for up to three trials per participant when the original tempo was around the manipulated tempo (1.25-1.6 Hz for 1.5 Hz or 2.01-2.35 Hz for 2.25 Hz) and compared it to those trials where the original tempo was around 0.75¬–1 Hz faster or slower than the manipulated tempo at which the participants heard the songs (Figure 3 – figure supplement 2). This analysis revealed that there was no significant effect of the original music tempi on the neural response (please see Material and Methods, p. 40, l. 855-861 and Results p. 13, l. 265-273). In response to your and Reviewer’s 3 comments, we also added this additional point to the discussion.

      p. 23-24 l. 427-436: “The tempo range within which we observed strongest synchronization partially coincides with the original tempo range of the music stimuli (Figure 1 – figure supplement 2). A control analysis revealed that the amount of tempo manipulation (difference between original music tempo and tempo the music segment was presented to the participant) did not affect TRF correlations. Thus, we interpret our data as reflecting a neural preference for specific musical tempi rather than an effect of naturalness or the amount that we had to tempo shift the stimuli. However, since our experiment was not designed to answer this question, we were only able to conduct this analysis for two tempi, 2.25 Hz and 1.5 Hz (Figure 3 – figure supplement 3), and thus are not able to rule out the influence of the magnitude of tempo manipulation on other tempo conditions.”

      3) The authors have a clear hypothesis about the frequency of the entrained EEG response: The one that corresponds to the musical tempo (or harmonics). It seemed to me that analyses do not sufficiently take that hypothesis into account and often include all possible frequencies. Restricting the analysis pipeline to frequencies that are expected to be involved might reduce the number of comparisons needed and therefore increase statistical power.

      Although we manipulated tempo, and so had an a priori hypothesis about the frequency at which the beat would be felt, natural music is a complex stimulus composed of different instruments playing different lines at different time scales, many or most of which are nonisochronous. Thus, we analyzed the data in two different ways – 1) based on TRFs and 2) based on stimulus–response correlation and coherence. Stimulus–response coherence is a frequency-domain measure, and so it was possible to do exactly as you suggest here and consider coherence only at the stimulation tempo and first harmonic, which we did (Figure 2E-J). However, for the TRF analyses, we followed previous literature (e.g., Ding et al., 2014; Di Liberto et al., 2020; Teng et al., 2021), and considered broader-band EEG activity (bandpass filtered at 0.5-30 Hz). Previous work has shown that the beat in music evokes a neural response at harmonics up to at least 4 times the beat rate (Kaneshiro et al., 2020), so we wanted to leave a broad frequency range intact in the neural data. Despite being based on differently filtered data, we found that the dependent measures from the two analysis approaches were correlated, which suggests to us that neural tracking at the stimulation tempo itself was probably the largest contributor to the results we observed here.

      Related to your comment, we added two points to our discussion, which we reproduce here for your convenience.

      p. 24-25, l. 453-461: “Regardless of the reason, since frequency-domain analyses separate the neural response into individual frequency-specific peaks, it is easy to interpret neural synchronization (SRCoh) or stimulus spectral amplitude at the beat rate and the note rate – or at the beat rate and its harmonics – as independent (Keitel et al., 2021). However, music is characterized by a nested, hierarchical rhythmic structure, and it is unlikely that neural synchronization at different metrical levels goes on independently and in parallel. One potential advantage of TRF-based analyses is that they operate on relatively wide-band data compared to Fourier-based approaches, and as such are more likely to preserve nested neural activity and perhaps less likely to lead to over- or misinterpretation of frequency-specific effects.”

      p. 29 l. 564-577: “Despite their differences, we found strong correspondence between the dependent variables from the two types of analyses. Specifically, TRF correlations were strongly correlated with stimulation-tempo SRCoh, and this correlation was higher than for SRCoh at the first harmonic of the stimulation tempo for the amplitude envelope, derivative and beat onsets (Figure 4 - figure supplement 1). Thus, despite being computed on a relatively broad range of frequencies, the TRF seems to be correlated with frequency-specific measures at the stimulation tempo. The strong correspondence between the two analysis approaches has implications for how users interpret their results. Although certainly not universally true, we have noticed a tendency for TRF users to interpret their results in terms of a convolution of an impulse response with a stimulus, whereas users of stimulus–response correlation or coherence tend to speak of entrainment of ongoing neural oscillations. The current results demonstrate that the two approaches produce similar results, even though the logic behind the techniques differs. Thus, whatever the underlying neural mechanism, using one or the other does not necessarily allow us privileged access to a specific mechanism.”

      Reviewer #2 (Public Review):

      Kristin Weineck and coauthors investigated the neural entertainment to different features of music, specifically the amplitude envelope, its derivative, the beats and the spectral flux (which describes how fast are spectral changes) and its dependence on the tempo of the music and self-reports of enjoyment, familiarity and ease of beat perception.

      They use and compare analysis approaches typically used when working with naturalistic stimuli: temporal response functions (TRFs) or reliable components analysis (RCA) to correlate the stimulus with its neural response (in this case, the EEG). The spectral flux seems the best music descriptor among the tested ones with both analyses. They find a stronger neural response to stimuli with slower beat rates and predictable stimuli, namely familiar music with an easy-to-perceive beat. Interestingly, the analysis does not show a statistically significant difference between musicians and non-musicians.

      The authors provide an extensive analysis of the data, but some aspects need to be clarified and extended.

      We thank the Reviewer for taking the time to evaluate and summarize our manuscript and for the great comments. We addressed the concerns and made changes throughout the manuscript, but especially in the introduction and discussion sections about the terminology (neural entrainment and neural measures), musical features of the stimuli, and musical experience of the participants. Below you can find the alterations described in more detail. The page and line numbers correspond to the manuscript file without track changes.

      1) It would be helpful to clarify better the concepts of neural entertainment, synchronization and neural tracking and their meaning in this specific context. Those terms are often used interchangeably, and it can be hard for the reader to follow the rest of the paper if they are not explicitly defined and related to each other in the introduction. Note that this is fundamental to understanding the primary goal of the paper. The authors clarify this point only at the end of the discussion (lines 570-576). I suggest moving this part in the introduction. Still, it is unclear why the authors use the TRF model and then say they want to be agnostic about the physiological mechanisms underlying entertainment. The choice of the TRF (as well as the stimulus representation) automatically implies a hypothesis about a physiological mechanism, i.e., the EEG reflects convolution of the stimulus properties with an impulse response. Please could you clarify this point? I might have missed it.

      Thank you for this valuable comment. We agree that it is fundamental to define and uniformly use terminology, and have made changes throughout the manuscript along these lines. First of all, we have changed all instances of “neural entrainment” or “neural tracking” to “neural synchronization”, as we think this term avoids evoking a specific theoretical background or strong mechanistic assumptions. Second, we have moved the Discussion paragraph you mention to the Introduction and expanded it. Specifically, we take the opportunity to address the association between specific analysis approaches (TRFs vs. stimulus–response correlation or coherence) and specific mechanistic assumptions (convolution of stimulus properties with an impulse response vs. entrainment of an ongoing oscillation, respectively). This allowed us to clarify what we mean when we say we prefer to stay agnostic to specific mechanistic interpretations. We are happy to have had the chance to strengthen this discussion, and think it benefits the manuscript a lot.

      We reproduce the new Introduction paragraph here for your convenience.

      p. 5-6, l. 101-123: “The current study investigated neural synchronization to natural music by using two different analysis approaches: Reliable Components Analysis (RCA) (Kaneshiro et al., 2020) and temporal response functions (TRFs) (Di Liberto et al., 2020). A theoretically important distinction here is whether neural synchronization observed using these techniques reflects phase-locked, unidirectional coupling between a stimulus rhythm and activity generated by a neural oscillator (Lakatos et al., 2019) versus the convolution of a stimulus with the neural activity evoked by that stimulus (Zuk et al., 2021). TRF analyses involve modeling neural activity as a linear convolution between a stimulus and relatively broad-band neural activity (e.g., 1–15 Hz or 1–30 Hz; (Crosse et al., 2016, Crosse et al., 2021); as such, there is a natural tendency for papers applying TRFs to interpret neural synchronization through the lens of convolution (though there are plenty of exceptions to this e.g., (Crosse et al., 2015, Di Liberto et al., 2015)). RCA-based analyses usually calculate correlation or coherence between a stimulus and relatively narrow-band activity, and in turn interpret neural synchronization as reflecting entrainment of a narrow-band neural oscillation to a stimulus rhythm (Doelling and Poeppel, 2015, Assaneo et al., 2019). Ultimately, understanding under what circumstances and using what techniques the neural synchronization we observe arises from either of these physiological mechanisms is an important scientific question (Doelling et al., 2019, Doelling and Assaneo, 2021, van Bree et al., 2022). However, doing so is not within the scope of the present study, and we prefer to remain agnostic to the potential generator of synchronized neural activity. Here, we refer to and discuss “entrainment in the broad sense” (Obleser and Kayser, 2019) without making assumptions about how neural synchronization arises, and we will moreover show that these two classes of analyses techniques strongly agree with each other.”

      2) Interestingly, the neural response to music seems stronger for familiar music. Can the authors clarify how this is not in contrast with previous works that show that violated expectations evoke stronger neural responses ([Di Liberto et al., 2020] using TRFs and [Kaneshiro et al., 2020] using RCA])? [Di Liberto et al., 2020] showed that the neural response of musicians is stronger than non-musicians as they have a stronger expectation (see point 2). However, in the present manuscript, the analysis does not show a statistically significant difference between musicians and non-musicians. The authors state that they had different degrees of musical training in their dataset, and therefore it is hard to see a clear difference. Still, in the "Materials and Methods" section, they divided the participants into these two groups, confusing the reader.

      Our findings are consistent with previous studies showing stronger inter-subject correlation in response music in a familiar style vs. music in an unfamiliar style (Madsen et al., 2019) and stronger phase coherence in response to familiar relative to unfamiliar sung utterances (Vanden Bosch der Nederlanden et al., 2022). We actually don’t think our results (stronger neural synchronization for familiar music) or these previous results are incompatible with work showing that violations of expectations evoke stronger neural responses. This work either manipulated music so it violated expectations (Kaneshiro et al., 2020) or explicitly modeled “surprisal” as a feature (Di Liberto et al., 2020). Thus, we could think of those stronger neural responses to expectancy violations as reflecting something like “prediction error”. Our music stimuli did not contain any violations, and we were unable to model responses to surprisal given the nature of our music stimuli, as we better explain below (p. 27 l. 514-529). Thus, neural synchronization was stronger to familiar music, and we would argue that listeners were able to form stronger expectations about music they already knew. We would predict that expectancy violations in familiar music would evoke stronger neural responses to those in unfamiliar music, though we did not test that here. We now include a paragraph in the Discussion reconciling our findings with the papers you have cited.

      p. 27 l. 514-529: “We found that the strength of neural synchronization depended on the familiarity of music and the ease with which a beat could be perceived (Figure 5). This is in line with previous studies showing stronger neural synchronization to familiar music (Madsen et al., 2019) and familiar sung utterances (Vanden Bosch der Nederlanden et al., 2022). Moreover, stronger synchronization for musicians than for nonmusicians has been interpreted as reflecting musicians’ stronger expectations about musical structure. On the surface, these findings might appear to contradict work showing stronger responses to music that violated expectations in some way (Kaneshiro et al., 2020, Di Liberto et al., 2020). However, we believe these findings are compatible: familiar music would give rise to stronger expectations and stronger neural synchronization, and stronger expectations would give rise to stronger “prediction error” when violated. In the current study, the musical stimuli never contained violations of any expectations, and so we observed stronger neural synchronization to familiar compared to unfamiliar music. There was also higher neural synchronization to music with subjectively “easy-to-tap-to” beats. Overall, we interpret our results as indicating that stronger neural synchronization is evoked in response to music that is more predictable: familiar music and with easy-to-track beat structure.”

      Your other question was why we did not see effects of musical training / sophistication on neural synchronization to music, when other studies have. There are a few possible reasons for this. One is that previous studies aiming to explicitly test the effects of musical training recruited either professional musicians or individuals with a high degree of musical training for their “musician” sample. In contrast, we did not target individuals with any degree of musical training, but attempted this analysis in a post-hoc way. For this reason, our musicians and nonmusicians were not as different from each other in terms of musical training as in previous work. Given this, we have opted to remove the artificial split into musician and nonmusician groups, and now only include a correlation with musical sophistication (as you suggest in your next comment), which was also nonsignificant (Figure 5 – figure supplement 2).

      3) Musical expertise was also assessed using the Goldsmith Music Sophistication Index, which could be an alternative to the two-group comparison between musicians and non-musicians. Does this mean that in Figure 5, we should see a regression line (the higher the Gold-MSI, the higher should be the TRF correlation)? Since we do not see any significant effect, might this be due to the choice of the audio descriptor? The spectral flux is not a high-level descriptor; maybe it is worth testing some high-level descriptors such as entropy and surprise. The choice of the stimulus features defines linear models such as the TRF as they determine the hierarchical level of auditory processing, and for testing the musical expertise, we might need more than acoustic features. The authors should elaborate more on this point.

      It is true that the Goldsmith Music Sophistication Index serves as an alternative way of investigating the effects of musical expertise on neural synchronization to natural music, and we now include this approach exclusively instead of dividing our sample (see response to the previous comment). Indeed, if musical sophistication would have an effect on the TRF correlations in this study, we would see a regression line in Figure 5 – figure supplement 2. Based on our experiment it is difficult to assess whether the lack of a correlation between neural measures and musical expertise is based on our choice of stimulus features. That is because our experiment was designed to investigate the effects of fundamental acoustic features of music, and it was not possible to calculate high-level descriptors, such as the entropy or surprisal, for the music stimuli we chose to work with – the stimuli were polyphonic, and moreover were purchased in a .wav format, so we do not have access to the individual MIDI versions or sheet music of each song that would have been necessary to apply, for example, the IDyOM (Information Dynamics of Music) model. As we cannot rule out that the (lack of) effects of varying levels of musical expertise on TRF correlations is due to our choice of stimulus features, we added this to the discussion.

      p. 28 l. 541-546: “Another potential reason for the lack of difference between musicians and non-musicians in the current study could originate from the choice of utilizing pure acoustic audio-descriptors as opposed to “higher order” musical features. However, “higher order” features such as surprise or entropy that have been shown to be influenced by musical expertise (Di Liberto et al., 2020), are difficult to compute for natural, polyphonic music.”

      4) Regarding the stimulus representation, I have a few points. The authors say that the amplitude envelope is a too limited representation for music stimuli. However, before testing the spectral flux, why not test the spectrogram as in previous studies? Moreover, the authors tested the TRF on combining all features, but it was not clear how they combined the features.

      One of the main reasons that we did not use the spectrogram as a feature was that it wouldn’t be possible to use a two-dimensional representation for the RCA-based measures, SRCorr and SRCoh, so we would not have been able to compare across analysis approaches. However, spectral flux is calculated directly from the spectrogram, and so is a useful one-dimensional measure that captures the spectro-temporal fluctuations present in the spectrogram (https://musicinformationretrieval.com/novelty_functions.html). Thank you for making this important point, we added this explanation to the Materials and Methods section (p. 35 l. 726-727).

      Sorry for not explaining the multivariate TRF approach better. Instead of using only one stimulus feature, e. g. the amplitude envelope, several stimulus features can be concatenated into a matrix (with the dimensions: time T x 4 musical features M at different time lags), which is then used as an input for the mTRFcrossval, mTRFtrain and mTRFpredict of the mTRF Matlab Toolbox (Crosse et al., 2016) – actually this is exactly how using a 2D feature like the spectrogram would work. The multivariate TRF is calculated by extending the stimulus lag matrix (time course of one musical feature at different time lags, T × τwindow) by an additional dimension (time course of several musical features at different time lags, T × M x τwindow). We added an explanation to the Methods section of the manuscript and hope that it is this way better understandable:

      p. 39 l. 840-842: “For the multivariate TRF approach, the stimulus features were combined by replacing the single time-lag vector by several time-lag vectors for every musical feature (Time x 4 musical features at different time lags).”

      Reviewer #3 (Public Review):

      Subjects listened to various excerpts from music recordings that were designed to cover musical tempi ranging from 1-4 Hz, and EEG was recorded as subjects listened to these excerpts. The main and novel findings of the study were: 1) spectral flux, measuring sudden changes in frequency, were tracked better in the EEG than other measures of fluctuations in amplitude, 2) neural tracking seemed to be best for the slowest tempi, 3) measures of neural tracking were higher when subject's rated an excerpt as high for ease-of-tapping and familiarity, and 4) their measure of the mapping between stimulus feature and response could predict whether a subject tapped at the expected tempo or at 2x the expected tempo after listening to the musical excerpt.

      One of the key strengths of this study is the use of novel methodologies. The authors in this study used natural and digitally manipulated music covering a wide range of tempi, which is unique to studies of musical beat tracking. They also included both measures of stimulus-response correlation and phase coherence along with a method of linear modeling (the temporal response function, or TRF) in order to quantify the strength of tracking, showing that they produce correlated results. Lastly, and perhaps most importantly, they also had subjects tap along with the music after listening to the full excerpt. While having a measure of tapping rate itself is not new, combined with their other measures they were able to demonstrate that neural data predicted the hierarchical level of tapping rate, opening up opportunities to study the relationship between neural tracking, musical features, and a subject's inferred metrical level of the musical beat.

      Additionally, the finding that spectral flux produced the best correlations with the EEG data is an important one. Many studies have focused primarily on the envelope (amplitude fluctuations) when quantifying neural tracking of continuous sounds, but this study shows that, for music at least, spectral flux may add information that is tracked by the EEG. However, given that it is also highly correlated with the envelope, what additional features spectral flux contributes to measuring EEG tracking is not clear from the current results and worth further study.

      All four of their main findings are important for research into the neural coding of musical rhythm. I have some concerns, however, that two of these findings could be a consequence of the methods used, and one could be explained by related correlations to acoustic features:

      We thank the Reviewer for the very helpful review, the summary, and the great suggestions. We addressed the comments and performed additional analysis. We made changes throughout the manuscript, but especially 1) concerning the potential advantage of the neural response to slower music, 2) the effects of the amount of tempo manipulation on neural synchronization, 3) the SVM-related analysis and 4) the relation between stimulus features and behavioral ratings. The implemented modifications can be found below in more detail. The page and line numbers correspond to the manuscript file without track changes.

      The authors found that their measures of neural tracking were highest for the lowest musical tempos. This is interesting, but it is also possible that this is a consequence of lower frequencies producing a large spread of correlations. Imagine two signals that are fluctuating in time with a similar pattern of fluctuation. When they are correctly-aligned they are correlated with each other, but if you shift one of the signals in time those fluctuations are mismatched and you can end up with zero or negative correlations. Now imagine making those fluctuations much slower. If you use the same time shifts as before, the signals will still be fairly correlated, because the rates of signal change are much longer. As a result, the span of null correlations also increases. This can be corrected by normalizing the true correlations and prediction accuracies with a null distribution at each tempo. But with this in mind, it is hard to conclude if the greater correlations found for lower musical tempos in their current form are a true effect.

      Thank you for this great suggestion. We followed your lead (Zuk et al., 2021), and normalized all measures of neural synchronization (TRF correlation, SRCorr, SRCoh) relative to a surrogate distribution. The surrogate distribution was calculated by randomly and circularly shifting the neural data relative to the musical features for each of 50 iterations. This was done separately for every musical feature and stimulation tempo condition (Figures 2 and 3). After normalization, the results look qualitatively similar and the main results – spectral flux and slow stimulation tempi resulting in highest levels of neural synchronization – persist.

      The changes in the manuscript based on your comment (and the comment of Reviewer 1) can be found throughout the manuscript, but especially on p. 11, l. 210-218, Figures 2-3 and a more detailed explanation in the Methods section:

      p. 39, l. 821-829: “In order to control for any frequency-specific differences in the overall power of the neural data that could have led to artificially inflated observed neural synchronization at lower frequencies, the SRCorr and SRCoh values were z-scored based on a surrogate distribution (Zuk et al., 2021). Each surrogate distribution was generated by shifting the neural time course by a random amount relative to the musical feature time courses, keeping the time courses of the neural data and musical features intact. For each of 50 iterations, a surrogate distribution was created for each stimulation subgroup and tempo condition. The z-scoring was calculated by subtracting the mean and dividing by the standard deviation of the surrogate distribution.”

      If the strength of neural tracking at low tempos is a true effect, it is worth noting that the original tempi for the music clips span 1 - 2.5 Hz (Supplementary Table 1), roughly the range of tempi exhibiting the largest prediction accuracies and correlations. All tempos above this range are produced by digitally manipulating the music. It is possible that the neural tracking measures are higher for music without any digital manipulations rather than reflecting the strength of tracking at various tempi. This could also be related to the author's finding that neural tracking was better for more familiar excerpts. This alternative interpretation should be acknowledged and mentioned in the discussion.

      Thank you for these important suggestions (see also comment #2 (part 2) from Reviewer 1). First up, it is important to say that all music stimuli were tempo manipulated: even if the tempo of an original music segment was e. g. 2 Hz and the same song was presented at 2 Hz, it was still converted via the MAX patch to 2 Hz again (to make it comparable to the other musical stimuli). Second, it is true that we cannot fully exclude the possibility that the amount of tempo manipulation could have an effect on neural synchronization to music – meaning that less tempo manipulated music segments (so a stimulation tempo close to the original tempo) could result in higher neural synchronization. However, we have now conducted an additional analysis to address this as best we could.

      We compared TRF correlations for a) songs that were shifted very little relative to their original tempi to b) songs that were shifted a lot relative to their original tempi. We did not have enough song stimuli to do this for every stimulation tempo, but we were able to do the TRF correlation comparison for two illustrative stimulation tempo conditions (at 2.25 Hz and 1.5 Hz). In those tempo conditions, we took the TRF correlations for up to three trials per participant when the original tempo was around the manipulation tempo (1.25-1.6 Hz for 1.5 Hz or 2.01-2.35 Hz for 2.25 Hz) and compared it to those trials where the original tempo was around 0.75¬–1 Hz faster or slower than the manipulated tempo at which the participants heard the songs (Figure 3 – figure supplement 2). This analysis revealed that there was no significant effect of the original music tempi on the neural response (please see Material and Methods, p. 40, l. 855-861 and Results p. 13, l. 265-273). In response to your and Reviewer’s 1 comments, we also added it to the discussion.

      p. 23-24 l. 427-436: “The tempo range within which we observed strongest synchronization partially coincides with the original tempi of the music stimuli (Figure 1 – figure supplement 2). A control analysis revealed that the amount of tempo manipulation (difference between original music tempo and tempo the music segment was presented to the participant) did not affect TRF correlations. Thus, we interpret our data as reflecting a neural preference for specific musical tempi rather than an effect of naturalness or the amount that we had to tempo shift the stimuli. However, since our experiment was not designed to answer this question, we were only able to conduct this analysis for two tempi, 2.25 Hz and 1.5 Hz (Figure 3 – figure supplement 3), and thus are not able to rule out the influence of tempo manipulation on other tempo conditions.”

      We also provide more information to the reader about the amount of tempo shift that each stimulus underwent. We added two plots to the manuscript that show 1) the distribution of original tempi of the music stimuli and 2) the distribution of the amount of tempo manipulation across all stimuli (Figure 1 – figure supplement 2).

      Their last finding regarding predicting tapping rates is novel and important, and the model they use to make those predictions does well. But I am concerned by how well it performs (Figure 6), since it is not clear what features of the TRF are being used to produce this discrimination. Are the effects producing discriminable tapping rates and stimulation tempi apparent in the TRF? I noticed, though, that these results came from two stages of modeling: TRFs were first fit to groups of excerpts with different tapping rates or stimulation tempo separately, then a support vector machine (SVM) was used to discriminate between the two groups. So, another way to think about this pipeline is that two response models (TRFs) were generated for the separate groups, and the SVM finds a way of differentiating between them. There is no indication about what features of the TRFs the SVM is using, and it is possible this is overfitting. Firstly, I think it needs to be clearer how the TRFs are being computed from individual trials. Secondly, the authors construct surrogate data by shuffling labels (before training) but it is not clear at which training stage this is performed. They can correct for possible issues of overfitting by comparing to surrogate data where shuffling happens before the TRF computation, if this wasn't done already.

      Thank you for noticing this important point. You are absolutely right – when re-analyzing that part of the results based on your comment, we noticed that we had an error in our understanding of the analysis pipeline. Indeed, we first calculated two TRF models for the separate groups (e. g. stimulation tempo = tapping tempo vs. stimulation tempo = 2* tapping tempo) based on all trials of each group apart from the left-out-trial. Next, the resulting TRFs were fed into the SVM which was used to predict the group. The shuffling of the surrogate data occurred at the SVM training step.

      Based on your comment, we tried several approaches to solve this problem. First, we calculated TRFs on a single-trial basis (instead of using the two-group TRFs as before, only one trial was used to calculate the TRFs) and submitted the resulting TRFs to the SVM. The resulting SVM accuracy was compared to a “surrogate SVM accuracy” which was calculated based on shuffling the labels when training the SVM classifier. Second, we shuffled, as you suggest, the labels not at the SVM training step, but instead prior to the TRF calculation. This way we could compare our “original” SVM accuracies (based on the two-group TRFs) to a fairer surrogate dataset. However, in both cases the resulting SVM accuracies did not perform better than the surrogate data. Therefore, we felt that it is the fairest to remove this part from the manuscript. We are aware that this was one of the main results of the paper and we are sorry that we had to remove it. However, we feel that our paper is still strong and offers a variety of different results that are important for the auditory neuroscience community.

      Lastly, they show that their measures of neural tracking are larger for music with high familiarity and high ease-of-tapping. I expect these qualitative ratings could be a consequence of acoustic features that produce better EEG correlations and prediction accuracies, especially ease-of-tapping. For example, music with acoustically-salient events are probably easier to tap to and would produce better EEG correlations and prediction accuracies, hence why ease-of-tapping is correlated with the measures of neural tracking. To understand this better, it would be useful to see how the stimulus features correlate with each of these behavioral ratings.

      We agree that our rating-based results could be influenced by acoustic stimulus features (at least for ease of tapping, it’s actually not clear to us why familiarity would be related to acoustics). As it is difficult to correlate stimulus features (time-domain, and one time course per song) with behavioral ratings (one single value per song per participant), we conducted frequency-domain analysis on the musical features to arrive at a single value quantifying the strength of spectral flux at the stimulation frequency and its first harmonic. We calculated single-trial FFTs on the spectral flux (which was used for the main Figure 5) for the 15 highest- and 15 lowest-rated trials per behavioral category (enjoyment, familiarity, ease to tap the beat) and participant. We compared the z-scored FFT peaks at the stimulation tempo and first harmonic for the top- and bottom-rated stimuli. We did observe significant acoustic differences between top- and bottom-rated stimuli in each category, but the differences were not in the direction that would be expected based on acoustically more salient events leading to better TRF correlations, with the exception of ease of tapping. Easy-to-tap music did indeed have stronger spectral flux than difficult-to-tap music, which is intuitive. However, spectral flux was stronger for more enjoyed music (we did not see any significant differences between TRF correlations of more vs. less enjoyed music; Figure 5C) and for less familiar music (this is the opposite of what we saw for the TRF measures). Overall, given the inconsistent relationship between acoustics, behavioral ratings, and TRF measures, we would argue that acoustic features alone cannot solely explain our results (Figure 5 – figure supplement 1, p. 21 l. 381 – 387).

    1. Author Response

      Reviewer #2 (Public Review):

      Summary

      The research paper presents a modeling approach aimed at disentangling mother's genetic effects on their offspring in two components: prenatal environment and postnatal environment. Specifically, the authors use SEM on adopted and non-adopted individuals from the UK Biobank and leverage the variation in genetic similarities from different family structures. Because the UK Biobank is not created as an adoption study, they build seven different family structures to include all possible family combinations that can provide information regarding the two parameters of interest: those representing prenatal and postnatal environment respectively. The model is used on two phenotypes (birthweight and education attainment) to illustrate it.

      The results indicate an 'expected pattern of maternal genetic effect on offspring birthweight' and 'unexpectedly large prenatal (intrauterine) maternal genetic effects on offspring education attainment. The authors mention this result can likely be explained by adopted offspring being raised by biological relatives. They then show simulations supporting this hypothesis.

      We praise the authors for the complex analyses executed and the work done to create the model and make the scripts available to the research community. The models can be a valuable addition to the behavior genetics literature and to researcher's toolkit. We do however have a few concerns regarding 1. the meaning of the results, 2. model building decisions and the choice of sample and 3. the way some limitations are addressed. We go into more details for each of these points.

      1) Interest to study mothers' genetic effects as acting via the prenatal environment or the postnatal environment and the meaning of the parameters tested by the model .

      I think this is an interesting question and a useful distinction for a number of phenotypes and the authors use the adoption design in an innovative way to define and estimate parameters that correspond to this distinction. However, I would suggest that the expressions of prenatal environmental effect and postnatal environmental effect (as distinct pathways for mother's gene to be expressed) seem to be an overstatement.

      The definition of mother genetic effects (effects of mother genotype on their child phenotype, over and above any genetic transmission) is citing Wolf & Wade 2009 (line 56) which mention the more general notion of 'maternal effect' that are defined as effect of genotype, phenotype (or both) on their offspring. I would argue that postnatal maternal genetic effects (as currently defined in the paper) are likely environmental effect and not only 'genetic effects'. These environmental effects are indeed partly influenced by mother's genes, but also strongly affected by other variables such as culture, generation, SES, education. It is not possible to disentangle these effects in the design(s) used here.

      Although we have referred to the maternal effects estimated in our manuscript as “prenatal maternal genetic effects” and “postnatal maternal genetic effects”- all of these effects on the offspring are mediated through maternal phenotypes (which as the reviewer correctly notes, will be influenced by both genes and the environment). In other words, the maternal PRS used in our study proxies some maternal phenotype/s that then forms part of the offspring’s prenatal and/or postnatal environment which then affects the offspring’s phenotype. We have referred to these effects as maternal genetic effects rather than just maternal effects to emphasize the causal link with the maternal genotype and the fact that we are only proxying that part of the maternal phenotype that is explained by the relevant genetic variation (NB. This is consistent with the Wolf & Wade 2009 definition of maternal effects i.e. “…the causal influence of maternal genotypes on offspring phenotypes…”). We agree with the reviewer that our model is not attempting to disentangle proportions of variance due to genetic and environmental factors (which is not its purpose).

      This consideration can affect the authors definition of the covariance between an adopted individual's genotype and phenotype as a function of prenatal (but not postnatal) maternal genetic effects (line 93-94). The authors current assumption does not consider the potential for environmental modulation of the effect of adopted mothers' genes (which are not zero for several phenotypes). Postnatal maternal genetic effects are thus also likely to capture and represent environmental differences.

      Assuming that adopted offspring are not biologically related to their adoptive mothers, then adopted individuals’ PRS should not be correlated with adoptive mothers’ PRS. The corollary is that adoptive mothers’ PRS should not influence the covariance between adopted individuals’ PRS and phenotype (i.e. regardless of whether there is environmental modulation of the effect of adopted mothers’ genes on offspring phenotype). It is true, however, that we do not consider genotype by environment interaction effects in our model, and that this is a limitation of our model. We allude to this important point several times in the Discussion:

      “Those assumptions explicitly encoded in Figure 1 include that the total maternal genetic effect can be decomposed into the sum of prenatal and postnatal components, that genetic effects are homogenous across biological and adoptive families, the absence of genotype x environment interaction…”

      And

      “In contrast, in our design it is more important that genetic effect sizes are homogenous across adopted and non-adopted individuals (i.e. no genotype by environment interaction)…”.

      At the request of the reviewer, we now include additional discussion of GxE and other assumptions of our model in further detail in Supplementary File 17.

      2) Model building decisions specific to the UK biobank. One of the main issues is that the method is tested on a sample that is not built as an adoption design. This forced the authors to make decision to circumvent this problem and lead to important limitations that are not inherent to their method, but to the specific sample they applied it to.

      a) Having adoptive parents partly genetically related to the child is breaking the logic of the adopted design. Thus, it brings back the genetic confound (passive gene-environment correlation) problem of usual family-based design. In their case, it alters their ability to differentiate between prenatal and postnatal environment.

      We agree that the UK Biobank was never designed for this purpose, and that data from it regarding adoption is less than perfect. Nevertheless, we think that an important conclusion of our paper is that large-scale biobanks (which because of their size) contain many hundreds/thousands of adopted individuals can be used to partition maternal genetic effects into prenatal and postnatal components, provided good quality data on the adoption process has been gathered and/or genetic information on their adoptive parents.

      To help address the reviewer’s concerns we have created a Supplementary Table (Supplementary File 17) that summarizes some of the main limitations/assumptions of our model, whether they are specific to the UK Biobank dataset or intrinsic to our method, their consequences on model parameters, and possible options for addressing them.

      b) In section starting on line 426, the authors have included simulations to show how this issue could be addressed. However, it does not help the fact that in their model applied to the UK biobank, the information regarding the degree of genetic similarity between adopting parents and biological parents and the child is unknown.

      We agree- but we feel it is important to demonstrate (a) that cryptic biological relatedness between adopted individuals and their adoptive parents is a potential issue not only for our study, but for other studies attempting to utilize this information in the UK Biobank, and (b) that cryptic relatedness can be dealt with effectively through appropriate modelling in our SEM framework (i.e. even if it is not possible with the current data from UK Biobank). The corollary is that we recommend that the UK Biobank (and other large-scale biobanks) attempt to acquire information on adopted individuals and their parents through e.g. questionnaire.

      c) To address this problem in their analyses of UK biobank, authors used (Lines 302 & 417) information regarding whether children were breastfed or not (on the basis that this knowledge would be more common if the child was raised by a biological family relative) to identify adopted singletons raised by biological relatives. However, this is, at best, a mediocre index of genetic relatedness. I can see other reasons for participants to have knowledge of if they have been breastfed: because they were adopted at an older age, because they are still (or have been) in contact with their biological mother. It is also possible, albeit rare, that adoptive parents may breastfeed a child via the use of drugs to stimulate milk production. Line 420: the fact that the prenatal maternal estimate became non-significant after removing participants that were breastfed do provide results more in-line with what would be expected. But we can't use expected results as a basis to evaluate the validity of the approach. The absence of GxE and rGE are two other strong assumptions of the model that could also produce this kind unexpected results.

      We agree that (a) the inclusion of adopted individuals whose adoptive parents are biologically related to them is only one possible reason for unexpectedly strong prenatal maternal genetic effect estimates, (b) attempting to remove these individuals from the analysis using a proxy like breastfeeding information is less than perfect. As indicated above, we now discuss in detail alternative explanations for our results including violations of assumptions regarding the absence of GxE and rGE, and other explanations (assortative mating, stratification etc) (see new text in the Discussion and Supplementary File 17).

      d) I would suggest discussing the issue of genetic relatedness between adopting parents and offspring in terms of passive rGE which is a common problem for the estimation of parental effects in every familial design.

      We now include mention of passive rGE in the Discussion:

      “Rather we hypothesize it is possible that our model could have been misspecified in that substantial numbers of adopted individuals in the UK Biobank may have in fact been raised by their biological relatives. This can be thought of as (unintentional) reintroduction of passive gene-environment correlation into the study. In other words, adopted children are brought up by their genetic relatives, who in turn provide the environment in which they are raised. This induces a correlation between adopted individuals’ PRS and their environment.”

      e) Line 291: why use an unweighted PRS for EY3 (Lee, 2018), while the usual way of computing PRS (as a weighted sum of risk alleles) was used for birthweight?

      We thank the reviewer for pointing this inconsistency out. We have now rerun the analyses using weighted and unweighted PRS for both birth weight and educational attainment. The reason for running both sets of analyses is that the GWAS on which the SNPs are selected (i.e. the weights are based), contains UK Biobank individuals. This may inflate the overall strength of association between the PRS and outcome through winner’s curse (although not differentially between individuals from adoptive and biological families). In contrast, unweighted scores should be much more robust to this inflation, and so are a useful sanity check on the results.

      3) Limitations

      As our Discussion is already very long, we have created a Supplementary Table (Supplementary File 17) that summarizes some of the main limitations/assumptions of our model, their consequences on model parameters, and possible options for addressing them. We also discuss specific concerns raised by the referee below.

      Assess other limitations of their method.

      a) limitation of the availability of birth father information,

      Our model does not require information on adopted individual’s birth fathers (although it does require PRS on non-adopted individuals’ birth fathers- which is typically readily available). It does, however, make the assumption that fathers do not contribute prenatally to offspring traits- which we think is a reasonable assumption for the majority of offspring phenotypes. If PRS for adopted individuals’ biological fathers were available, then prenatal paternal genetic effects could be estimated as part of the model. To accommodate the reviewer’s request, we have included and discussed this limitation/assumption in more detail in Supplementary File 17.

      b) prenatal events uncorrelated with birthmother's genes (disease or accidents),

      We agree that our model assumes that maternal genotype is uncorrelated with prenatal environmental factors. We now discuss this assumption/limitation further in Supplementary File 17.

      c) Inferring prenatal environment effect from higher birth mother correlation compared to birthfather is subject to bias from measurement differences between the two (Loehlin, 2016).

      Whilst this is a limitation of adoption designs that estimate prenatal effects using the difference between maternal and paternal correlations with offspring phenotypes, this is not actually a limitation of our model. In our model we do not use (phenotypic) mother-child and father-child correlations (we use PRS-phenotype correlations). Also, in our model, information on the size of the prenatal (and postnatal) maternal genetic effects primarily comes from the difference between the PRS-phenotype covariance in adopted singletons compared to the PRS-phenotype covariance non-adopted individuals (i.e. not from the difference between maternal and paternal correlations with offspring phenotypes). We state this in the Introduction and Methods e.g.:

      “Thus, the difference between the genotype-phenotype covariance in adopted and non-adopted singleton individuals provides important information on the likely size of postnatal genetic effects.”

      It is also worth noting, that in our model, the size of the paternal PRS-offspring association does not factor into the estimation of maternal genetic effects (nor does the difference between the maternal PRS-offspring phenotype association and the paternal PRS-offspring phenotype association). Also, our model takes into account if there are differences in the amount of (random) measurement error in adoptive and non-adoptive families.

      d) age at which the child is adopted (if the child has been partly raised by birth parents before adoption, it would bias (raise) the estimates of prenatal effects).

      We agree and now discuss this limitation further in Supplementary File 17.

      e) evocative rGE not mentioned. It has been shown that parents partly react to children's behaviors. Thus, the estimate of maternal genetic postnatal effects could be biased (lowered) by evocative gene-environment correlation. In other words, the model also assumes no evocative gene-environment correlation.

      We agree and now discuss this limitation in Supplementary File 17 (although we note that the effect that evocative rGE will have on the SEM parameters will depend on the direction of the gene-environment correlation).

      Final thoughts

      1) I would like a better case made for why it is important to distinguish genetic effects into prenatal and postnatal effect.

      We have included the following text in the Introduction:

      “Given the increasing number of variants identified in GWAS that exhibit robust maternal genetic effects, a natural question to ask is whether these loci exert their effects on offspring phenotypes through intrauterine mechanisms, the postnatal environment, or both. Indeed, resolving maternal effects into prenatal and postnatal sources of variation could be a valuable first step in eventually elucidating the underlying mechanisms behind these associations (Armstrong-Carter et al. 2020), directing investigators to where they should focus their attention, and in the case of disease-related phenotypes, yielding potentially important information regarding the optimal timing of interventions. For example, the demonstration of maternal prenatal effects on offspring IQ/educational attainment, suggests that if the mediating factors that were responsible could be identified, then improvements in the prenatal care of mothers and their unborn babies which target these factors, could yield useful increases in offspring IQ/educational attainment.”

      2) I would suggest the author make a clear distinction between the limits inherent to their sample (UK biobank) from those inherent to their methodological approach. I see important usefulness is plague by limits inherent to the sample used. At the same time, I am not aware of the availability of a big enough sample of adopted children with genotypic information available to compute PRS.

      One of the main limitations inherent to our sample (UK Biobank) is the fact that currently we cannot be certain that adopted individuals are not biologically related to their adoptive parents. As we demonstrate, this limitation could be addressed if information were gathered regarding the relationships, which at least in principle could be done relatively easily in the UK Biobank (e.g. by questionnaire, or even better, by genotyping adoptive parents where possible). The SEMs could then be adjusted to take these relationships into account. We discuss this limitation, and many others, in Supplementary File 17, and divide the table according to whether the limitation is primarily a consequence of the dataset (UK Biobank) or the method more broadly.

      We agree with the reviewer that the size of adoption studies is currently limited (e.g. Texas Adoption Project; Colorado Adoption Study etc). Nevertheless, it is likely that the number of adopted individuals available in large-scale Biobanks will increase over time, in which case models like the one espoused in this manuscript will become increasingly useful. Importantly, our method does not require adoptive families in order to partition maternal effects, merely adopted singleton individuals, and reliable information on the biological relatedness (or lack thereof) of their adoptive parents. We feel therefore that it is important that this sort of information be gathered so that the adopted individuals within these large-scale resources can be leveraged to examine interesting questions like the ones discussed in our manuscript.

      We have added these points to the Discussion:

      “We argue that of greater consequence for the validity of our model is that any genetic relationship between adoptive and biological parents is accurately modelled and included in the SEM. Through simulation, we have shown that the consequences of model misspecification depend upon which biological and adoptive parents are related, the nature of this relationship, and the proportion of adopted individuals in the sample who have had their relationship misspecified. Our simulations also showed that correctly modelling this relationship returns asymptotically unbiased effect estimates and correct type I error rates. Clearly, knowing these cryptic relationships in the UK Biobank would allow us to properly model them and better estimate prenatal and postnatal maternal genetic effects using this resource. We emphasize that accurately modelling these relationships does not require that actual genotypes for adoptive and/or biological parents be obtained (although this would be advantageous in terms of statistical power) as our SEM allows us to model these relationships in terms of latent variables. Indeed, as large-scale resources like the UK Biobank become more common, we expect that the number of adopted individuals who have GWAS will also increase, and consequently models like the one espoused in this manuscript will become increasingly useful. High quality phenotypic information on these adopted individuals and their adoptive parents including whether they share any biological relationship will be critical to making the most of these resources.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have provided an impressive analysis of the effects of reporting of WGS results on IPC practices in 14 hospitals in the UK during the COVID-19 pandemic. After a median of 4 weeks, hospitals adopted a practice of "rapid" or "longer" turnaround phases for WGS reporting. After a median of 8 weeks, 8 of 9 "rapid" hospitals adopted the "longer" practice for a median of 4 weeks. After a median of 4 weeks, all 5 "longer" hospitals adopted the "rapid" practice for a median 8 weeks. Hence, there were twice as many weeks with the rapid, compared to the longer reporting practice.

      The targeted turnaround times for reporting were 48 hours for the rapid and 5-10 days for the longer phase.

      The primary outcomes of the study were: (1) incidence of IPC-defined SARS-CoV-2 HAIs per week per 100 currently admitted non-COVID-19 inpatients, and (2) for each HOCI, identification of linkage to individuals within an outbreak of SARS-CoV-2 nosocomial transmission using sequencing data as interpreted through the SRT that was not identified by pre-sequencing IPC evaluation during intervention phases.

      Secondary outcomes were: (1) incidence of IPC-defined SARS-CoV-2 hospital outbreaks per week per 100 non-COVID-19 inpatients, (2) for each HOCI, any change to IPC actions following receipt of SRT report during intervention phases, (3) any recommended change to IPC actions (regardless of whether changes were implemented). The proportion of HOCI cases for which IPC reported the SRT report to be 'useful' was added as a further outcome.

      A total of 2170 HOCIs were recorded for the study between 15 October 2020 and 26 April 2021.

      The authors conclude that "While we did not demonstrate a direct impact of sequencing on the incidence of nosocomial transmission, our results suggest that sequencing can inform IPC response to HOCIs, particularly when returned within 5 days."

      The research question is very relevant and, as said, the amount of data collected is impressive. Yet, interpretation of the data, obtained in a real-life setting with all hurdles and complexities created by the pandemic situation, is challenging. I have several questions related to data interpretation and difficulties in accepting the overall positive interpretation of the findings when it comes to feasibility and potential impact. Especially, as I consider the real-time availability of WGS results in the participating hospitals to be (much) higher than it will be in hospitals in most other countries.

      We thank the reviewer for their summary and commentary on our work.

      Specific questions

      Not clear why sites started either with rapid or longer phase. Was there a randomization process? Please clarify.

      There was no randomisation process as the ordering of intervention phases was largely driven by logistical concerns that needed to be overcome in order to be able to run the intervention during a very unpredictable period of the pandemic. We have added the following text to the second paragraph of the Methods to clarify this:

      “The order of the intervention phases was pragmatically determined in some sites by the need to first run the ‘longer-turnaround’ phase to develop sample transport and sequencing procedures before attempting the ‘rapid’ sequencing phase, and the ordering was decided in the remaining sites to ensure a mixture of intervention phases over calendar time – there was no randomisation process in deciding the order of study phases.”

      From Figures S2 it is clear that the pandemic peaked, after which the curve declined when vaccination had started, and these curves seem to resemble the incidence rates of HOCI in the hospitals.

      Yes, there was also a full national lockdown in the UK in late 2020 and early 2021, which sharply reduced community incidence rates.

      I had difficulties in interpreting S3 and S4 where I think the authors incorporated these disease dynamics occurring outside the hospital setting on the HOCI incidences. I would be helped by a better explanation of what actually was done.

      Yes, this is correct. We have now added a more detailed model specification to the Appendix (also in response to the comments of Reviewer #3). We have also added some further explanation to the Figure captions: ‘The spline curves shown are estimated simultaneously within the final analysis model, and show how these factors have independent contributions to the prediction of the incidence rate for HAIs. The associations for each covariable indicated by model parameter point estimates are shown as solid lines, with 95%CIs shown as dashed lines.’

      It is not clear why the difference between the groups in the intervention (providing rapid or not so rapid WGS reports) was too small to have an impact, compared to the baseline period without WGS reporting. Surprisingly sites F and G appear to do significantly worse during the "rapid" phase, according to Fig1. Please clarify.

      With regards the issue of why it might be that we did not detect an effect of the intervention on the incidence rate of hospital-acquired infections, the Methods contains a relevant brief summary of our qualitative analysis within this study:

      “The SRT did provide new and valued insights into transmission events, outbreaks and wider hospital functioning but mainly acted to offer confirmation and reassurance to IPC teams. Critically, the capacity to generate and respond to these insights effectively on a case-by-case basis was breached in most sites by the volume of HOCIs, and the limits of finite human and physical resource (e.g. hospital layout). “

      We have now also added a sentence to flag this in the context of study limitations in the Discussion “Our qualitative analyses also found that the capacity of sites to react to information generated by the sequencing intervention was breached by the volume of HOCI and admitted COVID-19 cases in combination with the finite personnel resources and limited physical space for isolation that was available”.

      We have noted in the Results section that “Our analysis models reveal important findings beyond the effect of the intervention. The analysis model for the incidence of HAIs identified independent positive associations with the proportion of current SARS-CoV-2 positive inpatients, the local community incidence of new SARS-CoV-2 cases … and calendar time …”. The ‘rapid’ phases for sites F and G were conducted during periods of high community incidence of SARS-CoV-2, with a high proportion of current inpatients SARS-CoV-2 positive and at a relatively early stage in the national vaccination roll-out. These factors clearly outweighed any potential reduction in the incidence of HAIs associated with the sequencing intervention.

      We have added some further clarifications regarding the structure and interpretation of the analysis models for the outcome of the incidence of HAIs in response to comments from Reviewer #3.

      The 'health economic findings' miss the health component. The costs of the intervention are described in detail, but not the benefits of the intervention. Is it possible to calculate the costs required to prevent a single case of HOCI?

      Thank you for your comment. The scope of the economic evaluation was to evaluate the economic effects of SARS-CoV-2 genome sequencing in supporting infection control teams and not the health benefits of the intervention. Therefore, the outcome of the intervention would be rather the benefit of rapid/slow return of the sequencing report expressed as potential reduction in resource utilisation and costs. A paper presenting the methodology and findings is under preparation. Here we presented only the cost of the intervention. We will remove the “Health economic findings” heading and re-name it “Cost of SARS-CoV-2 genome sequencing”.

      Reviewer #2 (Public Review):

      This study evaluated the impact of rapid turnaround whole genome sequencing to discover unsuspected hospital-acquired SARS-CoV-2 on the incidence of hospital-acquired SARS-CoV-2 cases. Strengths of the study include the important question, the technical and logistic feat of making whole genome sequencing widely available, and the large number of participating sites.

      We thank the reviewer for their summary and commentary on our work.

      Major limitations of the study include the fact that only half of sequencing reports were returned to infection prevention programs and then only a small minority within the targeted reporting time-frame (5% for rapid phase, 21% for longer phase; median turnarounds were 5 days and 13 days respectively). This fundamentally undermines the premise of the study, namely to see if rapid turnaround of sequencing can impact infection control.

      We have now explicitly stated in the Results that the median turnaround times achieved were substantially longer than the target values, and have added that ‘…and more timely reporting of results might be associated with greater impact on IPC actions’ in the Discussion.

      More broadly, it does not appear that there was a standardized protocol on how hospitals were expected to respond to reports of clusters.

      Review of Table S2 suggests that many of the potential actions were things that in retrospect probably don't have too much impact on transmission (e.g. checking soap stocks, signage assessments). The kinds of things that I think might decrease nosocomial transmission include minimizing use of shared rooms, improving ventilation, increased use of N95/FFP2 respirators for source control, more frequent surveillance testing cadences, etc. These were not options on the response lists perhaps explaining the lack of impact on transmission.

      We have added the following paragraph to the Discussion: “Planning this study and developing the data collection forms during the early stages of a novel viral pandemic was challenging, as in the summer of 2020 there were still ongoing debates around the primary mode of viral transmission and optimal IPC practice, and global supply chains for personal protective equipment were strained. In the planning of an equivalent study now, there would be a greater focus on adjustments to ventilation, air filtration and respirator usage. It would also be possible to be more prescriptive and standardised regarding the recommended changes to IPC practice in response to sequencing findings.”

      Reviewer #3 (Public Review):

      This study, conducted in 14 acute hospital trusts in the United Kingdom, compared SARS-CoV-2 hospital infection outcomes in a four week baseline period with outcomes in periods with 'rapid' (<48h) and 'longer-turnaround' (5-10 day) sequencing with results fed-back to infection prevention and control teams using a bespoke sequencing reporting tool. The question of whether rapid sequencing of hospital-onset SARS-CoV-2 infection can, by informing infection prevention and control (IPC) actions, reduce nosocomial transmission is interesting and potentially important. To our knowledge, this study represents the first large-scale formal evaluation of such technology. While the results are, on the face of it, disappointing in that hospitals were largely unable to meet target turnaround times for sequencing and results provide no evidence of benefit of the intervention in reducing hospital-acquired infection (and in some cases, such as for the "hospital outbreaks" outcome, the confidence intervals are so wide as to be unable to rule out substantial benefits or harms of the intervention) the are a number of important strengths of the study. These include the relatively strong quasi-experimental design (a type of non-randomised cluster crossover), the pre-defined analysis plan, and adequate power for the primary outcomes.

      Limitations of the study include the practical difficulties that participating hospitals had in reporting sequencing results to the IPC teams in a timely manner that could be acted on and lack of sufficient consideration for the ways in which sequencing information could have directly informed IPC activities in ways that would have been likely to substantially reduce the spread of infection (for example, Table S2 reports changes to IPC as a result of sequencing reports which include generic activities such as "Assessment of alcogel stocks" or "IPC signage assessment", which seem like things which should be done anyway, and don't obviously depend on information from pathogen sequencing).

      We thank the reviewer for their summary and commentary on our work.

      There are also some aspects of transparency that need to be addressed: the analytical methods are not reported in sufficient detail to enable the work to be repeated, and the results are not reported with sufficient detail to an enable an assessment of the appropriateness or otherwise of the statistical models used in the analysis. Additionally, while the study protocol specified six secondary outcomes, not all of these are reported even where it appears that some (partial) information is available for unreported outcomes.

    1. Author Response:

      Evaluation Summary:

      The authors analyze the mechanisms of entropically driven cooperativity in the human thymidylate synthase (hTS), an enzyme essential for DNA replication and a promising target for anticancer drugs. The authors conclude that the cooperative binding of dUMP ligands to its two identical sites arises from a disproportionate reduction in the enzyme's conformational entropy upon binding the first ligand. The results provide rare insights into the mechanisms of ligand binding for an essential human protein and should be of great interest to readers interested in enzyme structure/dynamics/function relationships, cooperativity and allostery, and possible drug targeting of thymidylate synthase.

      We would like to add that the disproportionate reduction in conformational entropy is entirely dependent on the presence of the flexible N-terminus, even though the N-terminus itself undergoes no detectable change in conformational entropy.

      Reviewer #1 (Public Review):

      Human thymidylate synthase (hTS) is relatively large for NMR standards (~72 kDa dimer) and so the authors use a battery of advanced, TROSY-based NMR experiments to investigate the structure and conformational dynamics of the enzyme in multiple binding states. In particular, they have acquired multiple and single quantum methyl CPMG and CEST data to probe us-ms dynamics. These experiments showed that hTS undergoes exchange between active and inactive conformations. Analysis of residual dipolar couplings and chemical shift perturbation experiments indicated that the major conformational state revealed by CPMG and CEST corresponds to the active hTS conformation. This finding suggests that conformational selection is not the primary mechanism mediating cooperativity in hTS.

      To investigate if binding cooperativity in hTS is due to modulation of conformational entropy upon ligand binding, the authors have investigated ps-ns dynamics in hTS by means of 2H relaxation measurements. These measurements suggest that rigidification of the protein upon the first binding event is the primary origin of cooperativity in the hTS dimer. Indeed, acquisition of control experiments on systems that do not show binding cooperativity (i.e., the complex formed by dUMP with N-terminal truncated hTS and the complex formed by TMP with full-length hTS) do not show the same modulation of conformational entropy observed upon formation of the dUMP-hTS complex. Overall, I found this manuscript interesting and well-written. I found particularly fascinating the observation that cooperativity is driven by modulation of conformational disorder in the unstructured N-terminal tail, which is not directly involved in ligand binding. The experimental approach and analysis protocols are sound and the conclusions are well supported by the experimental data.

      We appreciate the positive comments. We note that the last statement about cooperativity is slightly misleading, as it is the presence of the N-terminal tail that enables modulation by conformational entropy, even though the entropy “at play” appears not to be in the tail itself.

      Reviewer #2 (Public Review):

      The principal objective of this work is to detail the basis for the enzyme's observed cooperative binding to dUMP, which was reported by the authors in a previous publication (Bonin et al. 2019 Biophys J). That paper showed (via ITC) that the binding of dUMP ligands to the protein's two identical sites cannot be explained by a simple thermodynamic model with a single affinity, but rather requires a cooperative model in which the second binding event is more favorable by 1.3 kcal/mol (~2RT), due in part to a much more favorable entropy change -TΔS. In this paper, the authors set out to test two possible cooperativity models consistent with that observation: (1) that binding of the first ligand results in stabilization of a binding-competent conformation (conformational selection), or (2) that a broad reduction in protein dynamics (conformational entropy ΔS_conf) upon binding the first ligand results in a smaller ΔS_conf penalty for binding the second ligand, and therefore a more favorable ΔG.

      The authors perform an extensive series of sophisticated NMR experiments using a range of samples with specialized labeling patterns, particularly ILV methyl-13C, and ILV-methyl-13C-HD_2. These labeling patterns allow the investigators to record high-quality methyl NMR spectra on the large ~70 kDa hTS dimer, its Δ25 N-terminal deletion, alone and in complex with dUMP and dTMP. Insights into exchange dynamics come from 13C methyl CPMG and CEST relaxation measurements, which are sensitive to motions on timescales spanning µs to ms. Insights into ps-time scale dynamics and conformational entropy come from methyl-2H_1H_2 relaxation measurements, and are extrapolated using an empirical "entropy meter". Structural insights are obtained from measured and predicted amide 1H-15N residual dipolar couplings, solvent PRE measurements, and chemical shift perturbations.

      The structural context is largely framed by prior reports of hTS crystallizing in two distinct conformations, termed 'active' and 'inactive' (Chen et al. 2017). These states are described (page 3) as differing by the conformations of an active site loop. The authors posit that if the enzyme is exchanging between these two states, with the 'inactive' state being dominant, binding of a first dUMP to the enzyme will shift the population towards the 'active' state, therefore favoring an additional dUMP binding event. The structural differences between the 'active' and 'inactive' states are not well described, however, and since the enzyme must bind both the substrate dUMP and its co-substrate MTHF, it's not entirely clear why this is a reasonable premise. RCSB coordinates 5X5A and 1YPV are used as the reference structures for the 'active' and 'inactive' states, respectively. Computed RDC data (Fig. 4) indicate that they are quite different, but it would be helpful to have a description of the differences in the structures, why it is reasonable to hypothesize that one or more of them might have different affinities for dUMP, and how the sampling of the other state might be manifest in the subsequent NMR data.

      We agree that the manuscript would be improved if there is a more detailed description of the existing structural states for hTS. These structures correspond to apo and dUMP-bound hTS. In TS, in general the dUMP-bound conformations are generally highly similar to those with nucleotide and cofactor (or cofactor analog) both bound in the active site.

      The authors indeed observe strong dispersions in methyl CPMG relaxation data for ligand-free hTS (Fig. 2), and more than one dip in CEST profiles (Fig. 1). However, these data (esp. CPMG) are not well described by a global exchange process with a single set of rate constants and populations, indicating a more complex exchange between three or more states (Fig S1, S2). (This point could be better described - the authors conclude that the data do not fit a 2-state model, but it would be helpful to describe in the main text the analysis that brought them to that conclusion.) Since the data are not well described by a two-state model, the authors fit the data to a three-state "BAC" model, in which the major state A exchanges with two other states, B and C; the A-B exchange is referred to as "slow" (~240/s) and the A-C exchange as "fast" (>2000/s)(Fig 2). It could be clearer why that model is preferred over alternative three-state models.

      We appreciate the Reviewer’s accurate summary of the CPMG and CEST NMR relaxation on hTS. The point is well-taken that the steps taken during the fitting trials were not described in detail, and we will expand on the descriptions to make the process that led to the BAC model more clear.

      The authors compare backbone amide RDCs measured from the major state of the enzyme and its complex with dUMP with RDCs computed from the crystallographic "active" and "inactive" structures (Fig. 4). On the basis of its better agreement, they conclude that the major state is the "active" conformation. This may be a reasonable conclusion but merits additional discussion. Why are the predicted RDCs so different if the conformations only differ in a loop (as described on page 3)? Were the same alignment tensors obtained from the structural analysis? Provided the major state in solution is the "active" conformation, they conclude that they can rule out the conformational exchange mechanism of allostery. Again, this might be a reasonable interpretation, but it would be strengthened by describing the evidence that the crystallographically observed "active" and "inactive" conformations will have different affinities for dUMP.

      While the active site loop is one region of the protein that is dramatically different in the active and inactive structures, there are differences between the two beyond this loop. We will expand upon the description of these differences to clarify this point. We will also add details supporting the idea that the two conformations have different affinities for dUMP.

      The authors further examine differences in the methyl spectra between full-length hTS, which exhibits cooperative dUMP binding by ITC, and the Δ25 mutant, which does not (Figure 7). Since the methyl spectra are nearly superimposable, they conclude that the N-terminal region does not perturb the structure, though it is responsible for the observed cooperative behavior. Again, this might be a reasonable interpretation, but it is tempered by the inherent limitations of the observables, as the spectra only reflect the structure experienced by the labeled methyl groups, so the data are silent about other areas of the protein that might reflect structural changes.

      This is true that the NMR probes are not used for every atom of the protein, and that methyl groups are in discrete locations of the structure. Nevertheless, the amide NH spectra contain information from nearly all residues, and we believe that methyl density is sufficient to draw basic structural inferences. We note that while there are fewer probes at the dimer interface (low density of methyls and lack of amides from extremely slow back exchange), significant structural changes there should be sufficient to detect chemical shift changes at nearby observable probes.

      Having ruled out structural changes and conformational exchange as responsible for the cooperative behavior, the authors quantify the intrinsic conformational entropy of the enzyme. They use the 2H relaxation rates of suitably labeled methyl groups to compute the magnitude of the order parameter S^2 of each labeled methyl axis. They compute the change in conformational entropy ΔS_conf using the change in S^2 for each methyl as a proxy, and an empirically-derived "entropy meter" (Fig. 5). From this analysis they find a larger 'unfavorable' entropy change upon binding dUMP than to TMP, meaning that a larger reduction in conformational entropy is associated with cooperative binding. The reason is that if more than half of this entropic penalty is paid upon binding the first ligand, the second binding event can occur with a smaller entropy penalty and thus a more favorable affinity. These are not unreasonable conclusions; however, there are significant uncertainties in the data and the underlying assumptions. At a minimum, these uncertainties should be considered and discussed.

      We agree that the entropy meter is a method for estimation of entropy, with uncertainties. However, the method has been shown to be useful for a large set of proteins (and widely adopted) and gives an overall sense of conformational entropic effects. We have been rigorous about measurements of error in the 2H relaxation data, and in fact switched to 2H relaxation in CHD2 groups after determining from our own data that methyl 1H relaxation in CH3 groups appears to be less reliable. In the end, the trends presented from use of the entropy meter are also easily observed from changes in the raw methyl axis order parameters.

      The ΔS_conf conclusion at which the authors arrive is unfortunately mechanistically uninformative. In a statistical mechanical sense, a reduction in entropy arises from a reduction in accessible conformational states. Might one quantify the states that are excluded upon ligand binding, and one might gain an understanding of the link between structure (ensembles) and thermodynamics. The "entropy meter" approach is not informative about 'which' states are lost, only that a reduction in disorder, extrapolated over the full protein, is associated with a bulk change in entropy.

      It would be nice to have the ability to identify such specific microscopic and transient states, but the current state of NMR cannot provide such a high level of this kind of conformational detail. We would like to reiterate the point that through the experimental strategy, we were able to identify the flexible N-terminus as a key element (i.e. mechanism) in the entropy effect underlying cooperativity in hTS dUMP binding.

  5. Jun 2022
    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to consider more the role of the predator in predator-prey interactions, particularly from a collective locomotion aspect. This is an aspect which at times has been overlooked, with many theories, experiments and models focusing largely on the prey response, independent of how the predator behaves. The major strengths are the (1) excellent writing, (2) quality of the figures, (3) quantity of data, and (4) question tackled. The major weaknesses are (1) the volume of information (as a reader, it is quite hard to distil key points from the sheer volume of what has been presented), (2) the confined captive environment making it difficult to draw comparisons with a wild-type scenario, and (3) lack of clarity about the wider implications of the work outside of the immediate field.

      We thank the reviewer for their thoughtful review and positive comments. To address the weaknesses highlighted by the reviewer, we have revised our manuscript throughout.

      Reviewer #2 (Public Review):

      The manuscript describes a laboratory-based predator-prey experiment in which pike hunt shiner fish as a way to gain insight into the selective pressures driving the evolution of collective behavior. Unlike the predictions of classical theoretical work in which prey on the edge of social groups are considered to be at highest risk of predation, the fish in the center of the school were primarily targeted by the pike. This is because the pike uses a hunting behavior in which it slowly moves to the center of the school, seemingly undetected, until it rapidly attacks prey directly in front of its snout. This study also differs from previous studies in that both the predator and prey motion are examined, and the success of predation attempts was precisely determined. While the study demonstrates why shiners would be under selective pressure to avoid the center of a school, I am not convinced that the results explain why shiners evolved to have schooling behavior.

      The reviewer indeed highlights one of the main findings of our study, that fish closer to the group center are more at risk of being attacked by pike. They also give a proper account of its possible explanation, and highlight some of the main ways in which our study differs from previous work. The reviewer states that our results do not explain why shiners evolved to school. We agree and note that we also don’t claim this anywhere in the manuscript. Rather, we state our study provides important new insights about differential predation risk in groups of prey and highlight the important role of predator attack strategy and decision-making and prey response, with potential repercussions for the costs and benefits of grouping.

      We have considerably revised our introduction to better explain the importance of understanding differential predation risk in animal groups (lines 36-50): A key challenge in the life of most animals is to avoid being eaten. Via effects such as enhanced predator detection (Lima, 1995; Magurran et al., 1985), predator confusion (Landeau and Terborgh, 1986), and risk dilution effects (Foster and Treherne, 1981; Turner and Pitcher, 1986), individuals living and moving in groups can reduce their risk of predation (Ioannou et al., 2012; Krause and Ruxton, 2002; Pitcher and Parrish, 1993; Ward and Webster, 2016). This helps explain why strong predation pressure is known to drive the formation of larger and more cohesive groups (Beauchamp, 2004; Krause and Ruxton, 2002; B. Seghers, 1974). However, the costs and benefits of grouping are not shared equally among individuals within groups, and besides differential food intake and costs of locomotion, group members themselves may experience widely varying risks of predation (Handegard et al., 2012; Krause, 1994; Krause and Ruxton, 2002). Where and who predators attack within groups not only has major implications for the selection of individual phenotypes, and thereby the emergence of collective behaviour and the functioning of animal groups (Farine et al., 2015; Jolles et al., 2020; Ward and Webster, 2016), but also shapes the social behaviour of prey and the properties and structure of prey groups. Hence, a better understanding of the factors that influence predation risk within animal groups is of fundamental importance.

      And in the discussion now better explain the potential evolutionary consequences of the findings of our work (lines 456-466): Predation is seen as one of the main factors to shape the collective properties of animal groups (Herbert-Read et al., 2017) and has so far generally been seen as to drive the formation of larger, more cohesive groups that exhibit collective, coordinated motion (see e.g. Beauchamp, 2004; Ioannou et al., 2012; B. H. Seghers, 1974). Our finding that central individuals are more at risk of being predated could actually have the opposite effect, with schooling having a selective disadvantage and over time result in weaker collective behaviour and less cohesive schools. However, we do not deem this likely as selection is likely to be group-size dependent, as discussed above. Furthermore, our multi-model inference approach revealed that, despite more central individuals experiencing higher predation risk, being close to others inside the school was still associated with a lower risk of being targeted. As most prey experience many types of predators, including sit-and-wait predators and active predators that hunt for prey, the extent and direction of such selection effects will depend on the broader predation landscape in which prey find themselves.

      Major strengths of the paper include the precise recording of the location and orientation of all fish at all times during the experiments. This indeed provides a rich dataset that can be used to search for the factors that predict the likelihood of attack and escape with higher statistical power.

      The major concern I have about the manuscript is that the results somewhat contradict the aim of the paper as expressed in the introduction and discussion: that predator-prey interactions explain the emergent evolution of collective behavior. Figure 2C shows that fish in smaller clusters or those that were totally isolated experienced lower rates of predation and were not included in any subsequent analyses. This would suggest that shiners experiencing predation from pike would be under strong selection to avoid schooling behavior altogether. Can you compare the likelihood of predation for individuals in non-central school locations compared to individuals outside of schools altogether? It might be helpful to investigate whether other predators of shiners use predation strategies that target prey on the edge of the school to help explain why schooling could be useful. Did the likelihood of schooling decrease throughout the trials?

      The reviewer makes a good point regarding the observation that pike tended to mainly attack individuals in the main school, questioning if this would result in a selective disadvantage for schooling. We would like to point out that this result is regarding the likelihood to attack an individual, not the likelihood for a successful attack. If we look at the later we find 5 out of 8 attacks away from the main school were successful, a ratio that is actually similar to that of the main school. More importantly, when wanting to understand how predation risk is linked to group size one needs to look at the per capita risk. If we do that for the group size we used in our study, despite a moderately elevated risk of being predated in a large group, the shiners in the main school still had considerably lower individual risk to be killed than those that occurred in small sub-groups or were alone. We would like to note that in our study the shiners did not really show proper fission-fusion behaviour and by far the majority of the time the shiners were in one large cohesive school. Therefore, we feel our dataset is not suitable for a proper investigation about the role of group size in predation risk.

      We now clarify these points in the discussion (lines 467-471): While the finding that pike were more likely to attack the main school may also appear to indicate a selective disadvantage to school, calculating the per-capita-risk for each individual would actually reveal it is still safest to be part of the main school. Nevertheless, as the shiners in our study rarely exhibited fission-fusion dynamics we feel our dataset is not appropriate to make proper inferences about how predation risk is linked to group size.

      We have also slightly extended the relevant sentences in the results to further clarify the clustering results (lines 144-150): We found that, by and large, the shiners were organised in one large, cohesive school at the time of attack and rarely showed fission-fusion behaviour (merging and splitting of schools) during the trials. Only occasionally there were one or two singletons besides the main school (25 attacks) or multiple clusters of more than two fish (12 attacks Figure 2C), which tended to exist relatively briefly (mean school size: 36.5 ± 0.8). In more than 80% of these cases, pike still targeted an individual in the main cluster (Figure 2C).

      We now also provide more discussion about other predator types being likely to attack central prey (lines 343-354): That predators may actually enter groups and strike at central individuals is not often considered (Hirsch and Morrell, 2011), possibly because it contrasts with the long-standing idea that predation risk is higher on the edge of animal groups (Duffield and Ioannou, 2017; Krause, 1994; Krause and Ruxton, 2002; Stankowich, 2003). However, our finding is in line with the predictions of theoretical work that suggest that the extent of marginal predation may depend on attack strategy and declines with the distance from which the predator attacks (Hirsch and Morrell, 2011). Furthermore, increased risk of individuals near the centre of groups may be more widespread than currently thought. Predators not only exhibit stealthy behavioural tactics that enable them to approach and attack central individuals, as we show here, but may also do so by attacking groups from above (Brunton, 1997) or below (Clua and Grosvalet, 2001; Hobson, 1963; but see Romey et al., 2008), and by rushing into the main body of the group (Handegard et al., 2012; Hobson, 1963; Parrish et al., 1989).

      We furthermore discuss the potential role of group size on the observed effects (lines 441-455): In particular, while group size is not expected to effect much whether ambush predators are likely to attack internal individuals, the specific risk of central individuals could both be hypothesized to decrease with group size, such as if the predator is more likely to attack when surrounded by prey, or to not be affected by it, such as if the predator actively targets central individuals. Whatever the process, the observed findings are likely for prey that move in groups of somewhat intermediate size; for very large groups, such as the huge schools encountered in the pelagic, ambush predators may simply not be able to attack the group centre due to spatial constraints. More generally, the tendency for predators to attack the centre of moving groups may depend on the medium in which the predator-prey interactions occur. As in the air there is potential for (fatal) collisions, and on land it is physically difficult for predators to enter groups and predators’ size advantage tends to be more limited, predators may be less likely to go for the group centre as compared to in aquatic or mixed (e.g. aerial predator hunting aquatic prey) systems. Hence, the important interplay we highlight between predator attack strategy and prey response may have different implications across different predator prey systems and warrants concerted further research effort.

      Finally, in response to the reviewer’s question if the likelihood to school decreased through the trials, we did not see a change in packing faction (median nearest-neighbour distance) with repeated exposure to the pike, but shiners increasingly avoided the area directly in front of the pike’s head (lines 182-186): While the shiners did not show a change in their packing fraction (median nearest-neighbour distance) with repeated exposure to the pike (F1,52 = 1.81, p = 0.185), they increasingly avoided the area directly in front of the pike’s head (Appendix 2 – Figure 1A) resulting in the pike attacking from increasingly further away (target distance: F1,52 = 45.52, p < 0.001, see Appendix 2 – Figure 1B,C). See also further Appendix 2.

      I am also curious whether tank size affects the behavior of the fish, both of the shiners and the pike. The pike seem to be approximately 1/3 the shortest length of the tank, and 6 inches of depth have constrained the movement to be mostly in the 2D plane. A lack of open space might limit the pike's ability to hunt in any way other than this stealthy strategy. Has this stealthy hunting strategy been described in other experiments in larger or more naturalistic conditions? Does open space affect the shiners' propensity to school? Although the manuscript describes that shiners tend to school near the surface of water, does the shallow depth affect the pike's behavior? The manuscript states that some pike never attacked -- were these the largest in the study?

      While the tank is small relative to the real world, we actually decided on this size of ~2m2 based on previous experimental work on predator-prey dynamics. As we stated in the methods of the original manuscript (lines 543-545) we expect that if a much larger space would have been used, pike would actually still show the same approach and attack behaviour linked to their stealthy attack strategy. The stealthy hunting behaviour of pike and similar predators and their ability to thereby get very close to their prey has been described elsewhere (see e.g. references on lines 332-344 of the original manuscript).

      We now better explain the potential limitation of the arena size in the discussion (lines 472-480): Laboratory studies on predator-prey dynamics like ours do, of course, have their limitations. Although the size of the arena we used (~2m2) is in line with behavioural studies with large schools of fish (e.g. Sosna et al., 2019; Strandburg-Peshkin et al., 2013) and experiments with live predators attacking schooling prey (Bumann et al., 1997; Magurran and Pitcher, 1987; Neill and Cullen, 1974; Romenskyy et al., 2020; Theodorakis, 1989), compared to conditions in the wild the prey and predator had limited space to move. However, as pike are ambush predators they tend to move relatively little to search for prey and rather rely on prey movement for encounters (Nilsson and Eklöv, 2008). Increasing tank size would have made effective tracking extremely difficult, or impossible, and while a much larger tank is expected to considerably increase latency to attack, we expect it to have relatively little effect on the observed findings.

      We agree that the shallow depth of the tank is a limitation of our study and may have somewhat restricted the pikes’ natural behaviour, although pilot experiments showed that the pike exhibited normal movements and attack behaviours. Fish were tested in very shallow water to be able to acquire detailed individual-based tracking of the schools as well as compute features related to the visual field of the fish. We would also like to note that both shiners and pike can often be found in the littoral zone and come in very shallow water of only a few 10s of cm (see e.g. Krause et al., 2000b; Pierce et al., 2013; Skov et al., 2018), with some experimental work furthermore showing that pike may actually prefer shallow water (Hawkins et al., 2005). We don’t think that increasing the depth of the tank would have considerably changed the predatory behaviour of the pike, as the pike would be expected to still use their stealthy approach to get close to their prey even if the prey school would be more three-dimensional.

      We now provide a much more extensive discussion of the limited depth used in the discussion (lines 480-494): In terms of water depth, fish were tested in relatively very shallow water. This was primarily done to be able to keep track of individual identities and compute features related to the visual field of the fish. Shiners naturally school in very shallow water conditions as well as near the surface in deeper water in the wild (Hall et al., 1979; Krause et al., 2000b; Stone et al., 2016) and also pike primarily occur in the shallow littoral zone, sometimes only a few of tens of cm deep (Pierce et al., 2013; Skov et al., 2018). Furthermore, pilot experiment showed the pike did exhibit normal swimming and attack behaviour with attack speeds and acceleration comparable to previous work (Domenici and Blake, 1997; Walker et al., 2005). Recent other work on predator-prey dynamics did not find a considerable impact of adding the third dimension to their analyses (Romenskyy et al., 2020). Still, the water depth used is a limiting factor of our study and in the future this type of work should be extended to deeper water while still keeping track of individual identities over time. We expect that adding the third dimension would not change the stealthy attack behaviour of the pike and therefore still put more central individuals most at risk, but possibly attack success would be reduced because of increased predator visibility and prey escape potential in the vertical plane, which remains to be tested.

      We did not observe a relationship between pike size and tendency to attack.

      Reviewer #3 (Public Review):

      While it has long been clear that animals in groups (e.g., fish schools) benefit in terms of safety in numbers, there has also been a keen interest in which animals in the group are at higher versus lower risk (e.g., those in front, or along the edges) and how that might depend on the predator's attack strategy. This study addresses these important predator-prey details using a common predatory fish (northern Pike) attacking schools of prey fish (golden shiners). A strength of the study is that it uses cutting-edge video tracking and computational/statistical methods that allow it to quantify and follow each fish's (1 predator and 40 prey in a group) spatial position, relative spacing, orientation and even each individual's visual field and movement throughout each of 125 attacks. Most (70%) of these attacks were successful, but many were not. The variation in attack success allowed the investigators to do statistical analyses to identify key predator and prey behaviors that are associated with successful vs. unsuccessful attacks.

      The study yielded numerous interesting insights. While conventional wisdom pictures predators initiating an attack from outside of the group thus putting individuals at the group's edge at greatest risk, this study found that pike typically approached the school of prey headon both in terms of the group's orientation and direction of movement, and often stealthily moved within the group before initiating an attack. To understand which prey individual was targeted by the predator, the highly quantitative video analyses examined 11 measures of each individual prey's position and orientation at the time that the pike initiated its attack. Of course, pike showed a strong tendency to target one of the 3 closest prey, particularly prey that were more or less directly in front of the pike. However, contrary to conventional wisdom, the analysis showed that targeted prey were closer to the center than the edge, and that an individual's position and orientation relative to other nearby prey also played an important role in whether it might be targeted by the predator. Not surprisingly, analyses showed that targeted prey were more likely to escape if they were further from the predator's head and if they exhibited higher maximum acceleration. Interestingly, during the actual strike, on average, the predator accelerated to a speed about 50% faster than the velocity of the targeted prey.

      A limitation of the study (that the authors describe and discuss) is that it was conducted in a tank with no spatial refuges whereas in nature, pike are often found in areas with vegetation, and schools of prey can often potentially respond to the presence of a predator by moving towards refuge (e.g., vegetation). Also, the study was done in very shallow water (6 cm) -- likely shallower than many, if not most, natural predator-prey interactions for these species. In deeper water, the predator-prey interaction might be better analyzed in three dimensions (i.e., also accounting for variation in vertical height in the water), though the authors argue that this conventional idea is not necessarily true.

      Overall, this study provides an impressive example of the use of modern technology and statistical analyses allows us to better describe and understand the fine-scale behaviors that affect an interaction of high importance for ecology and evolution.

      We thank the reviewer for the care and attention put in their review and their detailed objective assessment of our study.

      Regarding refuge use, it is true that in the wild pike are often found in areas with vegetation, but it is actually predominantly younger pike seeking refuge among vegetation from predators themselves, including from cannibalism by larger pike (see Skov & Lucas, 2018 Chapter 5). Vegetation is also used by pike as background camouflage rather than a refuge per se, but due to their elongated body and narrow frontal body pike are able to approach and ambush prey when no vegetation is available, as we show in our study. During pilot experiments we did provide pike with refuges, but as they never used them, and it would provide a hiding place for hiding, which would have considerably impacted our ability to investigate predation risk within the schools, no refuges were provided during the experiment.

      We now added an explanation about not using refuges in the discussion (lines 495502): For our experiments we used a testing arena without any internal structures such as refuges. This was a strategic decision as providing a more complex environment would have impacted the ability of the shiners to school in large groups and would have led fish to hide under cover. Although studying predator-prey dynamics in more complex environments would be interesting in its own regard, it would not have allowed us to study the questions we are interested in about the predation risk of free-schooling prey. Furthermore, pilot experiments indicated that the pike never used refuges (consistent with previous work, see Turesson and Brönmark, 2004), so they were not further provided during the actual experiment.

      Regarding the shallow depth of the tank, we now better acknowledge this limitation and explain our reasoning (lines 480-482): In terms of water depth, fish were tested in relatively very shallow water. This was primarily done to be able to keep track of individual identities and compute features related to the visual field of the fish. We would also like to note that both shiners and pike spent a lot of their life in the littoral zone and occur in very shallow water of only a few 10s of cm (see e.g. Krause et al., 2000b; Pierce et al., 2013; Skov et al., 2018). Although the limited vertical space may have restricted the pikes’ natural behaviour to some extent, they did exhibit normal swimming and attack behaviour with attack speeds and acceleration comparable to previous work (Domenici and Blake, 1997; Walker et al., 2005). We now better discuss the limitation of the shallow depth used in the discussion on lines 477-494 (see also our responses above).

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, He and collaborators analyse eight samples from six patients with acral melanoma through single-cell RNA sequencing. They describe the tumour microenvironment in these tumours, including descriptions of interactions among distinct cell types and potential biomarkers. I believe the work is thoroughly done, but I have identified a few concerns in their depiction and interpretation of their results.

      Strengths:

      1) One of the few available single-cell studies of acral melanoma, including a non-European cohort of patients.

      2) Data will be very useful to study the immune landscape of these rare tumours.

      3) Data include adjacent tissue, primary tumours and a metastatic sample, covering all disease stages.

      4) Analyses seem to be carefully done.

      Things to improve:

      1) Figures need much more description to be understandable, in particular, axes should be clearly labeled and the colour code should be specified

      Thank you for your generous comments and suggestions. We have improved the integrity of some figures and added some figure legends. I believe this will further improve the quality of our manuscript.

      2) In some places, I would recommend the authors soften their interpretation of their analyses (for example, when they suggest targeting TNFRSF9+ T cells as a novel therapy), as these are nearly all bioinformatic in a small number of samples

      As for the conclusions of TNFRSF9, we indeed provided a possibility that TNFRSF9 may serve as a novel therapy. We made some changes to soften the statement. In addition, we have added instructions and explanations in the Discussion section.

      3) I don't think the experiments add much to the literature, as these test already known oncogenes on a common, non-acral melanoma cell line. Thanks for your comments regarding the experiments included in our study. We have pointed out this deficiency in the Discussion section, and made some experimental changes. For example, we have removed the TWIST1-related experiments from the main Results section and shown them only as non-focus work in the Supplementary Figure.

      It is difficult for us to obtain AM cell lines. No commercial AM cell lines can be purchased in ATCC or ECACC. AM cell lines are more difficult to establish and there are few reports on methods for establishing primary acral melanoma cell cultures (PMID: 22578220, PMID: 17488338). Some Japanese and Chinese researchers have isolated the primary generation of AM cells (e.g., PMID: 17488338, PMID: 22578220, PMID: 34097822), but due to the customs policy and the COVID-19 epidemic, we could not receive them within a short period. Moreover, these studies also stated their limitations; namely, that the stability during serial passaging had not been evaluated. Therefore, it may be very time-consuming to obtain operable AM cell lines for functional assays. However, our research group would like to have the opportunity to separate and culture primary cells in subsequent studies, and improve relevant experiments according to your valuable suggestions. Man thanks again for your comments.

      Reviewer #2 (Public Review):

      The study presented by Zan He et al dissects the main interactions between malignant and stromal cells present in acral melanoma samples and in adjacent tissues using single cell RNA sequencing. The study describes factors that allow communication between the different cell types, with a special focus on macrophages, lymphocytes and fibroblasts, along with malignant cells. Factors playing a role in cell-cell communication are identified and suggested to be relevant prognostic makers and/or attractive therapeutic targets.

      Historically, the study of acral melanomas has been neglected due to the low incidence among Europeandescents and this formed an important gap of knowledge in the field and hindered the development of effective therapies to control the disease. Therefore, studies that address this unmet need in melanoma research are very important and should be motivated. This includes singlecell sequencing studies that allow one to study the complexity of tumours, including microenvironment features that influence the development and effectiveness of certain types of treatment. The present study contributes information on how cells interact in the acral melanoma microenvironment and this could be a first step toward better understanding how these interactions influence acral melanoma development, progression, and therapy response.

      However, there are a few points that should be carefully considered. The authors use 3 adjacent tissues (which in theory is composed of normal skin next to a cancer lesion), 4 primary tumor samples, and one lymph node metastasis as a model to study tumor progression. Adjacent tissue is not considered a stage of tumour progression and the sample size is too small to rule out sample-dependent effects. The study is descriptive in nature and could better contextualize the findings regarding what is known for other subtypes of melanomas or other tumours. This is especially important to help readers understand why it would be relevant to study cutaneous melanomas located in acral skin. It would be helpful to explain how different it is from nonacral cutaneous melanoma, and what this study adds compared to other single-cell studies from cutaneous acral and non-acral melanomas.

      Thank you for your generous comments. It is not accurate to represent the adjacent tissue samples as ‘tumour progression’, and our study did not want to focus on the tumour developmental process. We have revised related description in the text. Tumour adjacent tissues (ATs) have always been the focus of research on TMEs. Some studies believe that there are a lot of mutations and clone amplification in normal tissues adjacent to cancer, which may be in a pre-cancerous state (PMID: 33004515), and many single-cell studies of tumours have also sampled and paired para-cancer tissues (e.g., PMID: 29988129; PMID: 35303421).

      The problem of sample size limits the generality of the results, as we pointed out in the Discussion section. Most acral melanoma (AM) patients opt for surgical resection at an early stage to avoid the possibility of metastasis. Hence, we rarely encounter patients with lymph gland (LG) metastases. We only collected one metastatic sample, because it is very rare in clinic. However, the sample has a high quality, such as a high cell activity of single cell suspension after dissociation (95.30%), and a rich amount of tumour cells and other stroma cells. Therefore, we added its sequencing data into the overall analyses, hoping to contribute to the comprehensiveness of resources and research.

      It is important to link this study with the findings regarding what is known for other subtypes of melanomas. We have already supplied the comparison of AMs with non-acral skin cutaneous melanomas (CMs), using the published data. Your comments and advices are entirely helpful to us, and we believe that the current manuscript is more comprehensive and complete.

    1. Author Response

      Reviewer #1 (Public Review):

      COVID-19 epidemic conditions are rapidly changing due to behavioral changes, accumulating immunity from prior infections, vaccination roll-outs, and the emergence of new variants. In this analysis, the authors are using a simple mathematical model to reconstruct SARS-CoV-2 transmission dynamics in South Africa through different outbreaks with different prevalent variants. They estimate key characteristics of the epidemic in each of the nine South African provinces while accounting for multiple factors including changing detection rates, seasonality, nonpharmaceutical interventions, and vaccination. The paper is well written and addresses important questions in the field.

      The authors apply a model-inference system to estimate the background population characteristics (e.g., population susceptibility) before the emergence of the new variant, as well as changes in population susceptibility and transmissibility due to the new variant. They come up with projections of cumulative incidence, accumulation, and loss of population immunity over time for different provinces. Inference on the characteristics of different variants is also presented.

      The paper has a couple of key limitations.

      First, simple models come with strong assumptions. The simplicity of the model does not allow to account for several important epidemic drivers including i) heterogeneity in contactness, acquisition risk, and severity (especially with respect to age) which may have a strong impact on the epidemic dynamics; ii) all-or-nothing vaccine which restricts the possible mechanisms of protection to be explored and iii) using the same compartment for vaccinated and recovered from infection which leads to the same duration of immunity and efficacy for these 2 groups. Second, I suspect that the model-inference system has some identifiability issues. It is unclear how it selects between scenarios with low transmissibility but high IFR and scenarios with high transmissibility but low IFR. Some characteristics (including IFR) were estimated independently for each wave and each province. However, correlations across provinces should be expected. The paper will benefit from a more detailed explanation and sensitivity analyses that show how model assumptions influence presented results.

      We thank the reviewer for the comments and suggestions. We agree that model assumptions can affect model estimations. The model used here simulates a single age group, which, when used alone, likely would not be able to capture the heterogeneity in contact rates, acquisition risk, and severity (especially with respect to age). However, a key difference in this study is that the model is used in conjunction with a statistical inference method, i.e. the Ensemble Adjustment Kalman Filter (EAKF), and multiple data streams (i.e., cases, deaths, mobility, vaccination, and weather data). The combined model-inference system (i.e., the model, data, and the filter) enables estimation of short-term dynamics (e.g., changes in IFR due to more infection in older age groups) during each given time step (here, each week).

      Indeed, we have used model-generated synthetic data, for which the true parameters are known, to test a similar model-inference system and shown that it is able accurately estimate the underlying parameters as well as overall variant epidemiological quantities (i.e. immune erosion and change in transmissibility; see details in Yang & Shaman 2021 Nature Communications 12:5573). For this study, we additionally validate the model-inference estimates using three independent data streams (i.e., serology, hospitalization, and excess mortality data) and retrospective predictions (see “Model fit and validation” in Results of the main text). Further, when presenting model results for Gauteng and overall estimates for all nine South African provinces, we compare our model-inference estimates with available estimates in the literature; the consistency provides further support of the study findings (see the remaining Results sections).

      Per the reviewer suggestion, in this revision, we have added more detailed explanation when presenting the estimates (e.g. population susceptibility and variant transmissibility). See e.g., Lines 129 – 138 and 186 – 200 in the main text. We have also added further discussion of the model-inference method (e.g. choice of prior range and diagnosis) in the Appendix 1. In addition, we have added sensitivity analyses, in particular for the infection-detection rate in Gauteng during the Omicron wave, to show how model assumptions influence presented results. We have also plotted and shown the weekly estimates for all parameters included in our model-inference system (Appendix 1-figures 15 -23). Visual inspection of these estimates indicates that posterior estimates for the model parameters are consistent with those reported in the literature, or changed over time and/or across provinces in directions as would be expected. Please see these supporting results in the new Appendix 1.

      Reviewer #2 (Public Review):

      CoVID models have, by necessity, exploded in complexity over the last year. The emergence of new variants with differential spread, the waxing and waning of population immunity, and the constant changes in reporting rates all seem to necessitate the addition of new internal model states and parameters. In the present study, Yang and Shaman have developed a robust methodology that can account for each of these complexities and applied it to reconstruct the first four waves of infections in each province of South Africa. Specifically, they employ an SEIR model with time-varying parameters estimated using a Kalman Filter. Although the model does not explicitly incorporate details such as the waning of immunity, it is present implicitly in the time-varying "population susceptibility" parameter. The authors validate their estimates of infection and CoVID-related death rates over time using seroprevalence, hospitalizations, and excess deaths, which were not used to calibrate the model. Furthermore, they have shown their model's ability to predict the course of waves that have already begun using retrospective predictions of past waves.

      Despite the validity of these methods, it is not clear what conclusions can be drawn. The authors claim that their analysis shows that 1) new waves of infection are still possible, 2) large new waves of deaths can still occur, and 3) any new variant likely requires a loss of pre-existing immunity. Unfortunately, it is not clear how the modeling analysis presented supports these ideas. All three of these conclusions involve the emergence of new variants, something which the model may not be suited for. The transmissibility of new variants has been trending upwards, according to their analysis, suggesting that invasion is a combination of increased transmission and increased loss of immunity. Finally, the Delta wave was not accompanied by a large increase in susceptibility and instead appears to largely have been driven by seasonal fluctuation and increased transmissibility.

      Overall, this work should be of great interest to those modeling CoVID or seeking to understand the history of the epidemic in South Africa.

      We thank the reviewer for the comments. First, regarding waning immunity, the SEIRSV model used here did account for waning immunity, via the term R/Lt in Eqn 1, where R is number the recovered/immune individuals and Lt is the immunity period. This is briefly described in Lines 333 - 337 (grouped under “Virus-specific properties”). To clarify further, we have added a brief note: “Of note, the immunity period Lt and the term R/Lt in Eqn 1 are used to model the waning of immune protection against infection.”

      Second, the main conclusions and findings of this study are the model-inference estimates for the three SARS-CoV-2 variants of concern (i.e. Beta, Delta, and Omicron), as well as the inferenced underlying dynamics. The three general observations we made in the initial submission are related to the SARS-CoV-2 dynamics observed in South Africa, as well as in other places. We have now revised the text to clarify this and provide more direct evidence drawn from specific findings here to support the discussed observations (see Lines 252 - 271).

      Reviewer #3 (Public Review):

      Overall, the authors sought to explain the epidemiological, behavioral, and immunological underpinnings across multiple COVID-19 waves in South Africa using an infectious disease model and statistical framework. In doing so, they hoped to learn about the different emerging variant properties and provide a modeling framework for understanding risk upon future variant emergence.

      Strengths:

      The manuscript uses an epidemiological and statistical modeling framework that has been validated across a number of different diseases, time periods, and regions.

      The researchers have validated their modeling results using multiple separate lines of evidence and data including laboratory results, seroprevalence, forecasting, and other epidemiological studies.

      While not independent from one another, agreement across multiple regions within South Africa enhances the confidence in modeling results

      Weaknesses:

      The model complexity adds some opaqueness to the results due to the presence of many hidden parameters and potential correlations and interactions between them, so I suggest that the authors further validate the convergence of their model fitting and visualize the results of their hidden parameters.

      Conclusions justified:

      Overall I believe the conclusions the authors have provided are justified by their analysis. It appears their analysis is statistically rigorous, and there are multiple independent lines of evidence that agree with and validate their conclusions.

      We thank the reviewer for the comments and suggestions. In response, we have added plots to show estimates for all parameters (see Appendix 1-figures 15 – 23), in addition to those shown in the initial submission. We have also added a brief note in the main text on these results (see Lines 62-65). As the focus of this study is general COVID-19 dynamics and the epidemiological properties of SARS-CoV-2 variants of concern, we present the main estimates (e.g. population susceptibility, transmissibility, infection-detection rate, infection-fatality risk) in the main text, and provide the additional results for the supporting parameters (e.g. latent period) in the Appendix 1.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors estimate growth curves ('nomograms') for hippocampal volume (HV) using Gaussian process regression applied to UK Biobank data and evaluate the influence of polygenic scores for HV on the estimated centile curves. By taking this into account, the centile scores are shifted up or down accordingly. The authors then apply this to the ADNI cohort and show that subjects with dementia mostly lie in the lower centiles, but this does not improve the prediction of transition from mild cognitive impairment to dementia.

      This paper is reasonably well written and the finding that centile curves for different phenotypes are sensitive to genetic features will be of interest to many in the field, albeit perhaps somewhat unsurprising given the polygenic score evaluated here is for the same phenotype under investigation (i.e. HV). I think using centiles derived from nomograms/normative models for precisely assessing both current staging and progression of neurological disorders is a highly promising direction. Regarding this manuscript, I have a few comments about the methodology and interpretation of results, which I will outline below.

      • My most significant concern is that It appears that the assumption of Gaussian residuals is violated by the HV phenotypes that the authors fit their GP to. For example, in figure 2, the distribution is clearly skewed, and the lower centiles -in particular- are poorly fit to the data. First, please provide additional metrics to assess the fit and calibration of these models quantitatively (the latter can be done e.g. via Q-Q plots).

      Thanks for pointing this out. We are sorry for causing this confusion. The skew in the figure appears because the scatter plot overlayed with the GP-generated nomogram is showing ADNI samples of all diagnoses – not the UKB training data used for the GP. The lower centiles are mainly occupied by the participants with AD or MCI (see the new plots in Figure 5). In addition, the healthy subjects from ADNI do indeed fit the model reasonably well. We have added a supplementary figure to show just the healthy subject and have made the following edits in the text to address the confusion:

      Lines 143-149: “Nomograms of healthy subjects generated using the SWA and GPR method displayed similar trends (Figure 2; Supplementary Figure S8). … This extension allowed 86% of all diagnostic groups from the ADNI to be evaluated versus 56% in the SWA Nomograms (Figure 2; Figure 2 – Figure Supplement 2).”

      Lines 159-170 (description of figure 2): “Figure 2: Comparing Nomogram Generation Methods. Nomograms produced from healthy UKB subjects using the sliding window approach (SWA) (red lines) and gaussian process regression (GPR) method (grey lines) … The benefits of this extension can be seen with scatter plots of ADNI subjects of all diagnoses overlayed (E, F… A similar figure with only the Cognitively Normal ADNI subjects can be found in Figure 2 – Figure Supplement 2

      Second, I think if the authors wish to make precise inferences about the centile distribution for the reference model, then the deviation from Gaussianity ought to be accommodated in some manner. There are several options for this, including different noise models (e.g. Gamma, inverse Gamma, SHASH, etc), variable transformation, or quantile regression. One option that could be useful in the context of Gaussian process regression is the use of likelihood warping (see e.g. Fraza et al 2021 Neuroimage and references therein) which was originally developed for GP models. I would recommend the authors pursue one of these routes and provide metrics to properly gauge the fit.

      This is an excellent point. However, we believe that given that the training data indeed follows a Gaussian distribution (see new Figure 4 – Figure Supplement 3; reproduced below) across the relevant strata (sex, PGS) and across age groups, such modifications are not required.

      • Related to the above, it is likely that the selection of subjects with high/low polygenic scores for HV changes the shape of the distribution. It is currently impossible to assess this because no data points are shown in these cases. Please also add this information, along with comparable quantitative metrics to those for the models above.

      Thank you for bringing this up. We have now added a new supplementary figure with the shape of these distributions along with the Shapiro-Wilkens test results for each of them. As can be seen, the Shapiro-Wilkens tests detects mild deviation from Normality in some cases. However, given the size of the strata N>2000 this is not surprising. Moreover, would multiple testing be applied here across the 48 comparisons, then none of the tests would be significant at the corrected threshold (P<0.001).

      • How did the authors handle site effects? There appears to be no adjustment for the fact that the ADNI data are acquired from different sites that were not used during the estimation of the normative models. I would expect to see this dealt with properly (e.g. via fixed or random effects included in the modelling) or at the very least a convincing demonstration that site effects are not clearly biasing the results.

      We agree that site effects are a major issue; we have rerun the application experiments after adjusting the ADNI volumes with NeuroCombat. The results did not change significantly, but we have changed all the reported results with the updated results. In addition, we noted this in the methods section:

      Lines 442-445: Finally, we used NeuroCombat 1 to adjust across ADNI sites and harmonize the volumes with the UKB Dataset. To do this we modelled 58 batches (UKB data as one batch and 57 ADNI sites as separate batches) and added ICV, sex, and diagnosis (assigning all UKB as Healthy and using the diagnosis columns in ADNI) to retain biological variation.

      • How do the authors interpret the finding that the relationship between the polygenic scores and HV is different in the cohorts they consider (i.e. bimodal in UKB and unimodal in ADNI)? Does this call into question the appropriateness of the subsampled model for the clinical cohort?

      While we do see a bimodal distribution in UKB the effect is not very strong as the other reviewers commented. Therefore, we have de-emphasized this aspect. One reason may be that we detect the slightly bimodal aspect in UKB because of greater statistical power due to the large sample size (one order of magnitude). One further aspect is the used SNP data, i.e., differences in genotyping platform and imputation. This is also the reason why integrating PGS directly into the predictive model comes with additional challenges. We have addressed this topic briefly in our discussion: Lines 390-392: “Lastly, a recent study of PGS uncertainty revealed large variance in PGS estimates63, which may undermine PGS based stratification; hence a more sophisticated method of building PGS or stratification may improve results further.”

      • Perhaps the authors can comment on (or better, evaluate) how this genetic shift could be accommodated in normative models (e.g. the possibility of including polygenic risk scores as predictor variables in the normative model). This would remove the need for post hoc adjustment and would allow more precise control over the adjustment than just taking the upper/lower xxx % of the PGS distribution as is done in the current manuscript.

      We agree that integration of the genetics directly into the normative models is a great idea. And this will be the direction we will be exploring in future work. However, PGS themselves are prone to show ‘site’ effects that depend on the genotyping method that was used as well as of the quality of genotyping and imputation. As a consequence, using the ‘raw’ PGS scores in predictive models brings its own challenges. Therefore, we feel that the current framework is simpler at this point and illustrates the potential of PGS when combined with normative models.

      • Related to my point above, it is perhaps unsurprising that the polygenic score for the HV phenotype influences the centile distribution. I think the paper would benefit considerably by also evaluating other polygenic scores (e.g., APOE4 as in some of the prior cited references). it would be interesting to compare the magnitude and shape differences for these adjustments. The authors can consider this an optional suggestion.

      Our rationale for focusing on HV PGS was that we sought to improve the accuracy of the normative model. The genetics influences HV and this is a first attempt to adjust for this in the normative modeling framework. Indeed, APOE-e4 has a sizable effect on HV. However, this is most likely mediated by nascent accelerated neurodegeneration, i.e., Alzheimer’s disease. Thus, in our view focusing on APOE-e4 would mean to focus on a disease effect. We address this issue briefly in the discussion (Lines 326-334). For sensitivity analysis, we did indeed test other PGS, such as AD and Whole-Brain-Volume, and found that these do not affect the normative models for HV.

      Reviewer #3 (Public Review):

      Given the large variation in and high heritability of hippocampus volume in the population, taking out known variation in the healthy population is a nice way of reducing heterogeneity, and a step forward towards using normative models in clinical practice. The dataset the nomograms are based on is large enough to do so even when stratified by polygenic scores for hippocampal volume, and these provide interesting information on the role of genetics in hippocampus volume.

      There are however several concerns regarding the applicability of the models to the ADNI dataset. First, the lack of overlap in the age range between the dataset the model is trained on and the application to subjects that are outside that age range is questionable. The authors prefer Gaussian process regression (GPR) over a sliding window-based approach using the argument that the former allows for predictions in a larger age range but extrapolation beyond the reach of the data is usually not valid. The claim that Supplementary Figure 6 shows accurate extension beyond these limits is in my opinion not justified. If anything, we can be rather certain that the extensive growth of the hippocampus up to age 48 is not realistic (see e.g. Dima et al., 2022).

      As mentioned already in response to reviewer #1, this was a miscommunication on our side. We only used the ADNI samples that were within the age range of the models they were being plotted against. The GPR model did not require smoothing at the edges of the age-range and thus can support a wider age range than the SWA. This is why we stated that the extension of the nomograms enabled more of the ADNI dataset to be used, i.e., because otherwise these samples were outside the range of the model and could not be used.

      We have changed the following lines in the manuscript to make this idea explicit:

      Lines 477-478 (end of GPR methods section): “For both SWM and GPR models, we only tested the ADNI samples that lay within the age range of each model respectively.”

      Regarding the accurate extension claim, we have edited the line (411-412) in the discussion so that it now reads:

      Lines 347-348 “In fact, our GPR model can potentially be extended a few years beyond those limits”

      Thank you for pointing out the discrepancy in the hippocampal growth around 48 with the results by Dima et al. 2022. Although sample sizes between the two studies are similar. The data availability in UKB for ages 45-50 is rather sparse (N<100; see new Figure 4 – Figure Supplement 3). Thus, the observed growth is likely due to under sampling. The growth effect has been observed in other studies using UKB data7,8. We have noted this in the discussion:

      Lines 354-356:” However, there is a possibility that our results suffer from edge effects. For example, we suspect that the peak noted in the male nomogram is likely due to under-sampling in the younger participants.”

      Second, the drop in mean 'percentile' difference between high and low polygenic scoring individuals that if one uses genetically adjusted nomograms seems nice, but this difference is currently just a number and the reader cannot see whether this difference is significant, or clinically relevant.

      We have now provided a new figure (Figure 5) that shows the boxplots behind those numbers. The MCI-to-AD conversion analyses in the ADNI explored the clinical benefit of genetically adjusted nomograms. However, adjusted, and un-adjusted percentiles performed equally well. In the discussion we argue that the MCI stage is already too late and earlier stages may benefit from the increased precision:

      Lines 373-378: “However, despite this sizable effect, genetically adjusted nomograms did not provide additional insight into distinguishing MCI subjects that remained stable or converted to AD. Nonetheless, the added precision may prove more useful in early detection of deviation among CN subjects, for instance in detecting subtle hippocampal volume loss in individuals with presymptomatic neurodegeneration.”

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by O'Herron et al. describes an all-optical method combining optogenetic stimulation and 2-photon microscopy imaging to simultaneously manipulate and monitor brain microvasculature contractility in three dimensions. The method itself, which represents a microvasculature-targeted variation on a theme previously elaborated for simultaneous stimulation and monitoring of ensembles of neurons, employs a spatial light modulator (SLM) to create three-dimensional activation patterns in the brains of cranial window-model transgenic mice expressing the excitatory opsin, ReaChR, in mural cells (smooth muscle cells and pericytes) under control of the PDGFRβ promoter. The authors demonstrated that, by splitting a single 1040-nm stimulating beam into multiple beamlets using an SLM, this system is capable of optogenetically activating ReaChR at discrete depths in the neocortex, depolarizing mural cells and producing highly localized constrictions in targeted, individual microvessels. Using this system to investigate the kinetics of optogenetic-induced contraction and sensory-evoked dilation, the authors found that the onset of optogenetically evoked contraction was much more rapid than that of sensory-evoked dilation, concluding that the observed lag between sensory stimulation and vascular response does not reflect intrinsic limitations of mural cell contractile mechanisms but is instead attributable to the time course of neurovascular coupling mechanisms. They further found that by titrating the stimulation duration they could completely negate the vasodilatory response to a concurrent sensory stimulus.

      1) The red-shifted opsin, ReaChR, represents an improvement over opsins used in previously described 3D neuronal activation/monitoring systems. In particular, brief single-photon stimulation (100 ms) of ReaChR led to rapid, robust arteriole constrictions throughout the activation volume, whereas a previous generation ChR2 opsin required stimulation for seconds to achieve slowly appearing constrictions.

      Thank you for pointing out this key takeaway from our manuscript. In Figure 9 of the revised manuscript, we provide a comparison of ReaChR-induced vasoconstriction, with data previously collected across microvascular zones using line-scanning in ChR2-expressing mice. These data show how ReaChR produces faster and more potent vasoconstriction in alpha-SMA expressing SMCs and ensheathing pericytes, but has similar effects on the slow contraction with capillary pericytes.

      2) Single-photon stimulation was capable of completing stopping blood flow in a "first order pre-capillary branch". (Not clear what is meant by the phrase "pre-capillary branch"; anatomically, penetrating arterioles feed capillary branches.) While this speaks to the effectiveness of the method, it also highlights potential supraphysiological effects of stimulation and the importance of titrating stimulus intensity/duration to achieve physiologically meaningful responses.

      We have removed the term “pre-capillary” to avoid causing confusion, and now use the term arteriole-capillary transition to denote the alpha-SMA positive segment that lies between the penetrating arteriole (0th order) and the alpha-SMA low/negative capillaries (>4th order). The rationale for this terminology is provided in our new review (PMID: 34672718), which explains why the transitional zone should be considered a separate vessel type that is not arteriole and not capillary.

      We agree with the reviewer that titration of stimulation power/duration will be important and will depend on the application. We addressed this point by performing measurements of arteriole diameter with graded laser powers (Figures 5 & 7). There are many parameters to explore, but for the purposes of this manuscript, we clarify that the effect is titratable and that users should define physiological ranges in their specific circumstances, which may differ based on the experimental goals, age of mice, arteriolar size and vascular zone, and other factors.

      We also note that some applications may want to mimic pathophysiological levels of constriction, for example to mimic the effects of arterial vasospasm after subarachnoid hemorrhage, or ensheathing pericyte contraction with MCAo stroke (PMID: 26119027), or to examine the neural consequences of transient small vessel occlusion.

      3) In assessing effects of laser power, the authors assert that "increasing the laser power only slightly expanded the range of constriction". This seems a bit of an overstatement, given that increasing power (30-fold) had a greater effect on the spread (3x) than the magnitude (2x) of the response.

      Thank you for pointing this out. We have re-worded this section to avoid the overstatement and to emphasize the results more clearly on the spatial spread of constriction relative to laser power.

      The difference images in Figures 4B-C, G-H demonstrated that there was very limited spread of the constriction beyond the stimulation spots. We tested the effect of laser power on the spatial spread of constriction by stimulating with a broad range of power levels. We found that increasing the laser power led to a small increase in the spread of constriction. For example, a 30-fold increase in power (from 5 mW to 150 mW total power) led to ~3-fold increase in the spread of constriction (from ~25 µm to ~75 µm) (Figure 5A-H).

      4) The suggestion that penetrating brain arterioles possess a mechanism for upstream conduction of constrictive responses is intriguing (although this intrigue is tempered by the lack of experimental support for the operation of such a mechanism in the brain microvasculature).

      We are also intrigued by this hypothesis, which was supported by some evidence from a recent study of retinal vasculature. Kovacs-Oller et al. showed using neurocytin tracer injections into capillary pericytes, that they are linked through gap junctions and there is upstream directional diffusion of tracer. Further, they showed that electrical stimulation of a pericyte could lead to directional constriction from capillaries back to the arteriole in the retina (PMID: 32566247). The planar orientation of retinal vasculature makes this phenomenon easier to see. However, the 3D architecture of cortical vasculature is more challenging to study, particularly since the propagation along arterioles occurs along the Z axis, where spatiotemporal resolution of imaging is limited.

      Given our new data on the effects of laser power on axial spread (see reply to points 10-13 below) and the difficulty in separating active propagation from out-of-focus activation, we think there is not sufficient evidence to claim that penetrating arterioles are propagating the signal through some active process. Further experiments, including studies of the mechanisms involved, will be needed to address this hypothesis. Therefore, we have removed any discussion of potential propagation of the signal, and instead focus on the relationship between laser power and axial resolution of activation.

      5) The authors' premise for comparing contractile kinetics with sensory-evoked kinetics is flawed. In attempting to use the kinetics of optogenetic-induced constriction to infer something about the kinetics of sensory-evoked dilation, they are implicitly assuming that the kinetics of contraction and dilation processes intrinsic to mural cells are the same. This is highlighted by their use of the phrase "kinetics of the vasculature", which elides the possibility that dilation and contraction kinetics intrinsic to mural cells are different. Support for this latter possibility is provided by a previous report on renal afferent arterioles showing that the kinetics of myogenic constriction in arterioles are "substantially faster" than those of dilation (PMID: 24173354). Thus, their data do not rule out the possibility that the delay between sensory stimulation and vascular response reflects a slower intrinsic dilatory response rather than the time course of neurovascular coupling mechanisms. Furthermore, arterioles have an internal elastic lamina (IEL), which also determines the rates and degree of constriction and dilation. The IEL ends with the arterioles, and vessels with ensheathing contractile pericytes (and downstream) lack the constraints of the IEL.

      We thank the reviewer for this constructive critique. We agree that there are many issues in comparing kinetics between sensory evoked dilation and our optogenetic constriction. We have re-worded this section to avoid any mechanistic implications in the discussion of the kinetics of the different processes. However, we wish to still incorporate the details about the rapid kinetics of constriction to highlight the utility of the approach to intervene/perturb sensory-evoked responses, given that contraction can be titrated and precisely timed. We discuss the utility of this approach further below.

      6) It's not at all clear how overriding sensory-evoked dilation with optogenetically generated constriction provides a means for distinguishing neural activity from vascular responses. In particular, it is not clear how performing this maneuver while monitoring neuronal activity can provide the suggested insight into "aspects" of functional hyperemia that are essential to neuronal function beyond the relatively trivial observation that there is a point at which blood flow is too low to support continued neuronal activity.

      Thank you for raising this point. We have added more detail to our thoughts on why over-riding functional hyperemia could provide insight into the dependence of neural activity on the blood flow increase. Neural circuits are extremely complex with many different sub-types of neurons playing different roles. These subtypes have been shown to have different metabolic sensitivities and thus, may be differentially affected by blocking functional hyperemia (PMID: 26284893). This could lead to altered circuit activity which could have profound consequences for neural processing. Additionally, the energy budgets of different cellular functions within neurons are quite different (PMID: 22434069) and reducing available energy by blocking functional hyperemia could lead to differing degrees of dysfunction across important cellular processes (e.g. re-establishing the membrane potential, recycling neurotransmitters) which could again have important consequences for neural coding. Furthermore, it has been shown that there is a steep gradient of oxygen moving away from penetrating arterioles, and so neurons at greater distances from vessels may be differentially affected by blocking the hyperemic response (PMID: 21940458).

      7) With the exception of vasculo-neural coupling, where it would be the method of choice, the technology described leaves the impression of a capability in search of an application. That said, the ability to control blood flow to the point of completely stopping it may ultimately have applications in pathological settings.

      In addition to our response above on the utility of over-riding arteriole dilation during functional hyperemia, we have added to the discussion more potential uses of the technique. These include: (1) To be able to manipulate blood flow without using pharmacology or having to induce neural activity could be useful for a variety of studies involving intrinsic reactivity and compliance of vessels in both health and disease. (2) The different microvascular zones have distinct contractile kinetics. There are details that remain unstudied, such as the kinetics of different sized vessels, their location in the network, their identity as collateral arterioles or pial arterioles. Vascular optogenetics can dissect the contractile characteristics of different vessel types, similar to probing a circuit board. (3) Studies of the physiological significance of vasomotion, with respect to brain clearance of metabolic waste products. Being able to directly drive vasomotion and alter its amplitude and frequency will be an important tool for studies in this field. (4) Functional hyperemia is also impaired in many diseases, but this dysfunction could arise from impaired activity of neurons, astrocytes, or vessels. Therefore, a method to disentangle specific changes to blood vessels in vivo could be useful for understanding the vascular contributions to such diseases.

      Reviewer #2 (Public Review):

      The manuscript by O'Herron et al. describes a new technique for all-optical interrogation of the vasculature in vivo. They expressed optogenetic actuator ReaChR in vascular smooth muscle. They activated ReaChR using single-photon or 2-photon absorption. In both cases, they observed rapid and reversible constriction (presumably, due to Ca increase). Single-photon activation produced widespread constriction; two-photon activation allowed targeting of individual vessels. Using a commercial 2-photon system with a spatial light modulator on the photoactivation 1040-nm beam, they demonstrated localized constriction at multiple points along the small and large cerebral arterioles at once targeted by individual beamlets. Overall, this is a very interesting paper that clearly lays out the methodology and experimental design and carefully considers a number of potential limitations and pitfalls. This paper will serve as a valuable recourse for a large community of eLife readers interested in cerebrovascular physiology in health and disease as well as in neurovascular coupling and interpretation of noninvasive imaging.

      Given the chronic nature of the optical window, it is not clear why imaging was done under anesthesia. This point requires explanation. There is a concern that targeting of the vessel wall not possible in awake animals due to brain motion. If yes, that would be a serious limitation of the methodology.

      To ensure that our method is compatible with awake experiments, we have added awake data to the manuscript (Figure 10). We show that individual vessels can be independently targeted in the awake animal and the outcomes are not profoundly different than in the anesthetized state. As with all awake experiments, due diligence must be taken to ensure the preparation is as stable as possible, and the occasional trial may have to be removed if motion artifacts are too large.

      Reviewer #3 (Public Review):

      Strengths: In the vascular field, previous implementation of optogenetics to constrict and dilate blood vessels, has used either single photon full field and fiber illumination, or alternatively confocal and 2-photon scanning of individual vascular segments with raster scanning. The former is limited in spatial precision, activating multiple vessels over a large area, whereas raster scanning is not ideal for accumulating currents and often results in slow temporal precision. Spatial light modulator (SLM) generated diffraction patterns to achieve patterned illumination have become increasingly used in neuroscience to achieve reliable 2-photon activation of targeted neuron populations. Here the authors use this technology to depolarize and constrict smooth muscle cells in vivo. By imaging and stimulating with 2 laser lines and different optical paths they are able to stimulate opsin expressing cells and image simultaneously, which is advantageous. By using the Red-shifted opsin ReaChR for their experiments, it is possible to combine this approach (cautiously) with imaging many of the classically used 2-photon fluorophores and genetic indicators, with excitation spectrums <1040nm. Future work using variations of the technique is likely to gain valuable insight into neurovascular biology.

      Weaknesses: A major limitation of the current study is that although the authors achieve high spatial precision of ReaChR activation in the xy plane, the axial precision appears extremely poor compared to what would have been expected. For example, in Fig. 5-1 (using a 0.8NA, 16x objective), the authors achieve equivalent levels of surface arteriole constriction even when the SLM is focused 200um above the brain, and even larger constrictions as they initially move the focus away from the imaging plane. Although the axial spatial resolution appears better with the 1.1NA - 25X objective, such a large point spread function largely limits the utility of the technique, as there will always be a concern as whether the effects are spatially specific and not due to activation of vascular cells above and/or below the site of interest. This experiment that the authors have presented on axial precision is extremely important as it outlines a very important limitation of the technique (which is likely power dependent), but it remains to be completely characterized and understood. One possibility is that the power levels used by the authors are already above saturation, a problem raised by Rickgauer and Tank (2009)- PMID: 19706471, and therefore they may be able to refine the axial precision by using lower power. Further controls would be valuable to understand the precise cause of this large axial spread as it doesn't quite add up with the diameter of the bleach spot shown in figure 5-1D (some suggestions outlined in recommendations to the authors).

      We agree with the reviewers on this point. We conducted several new experiments to help elucidate the limits of axial resolution. First, we have dropped the comparison between objectives with different NA’s. This leads to unnecessary confusion, and it is common knowledge that lower NA objectives will have poorer resolution in the axial plane. We now mention this as a factor to consider, but have removed it from the figures. Second, we have shown, as the reviewer suggests below, that the stimulation power used has a dramatic effect on the axial spread of constriction (Figure 6E and Figure 7). Low powers indeed show a more narrow axial spread. However, we typically use higher powers (near or above 100 mW) to generate large constrictions in penetrating arteries, and we also include these levels to show the greater axial spread they cause. In summary, we confirm with lower powers the 3D precision of the two-photon optogenetic technique, and we show that higher powers can be used to broadly constrict penetrating arterioles for studies seeking to modulate blood flow in columns of cortical tissue supplied by penetrating arterioles.

      Regarding the stated inconsistency with the bleached spots, we think this mostly has to do with the difference between photo-bleaching fluorescent material (requiring lots of laser power) and photo-activating opsin channels (which can be done with much less power for very sensitive opsins). Additionally, the slide we bleached is optimally activated at ~800nm and so our 1040 nm stimulation required enormous power to burn the spot.

      The current version of the paper also lacks adequate quantification of the results as it is composed primarily of representative examples, which limits a proper assessment of reproducibility and variability of the effects.

      We agree that showing population averages will be more informative to the field. In the original submission, we showed mostly examples because the large parameter space (size and number of spots, position on vessels, duration and intensity of stimulation; if a stimulation train, the duration, number, and inter-pulse interval of stimulation) was explored in the early data rather than picking one set of conditions. However, we have now collected new data where parameters were typically the same and included population average plots in the figures that previously had only individual examples (Figures 2G,I, 4I,M, 4-1C, 5I, 6E,F, 7, 11-2 ) as well as the new data (Figures 8, 9, 10).

    1. Author Response

      Reviewer #1 (Public Review):

      LaRue, Linder and colleagues present an automation (GLO-Bot) and analysis pipeline building on the previously developed GLO-Roots, which makes use of a constitutively expressed luciferase gene to image plant roots in thin soil containers (rhizotrons). After validation of the system using a set of 6 accessions, the authors then take advantage of the increased throughput to phenotype root system architecture (RSA) of 93 natural Arabidopsis accessions and perform genome-wide association to identify polymorphic genomic regions that are associated with specific RSA traits. I appreciate that the authors made all data available via zenodo.

      The authors succeeded in automating the GLO-Root system. Overall, the GLO-Bot appears to be a nice platform to collect time-lapse images of root growth in soil-substrate using rhizotrons. The automation of the GLO-Roots system using the GLO-Bot is well described, although not in sufficient detail to be rebuilt by interested researchers, e.g. the software controlling the robot is not described or made available, precluding wide adoption of the method. The image processing pipeline is clearly described in the methods and in Figure 2. The pipeline open source and available for use and appears to work well overall, although in some cases the vector representation of the root system appears to be incomplete.

      We thank reviewer #1 for raising these concerns. We have now made the general code for the software available (GitHub: https://github.com/rhizolab/rhizo-server). In addition, we uploaded the rhizotron laser cutting files (Zenodo DOI: https://doi.org/10.5281/zenodo.6694558) that would facilitate rebuilding the robot.

      We understand the concerns about the vector representations of the root system.

      These root system structures visible on the GLO-Bot images are indeed disconnected in many locations, due to variability in the reporter’s intensity and obstruction of the light path by soil particles. For traits like root angle, the disconnected nature of the root system is much less impactful as this method naturally uses “segments” of the root as individual elements for angle measurements.

      The authors then present a quantitative analysis of RSA using a set of 93 accessions, with 6 replicates per accession, generating a large dataset on the diversity of RSA in Arabidopsis. Using average angle per day, the authors identify SNPs that significantly associated with angle at 28 days after sowing, and they describe a correlation between this trait and the mean diurnal temperature range at the site where the accession was originally collected. The main weakness of the manuscript in its current form are some details of the quantitative genetic analysis. In my opinion the quantitative genetic analysis would benefit from additional quality control as there are peculiarities in the dataset that was used as the basis for GWAS.

      We understand the concerns from reviewer #1 about the quantitative genetic analysis. Ultimately, we performed the analyses in the way we explained in the paper with careful consideration. We have added in additional descriptions of the rationale for chosing certain methods that hopefully elucidate why we did the analyses in the way we did. We hope this paper serves as a resource for others to pursue additional studies on traits relevant to their research.

      Reviewer #2 (Public Review):

      Therese LaRue and colleagues have developed a second generation of the GLO-Roots system that had been developed in their lab and published in 2015. Importantly, the new system (GLO-Bot) and the analysis of the resulting images has now been largely automated and therefore provides a throughput allowing for genetic studies. In an impressive endeavor the authors have transformed more than 100 diverse accessions that had been selected using sensible criteria with the luciferase construct, which then allowed the RSA of these accessions to be measured using the GLO-Bot system. On a set of 6 diverse accessions, the authors carefully identify meaningful RSA traits that they then quantified in the accessions of a larger panel of almost 100 accessions. They also benchmarked the new imaging processing tools against gold-standard manual tools. Overall, they show that the data acquisition and analysis is reproducible and reasonably accurate. They then proceeded to conduct GWAS using the RSA traits and identified several significantly associated candidate SNPs. Finally, they correlated the RSA with environmental variables and found interesting correlations that are consistent with prior studies.

      Strengths:

      The manuscript presents interesting root phenotyping technology, a comprehensive atlas of RSA under rhizotron lab conditions in Arabidopsis, candidate genes potentially underlying RSA traits, and interesting associations of RSA and climate variables. This will be inspiring and useful to many other researchers and has the potential to be explored further in future studies.

      We thank the reviewer for the encouraging feedback.

      Weaknesses:

      Some aspects of the data analyses are not well described and should be described more. The trait data is heavily processed to "breeding values" and it is a bit unclear when unprocessed and processed trait data is used and why. Also, limitations and caveats are not discussed sufficiently. For instance, presenting and discussing the issues and caveats of measuring RSA that was generated in thin and not very wide soil sheets using the GLO-Bot system when natural growth in soil is usually largely unconstrained. Moreover, the analysis of potential candidate genes from the GWAS is not very well developed. Finally, the trait data was not available with the manuscript and a major impact of a resource like this will come from the data being fully available to the community.

      We appreciate the broad comments on the manuscript and have tried to address them through the specific responses below. Overall we believe the approaches we used are effective but with specific caveats and have used the revision as a means of better communicating the limitations of the approaches chosen.

      Reviewer #3 (Public Review):

      The authors provide a thorough description of a method to transform plants to be bioluminescent upon applications of the require substrate such that roots are visible on the windows of rhizoboxes. They have expanded on previous work by automatic the imaging process with a robot that moves rhizoboxes to an imager where images are captured. They have improved the image analysis pipeline to be mostly automated with a user presumably needed to run various scripts in batch mode on directories of images. One novel aspect of the image analysis pipeline is in using image subtraction to subtract the previous time root system from the current in order to identify new growth.

      We thank the reviewer for highlighting the strengths of the manuscript.

      Overall, I think the authors provide a great amount of detail in parts needed and the methods, but some recommendations to increase reproducibility are more information about actual root traits measured. For example, one concern would be if root length is only summing pixels without considering diagonal pixels having a length of square-root of two, sqrt(2).

      This is a valid concern, rather than just summing the pixels, the length of the segments is actually calculated using the “Feret Diameter” (or caliper length) function in imageJ which does take diagonals into consideration

      While the methodological aspects of the paper are compelling, the authors have furthered the significance through a biological application for genetic analysis among accessions of Arabidopsis and correlating root traits to climatic 'envirotypes' or data from the origin site of the respective accession. This genetic analysis would be furthered by greater consideration of time series analysis and multi-trait analysis, which is possible in GEMMA. The authors could consider genetic analysis of the PCA traits as well. Given the novelty of this type of time-series, multi-trait data - the authors can reach further here.

      Absolutely, PCA approaches to disentangle the phenotype space would be highly interesting to further investigate, which we started in the Supplemental Figure 8. This figure decomposes all the data points including replicates and temporal values of the same replicate. The PC1 therefore mostly captures how plants change over time, while PC2 seems to capture the main trade-off of wide/horizontal vs deep/vertical root architectures that we describe throughout the text. We could make use of this PC space to quantify the average value per genotype in PC2 and utilize this value for GWA, although it is not obvious how replicated and temporal measurements behave in PCA and what would be its consequences when computing a genotype value. There will definitely be interesting work that we aim to pursue in this direction in the future.

      Regarding the additional capabilities of GEMMA. We are not aware of a subtool that is able to analyze time series directly in GEMMA, but we will look into it. The multi-trait analysis in GEMMA is also interesting. We have utilized the multi-trait feature in the past, but this is limited to very few traits. We have 8 time points, thus 8 traits. For reference, when we have run multi-trait LMM with 2 traits, we have typically seen runtimes of ~9 days in large clusters. New tools continue to emerge in the field of quantitative genetics, such as the use of summary statistics of multiple GWAs to gain new insights, which we will pursue in the future. We have added possible future directions to the discussion section (page 14).

      As far as the general structure of the manuscript, I struggled with the results mixing in the methods such that I was never sure if the lack of detail in methods there would be addressed later, along with the mixture of discussions. Perhaps these are personal choices, but the methods were also after supplemental. I simply ask the authors to consider the reader here by being honest with my own experience reading this manuscript.

      We appreciate this comment of reviewer #3. Since this is a “Tools and Resources” article, we believe that a substantial part of the results section should include the methods that were applied. The methodology mentioned in the results section should always help the reader to understand the illustrated results in the figures. If readers would like to apply certain methods, however, more details can be found in the materials and methods section. We apologize if this was not always successful and led to confusion. In the final formatted version, all supplemental figures would be linked to the main figures so that the materials and methods section would follow the discussion.

      Overall, I believe this manuscript advanced root phenotyping by providing relatively high-throughput (imaging is slow due to the long exposure times) data and doing the time-series, multi-trait genetic mapping. The authors mention imaging shoots but no data is presented - presumably, it would be interesting to tie that in but they may be reasons to not. The authors could also discuss more the advantages of this approach relative to color imaging that has also advanced significantly since the original GLO-Root paper was released. Last, I am not sure the description of the 6 accessions study adds much value to the paper, and probably many other preliminary studies were done to prototype. Overall, this is fantastic and substantial work presented in a compelling way.

      Unfortunately, the shoot images that were taken did not have sufficient quality for further analysis and due to technical problems, the set of shoot images is not complete. We removed the part of shoot imaging from the text. It now reads:”Inside the imaging system, the rhizotrons were rotated using a Lambda 10-3 Optical Filter Changer (Sutter Instrument®, Novato, CA). If it was the first imaging day or a designated luciferin day (every six days), GLO-Bot added 50 mL of 300 μM D-luciferin (Biosynth International Inc., Itasca, IL) to the top of each rhizotron immediately before loading the rhizotron into the imager.”

      The advantages of the GLO-Roots method over color imaging is clearly that the GLO-Roots method can capture a more complete image of root systems with finer roots (like Arabidopsis). We have added the possibility of using RGB imaging for bigger root systems to the discussion section (page 13).

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, Ströh et al. characterize the kinetics of osmium tetroxide staining of soft mouse brain tissue samples, the first step in many protocols aimed to prepare samples for electron microscopy imaging. The authors used time-lapsed single-projection X-ray images of the sample immersed in the staining solution to monitor the staining process. They have then been able to not only accurately model osmium tetroxide diffusion in the tissue across time and depth, but also to compare the performance of osmium tetroxide to other commonly used first reagents: osmium reduced in potassium ferrocyanide and the same reduced osmium in formamide. Overall, they provide a clear insight on the kinetics of osmium diffusion in tissue - obeying a long-established quadratic law - while also provide clear insight on how osmium concentration in the sample rises above its concentration in the staining solution. Finally, the authors also manage to put in perspective the effects of osmium reduction on the osmium staining of the tissue. Their results showcase that osmium reduction triggers a washout of the osmium in the sample and not only counteracts an osmium-triggered sample expansion but also manages to reverse its sign, resulting in sample shrinkage and even leading to sample degradation if left for long periods of time (evident after several tens of hours).

      One minor weakness of the manuscript is that it does not characterize the presence of osmium in the tissue after the water washes that typically follow osmium staining. That would provide a valuable control for the interpretation of the potassium ferrocyanide-triggered osmium washout. Also, it would provide a valuable insight on the presence of bound osmium in the sample at the moment of starting the next staining step in the protocol, which would facilitate escalating the use their approach to modularly optimize complex heavy metal soft tissue staining protocols consistent of multiple successive steps.

      We added a supplementary figure 8 and 9a that show the dynamics of heavy metal washout in double distilled H2O for 22 hours after 22 hrs of incubation in 2% buffered OsO4. First, the sample was quickly flushed with double-distilled H2O four times to completely remove any buffered OsO4 solution. Subsequently, the sample was immersed in double distilled H2O for 22 hrs. The accumulation of heavy metals decreases by about 6% in a depth of 100-1200μm. Our hypothesis is that this can be explained by unbound osmium that diffuses out of the sample and by the slight expansion of the sample (Supplementary Figure 9b). But in contrast to the washout effect in K4[Fe(CN)6] (Supplementary Figure 12c), the reduction in heavy metal density appears to be small.

      Reviewer #2 (Public Review):

      The investigators studied kinetics of Osmium Tetroxide diffusion in large chemical fixed biological samples. So far it has never been monitored so accurately. The use of micro CT scan images gives good insight in what is happening inside tissue blocks. The technical designed approach and mathematical analysis of the data, result in achieving the goal of opening the black box of staining. Other labs might use this X-ray method to understand their -sometimes very specific- conventional electron microscopy sample preparation protocols.

      The data shows accumulation of OsO4 in 4mm brain tissue blocks. Quantification of absorption intensity proves a quadratic dependence of time and sample size. OSO4 shows a homogeneous distribution after 20h in contrast to reduced osmium what resulted in a heterogeneous distribution and a high intensity band at 300-880 µm depth.

      Adding formamide to reduced Osmium gives a more homogeneous spreading but a side effect of long incubation with Formamide is 10-15% expansion of the tissue (reduced Osmium alone shrinks 5%, Osmium alone expands 5%).

      To overcome heterogenous spreading of reduced osmium the reagents were separated: the 1st osmium step was followed by a 2nd reducing ferrocyanide step. Surprisingly this led to wash-out of Osmium from the sample and therefore not useful.

      The authors used equations and simulations to develop a diffusion-reaction-advection model. Four coupled processes of diffusion, binding, unmasking and expansion are described to explain the staining reaction.

      The future goal of this paper is to set up an in-silico model which can be used for e.g. precious samples and predicts processes in different type of samples. A lot more work needs to be done to get that far though since many more steps are involved in the sample preparation for electron microscopy to get decent morphology. Variations of tissue, cells, species, protocols, imaging techniques are numerous. To create an "one fits all model" is very ambitious.

      Strengths:

      The results are very well documented and the use of micro CT to monitor chemical processes will be useful to other laboratories to better understand complex sample preparation steps. It will certainly be used by others to adapt their protocols to specific specimens.

      Experiments were done consistently and accurately.

      Both the introduction and discussion are supported by thorough literature search, which build a thorough reference for laboratories interested in sample preparation for electron microscopy.

      Of specific interest are the reported effect of the commonly used osmium mixes on the overall tissue topology and the unexpected shrinkage/swellings. These undoubtedly will raise awareness in the community and should trigger careful (re)consideration of established protocols.

      This work could represent a stepping stone for those laboratories studying the ultrastructure of large specimens, in particular (but not exclusively) the neurobiology community.

      Weaknesses:

      The study is done on brain tissue which is heterogenous and might make the extrapolations quite unprecise when modeling. It might have been easier to work with a more homogeneous samples like liver.

      On the tissue type as well. It would be interesting to have some data from other tissue types in order to extract how different various tissues would behave in terms of osmium penetration. Yet this might be slightly beyond the scope of this article.

      The fitted parameters of the diffusion-reaction-advection model (Supplementary Figure 7) capture the measured diffusion-reaction-advection kinetics of osmium staining in brain tissue well (Figure 2c, Supplementary Figure 1; residual standard error SEres=0.026 ± 0.005, mean ± s.d.), indicating that the heterogeneity of the brain tissue is not a major constraint for the accuracy of the presented model. But we do agree with the reviewer that the diffusion/reaction-advection kinetics are likely different in different tissue types. To illustrate this, we measured and modeled the staining kinetics of 2% buffered OsO4 in 4mm punches of liver tissue (Supplementary Figure 13). Interestingly, the effective diffusion coefficient appears to be >4X larger in liver tissue compared to brain tissue.

      Whilst the penetration of osmium is very important to achieve a good preservation of tissues and a good and homogenous contrast for EM, this step is also very delicate, as it can lead to tissue damages, especially when the reaction is not controlled. It is known, for a long time, that osmium can cause precipitates, loss of components (e.g. cytoskeleton) or even tissue destruction. One way to mitigate this has been to perform the osmium fixation at low temperatures, e.g. on ice. Yet, the authors don't report the temperature at which they performed their experiment. It is assumed that they worked at room temperature, whatever it could be within the Versa. This should be documented.

      All staining and washing steps have indeed been done at room temperature. We added that information accordingly in the sample preparation section of the methods.

      Moreover, and in line with the previous comment, it seems very important, if not crucial for this study, to thoroughly document the effect of long term exposure to osmium on the tissue integrity, at the ultrastructural level. The authors should perform the full workflow, i.e. down to the EM analysis, not necessarily on the full time series but at least on key timepoints. Assessment of various key components, e.g. synaptic structure, myelin sheets integrity, visibility of organelles, microtubules etc. would be very important. Indeed, what would be the interest of a 20 hours incubation in osmium if this would lead to a loss of the fine subcellular organization?

      We added Supplementary Figure 14 showing the ultrastructural preservation after 20 hours of incubation in OsO4.

      Another point that might be interesting to investigate and report on would be the potential damages caused by X-ray irradiation over long time periods. Does this interfere with the stability of the osmium solution? Of the sample itself?

      This is a very important point which has to be kept in mind for every staining solution and procedure studied by X-rays. For the staining solutions presented here, we did not find any difference in the qualitative appearance of the samples or the staining solutions with or without X-ray exposure.

      I am not able to comment of the modeling part itself, but it seems that the diffusion-reaction-advection model is based on many assumptions e.g. tissue density, isotropic expansion, homogeneous diffusion medium. The validation on experimental brain sample looks convincing, but it would be interesting to check how these could be generalized to a larger spectrum of biological material.

      As the reviewer points out, the fitted parameters of the diffusion-reaction-advection model (Supplementary Figure 7) capture the measured diffusion-reaction-advection kinetics of osmium staining in brain tissue well (Figure 2c, Supplementary Figure 1; residual standard error SEres=0.026 ± 0.005, mean ± s.d.), indicating that the heterogeneity of the brain tissue is not a major constraint for the accuracy of the presented model. But we do agree with the reviewer that the diffusion/reaction-advection kinetics are likely different in different tissue types. To illustrate this, we measured and modeled the staining kinetics of 2% buffered OsO4 in 4mm punches of liver tissue (Supplementary Figure 13). Interestingly, the effective diffusion coefficient appears to be >4X larger in liver tissue compared to brain tissue.

      Reviewer #3 (Public Review):

      Ströh et al use time-lapse X-ray imaging to monitor the diffusion of heavy metal stains into large brain samples. Uniform staining of large (thicker than 1 mm) tissue samples is a prerequisite for future whole-brain 3D EM reconstructions of synaptic connectivity. Until now, staining optimization has essentially been achieved through trial and error. The reported approach allows the rapid measurement of staining gradients and the determination of diffusion rates within tissue specimens. This offers the possibility to modify staining parameters with a more rapid turn-around. The authors develop a diffusion/binding model to describe the occupancy of free and masked osmium binding sites and fit the model parameters to the diffusion of osmium solution. The authors also demonstrate that an approach that separates the osmium staining and reduction steps seems to counterintuitively 'washout' the osmium in the tissue.

      While the approach seems promising as a diagnostic tool and offers a principled approach to gaining a better understanding of staining processes, a weakness is the lack of a demonstration that the x-ray imaged staining gradients correlate with what is actually observed under the electron microscope. For example, the figures show that reduced osmium stains tissue with a maximum intensity of ~1.1 (a.u.) compared to osmium alone at ~0.9 (a.u.). Because these intensities are not calibrated against the appearance of the staining in EM sections, their interpretation is limited.

      We added Supplementary Figure 14 showing the ultrastructural preservation after 20 hours of incubation in OsO4 and Supplementary Figure 15 that shows the covariation of the EM and X-ray pixel intensities. Note, in the presented study we acquired X-ray projection images that represent the cumulative tissue and heavy metal density along the direction of projection through a 4 mm thick tissue punch. Therefore our X-ray projection pixel intensities cannot directly be compared to the EM pixel intensity of a thin section. As has been shown previously (see for example Figure 4b in Mikula & Denk 2015), in computed X-ray tomograms the intensity scales linearly with EM intensity, if the pixel intensity in an EM section is compared to the corresponding reslice of a computed X-ray tomogram.

    1. Author Response

      Reviewer #1 (Public Review):

      Neutrophil extracellular traps (NETs) are defined as structures containing extracellular DNA co-localizing with granule-derived proteins, such as neutrophil elastase, and histones. While in in vitro assays a variety of protocols have been described to unambiguously detect and quantify neutrophil extracellular traps (NETs), in ex vivo tissue samples, quantification and demarcation of NETs from the remnants other forms of neutrophil cell death such as necrosis is still challenging. The current manuscript by Tilley and colleagues describes a novel tool to perform that important task. The authors have discovered that human histone H3 is processed by serine proteases at a specific cleavage site during NET formation. They created a mouse monoclonal antibody to this cleaved histone H3 and assessed its performance as a tool to detect NETs in vitro and ex vivo.

      The paper is well-structured and written, thus presenting a valuable contribution to the field. There are some open issues with the manuscript which are not clear at this point:

      1. One major point are the dynamics when this clipping occurs and if it occurs extra- or intracellularly. The authors have a used a serine protease inhibitor, AEBSF, which not only inhibits histone clipping but also NET formation and nuclear decondensation itself. I am therefore not sure if the conclusion can be drawn that histone H3 clipping is an intracellular event and "serine proteases cleave the N-terminus of H3 early during NET formation." This is also in open conflict to the study by Pieterse et al. (Ann Rheum Dis 2018) who demonstrated prevention of histone clipping by serine protease inhibitors working exclusively outside the cell.

      The reviewer raises an intriguing biological question. It will be very interesting to determine if the H3R49 clipping is happening intracellularly or extracellularly. This specific H3R49 cleavage does not appear until 120 min, however larger cleavage products appear after 30 min PMA stimulation by western blot – Fig 1A and 3A – at time points in which we know the cell membrane is not permeable as determined by sytox staining. However, we have not inferred from this that all cleavage is occurring intracellularly. Indeed, the kinetics of the appearance of the smaller 10kDa fragment, recognised by 3D9, suggests this specific cleavage likely occurs after permeabilization of the membrane which is in line with the findings of Pieterse et al, that cleavage happens after lysis or permeabilization of the membrane. However some cleavage may also happen intracellulary. A point to note is that, in their paper, they use a histone H3 antibody (#34) directed to residues-29-32, which includes a cleavage site we putatively identify in this study (Table 1): H3T32 – derived from the detection of the TGGVK peptide in Edmann degradation. This site and the epitope of the Pieterse anti-H3 (#34) are N terminal to the cleavage site H3R49. Thus, when cleaved at H3R49, a histone fragment may be released that contains this epitope. Using necrostatin, a reported inhibitor of membrane permeablisation, Pieterse et al show that the chromatin decondenses but is not externalised and this chromatin stains strongly for the histone H3 tail region (a.a. 29-32). They interpret this as the tails not being cleaved. However, we present an alternative hypothesis: the tails are cleaved, allowing chromatin decondensation, but the fragments remain concentrated in the cell as they cannot be dispersed through cell membrane permeabilization. A western blot of necrostatin treated cells may shed light on whether cleavage is really not happening in the absence of membrane permeablisation. One advantage of our cleavage site specific antibody is that it does not recognise free or cleaved tails, but the remaining histone that is still associated with the chromatin or NET. Furthermore, the epitope is not present if the histone has not been cleaved at H3R49. We hope that it will be a useful tool for the community in investigating the different molecular mechanisms at play during NET formation.

      We agree we have not tied the actual proteolytic event to the serine proteases however we can say that their activity is needed for the events leading to histone H3 cleavage. Thus, we have modified our finding to reflect this. Further research is needed to explore which proteases, serine or otherwise are mediating this cleavage event at H3R49

      Line 98-100 “This data shows that the N-terminus of H3 is cleaved early during NET formation and that this event is dependent on serine protease activity.

      1. Along these lines it is also confusing that the staining by an anti-citH3 Ab and 3D9 seems to be mutually exclusive. The authors mention this in the discussion and explain that 3D9 "may display a preference for more mature or proteolytically processed NETs." This seems to be hard to align with their claim that H3 is clipped early during NET formation. It would be important to show if citH3+ NETs progress into 3D9+ NETs. If this is not the case, that would potentially render a large part of the literature that has used citH3 staining for the detection of "NETs" useless.

      We are grateful to the reviewer for highlighting this confusion and we have made changes to the discussion to clarify our conclusions. We have now made a clearer distinction between general histone H3 cleavage and the specific cleavage site detected by 3D9 at H3R49. Histone clipping/cleavage is a process that occurs during NET formation, at various sites and time points. At later time points, the H3R49 site is cleaved and the antibody recognises this later cleavage.

      Line 267-270 Using a biochemical and proteomic approach, we determined that H3 is cleaved at multiple sites in its N-terminal tail during the course of NET formation and notably, at a novel cleavage site in its globular domain, H3R49, at ~120 min

      Line 292-294 Thus, we propose that 3D9 will allow broad detection of NETs induced by varied stimuli but that it may display a preference for more mature or proteolytically processed NETs or NETs that are citrullinated to a lesser degree or not at all.

      We have also included a time course of h3cit and 3D9 staining with PMA demonstrating that while most cells stain only for 3D9, there are a small percentage of cells staining for H3cit and an even smaller percentage that are double positive (Figure 11-figure supplement 2). This would suggest, that at least with PMA, a NOX dependent stimulus, that citrullination does not commonly precede histone cleavage although it may occur. Progression from citrullination to histone cleavage may be more common for NOX independent stimuli. Indeed, Nigericin also induces citrullination (90 min) but citrullination inhibitors do not affect the level of NET production (Kenny et al 2017, Elife. 2017, 6:e24437, Figure 6) and in the current manuscript we show that Nigercin produces NETs that are detectable with 3D9 and thus contain cleaved H3 at 2.5h. To us this suggests that the events of citrullination or histone cleavage are not mutually exclusive but that the hallmarks that remain once the NET is formed can be exclusive, specifically when looking at H3cit R2,8,19 and cleavage at H3R49. It would be every interesting to look at other citrullination markers, e.g. on H4 with a wide range of stimuli to see if the evidence of citrullination was more ‘long lived’ on a histone that was less susceptible to proteolysis but this may be examined in future research.

      1. Non-suicidal pathways of NET formation were described, where parts of the nucleus are extruded but the cell remains intact and basal cellular functions of neutrophils are still carried out. These" vital NETs" are not addressed in the manuscript.

      The phenomenon of “vital NETs” and whether histone H3 cleavage occurs during this process would be a very interesting question to explore but, unfortunately, we are not yet able to investigate this using this antibody. The occurrence of vital NETs needs to be observed with live cell imaging studies (Yipp et al., 2013, doi: 10.1182/blood-2013-04-457671). However, our antibody selection method was optimised for use with fixed or denatured human neutrophil samples only.

      We have addressed this limitation and its consequences for choice of assay in the discussion in the new manuscript.

      Line 308-319

      “In this study we detect NETs in fixed or denatured human samples from in vitro experiments and histological samples……. thus, care should be taken in the design of future assays and selection of sample when detecting cleaved H3, NETs or vital NETs under native and mild detergent conditions. In particular, 3D9 is not suitable for direct detection of NETs in complex biological fluids and a sandwich approach or colocalization with a neutrophil granule protein is critical.“

      1. The authors show that neutrophils stimulated with C. albicans released NETs not bound by 3D9, which remains unexplained.

      For this study we use the histological definition of NETs as extracellular chromatin decorated with neutrophil granule proteins. Thus, we can describe these visible structures as NETs as they stain for DNA and neutrophil elastase. It is possible that they are citrullinated NETs, as observed by Kenny et al (Elife. 2017, 6:e24437, Figure 6). However, we cannot exclude that these are remnants of other forms of neutrophil cell death and this remains a problem faced by the entire field as recently reported by Boeltz et al., 2019 (DOI:10.1038/s41418-018-0261-x).

      We have expanded on this observation in the discussion: line 282-285

      Like citrullination, not all NETs contain H3 cleaved at H3R49. With Candida albicans, some NET-like structures were not 3D9 positive (Figure 6). These may be remnants of other forms of cell death or they may be citrullinated NETs as has been shown by Kenny et al (2017).

      1. The authors suggest careful validation for cross reactivity in samples under native and mild detergent condition, e.g. in serum samples. It would be good if this validation be performed in the current study.

      This is an informative experiment for users of this antibody and we thank the reviewers for recommending to include it. In preparation for using the antibody with blood samples, we performed some preliminary experiments to address whether the antibody was suitable for a simple ELISA in the presence of plasma/serum. We have now added Figure 3-figure supplement 4 addressing this. In this experiment, we examined the ability of 3D9 to react with healthy donor plasma and serum alone by direct ELISA (Figure 3-figure supplement 4 (A) (biological replicate n=1). 3D9 strongly reacted with plasma but not with serum. We also examined detection by 3D9 of western blotted serum and plasma proteins separated by SDS page under reducing and non reducing conditions (Figure 3-figure supplement 4 (B) and found that 3D9 also detected proteins, of higher molecular weight than cleaved H3, in plasma but not in serum. We conclude that the direct detection of NETs by 3D9 in plasma containing samples and thus whole blood is not possible due to cross reaction with a plasma protein(s). Based on this we caution the use of 3D9 in serum containing samples alone as it is challenging to ensure all plasma proteins are removed and instead, we advocate for the use of sandwich and colocalization approaches.

      In the main text we have included the following lines 152-156

      However, when we performed preliminary experiments to see if the antibody had the potential to work in blood samples, we observed a strong reaction of the antibody with a plasma protein(s), but not with serum-protein(s) as determined by direct ELISA and western blot (Figure 3-figure supplement 4. Therefore, 3D9 is not suitable for direct detection of NETs in biological fluids that may contain plasma proteins.

      In the discussion Line 314-319

      Furthermore, a preliminary investigation revealed 3D9 reacts with a plasma protein(s) in ELISA and western blot - albeit of a higher molecular weight – (Figure 3-figure supplement 4A) and thus, care should be taken in the design of future assays and selection of sample when detecting cleaved H3, NETs or vital NETs under native and mild detergent conditions. In particular, 3D9 is not suitable for direct detection of NETs in complex biological fluids and a sandwich approach or colocalization with a neutrophil granule protein is critical.

      And appropriate text has been added to the methods section line 568-576

      The same approach was used to examine 3D9 interactions with immobilised serum and plasma proteins. Plasma was isolated from whole blood collected with S. Monovette sodium citrate tubes (Sarstedt). Whole blood was centrifuged at low speed to minimise cell lysis (150 xg, 20 min with no brake). Prostaglandin E1 (1 µm) was added to inhibit platelet activation and samples were further centrifuged at 650 xg (8 min) to collect cell free plasma and further centrifuged at 2000 xg (10 min) before being aliquoted and stored at -80 °C. Serum was isolated by collection of whole blood in S. Monovette serum tubes (silicate clotting activator) and incubation with gentle rotation for 30 min at RT before centrifugation at 2000 xg at 4°C (10 min) and collection of serum.

      Reviewer #2 (Public Review):

      In this study, Tilley et al. identified cleavage of histone H3 at R49 (H3R49) as a candidate marker of NETs and generated a H3R49 cleavage site monoclonal antibody (termed 3D9) as a potential tool to detect NETs in human samples. The antibody was validated using both in vitro assays and human tissues. Using human neutrophils, they demonstrated that 3D9 detects NETs induced by both ROS-dependent (i.e. PMA, heme in TNF primed neutrophils and Candida albicans) and ROS-independent stimuli (i.e. using the toxin nigericin from Streptomyces hygroscopicus). To demonstrate the specificity of 3D9, they first showed that 3D9 distinguishes NETs from other activated leucocytes in PBMCs after stimulation with PMA or nigericin. These studies also found that the anti-chromatin antibody PL2.3, broadly used to detect NETs, is not specific for NETs as it also stains nuclei of activated PBMCs. Moreover, they showed that 3D9 distinguishes NETs from other forms of neutrophil death, including spontaneous apoptosis, necroptosis induced by TNFα stimulation in the presence of a SMAC, and necrosis induced by the staphylococcal toxin α-haemolysin. Interestingly, these studies also found that the PL2.3 antibody is not specific for NETs, but also stains apoptotic cells. The detection of apoptotic cells as well as activated PBMCs by PL2.3 importantly questions previous studies in which PL2.3 has been used to specifically detect NETs. Finally, they showed that 3D9 labels neutrophils in inflamed human tissues, including tonsil, kidney, appendix and gallbladder. However, colocalization of 3D9 with other anti-NET antibodies (PL2.3 and anti-citrullinated histone H3, H3cit) is not impressive and particularly poor for H3cit. Since 3D9 detects both ROS-dependent and ROS-independent NETs, the authors concluded that histone H3 cleavage at R49 is a general feature of human NET formation. Therefore, the authors propose that the antibody 3D9 is a new tool to detect and quantify NETs in human samples.

      Some conclusions of this paper are well supported by data. However, the conclusion that this novel antibody can detect any form of human NETs is not demonstrated. The study needs a better validation of 3D9 using broader NET-inducing stimuli relevant for human diseases. In addition, this study needs to confirm the specificity of the anti-NET antibodies in tissues. Thus, for some aspects of this work, some data need to be clarified and extended.

      1. To validate that 3D9 detects all forms of NETs, the study used well-described NET inducers that generate ROS-dependent and ROS-independent NETs. PMA and nigericin are very useful in this regard because these are potent stimuli that induce NETs using either pathway (ROS-dependent and ROS-independent, respectively). However, neither PMA nor nigericin are stimuli relevant for human pathology. In particular, S. hygroscopicus (the source of nigericin) is not a human pathogen. The inclusion of NETs induced by heme in TNF primed neutrophils (a stimulus relevant for NETs in malaria) and by C. albicans is certainly an important complement for the study of 3D9 in ROS-dependent NETs associated with human diseases. However, the study is missing the analysis of ROS-independent NETs induced by stimuli associated with human illnesses.

      We thank the reviewer for the recommendation to look at additional stimuli. We apologize in advance as we have noted a mistake in the manuscript designating the disease relevant stimulus heme as NOX dependent. This has been corrected (line 302) to “both NOX dependent (PMA) and NOX independent (heme;nigericin) stimuli result in NETs”. Heme induces NETs in CGD patients lacking a functioning NADPH oxidase complex but intracellular ROS scavengers inhibit heme induced NETs (Knackstedt et al 2019 DOI: 10.1126/sciimmunol.aaw0336) -thus this stimulus is ROS dependent while being independent of the NOX induced ROS burst, with the ROS likely being induced by a chemical reaction with heme itself. For this possible confusion around the designation of ROS dependency we categorise our stimuli based on their NOX (in)dependency.

      In this manuscript we tested two NOX dependent and two NOX independent stimuli: these are the mitogen PMA and infections with Candida albicans (NOX dependent), and heme plus TNF and nigericin (NOX independent). C. albicans is a medically relevant pathogen. Heme, a product abundant in malaria, and TNF are a model for cytokine activation in Plasmodium infections. Nigericin, while not yet used in humans, is used in veterinary medicine and is being examined as an anti-tumor agent for the treatment of human colon cancer among others (DOI 10.1158/1535-7163.MCT-17-0906) . Both PMA and Nigericin are strong inducers of the two canonical pathways to NET formation – NOX dependent and independent respectively.

      In our manuscript we do not claim that 3D9 can detect all NETs. Indeed, we attempt to demonstrate, both with Candida albicans and with the tissue sections, that there can be a variety of NETs of different flavours both in vitro and ex vivo. Like H3cit, not all NETs are cleaved at H3R49. However, histone cleavage at this site is a common feature in all the stimuli we examined and thus we believe it can be broadly used to detect NETs from a wide range of stimuli – although, not all NETs generated may be detected by this antibody. We have expanded on the observation of 3D9 negative NETs in response to Candida albicans.

      line 282-285

      Like citrullination, not all NETs contain H3 cleaved at H3R49. With Candida albicans, some NET-like structures were not 3D9 positive (Figure 6). These may be remnants of other forms of cell death or they may be citrullinated NETs as has been shown by Kenny et al (2017).

      We removed the description that “histone H3 cleavage at R49 is a general feature of human NET formation” and replaced it with “histone H3 cleavage at R49 is a common feature in human NET formation” (line 304) to convey the regular occurrence in of Histone H3 cleavage in NETosis with wide ranging stimuli, rather that it being an all ways present feature all NETs.

      1. The number of diseases and stimuli associated with NETs is growing every day and it is unlikely that only two pathways defined by artificial stimuli (i.e. PMA and nigericin or calcium ionophores) can cover all mechanisms activated in humans to induce NETs. In the host, the neutrophil-pathogen interface is more complex than PMA or nigericin. For example, toxin-free S. aureus is known to induce NETs (J Cell Biol 2007, 176:231-41), but toxins released by S. aureus are also potent inducers of necrosis. Which of these stimuli may dominate during infection with S. aureus is unclear, underscoring the complexity of correlating biochemical features found in well-controlled NETs induced in vitro with changes in neutrophils found in tissues from human diseases. It is understandable that for the initial validation of 3D9, it is not possible to cover all potential inducers of NETs. However, there are diseases in which NETs have had a major impact and created new paradigms. NETs associated with malaria and C. albicans are interesting, but only cover a fraction of NET-inducing stimuli within a subset of diseases (i.e. infectious diseases). Importantly, autoimmune diseases are certainly one of the major group of diseases in which the study of NETs have had the highest impact. In some cases, NETs are considered the driving cause of these illnesses. The analysis of NETs induced by autoantibody-antigen immune complexes (specifically anti-RNP and rheumatoid factor) would be needed to increase confidence in the validation of 3D9.

      In our hands we have not been able to produce NETs using anti-RNP complexes but we accept this criticism of the limited stimuli we have used.

      We hope, through this publication, to make the antibody available to the research community to explore which types of NETs may be the aggressors or modulators in varying disease contexts and in doing so, as a community, we can assess the usefulness of 3D9 as we have done for H3cit antibodies.

      1. When comparing the specificity of antibodies used to detect NETs, the study should include a similar analysis of 3D9, PL2.3, and H3cit. This is significant to interpret their different patterns of staining in tissues. Figure 7 and Figure 8-figure supplement 1 are missing the analysis of H3cit.

      We thank the reviewers for their critical understanding of the field and their suggestion to add a comparison of staining with H3 cit to Figures 7 and 8. Including comparison stains with PL2.3 will always be useful in ensuring we are examining all areas of putative NETs, and staining for histone citrullination and/or histone H3R49 cleavage will add specificity as they detect processes that take place during NET formation. However, in this study, we did not set out to qualitatively compare H3cit as a surrogate marker of NETs to our new antibody. It is possible that other mechanisms of cell death may involve citrullination but our aim in this study was to characterise 3D9 staining behaviour in varied contexts - not H3cit behaviour, which is an ongoing area of debate in the research community - Boeltz et al., 2019 (DOI:10.1038/s41418-018-0261-x).

      Through our investigation into the precise site of the histone cleavage we determined that, by nature of the epitopes of 3D9 and the most commonly used abcam H3cit R2, R8, R17 antibody, they cannot stain the same individual histone. Thus, in theory, examining the staining patterns of 3D9 and H3cit, side by side, as we have done for PL2.3, will not provide insight as to which NETs or modes of cell death involve citrullination – only whether H3cit is present or absent under the conditions of our experiment.

      The finding, that individual neutrophils or NETs showed such ‘either or’ - H3cit or cleaved H3 – characteristics as seen in the tissue sections was surprising and intriguing and this has led us to propose that different types of NETs, specifically those that are more proteolytically processed at the H3 N-terminal, or as a reviewer has suggested, different degrees of citrullination, may be being distinguished by 3D9. However, this will need further investigation that goes beyond the scope of this manuscript and will need to examine other citrullination sites, e.g. H4cit or even pan citrullination to determine if 3D9 distinguishes NETs with more or less citrullination.

      What we have developed in this manuscript is an additional antibody, another tool in the arsenal of NET researchers, to look at NETs (that may or may not also have been citrullinated at H3 R2, 8 or 17 or at a different histone during the process of NET formation). It is another tool for detection. We examine H3cit, 3D9 and H2B co-staining in our final tissue section figures and in doing so demonstrate the diversity of NETs in tissues and highlight how it is important to follow the histological definition of NETs and not rely on a single marker. Individually each antibody has failings but researchers who make use of multiple methods to assess NETs can be confident of their assessment of NETs in their studies.

      1. Among the different forms of neutrophil death used to validate 3D9, the study should also include pyroptosis. This form of cell death shares some common effector pathways with NETs. It is therefore important to demonstrate that 3D9 can distinguish NETosis and pyroptosis.

      We thank the reviewer for this suggestion and agree that pyroptotic cell death would be interesting to examine in neutrophils with respect to distinguishing different forms of neutrophil cell death from NETosis with 3D9. However the variation in neutrophil responses to pyroptotic stimuli would warrant a deeper investigation beyond the scope of this manuscript. Unlike macrophages, human neutrophils are largely resistant to pyroptotic cell death when inflammasome pathways are activated (Chen et al 2014, Chen et al 2018, Karamakar et al 2020, Kovacs et al (Cell Rep. 2020 Jul 28; 32(4): 107967. ). Stimulation or infection of neutrophils with macrophage pyroptotic stimuli, intracellular pathogen or pathogen signals within the cytosol, elicits some of the characteristics of pyroptosis but not the typical cell morphology. In macrophages pyroptosis is preceded by inflammasome (canonical and non-canonical) activation resulting in activation of specific caspases, caspase mediated activation of Gasdermin D and assembly of a gasdermin pore in the plasma membrane resulting in disruption of osmostic regulation and rapid cell death and the concomitant release of cytokines including IL-1b/ IL-18 that are matured through caspase activity. In contrast, in neutrophils, inflammasome activation by infection with intracellular pathogens can produce diverse outcomes in terms of cell death depending on the specific intracellular pathogen or signal. While Salmonella triggers release of caspase 1 activated IL-1beta from neutrophils (Chen et al 2014, https://doi.org/10.1016/j.celrep.2014.06.028) it does not undergo pyroptosis. Infection with Citrobacter rodentium triggers release of IL-beta but goes on to produce NETs in a manner that is inflammasome driven and caspase dependent (Chen et al 2018 Science Immunology) (https://immunology.sciencemag.org/content/3/26/eaar6676.long). Most recently, Kovacs et al have shown that instead of oligermising at the plasma membrane, Gasdermin D forms pores in azurophilic granules and autophagolysome and that IL-1b release involves autophagy machinery. Thus, given the established variability in neutrophil pyroptotic-like responses and their inherent resistance to undergo typical pyroptotic cell death, it is unclear what value examining an individual pyroptotic stimulus will add in the context of characterising 3D9 staining of NETs. Indeed, in Fig 6 we use Nigericin as a stimulus to induce NETs which are detected by 3D9 and it is a known activator of the inflammasome pathway and pyroptosis in macrophages but produces NETs in neutrophils.

      To make it clear that we have not exhaustively looked at all forms of neutrophil cell death we have modified the section title, figure titles and main body of text referring to the forms of neutrophil cell death.

      line 233-234 3D9 distinguishes NETosis from apoptotic, necroptotic and necrotic cell death in neutrophils.

      Line 273-274 It distinguishes netotic neutrophils from apoptotic, necrotic and necroptotic neutrophils in vitro.

      1. There is some evidence that H3 can be citrullinated at R49 https://www.caymanchem.com/literature/methods-in-citrullination-and-analysis-of-recombinant-human-histones. This modification would likely make H3 resistant to cleavage at this site. This may explain that the detection of the H3 fragment importantly decreases at 180 mins in NETs induced by A23187 (Kenny et al, Elife. 2017, 6:e24437, Figure 7), which is a potent inducer of histone citrullination. Thus, an alternative explanation to the lack of colocalization between H3cit and 3D9 in tissues is that these antibodies are detecting different types of NETs. H3cit may stain NETs in which citrullination is dominant, making H3 resistant to cleavage. In contrast, 3D9 may detect NETs in which H3 citrullination is absent or minimal (such as NETs induced by PMA, heme in TNF primed neutrophils, C. albicans and nigericin. Elife 6. 10.7554/eLife.24437) and therefore, H3 cleavage is fully efficient.

      We agree with the reviewer that it is possible that different types of NETs are being detected by 3D9 and H3 cit. In our initial submission we proposed that 3D9 might display a preference for proteolytically processed or ‘mature’ NETs. This is not at odds, and is in fact complementary to the hypothesis presented by this reviewer, that 3D9 may detect NETs is which citrullination is minimal or absent. We have modified our discussion to reflect this.

      Line 292-294 Thus, we propose that 3D9 will allow broad detection of NETs induced by varied stimuli but that it may display a preference for more mature or proteolytically processed NETs or NETs that are citrullinated to a lesser degree or not at all.

      1. Another possibility of the lack of colocalization between 3D9 and H3cit in inflamed tissues is that analogous to PL2.3, H3cit is not specific for NETs and may be similarly detecting activated cells or some other forms of neutrophil death. Indeed, previous studies have shown that H3cit is generated during neutrophil activation and apoptosis (Sci Transl Med. 2013, 5:209ra150). If the authors show that PL2.3 and H3cit are not specific to detect NETosis, they should discuss the implications of these findings regarding all publications that have used these antibodies to mechanistically link NETs with specific human diseases.

      We thank the reviewer for this very interesting comment and the suggestion to compare H3cit and 3D9 staining in response to different stimuli in more depth. However, first we must clarify that PL2.3 has never been used by us as a specific marker of NETs. It is a general chromatin stain but when samples are minimally permeablised it binds with greater intensity to decondensed chromatin. A neutrophil marker such as anti-NE is always needed to confirm the presence of NETs if using PL2.3. Using PL2.3 alone is not good evidence of NETs.

      We are aware that H3 citrullination can also occur in other pathways of cell activation and most notably in apoptosis and this is another reason for using sandwich or colocalization approaches with a granule protein to detect NETs. There is much literature that has been presented using only H3cit, particularly during the early work in the field. However, as of late, and thanks to robust discussion in the community, H3 cit alone is rarely presented as convincing evidence of NETs. However, with this in mind, we investigated 3D9 staining using varied modes of neutrophil cell death. Its failure to stain apoptotic, necroptotic and necrotic neutrophils suggested that it was, to date, our best candidate for a NET specific marker. However, to err on the side of caution, we still advocate for the use of sandwich and colocalization approaches until the exclusive NET specificity of cleavage at H3R49 can be established using 3D9 or subsequently developed site specific cleavage antibodies for different assays.

      1. In the analysis of inflamed tissues, it is assumed that finding neutrophils only means NETs. This gives the impression that other forms of neutrophil death have disappeared in humans. To validate the anti-NET antibodies in tissues, it will be useful to include co-staining with markers of other forms of neutrophil death. This analysis will help to increase confidence that 3D9, PL2.3 or H3cit are more likely to detect NETs in tissues rather than other forms of neutrophil death. This is important because in vitro studies are not analogous to in vivo processes.

      We thank the reviewer for raising this important point and we have added to the text to clarify that other forms of neutrophil cell death will likely be occurring in inflamed tissues.

      line 246-247 Neutrophils are recruited to sites of inflammation and depending on the context or the surrounding stimuli, they may undergo varied forms of cell death.

      While not presented in our figures, there is also a DNA stain. We attempted to add a 5th marker for apoptosis using anti-cleaved caspase 3 but the results were not interpretable. However, in vitro we clarified that 3D9 staining cells were negative for cleaved caspase 3, an apoptotic marker (Figure 8-figure supplement 2).

      1. NETs are believed to be pathogenic because this process has been associated with specific pathologies, e.g. infection, autoimmunity and cancer. However, the detection of NETs in any inflamed tissue suggests that this process is driven in response to any non-specific inflammatory stimuli. To clarify this discrepancy, it will be useful to know if the inflamed tissues are from specific diseases associated with the production of NETs.

      We thank the reviewer for highlighting this. Indeed, much of the literature refers to the pathogenic nature of NETs, but, as with most actions of the immune system, there is usually a balance or threshold, a level of inflammation and possibly NET formation that is appropriate for the host defence which later requires the resolution of inflammation and repair. Unfortunately, in this paper the samples are not from diseases where NETs are considered one of the drivers of pathology – e.g. lupus, and are instead from tissues characterised as ‘inflamed’. The appendix and gall bladder tissues in this study come from patients with appendicitis (the gall bladder was extracted at the same time). The tonsil and kidney samples come from a commercial source and are only noted as ‘inflamed’. A future line of research could be to compare citrullinated histones and 3D9 staining in diseased tissues and to assess if one type of NETs is more dominant in specific diseases or disease states.

      We added further detail of the descriptions to figure legends and methods.

      line 1086 Figure 9….(A) human tonsil, denoted ‘normal’ by commercial provider but showing infiltration of neutrophils demonstrating an inflammatory event. (B & C) human kidney, denoted ‘inflamed’ by commercial provider.

      line 1089 Figure 10. Comparison of Clipped H3, H3cit & H2B staining in the gallbladder from an appendicitis patient.

      Line 1094 Figure 11. Comparison of Clipped H3, H3cit & H2B staining in the appendix of an appendicitis patient

      Methods Line 722-725 Human tonsil (denoted normal but showing neutrophil infiltration) and inflamed kidney paraffin tissue blocks were purchased from AMSbio. Inflamed tissue from a gallbladder and appendix was obtained from archived leftover paraffin embedded diagnostic appendicitis samples.

      Reviewer #3 (Public Review):

      The authors have successfully characterized a specific mechanisms that occurs during NET-formation: a NET-specififc histone H3 cleavage event. The monoclonal antibody 3D9 detects evidence of the proteolytic events that occur in NETosis -the proteolytic signature, histone cleavage at H3R49. Based on this finding they have developed a new method to detect and quantify NETs and differentiate NET-formation from apoptosis or necroptosis. The method can be used to stain mixed cell populations or also human tissue material.

      The major strength of this manuscript is that it gives mechanistical insight into NET-formation and presents at the same time a novel techique that shows several advantages compared to existing techniques.

      The methods and results are presented in detail and well controlled and presented.

      We thank the reviewer for their consideration and appraisal of our manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript by Nkosi et al. presents SARS-Cov-2-specific CD4 and CD8 responses in people living with HIV in South Africa two to four weeks after having experienced COVID-19. The authors look at the magnitude of the SARS-CoV-2 T cell responses in the three groups, as well as the T cell response breadth and cross-reactivity to a SARS-CoV-2 variant. The authors show that HIV treatment naïve people had diminished SARS-CoV-2-specific T cell responses compared to healthy individuals and correlated with immune activation and HIV plasma viral load. This observation is not unexpected as we know very well that untreated HIV infection dampens immune responses in general. Importantly, people under suppressive ART mounted SARS-CoV-2-specific T cell responses comparable to healthy people emphasizing the importance of HIV control by ART. Overall, although the message is not new, has limited interest to the field, and does not assess B cell responses, the data presented are clear and bring additional knowledge on T cell responses against SARS-CoV-2 in people living with HIV.

      We thank the reviewer acknowledging the contribution of our to the field regarding SARS-CoV-2 in people living with HIV. We respectively disagree with the notion that our findings are not new because like other reviewers have pointed out, this study is the first to report significant reduction in cross-recognition of SARS-CoV-2 variants in individuals with unsuppressed HIV using a new and more robust assay for detecting low frequency responses. Importantly, we identify 4 spike mutations in Beta that abrogates T cell recognition by responses raised against wildtype.

      Reviewer #3 (Public Review):

      In this study, the authors investigated the impact of HIV infection on T cell immune responses to SARS-CoV-2 infection. To do this, in vitro stimulation with control and strain-specific peptides was used to activate T cells, and the secretion of IL2, IFNg, and TGFa was used as a proxy for T cell-mediated function. The authors also attempted to define peptide specificity to establish the breadth of the T cell responses and which mutations were responsible for any loss of cross-recognition. The results show that individuals with unsuppressed HIV infection defined by a viraemia above 1000 viral copies per ml, had poorer T cell polyfunctionality compared to those who were HIV negative or avireamic. Unsuppressed HIV-infected individuals also had lower cross-reactive responses to SARS-CoV-2 variants dominating the first and second COVID-19 waves in South Africa. Contrary, aviremic HIV-infected individuals had similar responses to those observed in healthy individuals. The conclusions of this paper are well supported by the data. Using a flow cytometric approach and bulking of T cells enabled phenotypic and peptide-specific analysis. It is however worth noting that HIV infection may have a direct impact on the survival of cells in long-term cultures and outcomes from those assays may be more reflective of invitro survival than the true in vivo situation. In addition, previous studies have shown notable levels of cross-reactive responses of SARS-CoV-2 and other human coronaviruses present prior to the pandemic, it is surprising that very low levels of cross-reactivity were observed across SARs-CoV-2 variants even in the healthy individuals.

      This paper addresses a critical issue within the African context where HIV is prevalent and may have a direct impact on the continent's success in controlling SARS-CoV-2 infections. The low cross-reactive responses more so in individuals with unsuppressed HIV reduce the benefits of the first-generation SARS-CoV-2 vaccine warranting additional considerations of emerging variants for future vaccine development. Unsuppressed HIV infection also places this population at increased risk of infection and more severe form of SARS-CoV-2 disease from future emerging variants. It is therefore important that uninterrupted ART is available to maintain viral suppression in HIV-infected individuals. Generation of second line SARS-Cov-2 vaccine designs will have to consider emerging variants and what are the true longer-term benefits of vaccination.

      We thank the reviewer for highlighting the importance of this study particularly in the context of African countries with high HIV burden. A detailed explanation for the discrepancy between this study and others in provided in response to reviewer 1. Regarding the comment on cell survival, our SEB positive controls shows similar cell survival between HIV negatives and viremics.

    1. Author Response

      Reviewer #1 (Public Review):

      1.1) Dixon and colleagues aim to fill a major gap in our understanding of the epidemiology of human disease caused by Taenia solium (taeniasis and cysticercosis), a major food-borne zoonotic cestode. They use a rather heterogeneous dataset comprising "age-prevalence" data to fit catalytic models and infer a key epidemiological parameter, the force of infection of taeniasis and cysticercosis.

      The authors are to be commended for exploring the scarce information regarding the prevalence of Taenia antigens and antibodies in different endemic settings. It remains unclear to me why were the much more numerous studies relying on fecal egg detection for diagnosing taeniasis not included in their analysis. One reason might be that their main focus is on T. solium infection and classic parasitological diagnosis cannot distinguish between T. solium and the even more common human pathogen, T. saginata - but the most common coproantigen detection method used worldwide (Allan et al., 1990) also fails to reliably distinguish between these two species.

      The first clear limitation of the primary datasets analyzed for addressing the prevalence of taeniasis is that they combine infections with different species of Taenia - T. solium, T. saginata, and perhaps T. asiatica.

      We thank the reviewer for this summary and for acknowledging how our study attempts to fill a key research gap, using scarce and heterogeneous data sources. We agree that the challenges around measuring T.solium taeniasis infection in humans are based around a lack of species-specific diagnostics. The reviewer rightly points out that our main justification for not including faecal egg detection is the inability to distinguish at the species level, with this currently remaining a challenge for the copro-antigen-based diagnostics.

      We have included copro-antigen-based surveys (but not included surveys based on faecal egg detection, see below) because, for measuring T. solium-specific infection, the copro-antigen diagnostic characteristics (sensitivity and specificity), including for adapted protocols (e.g., Allan et al. 1990), are readily available to inform the necessary diagnostic adjustment process in the model. This is not the case for faecal egg detection approaches. It is our contention that there would be wide variation in protocols for faecal egg detection methods (without associated sensitivity and specify estimates), which would make inclusion of the diagnostic adjustment step very challenging and uninformative. Including the diagnostic adjustment for copro-antigenbased surveys therefore provides a more robust approach to tackling the issue associated with cross-reactivity (suboptimal specificity), which is then reflected by further uncertainty in the force-of-infection (FoI) estimates.

      1.2) Second, they combine diagnostic data based on coproantigen and antibody detection for modeling the force of infection of taeniasis. These are data of a completely different nature. Although the authors use reverse catalytic models to account for "infection loss", they are coping with different biologic processes classified under "infection loss" - the slow decline in antibody responses vs. the sudden clearance of coproantigens following treatment or spontaneous worm elimination. In areas of high endemicity, people may be often reinfected ("infection acquisition") but antibody seroconversion rates will grossly underestimate reinfection rates if many individuals remain seropositive at the time they are reinfected.

      We thank the reviewer for raising this important distinction to shed light on the interpretation of models fitted to either antigen-based surveys (e.g., copro-antigen), which we use as a marker of infection acquisition (λinf), described on lines 530-532, and models fitted to antibody-based surveys (e.g., rES33-immunoblot for taeniasis), indicating seroconversion (λsero), as described on lines 529-530. The reviewer is correct that we assume these underpin different biological processes. In Figure 1 (note, now Figure 2), we qualify how we interpret these two quantities, λinf and λsero, in conjunction with the structure of the catalytic model configurations. We agree that the additional parameter in the reversible model, seroreversion (ρsero) or infection loss (ρinf), refer to different mechanisms, and we define these two parameters differently depending on whether the reversible model is fitted to antibody-based or antigen-based datasets. However, we agree that we could make this clearer in Table 1, by reinforcing in the Table title that models fitted to antibody data aim to estimate seroconversion and seroreversion (exposure), while models fitted to antigen data aim to estimate infection acquisition and infection loss (active infection). The title for Table 1 title has therefore been amended as follows:

      “Table 1. Parameter posterior estimates for the best-fit catalytic models fitted to human taeniasis age-(sero)prevalence datasets (ordered by decreasing all-age (sero)prevalence). Parameters estimated from antibody-based datasets measure exposure dynamics, with seroconversion λsero and seroreversion ρsero rates. Parameters estimated from antigen-based datasets measure active infection dynamics, with infection acquisition λinf and infection loss ρinf rates.”

      We do not currently include a model capable of estimating seroreversion and infection loss rates from a single dataset, so we keep our interpretations separately. The reviewer points that seroconversion rates may underestimate infection loss rates, which is true where seroconversion rates are slow compared to resolution of natural infection, although we are not equating infection loss rates with seroconversion rates and keep these interpretations distinctly (see Figure 2). This is, however, an important consideration to include in the Discussion, so we have added the following text:

      “In addition, reinfection of individuals with the adult tapeworm is also likely to occur, particularly in high-endemicity settings; therefore, the persistence of antibodies against the adult worm is likely to complicate the measurement of reinfection rates (where antibody seroconversion is equated to infection, although we take care to differentiate between these two processes when interpreting the λsero and λinf parameters). However, with the limited number of HTT-based surveys available to estimate antibody seroconversion and duration of antibody parameters, it is difficult to determine to what extent this is an issue.” (lines 386 – 395)

      1.3) The "human cysticercosis" component of the study also relies on antigen and antibody detection. The diagnostic methods are assumed to be both species-specific (i.e., they distinguish between T. solium and T. saginata) and, even more critically, to be stage-specific (i.e., they distinguish between antibodies elicited by exposure to T. solium cysticerci and those elicited by adult worms). This appears to be the case of the classic EITB assay, but it remains unclear whether the diagnostic method (López et al.) used in the large, nationwide Colombian dataset is sufficiently species- and stage-specific.

      We thank the reviewer for these comments. More generally, Taenia saginata does not cause cysticercosis in humans, but cross-reactivity with other parasites is an important consideration (see below). The assays on which the data we analysed are based, are generally highly (species- and stage-) specific, with values >90%, identified in the literature, being broadly consistent with our posterior estimates of diagnostic specificity. The sensitivity values (>80%, reported in the literature) are also in agreement with our posterior estimates of diagnostic sensitivity. However, we thank the reviewer for requesting more information on the López et al. (1988) human cysticercosis antibody diagnostic.

      The Flórez Sánchez et al. 2013 paper provides further background on the López et al. test, indicating: “diagnostic tool to determine exposure to parasitic infection through the ELISA test for the detection of anti-cysticercus immunoglobulin G (IgG) antibodies” (translated from Spanish). In addition, the test was “standardized and evaluated in Colombian patients with parasitologically proven neurocysticercosis, with a sensitivity of 100% and a specificity of 97.6% in serum samples and 100% in both values with CSF samples”. The authors also state that “in its standardization, cross-reactions with different infectious agents such as Taenia saginata, Hymenolepis nana, Echinococcus sp., Fasciola hepatica, Entamoeba histolytica, Ascaris lumbricoides, Mansonella ozzardi, Treponema pallidum, Cryptococcus neoformans and HIV were evaluated, which were discarded”.

      We therefore conclude that the López et al. assay is both species- (T. solium) and stage- (to cysticerci) specific.

      1.4) Finally, the brief description of the source studies overlooks basic information. Were study participants randomly sampled in each study site? What about sampling units - individuals or households? Are study sites representative of the countries?

      We agree with the reviewer that this additional information should be included, which we now introduce into Supplementary Table S1 (under the new column “Study design, sampling strategy and representativeness”).

      To summarise, study participants were randomly selected in 8 studies, and in 4 studies all eligible participants in study sites (e.g., specific village) were selected. In 3 studies (Moro et al. 2003; Nguekam et al. 2003; Weka et al. 2013), non-random sampling was performed or information was not available to assess the methodology adequately. The unit of randomisation was the household in 3 studies (as a first sampling stage) followed by all eligible household members being sampled (Gomes et al. 2002; Conlan et al. 2012; Wardrop et al. 2015). In 2 studies, households were randomised first, then one household member was randomly selected (Holt et al. 2016; Sahlu et al. 2019). In the study by Flórez Sánchez et al. (2013), three-stage sampling was conducted. In 1 study individuals were the units of randomisation (Edia-Asuke et al. 2015), and in 1 study the sampling information was not sufficiently clear (Theis et al. 1994).

      Although there was somewhat limited information to determine how representative of the countries the studies were, in several studies the authors indicated that the study sites, selected from different areas, were representative of specific socio-economic factors across a region (e.g., Conlan et al. 2012; Jayaraman et al. 2011). Other study sites were selected based on prior knowledge of the presence of high-risk factors for T. solium or prevalence of porcine cysticercosis (Sahlu et al. 2019; Kanobana et al. 2011; Edia-Asuke et al. 2015).

      1.5) Given the potential limitations inherent to the datasets analyzed, it remains uncertain whether the authors can provide "global force-of-infection trends" derived from a small number of studies with different diagnostic approaches - although they can surely describe productive ways of interrogating available data, point to their limitations and suggest standardized study designs that might generate better data for future pooled analyses.

      We agree and thank the reviewer for this feedback. We have addressed the issue of representativeness of each study (where information is available) under Authors’ response 1.4. In several studies, the authors indicated that the study sites, selected from different areas, were representative of specific socioeconomic factors across a region (e.g., Conlan et al. 2012; Jayaraman et al. 2011). We acknowledge that it was difficult to determine how representative some of the other studies were at country level. Having said that, we strongly contend that our analyses, taken in their entirety, do indicate substantial variation in FoI/seroreversion or infection loss estimates across a range of different epidemiological settings representing the major global endemic areas (e.g., South America, sub-Saharan Africa and Asia). The distribution of study sites is visually presented in Figure S2, which is now Figure 1– figure supplement 1: Geographical distribution of studies with human taeniasis (HTT) and human cysticercosis (HCC) age-(sero) prevalence data included in the final analysis (n = 16) by diagnostic method. For this reason, we believe this study reflects the variation in global trends, and therefore propose modifying the manuscript title as follows: “Global variation in Force-of-Infection trends for Human Taenia solium Taeniasis/Cysticercosis”.

    1. Author Response

      Reviewer #3 (Public Review):

      S Luo and co-workers asked whether NKT17 subset had lineage-specific requirement/s for thymic development beyond what is currently known. Further, they determined what role such a requirement played in activating NKT cells in vivo and in vitro. The strength of the report is the finding that DR3 functions as a selective co-stimulator of NKT17 subset. Experiments appear well-thought out and executed, and the emergent data reasonably carefully interpreted. Some points to consider:

      1) The statement in the abstract and elsewhere that "However, the molecular mechanisms that drive the thymic development and subset-specific activation of NKT17 cells remain mostly unknown" is incorrect. It is better to say, "Much is known yet how this subset develops in the thymus and is activated in the periphery is incompletely understood."

      We thank the reviewer for this suggestion. Accordingly, we have changed the original phrase in the abstract to “How NKT17 cells develop in the thymus and what stimulatory signals would trigger their activation remain incompletely understood. “ (please see line 35-36).

      2) In this regard, if subsets are already formed, why should there be a subset-specific mechanism/s of activation beyond affinity thresholds? The literature suggests that different routes of bacterial inoculation results in the activation of all subsets within a tissue where infection has occurred.

      This is an interesting question. Because all iNKT cells express the canonical invariant TCRα (Vα14-Jα18 in mice), it is reasonable to argue that the same microbial antigen would trigger the activation of all iNKT cells within a given tissue. On the other hand, there are clearly some cytokine receptors and co-stimulatory molecules that are specifically expressed on individual subsets, such as CD122 for NKT1 cells [Lee YJ et al., 2015, Nat. Immunol] and ICOS for NKT2 and NKT17 [Cameron G., 2018, Immunol. Cell Biol] etc., suggesting their role as subset-specific costimulatory molecules. Our identification of DR3 as a thymic NKT17 specific co-stimulatory agent is in support of the perspective that each iNKT subset might require selective co-stimulatory signals for their full activation. We regret that we cannot fully address this question in the current study, but we aim to expand on this question in the near future.

      3) If there is subset specific activation, does this mean that downstream responses from DR3 activation of NKT17 cells prevents the activation of NKT1 & NKT2 subsets? Otherwise, how does one reconcile with the inability of alphaGalCer to activate NKT1 & NKT2 subsets?

      This is an excellent point that we aim to address in our follow-up studies. It is well established that the generation of different iNKT subsets require distinct strengths of TCR signaling, whereby NKT2 cells depend on strong agonistic TCR signals while NKT1 cells are less so. Whether DR3-activated NKT17 cells would prevent or dampen TCR signaling in other iNKT subsets, and whether there is a subset-specific effect of NKT17 cells on NKT1 versus NKT2 cells is unclear to us. However, we will investigate this point as we have now secured DR3-deficient mice for our studies. We are currently in the process of back-crossing DR3-KO mice onto the BALB/c background, which will require several more months; we will then examine this issue in further detail. At this point, we consider assessing this issue as beyond the scope of this manuscript.

      4) The statement "However, the role of CD138 in NKT17 cell biology remains mostly unclear" is incorrect as it was recently reported that CD138 serves are a NKT17 subset-specific marker but the development and function of NKT17 cells do not depend on CD138! So also, results presented herein also supports this view of CD138 about NKT17 cell development and function-nothing new here.

      Here, the reviewer is referring to our statement in line 60, where we are also citing the two publications that have studied the role of CD138 (Syndecan-1) on NKT17 cells in detail. Both studies report that CD138 is specifically expressed on NKT17 cells but that it is not required for their generation. Our statement was intended to highlight this point. To dampen down, we have now modified this sentence to “However, the role of CD138 in NKT17 cell biology is not yet fully understood and remains to be resolved”.

      5) The MS requires proper editing: e.g., "[Please add: Luo S., 2021, JCI Insight];" this incompleteness was found in the introduction.

      We apologize for this oversight. The reference is now properly formatted (line 61), and we have reviewed the revised manuscript multiple times to avoid other mistakes.

      6) Please provide original references to "Because NKT17 cells are the major producers of IL-17 in the thymus and in barrier tissues, such as the lung and skin ..."

      Here, the reviewer is referring to a sentence from the Introduction (line 61-62). The tissue distribution of NKT17 cells has been previously reported, so that we added the original reference [Lee YJ et al., 2015, Immunity] to the text (line 62) as requested.

      7) Whilst "unveiling a new layer of control in NKT17 cell biology" is quite interesting, it is not as surprising as relayed! That NKT cells use second signals to elaborate type I immune responses has been known for at least a couple of decades now.

      We concur that a second signal to control iNKT cell activation has been previously observed and documented. The novelty of our findings, however, lies in the fact that we identified a co-stimulatory signal that is specific to NKT17 cells and that this co-stimulatory signal is conveyed through the cytokine receptor DR3. Thus, we wish to keep this statement in our manuscript, and we hope that the reviewer agrees.

      8) "... we identified the TNF receptor superfamily member 25 (TNFRS25), also known as DR3 (Meylan et al., 2011), being highly expressed on thymic NKT17 cells (Figure 1A and 1B)" while true of BALB/c thymuses, seems less true of C56BL/6 thymuses based on their figure s1. This should be clearly stated in the results. Strain differences in NKT cell content and relative ratios of the subsets are known; hence, it is important to indicate of which strains a particular property/ies is true.

      We appreciate the reviewer’s comments on the strain difference. Prompted by the reviewer’s suggestion, we have expanded our analysis to peripheral organs (LN and lung) of both BALB/c and C57BL/6 mice. These experiments include the co-staining of DR3 and CD138 in thymic iNKT cells in C57BL/6 mice (Figure 1 -figure supplement 3A), and DR3 staining in NKT17 versus non-NKT17 (NKT1 and NKT2 cells) in the thymus, LN and lung of C57BL/6 and BALB/c mice (Figure 1 -figure supplement 3B). Consistent with the data showed by intracellular staining (Figure 1 – figure supplement 1), we found that DR3 is highly expressed on thymic NKT17 cells of both BALB/c and C57BL/6 mice. However, DR3 expression substantially increased on non-NKT17 cells in peripheral tissues, indicating that highly selective DR3 expression on NKT17 cells is rather limited to thymus-resident NKT17 cells. We are now highlighting and explaining these observations in the Results section.

      9) It is indeed surprising that the cytokine profile post alphaGalCer+anti-CD3 stimulation was not assessed.

      In our original submission, we had omitted the cytokine expression part to streamline the narrative and to focus more on the DR3-induced activation part of the thymic iNKT cells. We realize that this was not a wise decision as it raised more questions than explaining our finding. Therefore, we now added data assessing the cytokine production after -GalCer and anti-DR3 stimulation. We found that -GalCer stimulation in the presence of anti-DR3 significantly augmented IL-17 production in thymic NKT17 cells compared to -GalCer stimulation alone (new Figure 3E). Such co-stimulatory effect of DR3 was specific to NKT17 cells and IL-17 production, because we did not find an increase for IL-4 production in non-NKT17 cells (Figure 3 – Figure supplement 5B).

      10) And lastly in a similar vein, a mechanism and the in vivo relevance of this curious co-stimulatory finding remain wanting.

      These are indeed important questions, and the reviewer reiterated these issues below in the “Recommendations for the authors”. Thus, we are providing answers to these issues, e.g. the mechanism and the in vivo relevance, in the section below. Of note, we acquired new mouse models (Nur77-GFP reporter mice and TL1ATg mice) and performed a series of new experiments to address these points, and we hope that these new data have satisfactorily addressed the reviewer’s concern.

    1. Author Response

      Reviewer #2 (Public Review):

      -Were there any post-translational modifications (phosphorylation etc) or endogenous lipids that need to be quantified to make sense of the data?

      A percentage of receptors could be phosphorylated; therefore, our results represent the average behavior of the population. This is a noteworthy point and we have now explicitly discussed this idea in the revised the manuscript.

      In the in vivo experiments, heterogeneity in PTMs or local lipid environment of receptors could affect conformational change at the individual receptor level. For our analysis we integrate the intensities over the whole cell membrane, so the results represent the average behavior. Likewise, in the single-molecule FRET experiments many individual receptors are included in the analysis. Additionally, since the receptors are purified in the in vitro experiments, there is no further change in PTMs with application of drugs. We have added a sentence in the discussion to highlight the potential heterogeneity in PTMs and local lipid environment. We have also added a sentence to the methods to clarify how in vivo experiments are analyzed.

      Added to line 512 in discussion section: “Potential sources of heterogeneity arising from differences in post-translational modifications or differences in the local lipid environment, may affect receptor conformation. Therefore, our results represent the average of a heterogeneous population of such receptors.”

      Changed line 667 to: “ROIs used for analysis included the whole cell membrane for individual cells.”

      -mGLUR2 is a dimer. I was expecting that at 15 uM of Glutamate, for example, one might see effects of a single protomer-bound receptor. If I'm not mistaken, some class C receptors don't activate their CRDs until both ligand binding sites in the VFT are bound. Looking at all of the profiles in the VFT, CRD, and 7TM, I don't see any evidence of the 2-site binding of glutamate at the VFT. Presumably, there are Hill slopes for all of these profiles?

      Based on our previous work with the wildtype and with the receptor containing one glutamate binding deficient monomer, and available structures, indeed CRD domains do not significantly visit the active state unless both VFT domains are bound to glutamate and in the closed conformation. However, because activations involve progression through 2 intermediate states, we still expect to see FRET change even when both VFT domains are not occupied simultaneously. We have now revised Table 1 to included Hill slope. This data shows that cooperativity is generally observed for the FRET sensors for all the ligands tested.

      Reviewer #3 (Public Review):

      -The main concerns I had were with respect to labelling stoichiometry of the mixed Cy3/Cy5 compounds or SNAP-tag labels. How was this controlled? Clearly, both label cells, as shown in supplemental data and the single molecule FRET data support that both sites are labelled. Are there any concerns about larger molecular complexes such as oligomers that may confound the simple interpretation of interactions between the dimers?

      Among class C GPCRs, only GABA receptors have been shown to be able to potentially form efficient oligomers. Subunit counting experiments have shown that mGluR2 is predominantly dimer (> 90%) on the plasma membrane for the experimental conditions used in this manuscript (Levitz et al., 2016). The same result was obtained from live-cell FRET utilizing a dimer trafficking-control system (Maurel et al., 2008). This work also demonstrated that FRET occurred strictly for dimeric receptors labeled by both donor and acceptor fluorophores and not between neighboring receptors at the plasma membrane. Thus, receptors labeled with donor-only or acceptor-only do not contribute to the relative ΔFRET signal in response to treatment.

      -Some additional context might be a discussion of approaches used and results obtained for other types of conformational biosensors for GPCRs in other classes? Can we learn anything by comparison?

      We have revised the manuscript to include further discussion of results obtained from the use of other conformational sensors.

      Added to line 502: “Recent experiments have shown that GPCRs are dynamic (Nygaard et al., 2013) and undergo transition between multiple conformational states, including multiple intermediate states. For class A GPCRs, studies using conformational biosensors based on nuclear magnetic resonance (NMR) spectroscopy (Huang et al., 2021), double electron-electron resonance (DEER) spectroscopy (Wingler et al., 2019), smFRET (Gregorio et al., 2017), and fluorescent enhancement (Wei et al., 2022) have revealed the importance of conformational dynamics for receptor activation, ligand efficacy, and biased signaling.”

      Added to line 536: “Interestingly, the regulation of intermediate state occupancy has recently been shown to be a mechanism of allosteric modulation for other classes of GPCRs as well. NMR studies on the μ-opioid receptor (Kaneko et al., 2022) and cannabinoid receptor 1 (Wang et al., 2021) revealed that PAMs and NAMs regulate receptor function by acting on intermediate conformations in a manner similar to our findings for BINA and MNI-137. Collectively, these results suggest that designing compounds that regulate intermediate state occupancy is a plausible strategy for the development of allosteric modulators for mGluR2 and other families of GPCRs.”

    1. Author Response

      Reviewer #1 (Public Review):

      This study addresses the important question of understanding the cellular physiology of cholinergic interneurons in the striatum. These interneurons play a key role in learning and performance of motivated behaviors, and are central to movement disorders, psychiatric disease, and addiction. Their unique physiology, which includes tonic pacemaking activity and active conductances that shape integration of dendritic inputs, is critical to their function but is still incompletely understood. The authors cleverly integrate a series of innovative electrophysiological and optical approaches to gain insight into dendritic physiology of these neurons. Their creative approach yields some interesting and novel findings. However, there are technical and conceptual concerns that need to be addressed before these results can be readily interpreted. Some refinement of analysis and presentation, and potentially some additional experiments, will therefore be required to strengthen the conclusions and facilitate interpretation of the results.

      We believe that with several new sets of experiments and simulations, we have successfully refined the analysis and addressed the technical and conceptual problems. Indeed, we strengthened the conclusion with a novel pharmacological experiment that provided model-independent evidence of proximal-only boosting.

      Major concerns:

      1) This manuscript focuses on differential physiology of proximal and distal dendrites contribute to physiological activity and integration of inputs in cholinergic interneurons, suggesting that NaP and HCN currents act in concert to selectively boost inputs onto proximal dendrites (from thalamus), relative to inputs onto distal dendrites (from cortex). The results presented in Figures 1-4 are consistent with a distinct physiology of proximal-vs-distal dendrites based on purely electrical properties. Indeed, Figure 5 initially appears consistent with this model as well, since thalamic inputs (onto proximal dendrites) are boosted by an NaP conductance, while cortical inputs (onto distal dendrites) are not. This raises a key conceptual question: why are cortical inputs onto distal dendrites not boosted? Any depolarization of distal dendrites must pass through proximal dendrites before reaching the recording electrode at the soma. Shouldn't this signal be subject to the same active and passive conductances, and consequently the same boosting that shapes thalamic inputs onto proximal dendrites?

      You are absolutely right in the case of a linear model (passive or quasi-linear). However, for a nonlinear system, there can be preferential boosting of proximal inputs. The new Appendix 2, addresses this point with computer simulations.

      2) The quasi-linear approach to characterizing active and passive membrane properties is promising, and the choice of a cable-based model is well supported. However, the model itself is rather opaque, which limits confidence in the interpretation of the results. Additional analysis and description should be presented to alleviate concerns about whether the experimental data, which has a limited number of measurable values, may be over-fit by a model with too many free parameters. For example, why is the radius of the dendrite a free parameter that is allowed to vary in the full field vs proximal experiment (Lines 253-256) - and isn't it a serious red flag that the value returned for proximal dendrites is smaller than for the full field? Additional tables (e.g. fixed and free parameters and how they were determined), and figures (plots of how those parameters influence the fits, and how the parameters interact with one another) would considerably strengthen confidence in the conclusions drawn by the authors.

      Thank you very much for this comment. We have added in the new ms a table with all the parameters fit in the various figures, and have discussed the possible pitfalls of overfitting. Most importantly, we have provided a new appendix (#1) to the manuscript that explains the effects of the various model parameters in a systematic fashion, beginning with a passive dendrites, followed by the effects of boosting and then the effect of restorative currents that give rise to resonances. This appendix addresses the questions raised by the reviewer regarding how the various parameters influence the fits.

      We apologize, if we created a confusion, with respect to the meaning of the parameter r. It does not represent the radius of the dendrites (which is not explicitly represented at all, only implicitly through the space constant) but rather the electrotonic range of illumination. We indeed find that the fits consistently estimate a value of r for the proximal illumination which is smaller than that estimated for the full-field illumination, as it should.

      Finally, our new pharmacological demonstration of differential boosting in the case of proximal vs. fullfield illumination (see above) is entirely independent of the quasi-linear model fit. So for the main thrust of the ms, which is to demonstrate a proximal localization of nonlinearities and its correspondence to the spatial localization of excitatory afferent inputs, this is now achieved, at least vis-à-vis the NaP current, independently of the qausilinear model. However, we still find the model useful as it is used to estimate the distribution of HCN currents and provides a framework to think about how to manipulate dendritic nonlinearities experimentally.

      3) Technically, the use of ChR2 to modulate dendritic currents is creative. While the authors rightly acknowledge that activation/deactivation kinetics of the ChR2 channel will contribute to filtering, this important point should be expanded with additional analysis and potentially with new experiments. Of particular concern is the transition of ChR2 channels to an inactivated state over the comparatively long oscillating light pulse in Figure 3 Inactivation of ChR2 is prominent over this timescale and would precisely co-vary with the shift in oscillation frequency. To address this, the authors should present a direct measurement of this inactivation and account for it in their analysis of the chirp data. Alternatively, the chirp stimulus could be presented backwards (starting at high frequency), so that comparison of forwards-vs-backwards chirp recordings could disentangle this artefact. Either one or both of these additional experiments would be critical for interpreting the roll-off in photocurrent responses at high frequencies reported in Figure 3.

      Touché! You were spot on with this critique and we were wrong. We have now conducted several new experiments (that appear in the main text and in Figure 3 and all its supplements) that show that including ChR2 kinetics explicitly in the model fits actually makes the fits more self-consistent and removes some of the glaring differences between the results from the somatic voltage perturbations (Figures 1–2) and the optogenetic illumination (Figure 3). So as per your request, we have now presented a direct measurement of the deactivation (Figure 3–figure supplement 1) and we have played the “chirp” backwards (Appendix 1–figure 2) to address the issue of inactivation.

    1. Author Response

      Reviewer #1 (Public Review):

      In this article, the authors investigated the role of sleep and brain oscillations in visual cortical plasticity in adult humans. The authors tested the effect of 2 hours of monocular deprivation (MD) on ocular dominance measured by binocular rivalry. In the main MDN session, MD was performed in the late evening, followed by 2 hours of sleep, during which EEG was measured. After the sleep session, ocular dominance was measured, which was followed by 4 hours of sleep, then ocular dominance was measured again in the morning. The results show that the effect of MD was preserved 6 hours after MD. The effect of MD correlated with sleep spindle and slow oscillation measures. The questions asked by the study are timely and findings are important in understanding the visual cortical plasticity in human adults, but I have some concerns regarding the experimental design, analysis, and interpretation of the results, which are listed below.

      Thank you for the positive summary of our results.

      • The authors investigated EEG activities in the central and occipital regions. The results of the relationship between slow oscillations / sleep spindles and deprivation index are very interesting. However, it appears that the activities were averaged across hemispheres in the occipital region. Previous studies (e.g. Lunghi et al., 2011; Binda et al., 2018) have demonstrated that MD is associated with up-scaling of the deprived eye and with down-scaling of the non-deprived eye (page 11). I wonder whether sleep slow oscillations and / or spindles are modulated locally in the deprived occipital region? To answer the first question raised by the authors (how MD affects subsequent sleep), wouldn't it be important to compare between deprived vs. non-deprived regions?

      In humans, the pure monocular recipient cortical regions are very small and represent only very far visual periphery. These regions are impossible to be located by EEG and they are also difficult to locate also with high resolution fMRI (ref to Koulla CB). Visual cortical organization is based on the visual field map: neurons whose visu.al receptive fields lie next to one another in visual space are located next to one another in cortex, forming one complete representation of contralateral visual space, independently of the eye from which the visual information comes. However, at finer scales ocular dominance columns exist and Binda et al (2018) showed that in adult humans MD boosts the BOLD response to the deprived eye, changing ocular dominance of V1 vertices, consistent with homeostatic plasticity. All these are well known facts to the visual community, and we believe are not worthwhile to discuss them.

      • To answer the second question (how sleep contributes to consolidation of visual homeostatic plasticity), the authors compared the deprivation index between two sessions, the main MDN and a control MDM session. The experimental designs for these two sessions were quite different. For example, MD was conducted in the evening in MDN, whereas it was conducted in the morning in MDM. Since there may be circadian effects on plasticity (Frank, 2016), the comparisons between these sessions may not be sufficient in investigating the effect of sleep itself (it could be merely due to circadian effect).

      Thank you for raising this important issue. We performed the dark exposure experiment in the morning because we wanted to minimize the occurrence of sleep during the two hours spent by participants lying down in complete darkness. Preventing sleep under these conditions in the late evening would have been extremely challenging. In order to investigate a possible influence of the circadian rhythm on visual homeostatic plasticity and its decay over time, we have performed an additional experiment. In this experiment, we have tested the effect of 2h of monocular deprivation in the same participants either early in the morning or late at night (at a time of the day comparable to the MDnight and MDmorn conditions in the main study). We report the results of this control experiment in the supplementary materials (Figure S2). We found that the effect of monocular deprivation follows a similar timecourse for the two conditions (ocular dominance returns to baseline levels within 120 minutes after eye-patch removal). Moreover, we also report that the effect of MD is slightly (but significantly) larger in the morning, compared to the evening. The results of this experiment rules out a contribution of circadian effects and reinforces the evidence of a specific effect of sleep in maintaining visual homeostatic plasticity.

      • The authors argue that NREM sleep consolidates the effect of MD. However, consolidation may last days to months or even years (Dudai et al., 2015). Since the effect is gone in 6 hours or so, it may be difficult to interpret it as consolidation. Although the findings of the effects of sleep on ocular dominance plasticity are interesting, the interpretations of the results may need to be clarified or revised.

      We thank the reviewer for raising this issue. We agree that the data show a substantial delay in the decay process of the MD effects after the removal of the patch. The present data indicate that specifically the sleep condition and not merely darkness would be responsible for the maintenance of the MD-induced effect during the night. Therefore, we gladly adhere to the request and propose to say that sleep stabilizes/maintains the effects of MD as long as sleep itself persists. Having said that, we would like to point out that the MD boost in amblyopic patients gets consolidated for up to one year and increases across night sleep as we reported in Lunghi, Sframeli et al (2019). Although these data strongly suggest that real consolidation may occur, we agree with the reviewer that our data did not directly address this question and changed accordingly the manuscript.

      Reviewer #2 (Public Review):

      This manuscript is an interesting follow up on a substantial literature on the role of sleep in promoting critical period ocular dominance plasticity, and the role of sleep in promoting adult V1 plasticity following presentation of a novel visual stimulus. For nearly all of that literature (i.e. coming from cats and mice), the focus has mainly been on Hebbian mechanisms. The authors here propose to advance the field by investigating plasticity in adult human V1, which the authors consider to be homeostatic rather than Hebbian, and which the authors consider to be a form of sleep-dependent consolidation. This is an exciting goal, and the overall study designs and control will test the effects of brief MD and subsequent sleep or wake in the dark on V1 processing for the two eyes.

      Thank you for the positive commentary on our study.

      However, the outcomes of the study suggest that the changes observed in V1 across sleep may actually be the opposite of consolidation - rather it is decay of an effect on V1 function caused by prior wake experience (MD), which disappears over subsequent hours.

      We thank the reviewer for raising this issue. We agree that the data show a substantial delay in the decay process of the MD effects after the removal of the patch. The present data indicate that specifically the sleep condition and not merely darkness would be responsible for the maintenance of the MD-induced effect during the night. Therefore, we gladly adhere to the request and propose to say that sleep stabilizes/maintains the effects of MD as long as sleep itself persists. We have revised the entire MS through the various sections to handle this important aspect and to consider that a classic correlate of memory consolidation during sleep (spindles density) also turns out to be associated with maintenance of the MD-induced ocular dominance effect.

      The authors claim differences due to sleep, but there is not a direct statistical comparison between sleep and awake-in-the-dark controls.

      We now directly compare the effect of monocular deprivation and its decay after two hours in the sleep vs dark exposure condition (MDnight vs MDmor). We now plot the results of the two conditions in the same graph (Figure 2). We found a significant interaction effect between the factors TIME (before and after) and CONDITION (MDnight and MDmor), indicating a specific role of sleep in prolonging the decay of short-term monocular deprivation.

      There is also no quantification of sleep architecture across the sleep period, to determine whether REM or NREM play a role.

      We have provided a summary table of sleep architecture in the revised version of the Supplementary Materials. The table shows descriptive statistics of sleep architecture on MDnight and CN. Also, we report the result of the paired comparison between the nights and the Spearman correlations between the deprivation indices (DI before and DI after) and the changes between the nights in sleep architecture. Tests indicate that MD does not produce any main effect on the sleep architecture and that there are no substantial associations found between sleep architecture parameters and deprivation indices. Thus, it appears that changes in SSO and spindle frequency and amplitude did not lead to an alteration in the amount of N2 or N3 sleep, as we might expect. At the beginning of the Results section we refer to the table and to the lack of statistically significant effects.

      Finally, while there are tests of changes in NREM oscillations with previous plasticity in wake, there are no direct tests of changes across sleep - i.e. the very changes that could be considered consolidation.

      We thank the reviewer for stimulating us to investigate whether there are any NREM parameters whose change within the sleep cycle can be related to the degree of plasticity maintenance observed at the end of the two hours of sleep.

      For this aim, we 1) partitioned SSO and spindle events into tertiles according to their occurrence time, 2) estimated the average measures of events belonging to the first and last tertile, and considered the variation between tertiles as an estimate of the changes across sleep. We then tested whether there is a consistent relationship between measures of individual retained plasticity (DI after) and changes in SSO and sleep spindles across sleep.

      We did the across sleep analysis of the SSO and spindles measurements and as previously explained none of the parameters showed associations across sleep with the individual DI after sleep. We report these results in the supplementary materials (Figure S8).

      Finally is also not clear that the decay of response changes is due to homeostatic plasticity - it could be just that- decay of plasticity that occurred previously. The terminology used - e.g. consolidation, homeostatic vs. Hebbian - don't seem well founded based on data.

      Thank you for raising an important point. In our study homeostatic plasticity refers to the effect of short-term monocular deprivation (so the plasticity occurred before sleep). We have rephrased the interpretation of our results in terms of stabilization/maintenance rather than consolidation of plasticity

      About homeostatic vs Hebbian plasticity, there is a quite large agreement in the literature stating that indeed the effects are different. Now we make clear in the text that Hebbian plasticity is usually associated to the boost of most successful signals in driving a neuronal response or a behavior. Here the MD produced a boost of the unused, and probably silent, eye and as such the boost it is very difficult to explain in term of Hebbian plasticity. We make now this clear in the introduction.

      Reviewer #3 (Public Review):

      In this study, Menicucci et al. induced plastic changes in ocular dominance by applying an eye-patch to the dominant eye (monocular deprivation, MD). This manipulation resulted in a shift toward even more dominance of the deprived eye, as assessed though a binocular rivalry protocol. This effect was stabilized during sleep whereas it quickly decreases in waking (in the dark). The authors interpret the MD effect as the resultant of cortical plasticity over primary visual areas and its maintenance during sleep as the consolidation of these changes. The authors thus connect their work to the literature on sleep consolidation. They further show that the magnitude of the MD effect is positively correlated with sleep markers that are involved in memory consolidation (slow oscillations and sleep spindles).

      However, I have first conceptual issues with this study. Indeed, previous findings on the replay of memories during sleep and their consolidation were mostly obtained in hippocampus-dependent forms of learning. Here, I do not really see what is it that would be replayed. Thus, I struggle understanding how rhythms, such as sleep spindles, that have been linked to the transfer of hippocampal memories to the neocortex, would be mechanistically associated with low-level plastic changes restricted to primary visual areas. In addition, the effects were observed over occipital electrodes, where sleep spindles are far fewer and lower in amplitude than other cortical regions. Furthermore, the association between MD-related plasticity and slow oscillations is interesting but, since these slow oscillations organize sleep slow waves, the lack of correlation with slow wave is surprising.

      We agree with the review that many of our results are indeed surprising, especially those related to the involvement of the spindles and for these reasons we believe that eLife would be the appropriate journal to present our work. At present the fact that sleep spindles have been associated manly in mediating transfer of memory does not exclude a more general involvement in other sensory functions.

      Connected to these conceptual issues, I think the present work has some important methodological limitations. First of all, the analyses included a rather small number of participants, which could make some analyses, in particular correlational analyses, severely underpowered.

      We thank you for stimulating us to emphasize this limitation. In the section Participants within Materials and methods we pointed out that the complexity of the experimental design and the need to take into account the complexity of sleep expressed through different parameters, the sample size used and the need for corrections for multiple tests led to highlight only associations characterized by strong effect size.

      Secondly, the approach used to explore the correlation between plasticity and sleep features focused on subset of electrodes (ROI) defined a priori. It is therefore difficult to conclude on the specificity of the results. Given the topographical maps provided by the authors, I am wondering if a more exhaustive analysis of the effect at the electrode level could not yield more robust findings.

      The need for ROIs is based on the interindividual variability of brain structures, in particular the large anatomical variability of V1 orientation implying a variably oriented dipole and a variable maximal representation of visual potentials over electrodes from Oz to CPz. Moreover, we have to cope with the volume conduction effect that limits EEG spatial resolution.

      With these limitations in mind, we very gladly adhere to the reviewer's request to evaluate the effects on individual electrodes in more detail. To this end we have prepared supplementary figures which show boxplots and scatterplots for the electrodes inside the ROIs to evaluate main effects and associations, respectively.

      Finally, given the number of features tested, I think it is important to clarify the strategy used to correct for multiple comparisons.

      We thank the reviewer for highlighting an unclear point. In the revised version of the Statistical analyses section, we have provided missing details of the procedure used for handling false positives due to multiple testing. Basically, we applied the FDR correction for each question we asked.

      For example, “at which time points does dominance remain significantly different from baseline?” or, “which EEG feature and in which area of the scalp shows changes significantly dependent on plasticity induced by monocular deprivation?” For each of these questions, we made a group of tests (for the first example, dependent on the number of points at which ocular dominance was assessed until the morning; for the second example, on the number of EEG features examined multiplied by the number of areas in which they were assessed) to which Benjamini & Hochberg's FDR correction was then applied.

    1. Author Response

      Reviewer #1 (Public Review):

      The role of the parietal (PPC), the retrospenial (RSP) and the the visual cortex (S1) was assessed in three tasks corresponding a simple visual discrimination task, a working-memory task and a two-armed bandit task all based on the same sensory-motor requirements within a virtual reality framework. A differential involvement of these areas was reported in these tasks based on the effect of optogenetic manipulations. Photoinhibition of PPC and RSP was more detrimental than photoinhibition of S1 and more drastic effects were observed in presumably more complex tasks (i.e. working-memory and bandit task). If mice were trained with these more complex tasks prior to training in the simple discrimination task, then the same manipulations produced large deficits suggesting that switching from one task to the other was more challenging, resulting in the involvement of possibly larger neural circuits, especially at the cortical level. Calcium imaging also supported this view with differential signaling in these cortical areas depending on the task considered and the order to which they were presented to the animals. Overall the study is interesting and the fact that all tasks were assessed relying on the same sensory-motor requirements is a plus, but the theoretical foundations of the study seems a bit loose, opening the way to alternate ways of interpreting the data than "training history".

      1) Theoretical framework:

      The three tasks used by the authors should be better described at the theoretical level. While the simple task can indeed be considered a visual discrimination task, the other two tasks operationally correspond to a working-memory task (i.e. delay condition which is indeed typically assessed in a Y- or a T-maze in rodent) or a two-armed bandit task (i.e. the switching task), respectively. So these three tasks are qualitatively different, are therefore reliant on at least partially dissociable neural circuits and this should be clearly analyzed to explain the rationale of the focus on the three cortical regions of interest.

      We are glad to see that the reviewer finds our study interesting overall and sees value in the experimental design. We agree that in the previous version, we did not provide enough motivation for the specific tasks we employed and the cortical areas studied.

      Navigating to reward locations based on sensory cues is a behavior that is crucial for survival and amenable to a head-fixed laboratory setting in virtual reality for mice. In this context of goal-directed navigation based on sensory cues, we chose to center our study on posterior cortical association areas, PPC and RSC, for several reasons. RSC has been shown to be crucial for navigation across species, poised to enable the transformation between egocentric and allocentric reference frames and to support spatial memory across various timescales (Alexander & Nitz, 2015; Fischer et al., 2020; Pothuizen et al., 2009; Powell et al., 2017). It furthermore has been shown to be involved in cognitive processes beyond spatial navigation, such as temporal learning and value coding (Hattori et al., 2019; Todd et al., 2015), and is emerging as a crucial region for the flexible integration of sensory and internal signals (Stacho & ManahanVaughan, 2022). It thus is a prime candidate area in the study of how cognitive experience may affect cortical involvement in goal-directed navigation.

      RSC is heavily interconnected with PPC, which is generally thought to convert sensory cues into actions (Freedman & Ibos, 2018) and has been shown to be important for navigation-based decision tasks (Harvey et al., 2012; Pinto et al., 2019). Specific task components involving short-term memory have been suggested to cause PPC to be necessary for a given task (Lyamzin & Benucci, 2019), so we chose such task components in our complex tasks to maximize the likelihood of large PPC involvement to compare the simple task to.

      One such task component is a delay period between cue and the ultimate choice report, which is a common design in decision tasks (Goard et al., 2016; Harvey et al., 2012; Katz et al., 2016; Pinto et al., 2019). We agree with the reviewer that traditionally such a task would be referred to as a workingmemory task. However, we refrain from using this terminology because it may cause readers to expect that to solve the task, mice use a working-memory dependent strategy in its strictest and most traditional sense, that is mice show no overt behaviors indicative of the ultimate choice until the end of the delay period. If the ultimate choice is apparent earlier, mice may use what is sometimes referred to as an embodiment-based strategy, which by some readers may be seen as precluding working memory. Indeed, in new choice-decoding analyses from the mice’s running patterns, we show that mice start running towards the side of the ultimate choice during the cue period already (Figure 1—figure supplement 1). Regardless of these seemingly early choices, however, we crucially have found much larger performance decrements from inhibition in mice performing the delay task compared to mice performing the simple task, along with lower overall task performance in the delay task, indicating that the insertion of a delay period increased subjective task difficulty. As traditional working-memory versus embodiment-based strategies are not the focus of our study here and do not seem to inform the performance decrements from inhibition, we chose to label the task descriptively with the crucial task parameter rather than with the supposedly underlying cognitive process.

      For the switching task, we appreciate that the reviewer sees similarities to a two-armed bandit task. However, in a two-armed bandit task, rewards are typically delivered probabilistically, whereas in our task, cue and action values are constant within each of the two rule blocks, and only the rule, i.e. the cuechoice association, reverses across blocks. This is a crucial distinction because in our design, blocks of Rule A in the switching task are identical to the simple task, with fixed cue-choice associations and guaranteed reward delivery if the correct choice is made, allowing a fair comparison of cortical involvement across tasks.

      We have now heavily revised the introduction, results, and discussion sections of the manuscript to better explain the motivation for the tasks and the investigated brain areas. These revisions cover all the points mentioned in this response.

      Furthermore, we agree with the reviewer that the three tasks are qualitatively different and likely depend on at least partially dissociable circuits. We consider the large differences in cortical inhibition effects between the simple and the complex tasks as evidence for this notion. We also want to highlight that in fact, we performed task-specific optogenetic manipulations presented in the Supplementary Material to further understand the involvement of different areas in task-specific processes. In what is now Figure 1—figure supplement 4, we restricted inhibition in the delay task to either the cue period only or delay period only, finding that interestingly, PPC or RSC inhibition during either period caused larger performance drops than observed in the simple task. We also performed epoch-specific inhibition of PPC in the switching task, targeting specifically reward and inter-trial-interval periods following rule switches, in what is now Figure 1—figure supplement 5. With such PPC inhibition during the ITI, we observed no effect on performance recovery after rule switches and thus found PPC activity to be dispensable for rule updates.

      For the working-memory task we do not know the duration of the delay but this really is critical information; per definition, performance in such a task is delay-dependent, this is not explored in the paper.

      We thank the reviewer for pointing out the lack of information on delay duration and have now added this to the Methods section.

      We agree that in classical working memory tasks where the delay duration is purely defined by the experimenter and varied throughout a session, performance is typically dependent on delay duration. However, in our delay task, the delay distance is kept constant, and thus the delay is not varied by the experimenter. Instead, the time spent in the delay period is determined by the mouse, and the only source of variability in the time spent in the delay period is minor differences in the mice’s running speeds across trials or sessions. Notably, the differences in time in the delay period were greatest between mice because some mice ran faster than others. Within a mouse, the time spent in the delay period was generally rather consistent due to relatively constant running speeds. Also, because the mouse had full control over the delay duration, it could very well speed up its running if it started to forget the cue and run more slowly if it was confident in its memory. Thus, because the delay duration was set by the mouse and not the experimenter, it is very challenging or impossible to interpret the meaning and impact of variations in the delay duration. Accordingly, we had no a priori reason to expect a relationship between task performance and delay duration once mice have become experts at the delay task. Indeed, we do not see such a relationship in our data (see plot here, n = 85 sessions across 7 mice). In order to test the effect of delay duration on behavioral performance, we would have to systematically change the length of the delay period in the maze, which we did not do and which would require an entirely new set of experiments.

      Also, the authors heavily rely on "decision-making" but I am genuinely wondering if this is at all needed to account for the behavior exhibited by mice in these tasks (it would be more accurate for the bandit task) as with the perspective developed by the authors, any task implies a "decision-making" component, so that alone is not very informative on the nature of the cognitive operations that mice must compute to solve the tasks. I think a more accurate terminology in line with the specific task considered should be employed to clarify this.

      We acknowledge that the previous emphasis on decision-making may have created expectations that we demonstrate effects that are specific to the ‘decision-making’ aspect of a decision task. As we do not isolate the decision-making process specifically, we have substantially revised our wording around the tasks and removed the emphasis on decision-making, including in the title. Rather than decision-making, we now highlight the navigational aspect of the tasks employed.

      The "switching"/bandit task is particularly interesting. But because the authors only consider trials with highest accuracy, I think they are missing a critical component of this task which is the balance between exploiting current knowledge and the necessity to explore alternate options when the former strategy is no longer effective. So trials with poor performance are thus providing an essential feedback which is a major drive to support exploratory actions and a critical asset of the bandit task. There is an ample literature documenting how these tasks assess the exploration/exploitation trade-off.

      We completely agree with the reviewer that the periods following rule switches are an essential part of the switching task and of high interest. Indeed, ongoing work in the lab is carefully quantifying the mice’s strategy in this task and exploring how mice use errors after switches to update their belief about the rule. In this project, however, a detailed quantification of switching task strategy seemed beyond the scope because our focus was on training history and not on the specifics of each task. While we agree with the reviewer about the interesting nature of the switching period, it would be too much for a single paper to investigate the detailed mechanisms of each task on top of what we already report for training history. Instead, we have now added quantifications of performance recovery after rule switches in Figure 1— figure supplement 2, showing that rule switches cause below-chance performance initially, followed by recovery within tens of trials.

      2) Training history vs learning sets vs behavioral flexibility:

      The authors consider "training history" as the unique angle to interpret the data. Because the experimental setup is the same throughout all experiments, I am wondering if animals are just simply provided with a cognitive challenge assessing behavioral flexibility given that they must identify the new rule while restraining from responding using previously established strategies. According to this view, it may be expected for cortical lesions to be more detrimental because multiple cognitive processes are now at play.

      It is also possible that animals form learning sets during successive learning episodes which may interfere with or facilitate subsequent learning. Little information is provided regarding learning dynamics in each task (e.g. trials to criterion depending on the number of tasks already presented) to have a clear view on that.

      We thank the reviewer for raising these interesting ideas. We have now evaluated these ideas in the context of our experimental design and results. One of the main points to consider is that for mice transitioned from either of the complex tasks to the simple task, the simple task is not a novel task, but rather a well-known simplification of the previous tasks. Mice that are experts on the delay task have experienced the simple task, i.e. trials without a delay period, during their training procedure before being exposed to delay periods. Switching task expert mice know the simple task as one rule of the switching task and have performed according to this rule in each session prior to the task transition. Accordingly, upon to the transition to the simple task, both delay task expert mice and switching task expert mice perform at very high levels on the very first simple task session. We now quantify and report this in Figure 2—figure supplement 1 (A, B). This is crucial to keep in mind when assessing ‘learning sets’ or ‘behavioral flexibility’ as possible explanations for the persistent cortical involvement after the task transitions. In classical learning sets paradigms, animals are exposed to a series of novel associations, and the learning of previous associations speeds up the learning of subsequent ones (Caglayan et al., 2021; Eichenbaum et al., 1986; Harlow, 1949). This is a distinct paradigm from ours because the simple task does not contain novel associations that are new to the mice already trained on the complex tasks. Relatedly, the simple task is unlikely to present a challenge of behavioral flexibility to these mice given our experimental design and the observation of high simple task performance in the first session after the task transition.

      We now clarify these points in the introduction, results, and discussion sections, also acknowledging that it will be of interest for future work to investigate how learning sets may affect cortical task involvement.

      3) Calcium imaging data versus interventions:

      The value of the calcium imaging data is not entirely clear. Does this approach bring a new point to consider to interpret or conclude on behavioral data or is it to be considered convergent with the optogenetic interventions? Very specific portions of behavioral data are considered for these analyses (e.g. only highly successful trials for the switching/bandit task) and one may wonder if considering larger or different samples would bring similar insights. The whole take on noise correlation is difficult to apprehend because of the same possible interpretation issue, does this really reflect training history, or that a new rule now must be implemented or something else? I don't really get how this correlative approach can help to address this issue.

      We thank the reviewer for pointing out that the relationship between the inhibition dataset and calcium imaging dataset is not clear enough. We restricted analyses of inhibition and calcium imaging data in the switching task to the identical cue-choice associations as present in the simple task (i.e. Rule A trials of the switching task). We did this because we sought to make the fairest and most convincing comparison across tasks for both datasets. However, we can now see that not reporting results with trials from the other rule causes concerns that the reported differences across tasks may only hold for a specific subset of trials.

      We have now added analyses of optogenetic inhibition effects and calcium imaging results considering Rule B trials. In Figure 1—figure supplement 2, we show that when considering only Rule B trials in the switching task, effects of RSC or PPC inhibition on task performance are still increased relative to the ones observed in mice trained on and performing the simple task. We also show that overall task performance is lower in Rule B trials of the switching task than in the simple task, mirroring the differences across tasks when considering Rule A trials only.

      We extended the equivalent comparisons to the calcium imaging dataset, only considering Rule B trials of the switching task in Figure 4—figure supplement 3. With Rule B trials only, we still find larger mean activity and trial-type selectivity levels in RSC and PPC, but not in V1, compared to the simple task, as well as lower noise correlations. We thus find that our conclusions about area necessity and activity differences across tasks hold for Rule B trials and are not due to only considering a subset of the switching task data.

      In Figure 4—figure supplement 4, we further leverage the inclusion of Rule B trials and present new analyses of different single-neuron selectivity categories across rules in the switching task, reporting a prevalence of mixed selectivity in our dataset.

      Furthermore, to clarify the link between the optogenetic inhibition and the calcium imaging datasets, we have revised the motivation for the imaging dataset, as well as the presentation of its results and discussion. Investigating an area’s neural activity patterns is a crucial first step towards understanding how differential necessity of an area across tasks or experience can be explained mechanistically on a circuit level. We now elaborate on the fact that mechanistically, changes in an area’s necessity may or may not be accompanied by changes in activity within that area, as previous work in related experimental paradigms has reported differences in necessity in the absence of differences in activity (Chowdhury & DeAngelis, 2008; Liu & Pack, 2017). This phenomenon can be explained by differences in the readout of an area’s activity. We now make more explicit that in contrast to the scenario where only the readout changes, we find an intriguing correspondence between increased necessity (as seen in the inhibition experiments) and increased activity and selectivity levels (as seen in the imaging experiments) in cortical association areas depending on the current task and previous experience. Rather than attributing the increase in necessity solely to these observed changes in activity, we highlight that in the simple task condition already, cortical areas contain a high amount of task information, ruling out the idea that insufficient local information would cause the small performance deficits from inhibition. Our results thus suggest that differential necessity across tasks and experience may still require changes at the readout level despite changes in local activity. We view our imaging results as an exciting first step towards a mechanistic understanding of how cognitive experience affects cortical necessity, but we stress that future work will need to test directly the relationship between cortical necessity and various specific features of the neural code.

      Reviewer #2 (Public Review):

      The authors use a combination of optogenetics and calcium imaging to assess the contribution of cortical areas (posterior parietal cortex, retrosplenial cortex, S1/V1) on a visual-place discrimination task. Headfixed mice were trained on a simple version of the task where they were required to turn left or right depending on the visual cue that was present (e.g. X = go left; Y = go right). In a more complex version of the task the configurations were either switched during training or the stimuli were only presented at the beginning of the trial (delay).

      The authors found that inhibiting the posterior parietal cortex and retrosplenial cortex affected performance, particularly on the complex tasks. However, previous training on the complex tasks resulted in more pronounced impairments on the simple task than when behaviourally naïve animals were trained/tested on a simple task. This suggests that the more complex tasks recruit these cortical areas to a greater degree, potentially due to increased attention required during the tasks. When animals then perform the simple version of the task their previous experience of the complex tasks is transferred to the simple task resulting in a different pattern of impairments compared to that found in behaviorally naïve animals.

      The calcium imaging data showed a similar pattern of findings to the optogenetic study. There was overall increased activity in the switching tasks compared to the simple tasks consistent with the greater task demands. There was also greater trial-type selectivity in the switching task compared to the simple task. This increased trial-type selectivity in the switching tasks was subsequently carried forward to the simple task so that activity patterns were different when animals performed the simple task after experiencing the complex task compared to when they were trained on the simple task alone

      Strengths:

      The use of optogenetics and calcium-imaging enables the authors to look at the requirement of these brain structures both in terms of necessity for the task when disrupted as well as their contribution when intact.

      The use of the same experimental set up and stimuli can provide a nice comparison across tasks and trials.

      The study nicely shows that the contribution of cortical regions varies with task demands and that longerterm changes in neuronal responses c can transfer across tasks.

      The study highlights the importance of considering previous experience and exposure when understanding behavioural data and the contribution of different regions.

      The authors include a number of important controls that help with the interpretation of the findings.

      We thank the reviewer for pointing out these strengths in our work and for finding our main conclusions supported.

      Weaknesses:

      There are some experimental details that need to be clarified to help with understanding the paper in terms of behavior and the areas under investigation.

      The use of the same stimuli throughout is beneficial as it allows direct comparisons with animals experiencing the same visual cues. However, it does limit the extent to which you can extrapolate the findings. It is perhaps unsurprising to find that learning about specific visual cues affects subsequent learning and use of those specific cues. What would be interesting to know is how much of what is being shown is cue specific learning or whether it reflects something more general, for example schema learning which could be generalised to other learning situations. If animals were then trained on a different discrimination with different stimuli would this previous training modify behavior and neural activity in that instance. This would perhaps be more reflective of the types of typical laboratory experiments where you may find an impairment on a more complex task and then go on to rule out more simple discrimination impairments. However, this would typically be done with slightly different stimuli so you don't introduce transfer effects.

      We agree with the reviewer that investigating the effects of schema learning on cortical task involvement is an exciting future direction and have now explicitly mentioned this in the Discussion section. As the reviewer points out, however, our study was not designed to test this idea specifically. Because investigating schema learning would require developing and implementing an entirely new set of behavioral task variants, we feel this is beyond the scope of the current work. As to the question of how generalized the effects of cognitive experience are, our data in the run-to-target task suggest that if task settings are sufficiently distinct, cortical involvement can be similarly low regardless of complex task experience (now Figure 3—figure supplement 1). This finding is in line with recent work from (Pinto et al., 2019), where cortical involvement appears to change rapidly depending on major differences in task demands. However, work in MT has shown that previous motion discrimination training using dots can alter MT involvement in motion discrimination of gratings (Liu & Pack, 2017), highlighting that cortical involvement need not be tightly linked to the sensory cue identity.

      It is not clear whether length of training has been taken into account for the calcium imaging study given the slow development of neural representations when animals acquire spatial tasks.

      We apologize that the training duration and the temporal relationship between task acquisition and calcium imaging was not documented for the calcium imaging dataset. Please see our detailed reply below the ‘recommendations for the authors’ from Reviewer 2 below.

      The authors are presenting the study in terms of decision-making, however, it is unclear from the data as presented whether the findings specifically relate to decision making. I'm not sure the authors are demonstrating differential effects at specific decision points.

      We understand that the previous emphasis on decision-making may have created expectations that we demonstrate effects that are specific to the ‘decision-making’ aspect of a decision task. As we do not isolate the decision-making process specifically, we have substantially revised our wording around the tasks and removed the emphasis on decision-making, including in the title. Rather than decision-making, we now highlight the navigational aspect of the tasks employed.

      While we removed the emphasis on the decision-making process in our tasks, we found the reviewer’s suggestion to measure ‘decision points’ a useful additional behavioral characterization across tasks. So, we quantified how soon a mouse’s ultimate choice can be decoded from its running pattern as it progresses through the maze towards the Y-intersection. We now show these results in Figure 1—figure supplement 1. Interestingly, we found that in the delay task, choice decoding accuracy was already very high during the cue period before the onset of the delay. Nevertheless, we had shown that overall task performance and performance with inhibition were lower in the delay task compared to the simple task. Also, in segment-specific inhibition experiments, we had found that inhibition during only the delay period or only the cue period decreased task performance substantially more than in the simple task, thus finding an interesting absence of differential inhibition effects around decision points. Overall, how early a mouse made its ultimate decision did not appear predictive of the inhibition-induced task decrements, which we also directly quantify in Figure 1—figure supplement 1.

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Kanca et al. presents a variety of valuable resources for the use of the Drosophila research community. As an update to the ongoing work of the Drosophila Gene Disruption Project, it includes hundreds of new transgenic fly lines each of which simultaneously knocks out a targeted gene and generates a driver that expresses the Gal4 transcription factor specifically in the pattern of that gene. The "KozakGal4" approach described supplements previous approaches of the GDP, including the powerful "CRIMIC" method, which inserts a synthetic exon containing a T2AGal4 module into an intron of the targeted gene. In the KozakGal4 method, the coding sequence of the native gene is completely replaced by Gal4, which the authors point out will allow them to target genes lacking (suitable) introns. In the KozakGal4 method, gene replacement is accomplished by targeted excision of the native gene using CRISPR-based technology and subsequent incorporation of a Gal4-encoding cassette by homologous recombination. The vectors developed by the authors to effect gene replacement are elegantly optimized to include all components necessary for native gene excision and efficient recombination of Gal4. These components include the guide RNAS (sgRNAs) that cleave flanking regions of the native gene, an sgRNA that liberates the Gal4 cassette from the vector, and short synthetic homology arms that provide effective, site-specific recombination. Importantly, the vectors are designed so that all gene-specific components can be synthesized in a single fragment that can be readily incorporated into the vector backbone followed by insertion of the Gal4 cassette.

      Overall, the technical advances described in the manuscript are impressive and the utility of the method is well demonstrated. The one exception is in the validation of Gal4 expression fidelity. As the authors note, fidelity could be compromised if regulatory information is removed along with sequences in and around a targeted gene. In addition, the introduction of new DNA at a particular locus may alter the regulation of gene expression. In any case, establishing the fidelity of expression of KozakGal4 lines is important and the data presented on this point is both confusing and incomplete. Rather than directly comparing the expression of selected KozakGal4 lines against the expression of the endogenous gene (e.g. by immunostaining, in situ hybridization, or by comparing tissue-specific reporter expression against expression in microarray-derived datasets such as Fly Atlas or modEncode), the authors use two indirect methods to demonstrate fidelity. One method uses VNC scRNAseq data together with the expression patterns of T2AGal4 lines that target genes co-expressed (at least in certain cell types) with the KozakGal4 line, while the other method uses phenotypic rescue by driving UAS-cDNA transgenes. The demonstrations are at best suggestive, and the rescue results presented are minimal, with no description of phenotypes, methods used to assay them, or quantification of rescue. There is thus insufficient information to form a judgment about fidelity and a more direct demonstration is needed.

      We appreciate that the manuscript can be strengthened by adding supporting evidence about the fidelity of GAL4 expression to the expression pattern of the targeted gene. The direct comparison of the GAL4 expression pattern to the expression pattern of the gene is a complex issue. The seemingly straightforward experiments of comparing the GAL4‐UAS reporter fluorescent protein expression pattern to the antibody staining of the targeted gene product suffers from multiple technical and practical issues: 1) Majority of the genes that we targeted are understudied and do not have a readily available antibody that would work for immunostaining. 2)Even if the antibodies were available, and even if the antibodies were completely specific, the staining pattern would likely be different from the GAL4‐UAS reporter expression pattern due to the subcellular localization of the gene product differing from the subcellular localization of the reporter. 3) GAL4‐UAS system introduces very high level of amplification of the signal compared to the expression of the gene product. We have reported the extent of this difference in the Lee et al. 2018 eLife paper where we used RMCE to convert the same MiMIC lines to EGFP protein trap alleles or T2AGAL4 gene trap alleles. The signals that we could detect in larval or adult brains looked qualitatively different. Comparing the expression pattern of the targeted genes product to the KozakGAL4‐UAS reporter gene signal would suffer from the same issue.<br /> To overcome these issues, we decided to compare GAL4 mRNA expression pattern of KozakGAL4 alleles to the mRNA expression pattern of the targeted gene. We employed smiFISH (single molecule Fluorescent In‐Situ Hybridization) in 3rd instar larval brains for 8 genes. We crossed the KozakGAL4 alleles of these genes to yw flies and performed co‐staining of GAL4 mRNA and targeted genes mRNA. In 7 cases where we could detect the mRNA expression of the gene product reliably, GAL4 mRNA expression pattern was overlapping with the mRNA expression pattern of the targeted gene, suggesting the transcriptional regulation of KozakGAL4 in the locus reflects the transcriptional regulation of the targeted gene. We note that the signal to noise level is quite low for some of the in situ hybridization results. Hence, we attenuated the language about the expression patterns of KozakGAL4 alleles reflecting the expression domain of the targeted genes by adding that there is a caveat that the regulatory elements in the coding regions and UTRs would be removed in these alleles. We include the smiFISH results as a supplementary figure and we add a paragraph describing methodology to the text.

      The manuscript could be strengthened in a couple of other spots as well. There is little to no description in either the Introduction or Results/Discussion of similar knock-out/knock-in approaches, although gene-specific knock-ins of Gal4 have been generated in Drosophila using homologous recombination for some time-typically into the site of ATG start codons. CRISPR technology has only facilitated this approach, which has also been used to create gene-specific cre knock-ins in rodents. This is of potential interest since the authors mention that their approach can be generalized for use in other animals. A short overview of existing knock-in approaches and their limitations relative to KozakGal4 would therefore be useful. Also, the authors motivate the need for the KozakGal4 method by asserting that over 50% of Drosophila genes lack "suitable" coding introns for the integration of artificial T2AGal4 exons such as CRIMIC. This seems to unnecessarily overstate the actual need. The authors define a "suitable" gene as one that has an intron common to all its isoforms that is at least 100 nt long. The length requirement is justified based on the need for suitable sgRNA targets within the intron, but it's possible to use sgRNA targets outside the intron (as long as the homology domains replace this sequence). Also, the requirement of a sufficiently long intron common to all isoforms is quite stringent and could be relaxed if multiple T2AGal4 lines were made to target multiple isoforms. Presumably, multiple KozakGal4 lines will, in fact, also be required for genes that have multiple transcription start sites, if the expression patterns of all isoforms are to be reproduced. In general, there's no doubt about the utility of the KozakGal4 approach, but a more balanced presentation of its merits relative to other approaches seems warranted.

      We agree with the reviewers that the presence of 100 nt long coding intron in all annotated isoforms is a relatively stringent criterion for deeming a gene to be a suitable target for T2AGAL4 methods. This requirement can indeed be relaxed if the same gene is targeted with multiple T2AGAL4 alleles. Nevertheless, for the GDP project, our aim is to generate genetic reagents for as many conserved genes as possible to make them accessible to the research community. Multiple T2AGAL4 that target individual splice isoforms can be done by the laboratories that work on those genes, using the methodology that we describe in this paper. We attenuate the language about the intron length requirements and included our justification for this requirement for the GDP project in the text.

      Reviewer #2 (Public Review):

      In this interesting paper, Kanca and coworkers present a set of updated constructs for the replacement of gene coding regions for instance by a Gal4 expression cassette or a GFP protein trap allele, enabling multiple research applications with the generated fly strains. The novel design now allows for the CRISPR-based targeting of almost any gene in Drosophila. The authors apply these novel tools and generate hundreds of fly lines that complement the pool of already existing strains in the Drosophila Gene Disruption Project. The authors report a high success rate for their HDR-mediated gene targeting strategy and show that they can even target genes that previously proved to be difficult to engineer. The authors validate the expression patterns of a set of lines - supported even by single-cell sequencing experiments - and provide strong evidence that the updated toolkit functions as expected.

      What may confuse the reader is that there are different targeting strategies that are presented with a strong focus on the validation of the expression cassettes used in combination with a specific targeting strategy (i.e., KozakGal4 or GFP protein trap). This leaves the reader with the impression that the insertion of a particular expression cassette would require a tailored targeting strategy, which is not the case. In fact, the majority of the paper deals with the description and extensive validation of small updates on already published methods for the insertion for the generation of additional KO/Gal4 or eGFP trap lines. However, neither the updated knock-in/knock-out strategies described for the insertion of the KOZAKGal4 cassette at the beginning of the results section nor the experiments to GFP tag proteins at different positions in the open reading frames (Figure 5) are of sufficient novelty and technical advancement.

      What really warrants publication is the very elegant and universal method described in Figure 4 that requires only a single vector to be injected into fly embryos. The method is suited to precisely engineer any gene at will in combination with any HDR template. The very smart vector design allows for the directed insertion of custom and commercially synthesized HDR constructs as well as of a specific guide required to target and cut the gene of interest. This makes the method versatile, fast and cheaper with the benefit of being very efficient. This gRNA_int200 targeting strategy will be of broad interest, is straightforward to use and is expected to have a large impact - far beyond the fly community.

      We thank the reviewer for the constructive criticism and for seeing the benefits in our methodology. Although the KozakGAL4 and GFP knock‐ins in the genome are not conceptually new, the combination of our vector design makes the application of these concepts straightforward. Additionally, the extent of application and verification of GAL4 knock‐ins was limited compared to what we include in this manuscript which prompted us to include the KozakGAL4 and GFP knock‐in methodology in this manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, Radtke et al. use a model of helminth infection in IL-4-IRES-eGFP (4get) mice, in which transcription at the Il4 locus is reported by eGFP, in order to define the transcriptional signatures and clonal relatedness between Il4-licensed, CD4+ T cells in the mesenteric lymph nodes (mLN) and lungs. By infecting 4get mice with the hookworm Nippostrongylus brasiliensis, which is well described to induce a robust type 2 immune response, the authors isolated and sorted eGFP+CD4+ T cells from the mLN and lungs at 10day post infection and performed single cell RNA-seq analysis using the 10X Chromium platform. Transcriptional profiling of activated CD4+ T cells with scRNA-seq has been performed in a murine model of allergic asthma, including the lung and lung-draining lymph nodes, but this study involved unbiased capture of all activated CD4+ T cells (Tibbitt et al., Immunity, 2019). Radtke et al. have used a distinct model with Nippostrongylus brasiliensis and have focused on sorting Il4-licensed, CD4+ T cells, allowing for a greater number of captured CD4+ T cells with a "type 2" lymphocyte program for single cell analysis. Furthermore, this study sought to identify distinct and overlapping transcriptional signatures and clonal relatedness between Il4-licensed, CD4+ T cells in two "distant" tissues. In support of such an approach, there is growing evidence for tissue-specific and model-specific features of CD4+ T cell differentiation (Poholek, Immunohorizons, 2021; Hiltensperger et al., Nature Immunol, 2021; Kiner et al., Nature Immunol, 2021).

      Upon dimension reduction, the authors found mLN- and lung-specific clusters, including two juxtaposed clusters that form a "bridge" between the mLN and lung compartments, suggesting immigrating and/or emigrating cells. Consistent with previous studies, the dominant lung cluster (L2) exhibited unique expression of Il5 and Il13, enhanced IL-33 and IL-2 signaling, and exhibited an effector/resident memory profile. The authors did find a small cluster in the mLN (ML4) with an effector/resident memory signature that also expressed CCR9, suggesting the potential for homing to the gut mucosa. Whether this population is specific to the mLN or would also be found in the lung-draining lymph nodes remains unclear. In the mLN, the authors also describe an iNKT cell cluster with CCR9 expression and a CD4+ T cell cluster with a myeloid gene signature, but the significance of these populations remains unclear.

      The authors then use RNA velocity analysis to infer the developmental trajectory of Il4licensed, CD4+ T cells from the two tissue sites. Consistent with previous studies, the authors found that T cell proliferation was associated with fate decisions. Furthermore, among the two lung CD4+ T cell clusters, L1 represents highly differentiated, effector Th2 cells while L2, which is juxtaposed to the mLN clusters, represents a population likely entering the lung with the potential to differentiate into L1 cells.

      Next, the authors perform TCR repertoire analysis. The authors identified a broad TCR repertoire with the majority of distinct TCRs being found in only one cell. Among the TCRs found in more than one cell, a substantial number of clones can be found in both tissue sites, which is consistent with the findings that individual CD4+ T cells clones can produce different types of effector cells (Tubo et al., Cell, 2013). The authors find significant overlap of clones between the mLN and lung. In addition, they also identify clones enriched in a particular site and suggest that this represents local expansion. However, an alternative possibility is that certain CD4+ T cell clones are expanded at a particular site because the specific TCR preferentially instructs a particular cell fate. For example, fate-mapping of individual naïve CD8+ T cells suggests that certain T cell clones exhibit a greatly heightened capacity to form tissue-resident memory T cells over other cell fates (Kok et al., J Exp Med, 2020). Lastly, the authors analyze CDR3 sequences, finding the most abundant CDR3 motif belonging to the invariant TCRa chain of iNKTs. Among conventional CD4+ T cells, the abundant CDR3 motifs were not restricted to an exact TCRa/TCRb combination beyond a slight preferential usage of the Trbv1 gene. While TCR repertoire analysis allows for defining clonal relatedness among Il4-licensed, CD4+ T cells, the importance and relevance of the above findings to the in vivo type 2 immune response remain unclear.

      There are several limitations of the study:

      (1) The authors use the term "Th2 cells" to describe all Il4-licensed, CD4+ T cells. While CD4+ T helper cell nomenclature has evolved, Th2 cells and Tfh2 cells are generally used to describe distinct subsets driven by unique transcriptional programs (Ruterbusch et al., Annu Rev Immunol, 2020). While previous data suggested that Tfh2 cells are precursors to effector Th2 cells, subsequent studies support a model in which Tfh2 and Th2 cells represent distinct developmental pathways and should be designated as distinct subsets (Ballesteros-Tato et al., Immunity, 2016; Tibbitt et al., Immunity, 2019). Consequently, the authors' broad use of "Th2 cells" and a description of "Th2 cell heterogeneity" includes CD4+ T cell subsets with distinct developmental pathways that includes canonical Th2 cells as well as Tfh2 and iNKT cells. The clarity of the manuscript would be improved by describing eGFP+CD4+ cells as Il4licensed, CD4+ T cells rather than Th2 cells.

      We thank the reviewer for the helpful comment and state now that our IL-4 reporter positive population also includes cells that don’t meet the Th2 criteria in the introduction (lines 76-78).

      (2) The authors used perfused lungs to isolate Il4-licensed, CD4+ T cells for scRNA-seq of "Th2 cells" in the lung tissue. However, previous studies indicate that leukocytes, including CD4+ T cells, in lung vasculature are not completely removed by perfusion, which confounds the interpretation of a tissue cell profile due to contaminating circulating cells (Galkina, E et al., J Clin Invest, 2005; Anderson, KG et al., Nat Protoc, 2014). This is particularly true in the lung and relevant as the authors found a lung cluster (L2) with a circulating signature and suggested that L2 may represent a recent immigrant "Th2 cells". Thus, it is unclear whether L2 cluster identifies immigrant Th2 cells or simply reflect the circulating Th2 cells trapped in the lung vasculature. The study would benefit of using the intravascular staining to discriminate cells within the lungs from those in the circulation (Anderson, KG et al., Nat Protoc, 2014) for the proper isolation of Il4-licensed lung CD4+ T cells to truly define immigrant "Th2 cells" within the lung parenchyma.

      According to the reviewers suggestion we performed an intravascular staining to discriminate cells within the lungs from those in the circulation (new Figure 2—figure supplement 1). According to the vascularity staining method (with slightly increased time between i.v. and sacrifice compared to Anderson, KG et al., Nat Protoc, 2014 for higher probability of successful staining) the L2 lung cluster is a mixture of circulating cells and immigrating cells which we describe in the text (lines 210-213). The finding that the cells from the vasculature and the cells we classified as “migrating” seem to cluster together based on the similarity of their expression profiles on our UMAP further supports the classification of the L2 tissue fraction as “recent immigrants”. We thank the reviewer for this helpful comment which improved the quality of the manuscript.

      (3) The authors describe T cell exchange/trafficking across organs. However, in general, interorgan trafficking refers to lymphocyte trafficking between distinct non-lymphoid tissues, rather than trafficking between lymph nodes and peripheral tissues (Huang et al., Science, 2018). Rather than inter-organ trafficking, the authors have described shared and distinct features of Il4-licensed, CD4+ T cells from a draining lymph node of one organ (gut) and a distant non-lymphoid organ (lung). The experimental approach used makes interpretation of some of the findings challenging. Specifically, canonical effector Th2 cell differentiation is well described to occur via two checkpoints, including the draining lymph node and the peripheral (non-lymphoid) tissue (Liang et al., Nature Immunol, 2011; Van Dyken et al., Nature Immunol, 2016; Tibbitt et al., Immunity, 2019). In the draining lymph node, Th2 cells acquire the capacity to express IL-4 alone, but do not complete effector Th2 cell differentiation until trafficking to the inflamed peripheral tissues and receiving additional inflammatory signals. Consequently, it is unclear whether the differences identified in the mesenteric lymph node and lungs simply reflect well-described differences between the two Th2 cell checkpoints or organ-specific differences (gut vs lung). Il4-licensed, CD4+ T cells from the intestinal mucosa and lung-draining lymph node would also be needed to truly define organ-specific differences during helminth infection.

      According to the reviewers suggestion, we avoid the term “inter-organ trafficking” and replaced it by “at distant sites” in the title. As the reviewer points out we chose the setup of comparing a lymphoid and a non-lymphoid organ to acquire a broad picture of Th2 developmental stages in Nb infection. The limited overlap in clusters on the UMAP shows that expression profiles between MLN and lung strongly differ. However, this notion is not in conflict with cells of both organs being in a different developmental stage. We added information to highlight it in the manuscript (lines 99-101). Lung and MLN (rather than medLN and MLN) were selected to enable clonal relatedness/distribution analysis of T cells at distant sites. As part of the revision we additionally provide newly generated single cell sequencing data that compares medLN and MLN cells at day 10 after Nb infection and find that UMAP clusters are largely overlapping between medLN and MLN (new Figure 1—figure supplement 3). This suggests that there is no broad medLN/MLN site specific signature present that would force the medLN and MLN cells to cluster apart. Addition of the newly generated medLN/MLN data on the lung/MLN UMAP based on shared anchors (Stuart et al. Cell. 2019) also leads to a clear separation between all LN and lung cells supporting that cells don’t cluster due to a site-specific respiratory tract vs intestinal tract signature but likely based on developmental stages (new Fig. 1C,D). An exception are defined effector clusters that show signs of a site-specific signature (L1 expresses Ccr8, MLN4 and MLN6 express Ccr9, differences are also suggested by clustering described in lines 247-252). A similar phenotype to the one observed on the transcriptional level is observed when we cluster medLN/MLN and lung cells based on scRNAseq suggested surface marker expression after flow cytometric analysis, extending analysis to medLN on protein level (new Fig. 3). It would have also been interesting to include lamina propria T cells as the reviewer suggested but we were not able to extract high quality cells at day 10 after Nb infection which is a common limitation in the Nb model.

      (4) The study includes a single time point (day 10) whereas Tibbitt et al. performed scRNAseq in the lung and lung-draining lymph node at multiple time points during type 2 immunity (Tibbitt et al., Immunity, 2019). As a result, it remains unclear how similarities or differences between the mesenteric lymph node and lung response would change over the duration of helminth infection, especially given the helminth life cycle involves multiple infection stages.

      As part of the revision we screened for surface marker expression in the single cell sequencing dataset on transcript level and stained these on protein level (new Fig. 3 and Figure 3—figure supplement 1). This allows to follow the populations defined by scRNAseq longitudinally (d0, d6, d8, d10) by flow cytometry during Nb infection. We compared medLN, MLN and lung. The dynamic of the response in the medLN and the MLN seems similar with a small delay in the MLN compared to medLN.

      Nb with its relatively well defined migratory path through the body provides a relevant complex model antigen naturally present in the respiratory tract and the intestine during infection. However, analysis of complexity and relevance does often invoke limitations. While stage 4 larvae are found in lung and gut and certainly provide a shared antigen basis between both sites (migration stage from lung to intestine; Camberis et al. Curr Protoc Immunol. 2003), we also think that there is a reasonable number of antigens shared between different larval stages and antigen (either actively secreted or from dying larvae) that are systemically distributed. However, there are probably immunogenic differences between larval stages but to analyze these is beyond the scope of the manuscript.

      While i.e. Tibbitt et al. nicely define cell clusters with a limited number of cells they don’t include any TCR analysis and clonal information. Not much was known about the expansion of T cells in the different clusters in one organ and between organs and we provide relevant data in this regard. Furthermore, HDM as an allergy model might invoke different Th2 differentiation pathways as. i.e. Tfh13 cells are found in allergic settings but not in worm models (Gowthaman U, Science. 2019). With our approach on single cell level we were able to show effective distribution of a number of T cell clones in a highly heterogeneous immune response and describe and functionally validate successfully expanded clones / expanded TCR chains later on (i.e. new Fig. 6). This kind of analysis has not been performed for a worm model before.

      (5) The study analyzed one scRNA-seq experiment that included two mice without validation via flow cytometry or other method to infer a role of a particular finding to the type 2 immune response in vivo.

      As noted above, we screened for surface marker expression in the single cell sequencing dataset on transcript level and measured these on protein level by flow cytometry as the reviewer suggested. This allows to follow the populations defined by scRNAseq longitudinally (d0, d6, d8, d10) during Nb infection (new Fig. 3). Furthermore, we added a newly generated set of scRNAseq data which confirms and extends findings made in the initial sequencing experiment (Fig. 1C,D and Figure 1—figure supplement 3). We also included validation experiments based on the performed TCR analysis and retrovirally expressed three TCRs from our study and confirm Nb specific expansion for one of them in vivo (new Fig. 6 and Figure 6—figure supplement 1).

    1. Author Response

      Reviewer #1 (Public Review):

      In their paper, titled ‘Group II truncated haemoglobin YjbI prevents reactive oxygene species-induced protein aggregation in Bacillus subtilis’, Imai et al., suggest that the protein YjbI acts as a hydroperoxide peroxidase and therefore it may protect cell-surface cells from oxidation. Using AFM and contact angle measurements they show that yjbI mutants lead to changes in cell surface properties as well as to the formation of more hydrophilic biofilms, relative to the wild-type (WT) strain. Since both tasA and yjbI mutants experienced a similar phenotypic behaviour, the authors linked between the two proteins, TasA and YjbI, and in a series of biophysical and biochemical tests they tried to establish this link. This study touches upon an important question, how do biofilms protect themselves from reactive oxygene species (ROIs), that is nicely described in the introduction; The link between the above proteins in very interesting and relevant to the main question proposed in the study. However, the experiments presented does not always directly support the conclusions made.

      The points that I find necessary to clarify/extend:

      1) A major claim in the paper is that biofilms that do not harbour the tasA gene (tasA-) are flat, and therefore their contact angle is low, indicating that they are less hydrophobic than WT strains. However, the phenotype of biofilms of tasA mutants are normally not that flat (see for example Romero et al., PNAS 2010; Vlamakis et al., Genes and Development, 2008; Erskine et al., Molecular Microbiology 2018). As a matter of fact, even the WT biofilms that are used as a control in this study are much more flat than the biofilms that serve as standards in the papers referenced above.

      We appreciate the reviewer’s comment. As we explained above (answer to Essential Revisions, point 4), there are differences in the morphology of colonies between the 168 and NCBI3610 strains of B. subtilis, as previously pointed out in the literature (Arnaouteli et al. Nat. Rev. Microbiol., 19:600-614, 2021; Mielich-Süss and Lopez, Environ. Microbiol., 17:555-565, 2014). We employed B. subtilis strain 168 because this strain is a close representative of B. subtilis, as described by Zeigler et al. (J. Bacteriol., 190:6983-6995, 2008), and serves as a model organism for wider aspects of basic research, including oxidative damage responses.

      To clarify this point, we have added the following text to the revised manuscript in lines 269–277: “Most studies on biofilm formation in B. subtilis use the B. subtilis NCBI3610 strain as a model bacterium because of its ability to form well-structured three-dimensional biofilms (Arnaouteli et al., 2021, Mielich-Süss et al., 2014). The biofilms of the wild-type and tasA mutant strains of the B. subtilis 168 strain are known to be morphologically different from those of the B. subtilis NCBI3610 strain (Romero et al., 2010, Vlamakis et al., 2008, Erskine et al., 2018). In this study, the B. subtilis 168 strain was used because it is the most representative of B. subtilis and serves as a model organism for a wider range of research aspects (Zeigler et al., 2008) as we were not only interested in evaluating biofilm formation but also in more general aspects of oxidative damage responses in bacteria.”

      2) Figure 1. The authors use AFM phase imaging to probe differences in cellular stiffness. This AFM mode is not quantitative and the differences presented could also result from differences in adhesion between the tip and the sample. A more quantitative means to evaluate stiffness is a direct measurement of moduli in Force mode, a standard AFM module.

      Thank you for your comment. As mentioned above (answer to Essential Revisions, point 3), the AFM data have been removed.

      3) Line 147. The authors link between the lack of monomeric TasA in YjbI mutants and the formation of covalent cross linking in TasA aggregates. This is a strong statement that unfortunately is not supported by any of the experiments described in the manuscript.

      As mentioned above (answer to Essential Revisions, point 5), we have removed the statement regarding the lack of monomeric TasA in the mutant. The following has been included to highlight the potential involvement of covalent bonds in the TasA aggregate formation in lines 126–129 in the revised manuscript: “No monomeric TasA was detected in the insoluble fraction of the yjbI-deficient mutant strain. An aggregate of TasA was observed under strong reducing and heat-denaturing conditions in SDS sample buffer, suggesting that covalent bonds may be involved in aggregate formation.”

      4) The authors seek to make a connection between YjbI and TasA. However, this link is either not well established or only hinted indirectly in this manuscript, through precipitation assays, contact angle measurements and growth curves. To establish such a link, a more molecular approach is advised. Experiments that would provide a direct link between the two proteins and mark specific molecular changes of the proteins include for example titration NMR studies of labelled proteins (at least one of the proteins). In cases where the authors need to show protein localization to the cell surface, it would be of help to use TEM or high-end fluorescence microscopy.

      We thank the reviewer for this valuable advice. In response to this comment, we carried out additional experiments, as described above (answer to Essential Revisions, point 1). We will consider the suggested studies, mainly with a molecular approach including titration NMR and TEM, for future studies, as facilities for these specific studies are currently not available.

      5) This paper suggests that the protein YjbI acts as an electron donor. Given that there are other proteins with a similar role (in other organisms), it would be nice to show whether there is any homology (by sequence and/or structure) to these proteins.

      Thank you for your comments. We have added a description of animal peroxiredoxins and selenomethionine (with GSH or a thioredoxin system) that have been shown to scavenge protein hydroperoxides to the revised manuscript. We also added a description of how YjbI differs from peroxiredoxin and selenomethionine.

      The corresponding sentences have been added to the revised manuscript in lines 254–268: “Peroxiredoxins have been reported to repair intracellular protein peroxidation in mammals (Peskin et al., 2010). However, YjbI is distinct from peroxiredoxins in that it is a haem protein with no significant sequence homology (<15%). The second-order rate constants (M-1·s -1) for the reactions of mammalian peroxiredoxins 2 and 3 with BSA-OOH are 160 and 360, respectively, and have been shown to reduce protein peroxides more efficiently than GSH under physiological conditions (Peskin et al., 2010). Although direct comparison is difficult due to different experimental conditions, YjbI and peroxiredoxins are likely to have a similar catalytic rate, as both proteins can reduce BSA-OOH in the order of several mM in roughly 5 min at similar protein concentrations (Fig. 3e) (Peskin et al., 2010). Interestingly, selenomethionine can catalyze the removal of hydroperoxides from proteins in the presence of GSH or a thioredoxin system (Rahmanto & Davies, 2011). However, this system, as well as peroxiredoxins, localises in the cytoplasm of cells, which is a significant difference between YjbI and these proteins. Moreover, whether bacteria utilize peroxiredoxins and the selenomethionine system to remove hydroperoxides from proteins remains unclear.”

      6) (Minor point). The use of Pymol to demonstrate that the YjbI's pocket could serve as a binding site for haem molecule is nice, but using Molecular Dynamics (or any other calculation) would be more quantitative and convincing of the specificity of the interaction.

      We appreciate your comment regarding this point. However, we believe that analysis using molecular dynamics (or other calculations) is largely difficult because the structure of the hydroperoxidised protein substrates is not available. Further, the degree of similarity between the structure of TasA or BSA and the hydroperoxidised form is unclear. A calculation analysis with a small model substrate can be adopted in future work. Therefore, we only showed the surface opening of the YjbI structure, which is potentially relevant for binding to a hydroperoxidised protein substrate.

      Reviewer #2 (Public Review):

      In this study, Imai et al. uncover a role for the truncated haemoglobin protein YjbI in biofilm formation by the model bacterium B. subtilis. They show that yjbI gene disruption results in altered biofilms, with increased wettability and different matrix stiffness relative to cells. The absence of YjbI activity results in aggregation of the amyloid-like TasA matrix protein, and the biofilm wettability defect of the yjbI mutant can be recapitulated by anti-YjbI immune serum, suggesting that YjbI is located on the cell surface. Absence of YjbI also modestly increases the sensitivity of cells growing on agar plates to the oxidant AAPH. Using the model protein substrate BSA, purified YjbI can at least partially reverse oxidant-induced BSA aggregation in vitro, convincingly showing the YjbI has protein hydroperoxide peroxidase activity, which is evidently an unusual enzymatic activity. Finally, the authors examine lipid peroxidation and conclude that YjbI is not involved. The results are interesting in that they connect YjbI to a biofilm phenotype and convincingly show protein hydroperoxide peroxidase activity by a truncated haemoglobin protein, an activity not previously attributed to this class of proteins.

      The experiments are largely well done, but some of the corresponding conclusions are overinterpreted, connecting ideas without experimental support. Moreover, the yjbI mutant has a narrow and relatively mild phenotype.

      1) The paper identifies two separate properties of YjbI: its mutant phenotype with respect to biofilm formation, and its peroxidase activity against oxidant-induced aggregation of TasA and BSA. The authors conclude that these properties are connected, but this is not formally tested. While purified YjbI can reverse hydrogen peroxide-induced aggregation of purified TasA in vitro, and the yjbI mutant shows more TasA in the insoluble fraction of B. subtilis pellicle lysates, these experiments do not show that the TasA aggregates in pellicle lysates are caused by peroxidation, nor do they show that TasA aggregation is normally kept at bay by YjbI peroxidase activity (it is possible that YjbI has a separate role in biofilm integrity). Some experiments that might lend support to this connection include examining the biofilm phenotype of a catalytically dead point mutant of YjbI (perhaps Y25 or Y63, l. 298, or other residues informed by the crystal structure of Giangiacomo et al.) to establish whether peroxidase activity is important for biofilm formation. Such a mutant would be particularly valuable, as it could also be used to test whether inactivation of enzyme activity affects other phenotypes (cell stiffness, for example). Another approach would be to use a soluble antioxidant molecule, purified YjbI, or another peroxidase to see if the yjbI biofilm can be rescued.

      We greatly appreciate this comment, which is critical for improving our manuscript. To address this issue, we performed additional experiments using the Y25F, Y63F, and W69F variants of YjbI. The introduction of the Y63F variant gene into the yjbI-deficient strain failed to complement the defective phenotype of the yjbI-deficient strain in biofilm repellency (revised Fig. 1b). We found that the purified Y63F lost its hydroperoxide peroxidase activity (revised Fig. 3g). These results show a connection between the protein hydroperoxide peroxidase activity of YjbI and the abnormal biofilm phenotype of the yjbI-deficient strain. Accordingly, Figs. 1b and 3g have been added to the revised manuscript and figure descriptions have been included in lines 220–226, 322–324, and 327–328 (as explained above in the answer to Essential Revision, Point 2).

      2) The authors conclude on the basis of the AFM data in Fig. 1 that yjbI mutant cells are less stiff than WT cells, but the data only show relative stiffness. It is also unclear why a change in cell envelope stiffness would relate to biofilm wettability (ll. 130-131). If there truly is a change in cell envelope stiffness, a high-resolution, head-to-head AFM comparison of planktonically grown cells would be informative.

      We appreciate the reviewer’s comment on this point. As mentioned above (answer to Essential Revision, points 3 and 6), we realized that our interpretations of the AFM data were not appropriate and not relevant to biofilm repellency. Accordingly, the AFM data were removed.

      3) The data in Fig. 2F showing hypersensitivity of yjbI mutant cells to AAPH were generated in an unusual way: stationary-phase liquid culture was spotted on an LB plate, and the colonies were "fractionated" at the noted intervals and resuspended in saline for OD measurement. Measuring sensitivity to AAPH just in shaking liquid planktonic culture would make this phenotype more convincing. Under non-biofilm forming conditions, is a surface-associated peroxidase important for cell growth or survival under oxidant challenge?

      We appreciate your comment regarding this point and apologize for the error in the description “'Planktonically grown B. subtilis strains under AAPH-induced oxidative stress'” in the Methods section. No solid medium was used in the experiments. The description in the Figure legend of Fig. 2f is correct. The sentence in the text has been rewritten in the revised manuscript in lines 813.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Yildirim presents a method for labeling and tracking cells within organoids to enable the assessment of dynamic processes within the intact organoid. The authors use Third-harmonic generation (THG), an intrinsic signal which results from tripling the frequency of the excitation wavelength, and a modified three-photon microscope to identify and track cells within the 3D organization of cerebral organoids. Specifically, the authors focus on the ventricular zone in 35-day old organoids, when young DCX+ neurons are migrating into the cortical plate-like area of the organoid and show that THG can identify migrating cells. The authors then use a disorder model of Rett syndrome to validate the method and show that differences can be detected with their technique and, importantly, that the VZ volume is smaller and that radially migrating neurons have slower migration within RTT organoids.

      There are many strengths of the study including the use of multiple (two) isogenic pairs of control and RTT organoids, the critical comparison of the labeling method with standard markers, and the use of a relevant disease model to test the utility of the technique.

      We appreciate this constructive feedback from Reviewer 1 on both imaging system development and using it for a biologically important question.

      Reviewer #2 (Public Review):

      This paper reports an impressive technical development - Third Harmonic Generation (THG) three-photon, label-free imaging of intact cerebral organoids. This is the first paper to apply THG imaging to intact, three-dimensional organoids and offers some distinct advantages over other approaches in terms of being able to image the full depth of intact organoids. Using this approach, mutant organoids generated from Rett Syndrome patients were imaged, finding shorter migration distances, slower migration speeds, and more tortuous trajectories in these organoids. This work advances in a useful way the available imaging tools for intact, three-dimensional organoids, by allowing their full depth to be accessed. It is likely to have an impact both as a demonstration of what can be achieved through advanced bioimaging techniques and on the progress of the (recently rapidly advancing) cerebral organoids field. A caveat to the latter is that, due to the optics techniques involved, reproduction in a typical organoid/cell culture laboratory may be beyond the skill of most researchers in that field, although this could ultimately be addressed with commercialization (noting that the laser products needed are not completely "turn-key" yet).

      Strengths

      The fact that the authors were able to achieve a pulse width at the sample (in deep tissue) of 27 fs is a great technical achievement, which makes the results achieved in the paper possible. I can't emphasize enough how impressive that aspect of the paper is. As they note, pulse widths of < 30 fs have not previously been reported in such a scenario (and we would normally consider the 40-50 fs range as good going. This is a great technical achievement and is important given the apparent great sensitivity of three-photon efficiency to pulse width and shape. While the short pulse lengths are impressive, it is of interest to know how hard this will be in practice to reproduce in other laboratories. The authors might comment on how difficult it was to keep the pulse compressed to this level - was there any drift, and were adjustments needed to be made to the pulse compressor over the duration of the series of experiments?

      As well as making an impressive technical demonstration, the authors showed that it could be used to make useful measurements, showing that the system was capable of distinguishing some structural properties of mutant Rett Syndrome organoids from wild-type organoids, by means of time-lapse imaging of deep structures within the samples.

      Weaknesses

      There are some concerns about the statistical validity of the conclusions made, in particular for the analysis of the time-lapse imaging experiments. I am not convinced that the analysis made is statistically valid, due to bias effects introduced by pooling different lengths of time-lapse samples.

      We would like to thank Reviewer 2 for this constructive feedback. First of all, achieving lower pulse width values is only possible with optimizing both laser and microscope parameters. Since we have designed and implemented the custom-made microscope parts aligned with laser parameters, it is possible to achieve stable and short pulse widths in different kinds of tissues including cerebral organoids for our lab.

      Second, we agree with reviewer 2 that comparing the migration parameters of the cells which disappeared from the field of view before 12 hours and those imaged during 12 hours is not reasonable. Therefore, we removed the data from the cells which were not imaged for 12 hours and updated Figure 5 and 6.

      Reviewer #3 (Public Review):

      Yildirim et al describe a novel three-photon (3P) imaging approach which concomitantly addresses several notable roadblocks in the current state of the art when it comes to functional imaging using organoid cultures. The authors use a 3P system modified with custom laser and optics which enables label-free, deep, high-resolution, non-phototoxic, long-term imaging of intact brain organoids achieving close to 1mm penetration and imaging periods up to 96 hours. Leveraging the capacity of third harmonic generation (THG) signal to differentiate regions with distinct cellular densities, the authors demonstrate effective, label-free demarcation of ventricular zone-like regions vs regions resembling the cortical plate. Moreover, through a set of well-designed and well-powered experiments, the authors apply their system to uncover structural and functional phenotypes in organoids derived from Rett's Syndrome patient lines and corrected isogenic lines. All without the need for a fluorescent label, they describe structural changes to VZ-like regions and migration deficits in cells that emanate from these regions. The imaging of migrating cells in relation to their VZ of origin reveals an especially novel look at the radial migration of cortical neurons in organoids, something which has not been possible to assay since the VZ structures remain deep within organoids and inaccessible to co-image with migrating cells using standard approaches.

      The study is highly innovative, and looking ahead, a standardized 3P/THG imaging platform that enables deep and label-free imaging of organoids at scale, holds a lot of promise in illuminating a lot of biology which currently remains beyond reach, as well as in designing large scale, non-invasive, multi-parameter phenotyping screens using patient samples. The manuscript is well-written and the results clearly demonstrated.

      We would like to thank Reviewer 3 for their constructive feedback on how our paper addresses and resolves notable roadblocks in the current state of art of intact organoid imaging for modeling brain disorders.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors estimated the effectiveness of border restriction, testing and contact tracing in managing transmission of Covid19, and in detecting "missed" Covid19 cases. They developed a standard branching process model to disentangle the effects of each control measure on the incidence of missed infections, by fitting their model to data on cases both at the border and in the community of Singapore, from the beginning of the Covid19 outbreak until December 2021.

      The Authors modelled detected and undetected community infections as two separate branches of the transmission tree. They then fitted their model to the observed incidence data to obtain an estimate of the number of missed infections. Through this method, they explained the importance and contribution of case ascertainment (through testing and contact tracing), as well as border and community restrictions, towards transmission reduction.

      This modelling and inference framework could be applied to data from anywhere in the world to estimate the number of undetected infections when in lack of infection prevalence data.

      Strengths:

      For each of the five phases of border and public health measures put in place in Singapore in 2021, the authors successfully provided estimates on the number of undetected community cases, effectiveness of contact tracing and testing in finding unlinked cases, and effective reproduction numbers of both detected and undetected cases. All these estimates can be valuable to Covid modellers worldwide to either benchmark or parametrise their own model parameters, and to Singapore's public health officials to decide on future strategies of transmission prevention.

      Estimating infection prevalence and case ascertainment rates is one of the main challenges of Covid modellers everywhere. The authors' method to reconstruct the transmission tree of both known infections and undetected ones, and the subsequent fitting to observed data, could be used to estimate case ascertainment rate in the absence of prevalence surveys.

      The authors also found that contact tracing is only useful for transmission reduction when coupled with a high rate of case ascertainment. This is a well-known but important result, highlighting the need for more timely and accessible community testing.

      Weaknesses:

      The authors' models and estimates are mostly well supported by data, but the Methods need to be clarified and extended, and the results could be presented in a clearer way.

      The transmission model in particular needs to be presented in a more detailed way to avoid confusion around the modelling assumptions and to allow easy reproduction of the model by the reader.

      Thank you for this comment. As there were several mathematical notations required, we have compiled all parameters and variables in an Appendix Table 1 (tracked edits page 28 and 29) and also indicated within the table the assumed parameters, distributions and priors, unknown parameters to be modelled and derived parameters. The details of the model parameters and assumptions were also introduced in the main text based on the sequence of model building and simulating disease transmission, so we hope that this additional table will facilitate better understanding of the model framework.

      It would also be very useful to readers to visualize the different restriction measures in place together with the result graphs, to strengthen the link between the two and to highlight the effect that different border and public health regimes have on transmission and on the proportion of undetected infections, which the authors mention in the main text.

      Thank you for this comment, we have amended Figure 1 to incorporate short labels on key outbreak events or control measures to help the reader understand the changes in epidemic trajectory. Further details are also documented in the Figure 1-figure supplement 2 (a tabulated version of the previous Appendix 1 Figure 5 - studied time periods for wild-type SARS-CoV-2 and Delta variant outbreaks)

      While these results can definitely help the Singapore decision-makers design an efficient transmission control strategy, this paper could also be useful to researchers abroad. It is therefore important that the model is explained more clearly and that results and assumptions be benchmarked against those from some other country.

      Thank you for this comment. We have compared our findings on the burden of disease for SARS-CoV-2 wild type with those from other countries in the initial submission and have also expanded our Discussion section for findings related to the Delta variant as more studies were published since our initial submission. The amendments are as follows:

      “We estimated that the risk of ICU and fatality was 1.2% and 0.3% among wild-type SARS-CoV-2 infections across the study periods in 2020 and 0.2% and 0.2% among the Delta variant infections across the study periods in 2021 and our wild-type infection fatality ratio estimates corroborated with early studies in other countries and regions Centre for Disease Control and Prevention (2020), Brazeau et al. (2020); Myerowitz-Katz and Merone (2020).Early in the outbreak of Delta cases in Singapore (Apr 1 to May 12, 2021), more than 60% of the cases were unvaccinated. The risk of ICU and mortality among all notified cases during this period was 1.7% and 1.4%. When accounting for missed infections, the risk of ICU and mortality among all infections was 0.3% and 0.2%. This infection fatality ratio is comparable with that of the wild-type SARS-CoV-2. Over 90% of the fatalities occurred in the elder aged 60 and above and the 95%CI of the infection fatality ratio was 0.03–0.3% which overlaps considerably with the infection fatality ratio in the elderly during the H1N1 influenza pandemic in 2009 Riley et al. (2011); Wong et al. (2013). In most studies, the case fatality ratio are more commonly reported. This was estimated to be 3.4% in South Africa Sigal et al. (2022) and some studies reported approximately two times higher risk of death when compared to the wild type Li et al. (2021b). The healthcare system in these countries were under pressure arising from the surge in Delta variant cases Detsky and Bogoch (2021); Maslo et al. (2022) which potentially contribute to a higher case fatality ratio. Our estimates of comparable infection fatality ratio for the wild-type and Delta variant in the overall population could also be attributed to better clinical management of COVID-19 cases over time, and availability of new pharmaceuticals Beigel et al. (2020); Goldman et al. (2020); Ohl et al. (2021). Despite the lowered burden of infection estimates, it is prudent to vaccinate a large majority of the population before relaxing COVID-19 control measures to keep the absolute number of deaths arising from a highly transmissible Delta variant low Li et al. (2021).”

      Reviewer #2 (Public Review):

      This work combines multiple data sources and a branching-process mathematical model to assess the effectiveness of specific types of COVID-19 interventions including contact tracing, border screening, and case finding. The focus is on the original SARS-CoV-2 (2020) and the Delta (2021) variant outbreaks in Singapore. Utilizing data on both linked and unlinked cases, the model is also used to predict the total number of missed infections.

      Strengths:

      The study provides a way to utilize data to understand the importance of various simultaneously employed intervention strategies throughout the pandemic. Given the constantly evolving state of the epidemic, this retrospective study provides insight into what interventions worked under which conditions. This will be valuable for policymakers to understand what types of strategies to prioritize.

      The underlying model formulation has been used previously to understand components of transmission during the COVID-19 pandemic. Case data from Singapore is used to fit the model. Model conclusions on the number of true infections are consistent with a published seroprevalence study.

      Weaknesses:

      The paper currently provides an incomplete description of the model. Multiple terms (e.g. cases vs infections) are used throughout the manuscript to have a precise meaning, but that is not apparent until reaching the Methods section at the end. As these terms could be misinterpreted, a precise definition should come earlier. More information is needed regarding the parameters. It is unclear which parameters were fit, and no parameter values were given. For the lognormal, a mean is given, but no standard deviation.

      Thank you for the comments. We have added a short summary on the model structure and definition of key terms in the last paragraph of the Introduction section but referenced the Methods and Materials section for further elaboration. Furthermore, we have added a list of mathematical notations which contains both unknown parameters to be modelled, derived parameters and the observed distributions and their corresponding parameters.

      The identifiability of the model should be discussed. Given the wide range of some confidence intervals, it does seem that parameter identifiability could be a problem. The extent of this is hard to assess given the level of information on the parameters currently given.

      Thank you for this comment. Within the section “Effectiveness of case finding and contact tracing”, we have added a final line as follows:

      Across all time periods in 2020, ϵcf exhibits wide confidence interval as a result of some correlation with the factor, ρ, which scales the extent of missed imported infections (Figure 2-figure supplement 3).

      Furthermore, under the discussion section, we have, in the original submission, provided suggestions on how to overcome this challenge.

      Furthermore, unlinked cases were generated by either missed imported or local infections with the former modelled as a factor of the notified imported cases, ρ. As such, the interaction of model parameters results in ϵcf estimates characterised by wide 95% credible intervals. To improve these estimates, we could further stratify exposure histories of unlinked cases by their interactions with travellers from countries with ongoing outbreak for model fitting.

      Reviewer #3 (Public Review):

      This paper presents a new mathematical method to estimate case ascertainment (the fraction of infections identified as cases) and applies it to COVID-19 data from Singapore over the period early 2020 to late (pre-Omicron) 2021.

      The method relies upon access to line-listed case data that includes whether or not a case was an importation or was (epidemiologically) linked or unlinked.

      Through the application of this method, new results on the contribution of case identification and contact tracing to reducing transmission are derived. Reproduction numbers for different classes of infections (importations, linked, unlinked) are computed.

      A sensitivity analysis establishes that inclusion of the richer case line-list information results in tighter credible intervals for key parameters of interest, although there is no 'gold standard' on which to evaluate if those estimates are more accurate.

      Of some interest, the results suggest that both the case- and infection-fatality-ratios were lower during the Delta wave in 2021 than the ancestral wave in 2020. The underlying reasons (particularly for the IFR) are unclear and not investigated, but are perhaps due to vaccination (although the paper reports approximately just 35% vaccination coverage in early 2021), age-specific effects or other socio-demographic effects.

      The method is based on a branching process model, for which the key elements are well described (noting some (likely) typographical errors in the typesetting of the equations), although the exact details of the computational implementation are not described, leading to some ambiguity in how the method is implemented.

      Thank you very much for the overall comments and suggestions.

    1. Author Response

      Reviewer #1 (Public Review):

      Psychiatric symptoms in Parkinson's disease are debilitating, but there are few treatments that effectively reduce these symptoms long-term. The mechanisms that cause psychiatric symptoms in Parkinson's disease are unknown. However, it has been known for decades that abnormal alpha-synuclein is found in the amygdala, a brain region important for the control of emotions. Nagaraj et. al. present an article in which they attempt to characterize the differences in α-synuclein colocalization with vGluT1+ compared to vGluT2+ terminals in the BLA of a PFF mouse model. They successfully demonstrate convincing data that points to the preferential association of α-synuclein with vGluT1+ puncta and not vGluT2+ puncta. The authors also demonstrate that PFFs promote short-term depression of cortico-BLA synapses in response to repetitive stimuli which does not occur in vGluT2+ terminals.

      Clearly differentiating the association of α-synuclein with different glutamatergic terminals and cortical or thalamic projections, and the subsequent effect of abnormal α-synuclein and how it affects transmission in the BLA is novel and points to mechanisms of differential vulnerability to inclusions in different neuronal bodies.

      This study is one of the first to use electrophysiology to show that abnormal alpha-synuclein contributes to defects in the amygdala in Parkinson's disease. The study also pinpoints cortical-amygdala projections as the culprit in amygdala dysfunction. Therefore this study has a major impact in the field by determining how abnormal amygdala function caused by pathologic alpha-synuclein can potentially cause psychiatric symptoms in Parkinson's disease.

      The main weakness of the study is the lack of mechanism. Although the authors attempt to show that loss of synuclein in mice injected with PFFs is responsible for the amygdala defects, the data are insufficient to make this conclusion.

      We thank the reviewer for the comments on the importance of this work and suggestions for improvement. We agree with the reviewer that the initial submission was descriptive, thus we have performed additional experiments and analyses that have allowed us to reveal potential mechanisms underlying the changes in synaptic strength and plasticity, as outlined below.

      Reviewer #2 (Public Review):

      The data presented are clear and of high quality. The conclusion that alpha-synuclein aggregation and corresponding synaptic dysfunction preferentially occurs in vGluT1 expressing cortical inputs (as opposed to vGluT2 expressing thalamic inputs) to the BLA is convincing, but a few additional clarifications and experiments would greatly help describe the mechanism of synapse dysfunction. Overall this manuscript provides helpful insight into the circuit dysfunctions that may contribute to non-motor psychiatric symptoms that commonly occur in Parkinson's disease.

      1) The BLA is a relatively large structure, and the labeled terminal fields of cortical and thalamic inputs (figure 2) don't show matching patterns. It would be helpful to clarify where in the BLA recordings were made (and where high mag images in figure 1 were taken from).

      In the present study, immunohistochemistry (Figures 1 and 5) and electrophysiology (Figure 3, 4, and 6) data were collected from the medial part of the anterior basolateral amygdala (BLAam), where heavy αSyn pathology can be found (Figure 3B). This subregion of BLA also receives both cortical and thalamic inputs (Figure 3I, M). Images from similar rostrocaudal sections have been updated as representative images in the related figures.

      2) The short-term plasticity experiments shown in figure 4 are informative, but by themselves don't necessarily rule out post-synaptic mechanisms of adaptation. Since the mobilization of synaptic vesicles is likely involved, it would be helpful to also look at the effect of PFFs on release probability using pared pulse ratios.

      To address the Reviewer’s comments, we performed additional experiments/analyses, as outlined below:

      • To assess postsynaptic adaptations at cortico-BLA synapses, we analyzed AMPA/NMDA ratio and AMPA receptor rectification index (Figure 4 K-M). Interestingly, we detected a reduced AMPA/NMDA ratio and an enhanced inward rectification of AMPA receptors at cortico-BLA synapses in slices from PFFs-injected mice. These data suggest an overall reduction of postsynaptic AMPA receptor function and an increased relative contribution of GluA2-lacking AMPA receptors to cortico-BLA transmission in PFFs-injected mice. These data and discussion are now included in the last paragraph of Page 17.

      • To assess changes in the initial release probability of SVs, we stimulated cortico-BLA synapses using paired pulses of electric (20 Hz) and optogenetic (10 Hz) pulses. Our data showed no change in the ratio of EPSC2/EPSC1 between groups using either approach (Figure 4D-G), suggesting that the initial release probability was not altered in PFFs-injected mice versus controls. Thus, we propose that the development of αSyn pathology could affect SV mobility, leading to a slower refilling of the active zone from reserve pool, which is then revealed by prolonged repetitive stimulation (Figure 6).

      • We assessed the quantal release at cortico-BLA synapses by replacing Ca2+ with Sr2+. We detected a decreased frequency of Sr2+-induced, optogenetically-evoked cortico-BLA EPSCs in slices from PFFs-injected mice (Figure 4H, I). Together with the unaltered density of cortico-BLA axon terminals and the initial release probability, we proposed that αSyn pathology decreases the number of release sites at axon terminals. Testing of this hypothesis warrants further studies using electron microscope or expansion microscope techniques.

      • Last, we quantified the density of vGluT1 in the BLA and did not detect change in vGluT1 density between groups (Figure 4A-C), suggesting no degeneration of cortical axon terminals (as mentioned by Reviewer 3 below).

      Together, we conclude that both pre- and post-synaptic alterations contribute to the altered basal and dynamic cortico-BLA connection as αSyn pathology develops.

      3) PPFs reduce cortico-BLA EPSC amplitudes but not thalamo-EPSC amplitudes in response to single electrical and optogenetic stimuli (figure 2). In figure 4, however, the starting amplitudes appear to be similar (at least in the exemplar traces). I'm assuming this is because stimulus intensities were adjusted to achieve a similar starting point? If so, are differences in short-term plasticity also observed if similar stimulus intensities are used?

      As noted by the reviewer, stimulation intensity was adjusted to evoke 200-300 pA the 1st EPSCs for these experiments.

      In a subset of experiments, we delivered the same intensity of electric stimulation to slices from control and PFFs-injected mice and repeated the repetitive stimulating experiments. We observed very similar faster and stronger suppression of cortico-BLA EPSCs in slices from PFFs injected mice (e.g., EPSC200/EPSC1, control = 0.350.14, n = 5 cells; PFFs = 0.200.04, n = 6 cells). We would like to point out that under such conditions, EPSCs from PFFs-injected mice showed smaller initial amplitudes and the subsequent EPSCs decayed quickly to noise level, making the quantification of later EPSCs less meaningful. Thus, these data were not included in the revised manuscript.

      Reviewer #3 (Public Review):

      1) In this manuscript, the authors try to address whether glutamatergic axonal terminals are differentially impacted by a-syn aggregation, a key pathology seen in Parkinson's disease. Using a-syn PFF injection, and a-syn KO mice, the authors show a few interesting findings: 1. After a-syn PFF injection in the BLA, the strength of the cortical inputs was selectively reduced, while leaving thalamic inputs unaffected. 2. There is an interesting parallel finding on the release probability of cortical glutamatergic synaptic transmission after a-syn PFF injection and in the a-syn KO mice. The key findings are interesting, showing selective vulnerability of glutamatergic synapses, in which vGluT1+ terminals are more profoundly affected by a-syn PFF or loss of function.

      The authors thank this reviewer for highlighting the importance of this work and the thoughtful comments.

      2) However, mechanistically, the authors implied that a-syn PFF induced aggregation sequesters soluble a-syn, acting more similar to a-syn KO conditions. This does seem to be plausible for the enhancement of release probability. But, what would be responsible for the reduction of cortico-BLA synaptic transmission? Previous studies showed that there was no neurodegeneration one month after a-syn PFF injections in the cortex.

      In the revised manuscript, we have now included additional experiments/analyses and further clarified potential mechanisms underlying the impaired basal and dynamic cortico-BLA transmission in the PFFs model. Please see above responses to the point #2 of the response to Reviewer 1 as well as the point #2 of the response to Reviewer 2.

      3) Do the authors imply that some vGluT1+ terminals are lost after a-syn PFF injection? The authors did not quantify the number of vGluT1+ puncta in the BLA after a-syn PFF injections.

      We apologize for the confusing descriptions in the initial submission. We did not detect a reduction of vGluT1 density in PFFs-injected mice at 1 mpi (Figure 4A-C), indicating no degeneration of vGluT1+ cortical terminals.

      4) The authors also used intrastriatal a-syn PFF injection as a comparison. However, the data were not shown in the manuscript. Because striatum also receives convergent cortical and thalamic inputs, it would strengthen the conclusion if the authors systematically investigate corticostriatal vs thalmostriatal terminals in parallel.

      In the present work, we injected αSyn PFFs into the striatum to induce αSyn aggregation in the BLA through the seeding process and assessed its impact on synaptic transmission in the BLA. This intrastriatal PFFs model induces heavy pS129 pathology in the BLA and avoids local inflammatory responses at the injection site.

      Systematically study the corticostriatal and thalamostriatal transmission in the context of αSyn aggregation and PD pathophysiology would be the next follow-up experiment to do. However, it is worthy to note that PFFs injections into the striatum is likely to trigger local inflammation and chronic microglia activation, which may confound the conclusion on striatal circuitry changes. We sincerely request to leave this question for future studies, since the present work was initially designed to understand the impact of αSyn pathology to amygdala function, which may be relevant to the biology of psychiatric defects in PD.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors present a viral dynamical model to predict the distribution of patient rebound times to bNAbs using only information about the population diversity at the onset of treatment. To parametrize this model, the authors identify mutational target sizes for bNAbs escape mutations from an analysis of deep mutational scanning data and infer the fitness costs of these mutations from a bNAb-free cohort. Paired with a rescaling factor that represents the amount of unsampled diversity in the reservoir, the authors have produced a model with few parameters that in aggregate does a good job of predicting trial outcomes in well-tracked cohorts. Using this validated model, they predict the percentage of late-rebounding viral populations treated with novel combination therapies, suggesting that three simultaneous bNAbs are required to prevent early rebound in the majority of individuals.

      Strengths:

      Because many of the model creation is largely driven by non-bNAb datasets, one of the major strengths is that the model is able to make predictions about rebound timing from very little data (i.e., population diversity before therapy). In doing so, it circumvents potential problems of overfitting limited data. In general, the analysis is careful and the authors derive many attributes from their data that important answer questions peripheral to the central stated goal. For example, they estimate the frequency of escape mutations arriving via mutation after therapy onset as opposed to those stemming from standing genetic variation before therapy onset. Additionally, they quantify the contribution of the unsampled genetic reservoir to escape dynamics.

      The paper is clearly written, and will be an asset to newcomers to the field.

      We Thank the reviewer for the encouraging comments.

      Weaknesses:

      1) One potential weakness of the paper is that the model encodes all escape mutations as conferring a complete rescue effect in the presence of bNAbs. I didn’t see clear justification of this in the paper, and I’m not sure that evidence from the literature really suggests that this is true (or that is maybe only true for a subset of bNAbs). The IC50s of 3BNC117 to different viral isolates before and after treatment that are reported in the supplement of Caskey et al, 2015 show that there can be orders of magnitude differences in the evolved populations between individuals suggesting not all resistance is the same. The authors do not really consider that multiple smaller effect mutations combine to create larger effect escape phenotypes. While it’s possible that on these timescales, any viruses with positive growth rates should be sufficient to drive rapid population rebound and differences in these growth rates don’t matter, this argument wasn’t clearly articulated in the text.

      This is an excellent point and we have added a new Appendix (Appendix 3) to the manuscript discussing this matter in detail. There are two related phenomena to consider: incomplete neutralization and the effect of multiple mutations to create an escape variant.

      i. Regarding incomplete neutralization:

      We have added language in the Discussion section, Limitations (page 12, Line 464-471):

      "In our model of viral escape, we neglect the possibility of incomplete escape of the virus due to the reduced neutralization efficacy of bNAbs as their concentrations decay during trials. In Appendix 3, we show that this simplifying assumption is valid as long as the IC50 is not the same order of magnitude as the initial dosage concentration of the infused bNAb. Notably, the data from therapy trials used in this study fall into the regime for which we can neglect the impact of incomplete neutralization (Appendix 3-Figure 2). However, taking into account the dependence of viral fitness on bNAb concentration and its neutralization efficacy, as in the model proposed by [R34], could improve the long-term predictive power of our approach."

      Moreover, in Appendix 3 we explore the effects of incomplete neutralization on rebound trajectories. As we show in Appendix 3-Figure 1, if an antibody has an IC50 against the viral variant which is an order of magnitude above the initial antibody concentration, the viral dynamics very closely follows the idealized “escaped” trajectory (i.e., with complete neutralization). On the other hand, for an IC50 an order of magnitude below the initial concentration, the viral dynamics behave similarly to a completely neutralized virus, with a late rebound (later than 8 weeks). We found that the most important effect of incomplete neutralization on the dynamics of viremia occurs when the antibody has an IC50 against a resistant variant that is roughly of similar magnitude to the initial bNAb concentration in a patient’s serum; see Appendix 3-Figure 1. In Appendix 3-Figure 2 we show the distribution of IC50 and the initial bNAb concentration from the 10-1074 trial [R35] to see how often we would expect IC50 and initial concentration to be of the same order of magnitude. We find that the IC50 values in this trial are much lower (higher) for susceptible (resistant) variants compared to the initial bNAb concentration in all patients. Therefore, our simplified model assuming that a viral variant is either fully resistant or susceptible to a bNAb (i.e., no incomplete escape) is a reasonable approach for capturing the statistics of treatment failure at the concentrations tested in these trials. Nonetheless, developing a genotype-to-neutralization model such as the ones in ref. [R1,R2] may allow for a more nuanced approach to characterize neutralization in future work.

      ii. Regarding the effect of multiple mutations for escape:

      One might still argue that the single-site substitution model of escape is limited, and that we have ignored the possibility of escape requiring multiple mutations from the consensus strain to generate a fully resistant and viable variant. However, viruses which require more than one mutation at treatment initiation cannot contribute to rebound because the neutralization timescale is too short for the virus to acquire more than one mutation. As we show in Figure 2-E, even acquiring one resistant mutation after infusion is rare. The de novo resistant population has frequency x(µ) ∼ 10−5 (equation 3, page 10). Requiring two independent mutation events instead of one would replace x(µ) with (x(µ))2 (roughly 10−10). Such a double-mutant population could never surpass the stochastic threshold for establishment, since the characteristic extinction frequency xext is on the order of 10−4 (Figure 2-D).

      Although double-mutants are very unlikely be produced de novo in the course of a therapy, they could still be considered as present in the background genome. We may therefore view the question of multiple mutations as being closely related to the background dependence of escape pathways. As we have noted with regards to the shortcomings of the DMS data (see answer to Q#3 from essential revisions), our incomplete understanding of epistatic effects remains a limitation of our analysis and more data would be needed to address this problem.

      2) The manuscript identifies a number of escape versus susceptible mutations based on DMS data and other patient-derived data taken from the literature. I remain incompletely convinced that these resistance mutations alone can explain population rebound in the clinical trial data that the authors fit. For example, for the trial on 3BNC117, this paper identifies four sites (279, 281, 282 and 459, listed in Appendix 1) where specific amino acid identities should confer resistance to 3BNC117. In looking at the genotypes of 10 viral populations treated with 3BNC117 and plotted in Figure 4 of this original paper (Caskey et al, 2015), only 1 of the 10 post-treatment viral populations has mutations at any of these four sites identified in this manuscript (279, 281, 282, 459). This suggests that the description of resistance mutations may not be sufficiently inclusive. The mutational target size is a critically important part of the model, so the potential for resistance outside of the identified ones could be problematic. Relating to the point above, these mutations may not have appeared in the screen for resistance mutations because they are of smaller effect. I would like the authors to try to demonstrate a better validation of their mutational targets.

      There are two papers describing results from this trial:

      (a) Caskey et al., Nature 2015 [R36] (the paper that the reviewer mentioned)

      (b) Schoofs et al., Science 2016 [R37]

      Sequences from some participants are included in only one the studies. We considered sequences derived from whole genome sequencing (available at GenBank), originated from Schoofs et al. [R37]. Sequences from Caskey et al. [R36] were generated usinghigh throughput bulk sequencing. Therefore, there is some discrepancy between the escape sites that we identify from the whole genome sequencing data of ref. [R37], and those identified in ref. [R36]. Nonetheless, we do find escape pathways in ref. [R36] which match the results we found from data originating in ref. [R37]. These include escape substitutions at HXB2 sites 279 (seen in patient 2A1), 281 (in patients 2E2, 2A3), and 459 (in patient 2C5). See Figure 4-A in ref. [R36] for more details.

      We have fixed some references and added clarification in the Methods section: Data from bNAb trials (page 13 lines 516-518). The reviewer’s point is valid regarding the difficulties with the CD4bs bNAbs, and also that epistatic effects could influence the escape of a given variant on different genetic backgrounds. In the new appendix (Appendix 4) we discuss these issues in more detail; see our response to Q #3 of essential revisions.

      3) Maybe relatedly, the authors identify that there are potential difficulties in using the DMS data from the CD4 binding site antibodies 3BNC and VRC01, and so they supplement this analysis of escape-mediating variants with other data sources (paragraph starting on line 490). First, it would be useful to have more detail around how exactly these mutations were identified from these other sources. Second, it sounds like the mutations identified in DMS for 3BNC and VRC01 aren’t concordant with those that are observed in treated HIV populations. I’m not familiar enough with these trials to know whether there is sufficiently extensive patient genetic data for each of these bNAbs treatments that can be used to look for large effect escape mutations, but it would be useful to have some measurement of how predictive these DMS-identified mutations are of actual patient escape mutations. Could comparing these two distributions (of DMS-identified mutations and patientidentified mutations) in cases in which both are available give us more confidence about their performance when only DMS data is available?

      We agree with the reviewer that the DMS data required further consideration and we have added new Appendix (Appendix 4). Following the reviewer’s suggestion to compare the predictive results of the DMS data vs. trial, Appendix 4-Figure 1 now shows these predictions side-by-side. We also analyzed another trial with PGT121 for which we can now compare our predictions based on the escape variants inferred from the DMS data and from the trial data [R9]. Although it is difficult to generalize based on two bNAbs (10-1074 and PGT121), it seems that the DMS data may be slightly more optimistic, which accords with our intuition that the lack of diversity in the DMS parent strain may preclude background-dependent mutations which are more likely in-vivo. We stress that, in our view, the PGT121 predictions are remarkably good considering there are no new parameters fit. These issues are discussed in detail in Appendix 4, Discussion section, and the Methods sections; see a more detailed response to Q#3 of essential revisions.

      4) It was not completely clear how the application of multiple bNAbs worked in the context of the model - did genotypes need to have one or more escape mutations for each bNAbs in order to replicate? For a three-bNAb combination therapy, is a virus carrying two escape mutations able to replicate?

      We have attempted to clarify with a direct statement:

      For multivalent treatment, a virus must be resistant to all antibodies comprising the treatment to have a positive growth after infusion. (page 4, line 162)

      5) The paper was quite brief in terms of placing its own work in the context of other modeling studies of bNAbs escape.

      We have substantially extended the introduction to discuss other studies on bNAb escape and to highlight our contributions in this work. Please see the answer to Q#1 of the essential revisions and also the red marked text in the introduction.

      6) The manuscript analyzes the use of bNAbs for suppression of viral load, but does not discuss what the model might tell us about maintaining viral suppression in individuals with suppressed viral loads transitioning off of ART (which seems like it might be a more likely use case for bnABs in the future).

      Thanks for pointing this out to us. We have added the following paragraph to the Discussion section (lines 442-455):

      “One strategy to achieve a longer term treatment success is by combining bNAb therapy with ART. One main advantage of bNAb therapy is the fact that it can be administered once every few months, in contrast to ART, which should be taken daily and missing a dose could lead to viral rebound. Although multivalent bNAb therapy reduces the chances of short-term viral escape, viral escape remains a real obstacle for longer term success of a treatment with bNAbs. Alternatively, (fewer) bNAbs can be administered in combination with ART [R22,R23], whereby ART could lower the replication rate of the HIV population, reducing the viral diversity and the chances of viral escape. Specifically, we can expect that emergence and establishment of rare (i.e., strongly deleterious) escape variants against bNAbs to be less likely in ART+ patients, which suggests that fitness-limited bNAbs should be more effective in conjunction with ART. Still, more data would be necessary to understand the longterm efficacy of such augmented therapy, and specifically the role of viral reservoirs in this context. A modeling approach could then shed light on how ART administration and bNAb therapy could be combined to efficiently achieve viral suppression.”

      7) This model assumes that the pressure imposed by bNAbs is constant for the first 8 weeks. What are the half-lives of the bNAbs involved, and is this a fair assumption? For example, Kwon et al, J Virol 2016 suggests 10E8 has a half-life of 5 days. Wouldn’t this require ongoing infusions to keep clinically relevant levels of the bNAbs around?

      We thank the reviewer for pointing this out to us. Since submission we have learned that 10E8 is subject to several problems. It has displayed toxicity in trials, and this short half life might be consistent with selfbinding see trial report https://clinicaltrials.gov/ct2/show/NCT03565315. In contrast the other antibodies we considered, including 10-1074 (half-life: 24.0 days in uninfected and 12.8 days in HIV-1-infected [R35]), 3BNC117 (half-life: about 18 days based on [R37]), PGT121 (half-life: about 20 days based on figure plots in [R9])), are likely to remain above the neutralizing threshold for the duration of the study. Indeed avoiding escape is not the end of the story, and successful bNAb therapy would target longer-lived antibodies. We now point out this caveat in the Discussion section: (lines 491-498)

      “It should be noted that our analysis in [figure 4c (antibody ranking)] only focuses on one aspect of therapy optimization, i.e., the suppression of escape. Other factors, including potency (neutralization efficacy) and half-life of the bNAb, or the patient’s tolerance of bNAbs at different dosage should also be taken into account for therapy design. For example, the bNAb 10E8, which we identified as of the most promising mono-therapy candidates in Figure 4, is shown to be poorly tolerated by patients with short half-life [R24], making it undesirable for therapy purposes. Thus, the bNAb candidates shown in Figure 4C should be taken as a guideline to be complemented with further assessment of efficacy and safety for therapy design.”

      Reviewer #3 (Public Review):

      The authors attempted to identify an optimal combination of broadly neutralizing antibodies (bNAbs) that can suppress escape of HIV-1 from the therapy. To do so, the authors fit a birth-death model of viral dynamics using published longitudinal HIV sequence data from 9 untreated patients. Using inferred quantities to parametrize the model subject to bNAb infusion, they predict the distribution of rebound times of HIV in therapy trials with two mono-therapy and their combination. Finally, using deep mutational scanning (DMS) data to identify escapemediating variants against 9 bnAbs for HIV, they propose a triplet combination that may best suppress early viral rebound. While the goal is clear, there are a number of major weaknesses that curtail the quality of the work:

      1) First, the approach is not novel. It at best represents a synthesis of known methods and published data sets.

      We have now substantially extended and restructured the introduction section to highlight the novelty of our approach and compare our work to prior studies. In brief:

      We discuss the modeling and machine learning techniques trained on experimental data from neutralization assays against pseudo-viruses to characterize the efficacy of bNAbs and their combinations against different variants of HIV [R1–R3]. These modeling approaches to optimization view the infection as a static collection of viral strains to be neutralized as opposed to an actively evolving population. We then discuss the mechanistic models that have been developed to explain the dynamics of viremia in patients following passive infusion of bNAbs [R4–R9]. These detailed models use trial data to fit parameters in relation to a bNAb’s efficacy in clearing virions, reducing viral load, etc. Although many of the inferred parameters are common across studies, these detailed mechanistic models cannot easily generalize from one trial to another in order to predict the efficacy of a new bNAb mono- or combination therapy. We discuss that evolution of the HIV population is another key factor to consider in modeling the dynamics of viremia in response to therapy with ART or bNAb. We then present our approach as a coarse-grained evolutionary model of viral response to bNAb infusion that uses genetic data of HIV in untreated patient to predict bNAb therapy outcome by characterizing the chances of viral escape from a given bNAb in patients. Although our model does not accurately reproduce the detailed dynamics of viremia in each patient and lacks the mechanistic insight of richer models proposed previously (these are not our goals), it can accurately predict the distribution of viral rebound times in response to passive bNAb infusions – a key measure of efficacy for a bNAb therapy trial. We then emphasize that our prediction for the viral rebound time in response to a bNAb relies on only a few patient-specific parameters (i.e., the genetic diversity of patients prior to treatment), and is primarily done based on the inferred genetic parameters from the deep sequencing of HIV-1 populations in a separate cohort of ART-naive patient. Therefore, we argue that our model could be used to guide therapy trial design by identifying optimal combinations of bNAbs to suppress evolutionary escape of HIV in patients.

      The detailed added text is marked in red in the introduction.

      In addition, in the Discussion section we have added the following text to argue how our approach can be used to identify combo therapies (lines 419-426):

      “Combination therapy with more than two bNAbs (or drugs in ART) has long been shown to be more effective in suppressing early HIV rebound, both in theory and practice [R1,R2,R4,R10–R12]. In addition to corroborating this conclusion quantitatively, we provide a method for assessing new bnAbs for which escape mutations are known. Our method can be understood as a tool to navigate the combinatorial explosion of higher order cocktails for which we cannot possibly test all combinations. By assessing the evolvability of resistance against different combinations we can identify the best therapies to target for clinical trial. Specifically, we show that to suppress the chance of viral rebound to below 1%, we show that a combo-therapy with 3 bNAbs with a mixture of mutation- and selection-limited strategies that target different regions of the viral envelope is necessary. Such combination can counter the full variation of viral diversity observed in patients. We found that PG9, PG151, and VRC01, which respectively target V2 loop, Interface, and CD4 binding site of HIV envelope, form an optimal combination for a 3-bNAb therapy to limit HIV-1 escape in patients infected with clade B of the virus.”

      Taken together, our main biological results can be summarized as follows:

      (a) Predicting the distribution of bNAb treatment outcomes (short-term) for trials that have already been conducted (Figure 3-A)

      (b) Estimates for all combinatorial possibilities for 9 bNAbs studied in this work. (Figures 3 and 4-D)

      (c) Quantitative corroboration of importance of multivalent treatment and necessity of at least 3 bNAb’s in effective therapy (Figure 4-D)

      (d) An identification and discussion of the importance of mutation-limited and selection-limited antibodies for therapy design (Figure 4-C)

      (e) Quantifying the relative importance of de novo mutations vs. standing variation via simulations (Figure 2) (f) Providing a quantitative approach for rational design of bNAb therapy

      In addition to the new biological insight, this work presents methodological novelty, specifically with regards to robust algorithms for inference of evolutionary parameters and the statistical tests to quantify the accuracy of such inferences. These include (but not limited to):

      (a) Inference of mutational target size from the nucleotide substitution pathways for acquiring resistance (Figure 1-D, and Equations 25 and 26)

      (b) Bayesian posterior for the steady-state relative fitness values of resistant and susceptible variants from sequences of HIV populations in untreated patients (Equations 28-30 and Algorithm 4, Figure 4A and 4-supplement 2 for validation)

      (c) Minimum-disparity based approach for measuring the robustness of the inferred selection parameters and a minimum-disparity based approach for hypothesis testing in the context of censored and categorical data, since rebound times may fall into the categories of right-censored late rebounds (> 56 days) and no response (NR). (Equation 32 and Algorithm 5, and Figure 4-supplement 4)

      (d) Quantifying the critical extinction threshold for therapy success (Equation 17 and Figure 3-D)

      (e) Quantifying and inferring the contribution of the viral reservoir to treatment outcomes (Equation 35 Figure 3-E)

      These statistical measures and modeling developments are likely to be helpful for inference of evolutionary parameters from sub-sampled genetic data in other evolving populations.

      2) Second, the analyses and computational data cannot justify the major claims, in particular the prediction on optimal bNAb combinations - the central goal of this work. Specifically, match of rebound time distribution is only achieved for early rebound due to ineffective bNAbs. This limited validity under restrictive assumptions (within a limited time window) thus cannot validate the optimality of identified combinations that count on effective bNAbs for delayed rebound. More importantly, the proposed optimal combinations are highly sensitive to data quality and depth. In particular, DMS data cannot faithfully probe low-frequency variants that are chiefly responsible for rebound, which undermines the predictive power of the approach.

      1. With regards to the utility of bNAb therapy to suppress early viral rebound:

      We agree with the reviewer that the long-term efficacy of a bNAb treatment is important and we now discuss the nuances relevant to this issue in more depth in the manuscript; see the response to Q#4 of essential revisions. Nonetheless, we want to emphasize that our method can be understood as a tool to navigate the combinatorial explosion of higher order bNAb cocktails for which we cannot possibly test all combinations. By assessing the evolvability of resistance against different combinations we can identify the best candidate therapies that can be further tested in trials. In the manuscript we also discuss the potential role of augmenting bNAb therapies with ART to achieve a long-term success; see the response to Q#4 of essential revisions.

      1. With regards to the limited utility of DMS data to detect escape variants:

      We agree with the reviewer that the utility and limitations of the DMS data should be more clearly demonstrated. We have included a new Appendix (Appendix 4) to show the robustness of our predictions for rebound time distributions, when identifying escape sites from the DMS data versus the trial data. We performed this comparison for the 10-1074 and the PGT121 bNAbs, for which we have access to both the DMS and the trial data. Note that the analysis of the newly published therapy trial dataset with the PGT121 bNAb [R9] was added to the revised version of this manuscript. Please see the response to Q#3 of essential revisions for further details on our efforts to highlight the utility and limitations of DMS data for bNAb therapy design.

      3) Third, the main results are already known from earlier work. It has been long known that a combination of more than two bnAbs is more effective in suppressing early rebound than fewer. Moreover, it has been shown recently that bnAb (VRC01) infusion acts to amplify pre-existing bnAb-resistant viral strains, leading to fast HIV rebound. Hence, it is unclear what new insight this work confers.

      1. With regards to combination therapy:

      As noted in response to Q# 1 of the reviewer, we agree that combination therapy with more than two bNAbs has been shown to be more effective in suppressing early HIV rebound. This work corroborates this conclusion quantitatively and provides a methods to assess the efficacy of new bNAb combinations o suppress viral rebound. For further details, please see our response to Q#1 of essential revisions.

      1. With regards to the role of pre-existing bNAb resistant variants in viral escape:

      One contribution of our work is in quantifying the role of pre-existing resistant variants for escape against bNAbs. Specifically, we used our evolutionary model to quantify the fraction of escape events driven by preexisting bNAb-resistant variants versus viral escape due to spontaneous mutation, and showed that mutationmediated escape accounted for less than 5% of escape; whether this fraction is 5% or 20% can be consequential for therapy design, yet difficult to infer from limited trial data. Although evidence from trial data (VRC01 study pointed by the reviewer) may be insightful, we believe that our quantitative assessment with mathematical modeling and evolutionary reasoning can be extended to a variety of bNAb studies to guide therapy design.

      4) Lastly, suppression of early rebound alone is not a sufficient measure of therapy efficacy. Late rebound is not necessarily a sign of viral control, but might instead indicate selection for cross-resistant viral mutants - an even more detrimental outcome. In addition, this work has neglected bnAb dynamics or influence of infused bnAbs on the response of endogenous B cells, which will be essential for understanding viral dynamics, especially when infused bnAbs are relatively effective at suppressing early rebound.

      1. Regarding selecting for cross-resistant viral mutants: All combination therapy are likely to select for cross-resistant viral mutants to some extent. To reduce the chances of emegence of cross-resistant variants, the therapy should keep viremia low to reduce the diversity of circulating strains. Although bNAbs (even in combinations) may be inefficient to suppress long term viral diversity, augmenting bNAb therapy with ART may be the solution to this problem. We are now emphasizing this issue in the Discussion section; see response to Q#4 of essential revisions.

      2. Regarding the neglected bNAb dynamics:

      We have added a new Appendix 3 and also added language in the Discussion.

      From the Discussion (Lines 464-472):

      In our model of viral escape, we neglect the possibility of incomplete escape of the virus due to the reduced neutralization efficacy of bNAbs as their concentrations decay during trials. In Appendix 3, we show that this simplifying assumption is valid as long as the IC50 is not the same order of magnitude as the initial dosage concentration of the infused bNAb. Notably, the data from therapy trials used in this study fall into the regime for which we can neglect the impact of incomplete neutralization (Appendix 3-Figure 2). However, taking into account the dependence of viral fitness on bNAb concentration and its neutralization efficacy, as in the model proposed by [R34], could improve the long-term predictive power of our approach.

      Moreover, in Appendix 3 we explore the effects of incomplete neutralization on rebound trajectories. As we show in Appendix 3-Figure 1, if an antibody has an IC50 against the viral variant which is an order of magnitude above the initial antibody concentration, the viral dynamics very closely follows the idealized “escaped” trajectory (i.e., with complete neutralization). On the other hand, for an IC50 an order of magnitude below the initial concentration, the viral dynamics behave similarly to a completely neutralized virus, with a late rebound (later than 8 weeks). We found that the most important effect of incomplete neutralization on the dynamics of viremia occurs when the antibody has an IC50 against a resistant variant that is roughly of similar magnitude to the initial bNAb concentration in a patient’s serum; see Appendix 3-Figure 1. In Appendix 3-Figure 2 we show the distribution of IC50 and the initial bNAb concentration from the 10-1074 trial [R35] to see how often we would expect IC50 and initial concentration to be of the same order of magnitude. We find that the IC50 values in this trial are much lower (higher) for susceptible (resistant) variants compared to the initial bNAb concentration in all patients. Therefore, our simplified model assuming that a viral variant is either fully resistant or susceptible to a bNAb (i.e., no incomplete escape) is a reasonable approach for capturing the statistics of treatment failure at the concentrations tested in these trials. Nonetheless, developing a genotype-to-neutralization model such as the ones in ref. [R1,R2] may allow for a more nuanced approach to characterize neutralization in future work.

      1. Regarding the endogenous B cell response:

      The reviewer’s point about the influence of the infused bNAbs on the response of endogenous B cells is very interesting. Unfortunately, we do not have data about these interactions and can only speculate about their relevance in bNAb therapy. Nonetheless, we should emphasize that our model does capture the impact of immune pressure on (pre-trial) viral populations in a coarse-grained way. The fitness values that we infer are not determined in vacuum: they are derived from allele frequencies measured in evolving HIV-1 populations under the constantly changing immune challenge from a host’s endogenous immune system. Therefore, it is reasonable to assume that the changes in the immune challenge are to some extent accounted for in our inference of fitness costs for escape variants. The enormous variation in human immune response means that we can only expect to model these effects in aggregate — as an average over the range of immune systems we find in a dataset. But that we have a mechanism to capture these effects at all is likely a major contributor to the success of our predictions in clinical trials.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors set out to clarify the relationship between brain oscillations and different levels of speech (syllables, words, phrases) using MEG. They presented word lists and sentences and used task instructions to attempt to focus listeners' attention on different levels of linguistic analysis (syllables, words, phrases).

      1) I came away with mixed feelings about the task design: following each stimulus (sentence or word list), participants were asked to (a) press a button (i.e. nothing related to what they heard, (b) indicate which of two syllables was heard, (c) indicate which of two words was heard, (d) indicate which pair of words was present in the correct order. This task is the critical manipulation in the study, as it is intended to encourage (or in the authors' words, "require") participants to focus on different timescales of speech (syllable, word, and phrase, respectively). I very much like the idea of keeping the physical stimuli unchanged, and manipulating attention through task demands - an elegant and effective approach. At the same time, I have reservations about the degree to which these task instructions altered attention during listening. My intuition is that, if I were a participant, I would just listen attentively, and then answer the question about the specific level. For example, I don't know that knowing I would be doing a "word pair" task, I would be attending at a slower rate than a "word" task, as in both cases I would be motivated to understand all of the words in the sentence. I fully acknowledge my introspection (n=1) may be flawed here, but nevertheless, any additional support validating the effect of these instructions would help the interpretation of the MEG results.

      The reviewer points out that to do any task on sentences (such as a word task and a syllable task) participants’ strategy could be to understand the full meaning of the sentence and infer the lower level properties based on the understanding of the full sentence. We fully share this introspection, which would suggest that extracting sentence meaning is partly automatic (or at least a default mode of processing) and independent of the behavioral relevance. While the reviewer sees this as a downside of the design, this is part of what our study tried to disentangle (automatic versus task-dependent processing at lower frequency time-scales). If, as the reviewer points out, all processing of sentences would be automatic we should not find any effect of task (as the task should not affect the tracking response at all). We found that overall the tracking response is robust to task-induced manipulation of attention – the main effect that MI to phrases is higher for sentences than for word lists is robust across passive and task conditions. But that is not the whole story on the source level, where we do find some task effects, which indicates that task instructions do matter. This means that participants changed their strategy depending on the instructions, but that overall, tracking of linguistic structures such as phrases is automatic. We show that for the IFG MI phrasal time scales are tracked stronger during the phrase task versus the other tasks. This is also reflected in stronger STG-IFG connectivity during the phrasal versus passive task. These results speak against the interpretation of the reviewer that “task instructions“ do not “ altered attention during listening”. While there are these subtle task differences, especially in IFG, overall our findings do speak for an automatic tracking of phrasal rate structure in sentences independent of task. We therefore concluded that “automatic understanding of linguistic information, and all the processing that this entails, cannot be countered to substantially change the consequences for neural readout, even when explicitly instructing participants to pay attention to particular time-scales” (line 548-549).

      The analysis steps generally seem sensible and well-suited to answering the main claims of the study. Controlling for power differences between conditions through matching was a nice feature.

      2) I had a concern about accuracy differences (as seen in Figure 1) across stimulus materials and tasks. In particular, for the phrase task, participants were more accurate for sentence stimuli than word list stimuli. I think this makes a lot of sense, as a coherent sentence will be easier to remember in order than a list of words. But, I did not see accuracy taken into account in any of the analyses. These behavioral differences raise the possibility that the MEG results related to the sentence > word list contrast in phrases (which seems one of the most interesting findings in IFG) simply reflect differences in accuracy.

      With the caveat of the concern regarding accuracy differences, the research goals were clear and the conclusions were generally supported by the analyses.

      Thank you for pointing this out. We have now taken accuracy into account in our analysis. It did not change any of our main findings or conclusions, and strengthened the argument that tracking of phrases in sentences vs. word lists is stronger. The influence of task difficulty is a relevant point to investigate (also see point 1 of reviewer 2 and point 4 of reviewer 3). To do so we added accuracy (per participant per condition) as a factor in the mixed model (as well as all interactions with task and condition) for the MI, power, and connectivity analyses at the phrasal rate/delta band. Note that as for the passive task there is no accuracy, we removed the passive task from the analyses. We could also only run models with random intercepts (not random slopes), due to the reduced number of degrees of freedom when adding the factor accuracy to the models.

      For the MI analysis we only found an effect in MTG. Specifically, there was a three-way interaction between task, condition and accuracy (F(2, 91.9) = 3.4591, p = 0.036). To follow up on this three-way interaction we split the data per task. The condition*accuracy interaction was only (uncorrected) significant for the word combination task (F(1,24.8) = 5.296, p = 0.03 (uncorrected)) and not for any other task (p>0.1). In the word combination task, we found that the difference between sentences and word lists was the strongest at high accuracies (see below figure the predicted values of the model). One way to interpret this finding is that stronger phrasal-rate MI tracking in MTG promotes phrasal-rate processing (as indicated by accuracy) more in sentences than in word lists.

      MEG – behavioral performance relation. A) Predicted values for the phrasal band MI in the MTG for the word combination task separately for the two conditions. B) Predicted values for the delta band WPLI in the STG-MTG connection separately for the two conditions. Error bars indicate the 95% confidence interval of the fit. Colored lines at the bottom indicate individual datapoints.

      For power we did not find any effect of accuracy. For the connectivity analysis we found in the STG-MTG connectivity a significant conditionaccuracy interaction (F(1, 80.23)=5.19, p = 0.025). The conditionaccuracy interaction showed that lower accuracies were generally associated with stronger differences between the sentences and word lists (see figure; the opposite of the MI analysis). Thus, functional connections in the delta band are stronger during sentence processing when participants have difficulty with the task (independent of the task performed). This could indicate that low-frequency connections are more relevant for the sentence than the word list condition (as the reviewer also indicated in point 1).

      After correcting for accuracy there was also a significant task condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005 corrected), but not for the other tasks (p>0.1).

      We added the results of the accuracy analyses in the main manuscript as well as adding a dedicated section in our discussion section (page 21-22). Adding accuracy did not remove any of the effects we report in the original analyses. Therefore, none of these finding change the interpretation of the results as the task still had an influence on the MI responses of MTG and IFG. The effect of accuracy in the MTG refined the results showing that the effect was strongest there for participants with high accuracies. This relationship suggests a functional role of tracking through phase alignment for understanding phrasal structure.

      The methods now read: “MEG-behavioural performance analysis: To investigate the relation between the MEG measures and the behavioural performance we repeated the analyses (MI, power, and connectivity) but added accuracy as a factor (together with the interactions with the task and condition factor). As there is no accuracy for the passive task, we removed this task from the analysis. We then followed the same analyse steps as before. Since we reduced our degree of freedom, we could however only create random intercept and not random slope models”.

      The results now read: “MEG-behavioural performance relation. We found for the MI analysis a significant effect of accuracy only in the MTG. Here, we found a three-way interaction between accuracy task condition (F(2, 91.9) = 3.459, p = 0.036). Splitting up for the three different tasks we found only an uncorrected significant effect for the condition accuracy interaction for the phrasal task (F(1, 24.8) = 5.296, p = 0.03) and not for the other two tasks (p>0.1). In the phrasal task, we found that when accuracy was high, there was a stronger difference between the sentence and the word list condition compared to when accuracy was low, with stronger accuracy for the sentence condition (Figure 7A).

      No relation between accuracy and power was found. For the connectivity analysis we found a significant condition accuracy interaction for the STG-MTG connection (F(1,80.23) = 5.19, p = 0.025; Figure 7B). Independent of task, when accuracy was low the difference between sentence and word lists was stronger with higher WPLI fits for the sentence condition. After correcting for accuracy there was also a significant task condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005), but not for the other tasks (p>0.1).”

      The discussion now reads: “We found that across participants both the MI and the connectivity in temporal cortex influenced behavioural performance. Specifically, MTG-STG connections were, independent of task, related to accuracy. There was higher connectivity between MTG and STG for sentences compared to word lists at low accuracies. At high accuracies, we found that stronger MTG tracking at phrasal rates (measured with MI) for sentences compared to word lists during the word combination task. These results suggest that indeed tracking of phrasal structure in MTG is relevant to understand sentences compared to word lists. This was reflected in a general increase in delta connectivity differences when the task was difficult (Figure 7B). Participants might compensate for the difficulty using phrasal structure present in the sentence condition. When phrasal structure in sentences are accurately tracked (as measured with MI) performance is better when these rates are relevant (Figure 7A). These results point to a role for phrasal tracking for accurately understanding the higher order linguistic structure in sentences even though more research is needed to verify this. It is evident that the connectivity and tracking correlations to behaviour do not explain all variation in the behavioural performance (compare Figure 1 with 3). Plainly, temporal tracking does not explain everything in language processing. Besides tracking there are many other components important for our designated tasks, such as memory load and semantic context which are not captured by our current analyses.”

      Reviewer #2 (Public Review):

      In a MEG study, the authors investigate as their main question whether neural tracking at the phrasal time scale reflects linguistic structure building (testing different conditions: sentences vs. word-lists) or an attentional focus on the phrasal time scale (testing different tasks, passive listening, syllable task, word task, word combination/phrasal scale task). They perform the following analyses at brain areas (ROIs: STG, IFG, MTG) of the language network: (1) Mutual information (MI) between the acoustic envelope and the delta band neuronal signals is analyzed. (2) Power in the delta band is analyzed. (3) Connectivity is analyzed using debiased WPLI. For all analyses, linear mixed-models are separately conducted for each ROI. The main finding is that the sentence compared to the word-list condition is more strongly tracked at the phrasal scale (MI). In STG the effect was task-independent; in MTG the effect only occurred for active tasks; and in IFG additionally, the word-combining/phrasal scale task resulted in higher tracking compared to all other tasks. The authors conclude that phrasal scale neural tracking reflects linguistic processing which takes place automatically, while task-related attention contributes additionally at IFG (interpreted as combinatorial hub involved in language and non-language processing). The findings are stable when power differences are controlled. The connectivity analysis showed increased connectivity in the delta band (phrasal time scale) between IFG-STG in the phrasal-scale compared to the passive task (adding to the IFG MI findings). (Additionally, they separately analyze neural tracking at the syllabic and word time scale, which however is not in the main focus).

      Major strength/weaknesses of the methods and results:

      1) A major strength of the results is that part of them replicate the authors' earlier findings (i.e. higher tracking at the phrasal time scale for sentences compared to word-lists; Kaufeld et al., 2020), while they complement this earlier work by showing that the effects are due to linguistic processing and not to an attentional focus on the phrasal time scale due to the task (at least in STG and MTG; while the task plays a role for the IFG tracking). Another strength is that a power control analysis is applied, which allows excluding spurious results due to condition differences in power. A weakness of the method is that analyses were applied separately per ROI, and combined across correct/incorrect trials (if I understood correctly), no trial-based analysis was conducted (which is related to how MI is computed). Furthermore, several methodological details could be clarified in the manuscript.

      The authors achieved their aims by providing evidence that neuronal tracking at the phrasal time scale in STG and MTG depends on the presence of linguistic information at this scale rather than indicating an attentional focus on this time scale due to a specific task. Their results support the conclusion. Results would be strengthened by showing that these effects are not impacted by different amounts of correct/incorrect trials across conditions (if I understood that correctly).

      We thank the reviewer for her comments. It is correct that we collapsed across the correct and incorrect trials. This had various reasons (also see point 2 and 9 of reviewer 1 and point 4 of reviewer 3). First, our tasks function solely to direct participants’ attention to the various linguistic representations (syllables, words, phrases) and the timescales that they occur on. The three tasks are in a sense more control tasks to study the tracking response, and manipulate attention as tracking during spoken language comprehension occurs, rather than a case where the neural response to the tasks is itself to be studied. For example, in a typical working memory paradigm, it is only during correct trials that the relevant cognitive process occurs. In contrast, in our paradigm, it is likely that that spoken stimuli are heard and processing, in other words, sentence comprehension and word list perception occur, even during incorrect trials in the syllable condition. As such, we do not expect MI tracking responses to explain the behavioral data. However, we agree it is crucially important to show that MI differences are not a function of task performance differences.

      Second, there are clear differences in difficulty level of the trials within conditions. For example, if the target question was related to the last part of the audio fragment, the task was much easier than when it was at the beginning of the audio fragment. In the syllable task, if syllables also were (by chance) a part-word, the trial was also much easier. If we were to split up in correct and incorrect we would not really infer solely processes due to accurately processing the speech fragments, but also confounded the analysis by the individual difficulty level of the trials.

      To acknowledge this, we added this limitation to the methods. The methods now reads: “Note that different trials within a task were not matched for task difficulty. For example, in the syllable task syllables that make a word are much easier to recognize than syllables that do not make a word. Additionally, trials pertaining to the beginning of the sentence are more difficult than ones related to the end of the sentence due to recency effects.”.

      To still investigate if overall accuracy influenced the results we did add accuracy (across participants) to the mixed models. Note that as for the passive task there is no accuracy, we removed the passive task from the analyses. We could also only run models with random intercepts (not random slopes), due to the reduced number of degrees of freedom when adding the factor accuracy to the models.

      For the MI analysis we only found an effect in MTG. Specifically, there was a three-way interaction between task, condition and accuracy (F(2, 91.9) = 3.4591, p = 0.036). To follow up on this three-way interaction we split the data per task. The condition*accuracy interaction was only (uncorrected) significant for the word combination task (F(1,24.8) = 5.296, p = 0.03 (uncorrected)) and not for any other task (p>0.1). In the word combination task, we found that the difference between sentences and word lists was the strongest at high accuracies (see on the right attached figure the predicted values of the model). One way to interpret this finding is that stronger phrasal-rate MI tracking in MTG promotes phrasal-rate processing (as indicated by accuracy) more in sentences than in word lists.

      For power we did not find any effect of accuracy. For the connectivity analysis we found in the STG-MTG connectivity a significant conditionaccuracy interaction (F(1, 80.23)=5.19, p = 0.025). The conditionaccuracy interaction showed that lower accuracies were generally associated with stronger differences between the sentences and word lists (see figure below; the opposite of the MI analysis). Thus, functional connections in the delta band are stronger during sentence processing when participants have difficulty with the task (independent of the task performed). This could indicate that low-frequency connections are more relevant for the sentence than the word list condition.

      MEG – behavioral performance relation. A) Predicted values for the phrasal band MI in the MTG for the word combination task separately for the two conditions. B) Predicted values for the delta band WPLI in the STG-MTG connection separately for the two conditions. Error bars indicate the 95% confidence interval of the fit. Colored lines at the bottom indicate individual datapoints.

      After correcting for accuracy there was also a significant task*condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005 corrected), but not for the other tasks (p>0.1).

      We added the results of the accuracy analyses in the main manuscript as well as adding a dedicated section in our discussion section (page 21-22). Adding accuracy did not remove any of the effects we report in the original analyses. Therefore, none of these finding change the interpretation of the results as the task still had an influence on the MI responses of MTG and IFG. The effect of accuracy in the MTG refined the results showing that the effect was strongest there for participants with high accuracies. This relationship suggests a functional role of tracking through phase alignment for understanding phrasal structure.

      The methods now read: “MEG-behavioural performance analysis: To investigate the relation between the MEG measures and the behavioural performance we repeated the analyses (MI, power, and connectivity) but added accuracy as a factor (together with the interactions with the task and condition factor). As there is no accuracy for the passive task, we removed this task from the analysis. We then followed the same analyse steps as before. Since we reduced our degree of freedom, we could however only create random intercept and not random slope models”.

      The results now read: “MEG-behavioural performance relation. We found for the MI analysis a significant effect of accuracy only in the MTG. Here, we found a three-way interaction between accuracytaskcondition (F(2, 91.9) = 3.459, p = 0.036). Splitting up for the three different tasks we found only an uncorrected significant effect for the condition*accuracy interaction for the phrasal task (F(1, 24.8) = 5.296, p = 0.03) and not for the other two tasks (p>0.1). In the phrasal task, we found that when accuracy was high, there was a stronger difference between the sentence and the word list condition compared to when accuracy was low, with stronger accuracy for the sentence condition (Figure 7A).

      No relation between accuracy and power was found. For the connectivity analysis we found a significant conditionaccuracy interaction for the STG-MTG connection (F(1,80.23) = 5.19, p = 0.025; Figure 7B). Independent of task, when accuracy was low the difference between sentence and word lists was stronger with higher WPLI fits for the sentence condition. After correcting for accuracy there was also a significant taskcondition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005), but not for the other tasks (p>0.1).”

      The discussion now reads: “We found that across participants both the MI and the connectivity in temporal cortex influenced behavioural performance. Specifically, MTG-STG connections were, independent of task, related to accuracy. There was higher connectivity between MTG and STG for sentences compared to word lists at low accuracies. At high accuracies, we found that stronger MTG tracking at phrasal rates (measured with MI) for sentences compared to word lists during the word combination task. These results suggest that indeed tracking of phrasal structure in MTG is relevant to understand sentences compared to word lists. This was reflected in a general increase in delta connectivity differences when the task was difficult (Figure 7B). Participants might compensate for the difficulty using phrasal structure present in the sentence condition. When phrasal structure in sentences are accurately tracked (as measured with MI) performance is better when these rates are relevant (Figure 7A). These results point to a role for phrasal tracking for accurately understanding the higher order linguistic structure in sentences even though more research is needed to verify this. It is evident that the connectivity and tracking correlations to behaviour do not explain all variation in the behavioural performance (compare Figure 1 with 3). Plainly, temporal tracking does not explain everything in language processing. Besides tracking there are many other components important for our designated tasks, such as memory load and semantic context which are not captured by our current analyses.”

      The findings are an important contribution to the ongoing debate in the field whether neuronal tracking at the phrasal time scale indicates linguistic structure processing or more general processes (e.g. chunking).

      Reviewer #3 (Public Review):

      This manuscript presents a MEG study aiming to investigate whether neural tracking of phrasal timescales depends on automatic language processing or specific tasks related to temporal attention. The authors collected MEG data of 20 participants as they listened to naturally spoken sentences or word lists during four different tasks (passive listening vs. syllable task vs. word tasks vs. phrase task). Based on mutual information and Connectivity analysis, the authors found that (1) neural tracking at the phrasal band (0.8-1.1 Hz) was significantly stronger for the sentence condition compared to the word list condition across the classical language network, i.e., STG, MTG, and IFG; (2) neural tracking at the phrasal band was (at least tend significantly) stronger for phrase task than other tasks in the IFG; (3) the IFG-STG connectivity was increased in the delta-band for the phrase task. Ultimately, the authors concluded that neural tracking of phrasal timescales relied on both automatic language processing and specific tasks.

      Overall, this study is trying to tackle an interesting question related to the contributing factors for neural tracking of linguistic structures. The study procedure and analyses are well executed, and the conclusions of this paper are mostly well supported by data. However, I do have several major concerns.

      1. The title of the manuscript uses the description "tracking of hierarchical linguistic structure". In general, hierarchical linguistic structures involve multiple linguistic units with different timescales, such as syllables, words, phrases, and sentences. In this study, however, the main analysis only focused on the phrasal band (0.8-1.1 Hz). It seemed that there was no significant stimulus- or task-effect on the word band or syllabic band (supplementary figures). Therefore, it is highly recommended that the authors modify the related descriptions, or explain why neural tracking of phrases can represent neural tracking of hierarchical linguistic structures in the current study.

      We thank the reviewer for this comment. We meant to refer to the task manipulation directing attention to different levels of representation across the linguistic hierarchy. We have changed the title to “Neural tracking of phrases during spoken language comprehension is automatic and task-dependent.” We hope this resolves any inadvertent confusion we created. Furthermore, throughout the manuscript we ensure to talk about effect occurring for phrasal tracking at low frequency bands at not across any hierarchical linguistic structure. We agree that our findings cannot speak for any task-dependent effects along the hierarchy, only that at the phrasal level there is a difference between sentences and word lists.

      1. In Methods, the authors employed MI analyses on three frequency bands: 0.8-1.1 Hz for the phrasal band, 1.9-2.8 Hz for the word band, and 3.5-5.0 Hz for the syllabic band (line 191-192). As the timescales of linguistic units are various and overlapped in natural speech, I wonder how the authors define the boundaries of these frequency bands, and whether these bands are proper for the naturally spoken stimuli in the current study. These important details should be clarified.

      The frequency bands of the MI analysis were based on the stimuli, or in other words, are data driven. They reflect the syllabic, word, and phrasal rates in our stimulus set (calculated in Kaufeld et al., 2020). They were calculated by annotating the sentences by syllables, words, and phrasal and converting the rate of the linguistic units to frequency ranges. The information has been added to the manuscript. We acknowledge that unlike our stimulus set in natural speech the boundaries of these bands can overlap and now also state this (“While in our stimulus set the boundaries of the linguistic levels did not overlap, in natural speech the brain has an even more difficult task as there is no one-to-one match between band and linguistic unit [26]”, line number 211-213).

      1. What is missing in the manuscript are the explanations of the correlation between behavioral performance and neural tracking. In Results, the behavioral performance shows significant differences across the active tasks (Figure 1), but the MI differences across the tasks are relatively weak in IFG (Figure 3). In addition, the behavioral performance only shows significant differences between the sentence and word list conditions during the phrasal task, but the MI differences between the conditions are significant in MTG during the syllabic, word, and phrasal tasks. Explanations for these inconsistent results are expected.

      We answer this point together with point 4 below where we analyze the behavioral performance and the MEG responses.

      1. Since the behavioral performance of these active tasks is likely related to the temporal attention to relevant timescales of different linguistic units, I wonder whether there exist underlying neural correlates of behavioral performance (e.g., significant correlation between performance and mutual information). If so, it may be interesting and bring a new bright spot for the current study.

      The influence of task difficulty is a relevant point to investigate (also see point 1 of reviewer 2 and point 4 of reviewer 3). To do so we added accuracy (per participant per condition) as a factor in the mixed model (as well as all interactions with task and condition) for the MI, power, and connectivity analyses at the phrasal rate/delta band. Note that as for the passive task there is no accuracy, we removed the passive task from the analyses. We could also only run models with random intercepts (not random slopes), due to the reduced number of degrees of freedom when adding the factor accuracy to the models.

      For the MI analysis we only found an effect in MTG. Specifically, there was a three-way interaction between task, condition and accuracy (F(2, 91.9) = 3.4591, p = 0.036). To follow up on this three-way interaction we split the data per task. The condition*accuracy interaction was only (uncorrected) significant for the word combination task (F(1,24.8) = 5.296, p = 0.03 (uncorrected)) and not for any other task (p>0.1). In the word combination task, we found that the difference between sentences and word lists was the strongest at high accuracies (see the below figure the predicted values of the model). One way to interpret this finding is that stronger phrasal-rate MI tracking in MTG promotes phrasal-rate processing (as indicated by accuracy) more in sentences than in word lists.

      MEG – behavioral performance relation. A) Predicted values for the phrasal band MI in the MTG for the word combination task separately for the two conditions. B) Predicted values for the delta band WPLI in the STG-MTG connection separately for the two conditions. Error bars indicate the 95% confidence interval of the fit. Colored lines at the bottom indicate individual datapoints.

      For power we did not find any effect of accuracy. For the connectivity analysis we found in the STG-MTG connectivity a significant conditionaccuracy interaction (F(1, 80.23)=5.19, p = 0.025). The conditionaccuracy interaction showed that lower accuracies were generally associated with stronger differences between the sentences and word lists (see figure attached; the opposite of the MI analysis). Thus, functional connections in the delta band are stronger during sentence processing when participants have difficulty with the task (independent of the task performed). This could indicate that low-frequency connections are more relevant for the sentence than the word list condition.

      After correcting for accuracy there was also a significant task*condition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005 corrected), but not for the other tasks (p>0.1).

      We added the results of the accuracy analyses in the main manuscript as well as adding a dedicated section in our discussion section (page 21-22). Adding accuracy did not remove any of the effects we report in the original analyses. Therefore, none of these finding change the interpretation of the results as the task still had an influence on the MI responses of MTG and IFG. The effect of accuracy in the MTG refined the results showing that the effect was strongest there for participants with high accuracies. This relationship suggests a functional role of tracking through phase alignment for understanding phrasal structure.

      While the findings can explain some behavioral effects, we agree with the reviewer that the behavioral results and the MI results don’t align. We note that our use of tasks to guide attention to different timescales and linguistic representations differs from the use of, for example, a working memory task where only the correct trials contain the relevant cognitive process. In working memory type paradigms, the MEG data should indeed explain the behavioral response. Our study was designed to test for effects of task demands on the neural tracking response to speech and language. As we are only using the tasks to control attention, we do not attempt to explain behavior through the MEG data or differences in MI.

      Thus, the phrasal tracking cannot explain all of the behavioral results (point 3). It is at this point unclear what could have caused this effect, but it quite likely that neural sources outside the speech and language ROIs we selected are in play. We discuss this now.

      The methods now read: “MEG-behavioural performance analysis: To investigate the relation between the MEG measures and the behavioural performance we repeated the analyses (MI, power, and connectivity) but added accuracy as a factor (together with the interactions with the task and condition factor). As there is no accuracy for the passive task, we removed this task from the analysis. We then followed the same analyse steps as before. Since we reduced our degree of freedom, we could however only create random intercept and not random slope models”.

      The results now read: “MEG-behavioural performance relation. We found for the MI analysis a significant effect of accuracy only in the MTG. Here, we found a three-way interaction between accuracytaskcondition (F(2, 91.9) = 3.459, p = 0.036). Splitting up for the three different tasks we found only an uncorrected significant effect for the condition*accuracy interaction for the phrasal task (F(1, 24.8) = 5.296, p = 0.03) and not for the other two tasks (p>0.1). In the phrasal task, we found that when accuracy was high, there was a stronger difference between the sentence and the word list condition compared to when accuracy was low, with stronger accuracy for the sentence condition (Figure 7A).

      No relation between accuracy and power was found. For the connectivity analysis we found a significant conditionaccuracy interaction for the STG-MTG connection (F(1,80.23) = 5.19, p = 0.025; Figure 7B). Independent of task, when accuracy was low the difference between sentence and word lists was stronger with higher WPLI fits for the sentence condition. After correcting for accuracy there was also a significant taskcondition interaction (F(2,80.01) = 3.348, p = 0.040) and a main effect of condition (F(1,80.361) = 5.809, p = 0.018). While overall there was a stronger WPLI for the sentence compared to the word list condition, the interaction seemed to indicate that this was especially the case during the word task (p = 0.005), but not for the other tasks (p>0.1).”

      The discussion now reads: “We found that across participants both the MI and the connectivity in temporal cortex influenced behavioural performance. Specifically, MTG-STG connections were, independent of task, related to accuracy. There was higher connectivity between MTG and STG for sentences compared to word lists at low accuracies. At high accuracies, we found that stronger MTG tracking at phrasal rates (measured with MI) for sentences compared to word lists during the word combination task. These results suggest that indeed tracking of phrasal structure in MTG is relevant to understand sentences compared to word lists. This was reflected in a general increase in delta connectivity differences when the task was difficult (Figure 7B). Participants might compensate for the difficulty using phrasal structure present in the sentence condition. When phrasal structure in sentences are accurately tracked (as measured with MI) performance is better when these rates are relevant (Figure 7A). These results point to a role for phrasal tracking for accurately understanding the higher order linguistic structure in sentences even though more research is needed to verify this. It is evident that the connectivity and tracking correlations to behaviour do not explain all variation in the behavioural performance (compare Figure 1 with 3). Plainly, temporal tracking does not explain everything in language processing. Besides tracking there are many other components important for our designated tasks, such as memory load and semantic context which are not captured by our current analyses.”

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper the authors find that vitamin C (VC) enhances the differentiation of B cells to plasma cells (PC) in an in vitro culture system and link the treatment regimen to changes in the DNA methylation pattern changes associated with B cell differentiation. The work generally supports the conclusions. The differentiation of B cells to PC is critical to the induction of adaptive immunity to infection and vaccination.

      Strengths: 1) The major strength of the paper is the observation that VC greatly enhances plasma cell formation in the culture assay. 2) Because they have a two step differentiation process, the authors were able to narrow down the important point of VC action on the first step as IL-21 signaling did not change. 3) The authors focused the rest of the studies on the actions of the TET2/3 proteins, which connects the iron pathway as a cofactor for the TET enzymes and antioxidation nutrients such as VC. 4) The authors use a relatively novel chemistry to assess 5hmC levels. 5) The data appear to have been rigorously collected with an appropriate number of samples.

      We would like to thank you for the comments.

      Weaknesses: 1) The direct connection between IL-21 STAT3 signaling and the E58 region of the Prdm1 gene is not shown, but rather inferred from previous work in T cells. Because this is "the connection," they should attempt to show this by ChIP in their system. It should be possible as the experiments are in vitro and lots of cells can be generated with a high proportion differentiating cells in the culture.

      We appreciate the Reviewer’s suggestions, which allowed us to identify another VC-regulated element E27 at the Prdm1 locus in B cells.

      1) We were unable to completely reproduce the previous T cell STAT3 ChIP-seq by Kwon et al., 2009 (Dr. Warren Leonard’s lab) due to the discontinuation of the antibody (Santa Cruz). Therefore, we have tested four anti-STAT3 antibodies (Cell Signaling Technology rabbit monoclonal #4904, #12640, #9145; Millipore rabbit polyclonal #06-596), and found that only the rabbit mAb #9145 anti-phospho-Stat3 (Tyr705) from CST generated a specific signal for ChIP, which was quantified by qPCR using an IL-21-induced positive control Mcl1.

      Surprisingly, the ChIP-seq result showed that the STAT3 binding patterns differ between B cells and the previously published data from T cells (New Fig. 8D and S8-1B). The difference may likely be due to cell types, culture conditions, and the source of antibodies. Nonetheless, our STAT3 ChIP-seq data showed that VC could enhance STAT3 binding at the Prdm1 promoter and E27, a previously identified enhancer with a functional STAT3 motif at +27kb (New Fig. 8D)(Kwon et al., 2009). In order to prove that the DNA modification of this element was indeed regulated by VC, we analyzed the DNA methylation using bisulfite nanopore sequencing and showed that VC induced DNA demethylation at E27 (NEW Fig. 9A). Importantly, both VC-enhanced DNA methylation (NEW Fig. 9B) and STAT3 binding (NEW Fig. 9C) at E27 were abolished in the Tet2/3-deficient B cells. Our data strongly corroborate with the previously identified STAT3 binding site at E27 by Kwon et al., where they showed that the STAT3 motif is responsible for IL-21-induced Prdm1 expression. Our results further demonstrated that E27 is regulated by DNA methylation and is thus sensitive to the status of VC.

      2) From the 5hmC DNA dot blot, it is difficult to make the interpretation that there is an increase in activity of the TETs during the process as the VC samples look like naïve cells and there is a clear loss of 5hmC in the Mock treated samples that stays relatively the same during the differentiation process. A better description of the logic and new sites that go from 5mC to 5hmC is needed.

      We have now included additional explanation in the text and a new supplementary figure (NEW Fig. S7-1).

      Line 264 TET enzymes oxidize 5mC into 5hmC, a stable epigenetic medication and an intermediate for DNA demethylation (Fig. S7-1A)40,56,64. To confirm that VC indeed enhances TET activity, we used DNA dot blot to semi-quantitatively measure the level of 5hmC in B cells. Naïve B cells had the highest density of 5hmC compared with B cells from day four or day seven culture (Fig. 7A). The decreased 5hmC is consistent with the passive dilution of 5hmC after cell divisions, as the 5hmC modification pattern is not replicated on the newly synthesized DNA (Fig. S7-1B-C).

      Reviewer #3 (Public Review):

      This paper titled "Epigenetic remodeling by Vit C potentiates the differentiation of mouse and human plasma cells" is a very interesting paper with novel findings and mechanisms of Vit C in plasma cell differentiation. They elegantly show that Vit C enhances plasma cell differentiation and that the mechanisms involve TET activity and DNA demethylation. Their current model supports plasma cell differentiation of IgM, IgE and IgG1 in mouse which is a type 2 immune response model. If VIt C can be shown in a type 1 immune model, it would important. Then, these data may help to support Linus Pauling's claims that Vit C prevents and alleviates the common cold if the model can be applied broadly.

      We sincerely appreciate and are thankful for the Reviewer’s comments. We hope our study will provide insight into how VC may contribute to a proper immune response.

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript is of interest for neuroscientists studying neural circuit mapping in late larval, juvenile, and adult zebrafish. The work adapts and refines methods for retrograde viral tracing in zebrafish, using conditional and transneuronal DNA cargoes, to gauge the structure, connectivity, and function of neurons. Overall, the methods described in the paper, combined with a suite of viral constructs that are made available, represent a practical advance for virus-based neural circuit mapping in zebrafish, although a few aspects of experimental design and data interpretation require strengthening.

      This work provides methodological refinements and new constructs for retrograde neuronal tracing and functional testing of circuit elements in zebrafish. The authors of the manuscript put impressive efforts into developing methods that are compatible with currently available transgenic zebrafish lines. The authors developed the methods based on previously-described herpes simplex virus 1 (HSV1) and pseudotyped rabies virus (RV) with deleted G protein (RVΔG) as neuronal labeling tools. First, they explore and assess temperature's effect on viral infection efficiency. The results indicate that a temperature close to the viral host temperature is optimal. Second, they engineered HSV1 into the UAS system that either contained TVA or codon-optimized glycoprotein (zoSDG). In the lines that contained TVA, the authors delivered HSV1-UAS containing TVA to Gal4 zebrafish lines for specific cell type delivery. With Gal4/UAS, they expanded the tool to adapt the transgenic zebrafish system that is widely used. Because EnvA/TVA works as a system, they then inject EnvA- RVΔG to target neurons where TVA is prelocated for specific labeling. Because of the deleted glycoprotein in RV, the reproducibility of the virus was limited. Therefore, they showed another experiment that complemented the EnvA- RVΔG by co-injection of the HSV1 containing zoSDG (HSV1[UAS:zoSADG]) as a helper virus to assist RVΔG in the transneuronal spread. Using the resulting retrograde migration of RV, the authors visualized the firstorder upstream connections labeled by HSV1-TVA+ neurons. Appropriate for a methodological paper, the function of the viruses are well described and their properties are well documented. In some cases, however, supporting data are thin or anecdotal, and do not always sufficiently support the manuscript's claims and conclusions. Further data, more nuanced interpretations, and/or more circumspect discussion points are needed to address these concerns.

      Strengths:

      1) HSV1 contains double-stranded DNA that can incorporate into the genome without using a complicated process to increase replication efficiency.

      2) Specific gene targeting with the EnvA-TVA system increases accuracy during gene delivery. The expanded toolkit enhances the targeting strategy to include a diversity of useful constructs for the structural and functional assessment of neural circuits.

      3) By making their toolbox compatible with the Gal4/UAS system, the authors leverage a large collection of Gal4 lines already available to the zebrafish community.

      4) The toolbox for virus-based circuit mapping is relatively immature in the zebrafish model. The methods and reagents introduced here complement the current anterograde tracing using VSV. They also fill a gap in viral tracing for circuit mapping in adult zebrafish, as the immune system in juveniles and adults tended to reduce the viral spread efficiency using other approaches.

      Weaknesses:

      1. One of the major concerns of using this method is temperature increase. In zebrafish, temperature increase has been used as a heat stressor and is known to accelerate and facilitate development at larvae stage also cause lethality. Because of this accelerated development, the neurons labeled with HSV1 under heated conditions might not be the consequence of efficient virus infection, but rather a byproduct of faster migration and differentiation of neurons and other cells. Although the authors stated that adult zebrafish could tolerate higher temperatures (see item 5, below), this is not the normal condition for mapping circuits function, and the virus, as indicated in the manuscript, is also used in larvae. Further justification will be required to convince the audience that the use of high temperatures is generally adaptable, including for mapping circuits involved in other circuits. This is especially a concern for the HPA, because of the challenges in distinguishing the stress is from HSV1-induced oxidative stress from heat-induced neural stress.

      The reviewer raises the possibility that increased expression after injection of HSV1 at higher temperatures may reflect increased proliferation (accelerated development) rather than increased infection efficiency. This scenario implies that a substantial fraction of labeled neurons were infected as progenitors, which then divided and differentiated into neurons. This possibility, although formally possible, appears extremely unlikely for the following reasons.

      1. The difference in the number of labeled neurons is very large. If this difference were due to a difference in the speed of development, there should be an enormous difference in (brain) size between fish kept at different temperatures. If present, this size difference should be easily observable, at least in larvae. However, we did not observe an obvious size difference.

      2. Viral infection was studied primarily in adult fish where neurogenesis still occurs, but at low rates compared to development. Nevertheless, the difference in labeled neurons was very large.

      3. Many labeled neurons showed elaborated morphologies and long-range projections. It appears very unlikely that such neurons and their projections can arise by differentiation from precursors within the given incubation time.

      4. The quantification in Fig. 1C was performed specifically for neurons with long-range projections in adult fish. The virus was injected into the OB while the neuronal somata were located in the dorsal telencephalon. If these neurons arose from precursors at the injection site, it would have to be postulated that these precursors migrated to the dorsal telencephalon, differentiated into neurons, and developed projections back to the OB. It is extremely unlikely that this can occur within the time of incubation. Moreover, there is no biological evidence for the migration of neuronal precursors or differentiating neurons from the OB to the dorsal telencephalon.

      To further confirm that a speed-up of development cannot account for the observed difference in labeling we performed another variation of the experiment shown in Fig. 1: adult fish injected with HSV1[LTCMV:DsRed] into the OB were first kept at 36 deg for 3 days and then kept at 26 deg for 3 days before analysis (“36→26”). In these fish, DsRed expression in dorsal telencephalic neurons was indistinguishable from fish that were kept at 36 deg for the full period of 6 days. Fish that underwent the opposite temperature shift (“26→36”), in contrast, did not express DsRed in dorsal telencephalic neurons (Fig. 1C), despite the fact that they spent the same amount of time at each temperature.

      Hence, the time at increased temperature per se cannot account for the difference in expression, indicating that temperature affects the process of infection. The new results have now been integrated into Fig. 1.

      We cannot rule out that the temperature change affects stress levels and the HPA axis. However, as discussed in more detail below, swimming behavior was almost unchanged and obvious signs of stress were not observed. Moreover, please note that the temperature change can be restricted to the time around the virus injection, while any effects on behavior or neural activity will typically be examined several days later. Hence, effects of transgene expression will usually be evaluated at the standard laboratory temperature, long after the temperature change and the injection procedure.

      2) HSV1 infects various cell types, not limited to neurons. The authors in the manuscript mentioned the high infection rate of cells. They did not categorize whether all infected cells were neurons or mixed neurons and glia. The authors briefly mention glia in the RNA sequencing data, but knowing the cell types and location is critical for circuit mapping. In Figure S2A-D, it seems that some of the cells around the midline could be radial glia. Cell migration from the midline is abundant, with radial-glia at the early stage guiding neurons from the ventricular zone to the mantle regions. How do authors ensure that the increased infection at higher temperatures does not include glia with the elevated immune response?

      We do not claim that HSV1 infects only neurons. Indeed, HSV1 probably also infects glia, and the cells labeled in Fig. S2 are likely to include radial glia. However, this is not necessarily a disadvantage as additional specificity can be created by methods such as the Gal4 system. In fact, enhancing cell type specificity was a main motivation to combine HSV1 with the Gal4 system. A broad selectivity of the virus itself may then actually be considered an advantage because it allows for targeting of a broad spectrum of possible cell types. For example, HSV1 in combination with a transgenic line expressing Gal4 in glia (e.g., Tg[gfap:Gal4]) may be used to specifically interrogate glia cells if desired. We now discuss this issue of cell type specificity more specifically in the revised manuscript (ln 103-107; ln 339ff).

      3)One limitation with HSV1 is that it resides inside neurons for an unpredictable length of time before expression, which increases the latency for induction of TVA. This extended latency could reduce sample size or lead to missed temporal windows. This caveat should be discussed.

      We agree that the delay between HSV1 injection and transgene (TVA) expression may, in principle, decrease the efficiency of Rabies infection and retrograde tracing. We therefore performed a set of experiments in which we injected the Rabies virus 2 or 4 days after the HSV1. However, we observed lower, rather than higher, rates of Rabies infection, possibly because the sites of the two injections were not precisely identical. Hence, the advantage of staggered injections, if any, appears to be offset by variability in the location of injections, at least in our hands. Moreover, previous applications in rodents also reported high efficiency of Rabies infection when the Rabies virus was applied at the same time as the TVA expression construct (Vélez-Fort et al. 2014; Wertz et al. 2015). We now show results in Figure 4 – figure supplement 1 and discuss these issues briefly in the revised manuscript (ln 271-274; ln 677ff).

      4). In the manuscript, to achieve transneuronal labeling, the fish were exposed to three viruses across two injections. The approach also includes exposure to chronicle heat, selection of TVA+ neurons from the first round of injection, and long periods of incubation between steps in the protocol. This is both labor-intense and potentially challenging for the animals' health and survival. Because the rates of lethality and poor health are not quantified for times after the first injection, and because the efficiency of the labelling approach (assessed at the animal level) are not reported, it is difficult to judge whether the approach is efficient enough for experimental work, where a large n of animals will be necessary for multiple treatments. This is particularly the case for phenotyping where mutant lines may be predisposed to adverse effects from heat or other manipulations and interventions. The manuscript would ideally show the number of fish that 1) were injected, 2) were infected with the virus, 3) survived until the timepoint for data collection, and 4) yielded publishable data. The possible limitations for studying mutants, especially those susceptible to heat and infection, should be discussed.

      We agree that more information on the success rate and survival rates is desired. Previously, we had not explicitly reported survival rates because these were very high, and we apologize for not mentioning this explicitly. In the revised manuscript, we have now addressed this issue more specifically.

      Please note that all fish used in experiments are represented by individual data points in the figures (except for a very low number of fish that did not survive the injection); no fish were excluded from the analysis. This is now pointed out explicitly in Methods (“Statistical analysis”). Hence, the data in the figures show directly how many fish were infected with the virus (point 2 above; 100% of injected fish) and how many neurons were labled in each fish. In all fish, images were acquired and the number of labeled neurons was quantified, implying that all fish yielded “publishable data” (point 4 above).

      The survival rate (points 1 and 3 above) was very close to 100% in adult fish, and very few fish were lost during the injection. This has now been quantified systematically for all experimental conditions. We directly compared the survival of fish that were not injected, injected with buffer, and injected with virus, either at standard laboratory temperature (typically 26 deg for adults, 28.5 deg for larvae) or at elevated temperature (36 or 35 deg, respectively). The results are shown in Figure 1 – figure supplement 1.

      In adult fish, survival rates were 100% under all conditions after single injections of HSV1 viruses. In larvae, some mortality was observed under control conditions that was slightly enhanced at elevated temperatures. We speculate that this is an indirect effect because larvae were kept in petri dishes in stagnant medium and water quality degrades more rapidly at higher temperatures. In any case, survival rates one week after injection were still relatively high (~50%). Moreover, for the first 3 days, survival rates were >90%. This appears particularly relevant because two or three days of exposure to high temperature are sufficient to achieve efficient expression. Survival rates were still 80 – 90% after two injections of HSV1 or after injections of rabies virus. Hence, the temperature shift should be compatible with a broad spectrum of practical applications. No effect of the HSV1 itself was detected on survival rates.

      5) The current videos do not provide a rigorous demonstration that animals routinely tolerate elevated temperatures or infection (S Movies 1-3). Rates of survival for these cohorts and quantification of their swim behavior (such as distance travelled) with statistics would be more convincing. This criticism applies even more strongly to the single video of a sick fish (S Movie 4), which the authors use to support a claim of a targeted circuit manipulation using TeTx.

      We have now quantified swimming behavior using two approaches. First, we compared the mean swimming speed between the six experimental groups used to determine effects of temperature and HSV1 on survival rates. Swimming was quantified at 27 deg after keeping fish at either 27 deg or at 36 deg for seven days. No significant difference in swimming behavior was observed (Figure 1 – figure supplement 2).

      In addition, we quantified swimming behavior of fish at room temperature (25 – 26 deg) or 36 deg. Fish were kept in groups of five and individual fish were tracked using a machine learning-based tracking software (DeepLabCut). This allowed us to quantify different behavioral components. We found that mean swimming speed was higher at 36 deg and fish stayed slightly higher in the water column. However, social distance and the visual appearance of swimming were not obviously different. Swimming speed was normal again when fish were returned to normal temperature after seven days at 36 deg. These data are now shown in Figure 1 – figure supplement 2A,B.

      6) FACS sorting and transcriptomics is a very complex and not wholly informative approach for judging stress at the cellular and organismal level. First, stress level is best assessed with high temporal resolution and best measured through blood or whole body (for larvae) cortisol measurements. Second, it is best to judge stress circuits in zebrafish in the diencephalon-mesencephalon, for the HPA. Cellular stress could best be measured with IHC for oxidative stress in infected cells and for apoptotic cells in the wake of infections. Taking measurements from OB neurons, with RNA sequencing that followed the elimination of dead cells during tissue disassociation and cell sorting, could have missed elements of the stress process. The sequencing result from only live cells in the OB may not provide the most reliable evidence.

      We believe that there is a misunderstanding here. We did not analyze stress at the organismal level or activation of the HPA axis. In fact, we compared cells collected from the same individuals, which rules out any differences in organismal stress levels between samples. Organismal stress is not a topic of this study; addressing this is clearly beyond the scope of this study.

      The transcriptomics experiments were specifically designed to examine cellular stress caused by Rabies infection. We agree that the transcriptomics approach has limitations but we feel that the data nevertheless contain valuable information. Together with other findings (morphology, calcium imaging), they support the conclusion that infection by the Rabies virus (in the absence of G) does not cause excessive cellular toxicity on the timescales of our experiments, consistent with results from other species. We agree that it is possible that the Rabies virus has more subtle effects on cellular stress levels (or immune responses) but a detailed analysis of such effects is beyond the scope of the present study. This is now discussed explicity (Results: ln 224-228; Discussion: ln 372-375). It is also possible that toxicity would occur on longer timescales. This may be expected based on findings in rodents but still leaves a broad time window for anatomical and functional experiments. This is now discussed explicity (ln 378-381).

      7) The down-regulation in stress markers needs further discussion. Under chronic stress of heat exposure, exacerbation of HPA axis function could reduce glucocorticoids.

      Please note that control and infected cells were from the same animals. The temperature regime can therefore not explain the differences in gene expression. Please also note that animals were not exposed to elevated temperature for days prior to cell collection.

      Please also note that the down-regulation of genes was broad, affecting not only stress-related genes. Indeed, stress-related genes were not downregulated more frequently than other genes. Gene groups that were down-regulated most frequently are associated with immune responses. We therefore conclude that the downregulation of genes does not specifically reflect a stress response, and we speculate that it may reflect a general immune-related response. However, this is very hypothetical, and further studies are needed to understand the processes behind the observed pattern of gene regulation.

      This is now stated clearly in the revised text (ln 224-228; ln 372-375).

      8) Although it cannot be addressed for larvae, it is critical to report the sex ratio for your adults, since hormones affect stress and circuits formation.

      Adult fish of both sexes in approximately a 50:50 ratio were used to ensure that there is no sexdependent bias in the data. However, the exact sex ratio in each experiment has not been recorded. This is now stated explicitly in Methods.

      Reviewer #3 (Public Review):

      Satou et al. report a viral toolbox by:

      1) Inventing a novel way through temperature-dependence of HSV1-mediated gene expression for adult and larval zebrafish;

      2) Employing Gal4/UAS system to achieve cell types specific expression in this model;

      3) Combining the modified rabies viruses and HSV1 for transneuronal tracing of neural circuits in zebrafish that is kept in a higher temperature environment.

      This toolbox in the manuscript will be of great interest to the neuroscience field when they are using zebrafish as a model.

      The strength is these novel methods will offer more experimental opportunities and will facilitate more exciting basic scientific discoveries. However, some concerns still exist as below:

      1) What's the mechanism of temperature-dependence expressions with these HSV1 and rabies virus in this study? At least the authors should discuss it. Have the authors done experiments like this: after getting enough gene expression from these viruses when maintaining these fishes in 35-37 degree, bring them back to normal temperature as they usually live to see what happen? Does this higher temperature help the fish brain cells get infected with more viral particles or just help increase the expression level? Or does just the higher temperature help produce more proteins?

      The question raised by the reviewer is indeed interesting. We agree that it would be useful to know whether host-like temperature enhances the entry of the virus into the host cell (infection), viral replication/protein synthesis, or both. In the original manuscript, we reported results from a first experiment to address this question. In this experiment, we injected HSV1 at 26 deg and then increased temperature to 36 deg 3 days after infection (“2636”). This protocol yielded low expression. We have now also performed the reverse temperature shift (“3626”), as suggested by the reviewer. This protocol yielded high expression, comparable to the expression observed when fish were kept at 36 deg throughout (see Fig. 1). Together, these results suggest that temperature affected primarily the infection. However, additional, more advanced analyses are required to resolve to what extent temperature affects viral infection and viral replication/protein synthesis. This is now discussed explicitly (ln 332ff).

      2) The authors should address or discuss more whether the higher temperature affects these fishes' brain activity? The reason is if someone will use this method for a most important experiment like GCaMP7s calcium imaging, in order to get good expression with these viruses that authors described in the manuscript they should raise the temperature but they have no idea about whether these higher temperatures affect the behavior or brain activity in some special brain regions they are interested in.

      Please note that, in most applications, the temperature is increased only transiently around the time of injection for two or three days. Thereafter, fish can be transferred back to normal laboratory temperature without compromising transgene expression. Any follow-up experiments, e.g. analyses of behavior or neuronal activity, can therefore be performed at standard laboratory temperature, after fish were kept at this temperature for a few days. The temperature change should therefore have only minor, indirect effects on the results of behavioral or physiological experiments. We apologize if this was not evident and discuss this now explicitly (ln 334ff).

      In addition, we have analyzed the swimming behavior of zebrafish in more detail at elevated temperatures (36 deg for adults; 35 deg for larvae) and observed only minor differences in swimming behavior. These results are now reported in Figure 1 – figure supplement 2A. Moreover, we compared swimming behavior between control fish (kept at standard laboratory temperature) and fish that underwent a transient temperature change to 36 deg for 7 days before the test. No significant difference in swimming behavior was observed between groups (Figure 1 – figure supplement 2B). We therefore conclude that no obvious effects of temperature are observed at least at the behavioral level.

    1. Author Response

      Reviewer #2: Public review:

      This manuscript reports results from a sensitivity analysis done to assess jointly the contribution of various factors to the spread of P. falciparum malaria parasites that are resistant to antimalarial drugs. It also explores how probable parasite genotypes are to establish as a function of their consequent rate of spread.

      This manuscript's main contribution is its joint consideration of several factors not considered jointly before. The authors achieve their goal of doing a large joint analysis using computer simulations generated under a model framework that includes a model of malaria transmission and a model called an emulator. The malaria model has new features capturing different drug mechanisms and the capacity to track different degrees and types of drug resistance. It is very sophisticated but computationally expensive. The emulator emulates the input to output relationship of the sophisticated malaria model, thereby enabling the authors to do the large joint analysis, which would be computationally prohibitive using the malaria model alone. This is a practical solution to a computationally expensive problem. It could be applied to other computationally expensive models in epidemiology, if not already done so.

      The results are impactful because they reinforce the need for continued surveillance of resistance to so-called partner drugs and they reinforce our understanding of drug properties that best withstand resistance. Three drug profiles were investigated: two monotherapies and a combination therapy that combines the two drugs used as monotherapies. The properties of the drugs mimic the properties of the drugs used in artemisinin-based combination therapies (ACTs). (The drug that is like artemisinin and its derivatives has a short half-life, high maximum kill rate and parasites resistant to can endure longer drug exposure times. The partner-like drug has a short half-life, low maximum killing rate and parasites resistant to it can endure higher drug concentrations.) ACTs are recommended for the treatment of malaria in almost all endemic counties. They include a fast-acting artemisinin derivative and a more slowly acting partner drug to kill residual parasites.

      Supported by their simulated data, the authors conclude that partner drug resistance likely promotes the establishment and spread of artemisinin resistance. They then go on to say that their results support the belief that partner drug resistance precedes the evolution of artemisinin resistance. This belief is consistent with the spread of artemisinin resistance in the Greater Mekong Subregion but not in Africa. It cannot be tested directly in this study because the malaria model does not capture the sequential evolution of resistance, but the arguments the authors use to extrapolate from their results are logical.

      Almost all the results are intuitive, and support previously published epidemiological and laboratory studies. Among the factors that can be acted upon, drug properties play an important role. Longer half-lives of the artemisinin-like drug hinder the spread of artemisinin-like resistance. Longer half-lives of the partner-like drug promote the spread of partner-like drug resistance but protect the artemisinin-like drug. Despite this protective effect, the authors conclude that "reducing the half-life of the partner drug in an ACT regimen could reduce the spread of resistance". Stated thus, this may seem counterintuitive. However, it is logical: longer half-lives of the partner drug likely compromise the artemisinin derivative in the long run by first promoting the emergence, establishment and spread of resistance to the partner drug. Nonetheless, it cannot be tested directly in this study because the malaria model does not capture the sequential nature of the evolution of resistance.

      Although this study makes an important contribution, it has some weaknesses. Firstly, it does not capture the sequential evolution of resistance. Secondly, it is important to note that monotherapies are non-longer recommended for malaria treatment. Looking forward, the authors discuss briefly how their findings might extrapolate to triple combination therapies (TACTs), arguing that the two long-acting drugs of TACTs should ideally have matching half-lives. Although it seems reasonable to make this point based on extrapolation, a TACT-like drug profile merits full investigation. What would happen, for example, if the two long-acting drugs exert inverse drug pressure, selecting complementary mutations? Of course, it is not possible to consider all factors that might impact the spread of antimalarial drug resistance. Some potentially important factors not discussed presently in this manuscript include sub-quality drugs and additional factors that impact coverage, such as absorption (nutritional status). Recombination, an obligate stage of the malaria parasite lifecycle which does not feature in the malaria model, is mentioned briefly. Under a modified malaria model, recombination could affect some of the results at higher entomological inoculation rates (EIRs) because higher EIRs leads to more effective recombination. For example, resistance to the partner-like drug might not spread preferentially in high EIR settings when access to treatment is high. This is because the phenotypes of parasites resistant to partner drugs are typically encoded for by more than one mutation, so can thus be disrupted by recombination. Recombination could also affect the spread of artemisinin resistance. Although artemisinin resistance is typically encoded for by a single mutation, compensatory mutations elsewhere in the genome may play a role in mitigating the fitness cost. If so, recombination might restore the resistance cost in high EIR settings with low access to treatment. On the contrary, recombination could unite multiple mutations that encode drug resistance. In short, recombination could have a complicated and hard-to-intuit effect. It thus merits further investigation using a model.

      We thank the reviewer for his/her supportive public review and for sharing his/her expert view on factors not captured by our models. We do identify a typo for non-specialist readers: the “partner-like drug” has a long half-life (we call it a “typo” because subsequent comments show the Reviewer is well aware of this fact).

      (1) Sequential resistance and recombination

      We recognise that OpenMalaria does not incorporate recombination is a limitation of our study. We have, in the past, seriously investigated whether to re-code our model to incorporate recombination, but for now, it would require fundamental, far-reaching changes to the mosquito model, parasite transmission model, and code. Additionally, to realistically represent recombination would result in significant increases in both memory use and computational time. We have prioritised using existing functionality rather than committing to resource-intensive code revisions, noting potential impacts of recombination on establishment and spread of resistance are addressed below and in our discussion.

      Practically, the lack of recombination means that we can investigate the spread of resistance of one mutation at a time. For example, we could not simultaneously simulate the spread of a mutation that confers resistance to drug A and the spread of a mutation that confers resistance to drug B in a drug-sensitive parasite population. However, we could assume that resistance arises first to drug B and gets fixed before the emergence of resistance to drug A. Similarly, we assumed that the resistant genotypes had a fixed degree of resistance (i.e., a fixed number of mutations) and could not acquire a new mutation that could confer higher degrees of resistance across the simulation. However, we assessed how the selection pressure and impact of factors vary for different degrees of resistance (from low to high degrees of resistance). Thus, our model can capture the effect of a changing pattern of selection that occurred with the increasing degrees of resistance due to sequential evolution. However, we acknowledge that we did not dynamically model the sequential evolution of resistance. We added this remark to the discussion (L547-550).

      In our paper we highlighted that the consequence of not modelling recombination is that we overestimated the evolution of drug resistance in settings with a high rate of infection when the resistant phenotype involves multiple mutations. The reviewer rightly highlighted that this could also influence the evolution of resistance to artemisinin (despite being caused by one mutation) due to compensatory mutation. We added this point to the revised version of our paper. We have also added that the effect of recombination depends on the frequency of each mutation needed to confer resistance. When these mutations are present at a low frequency (such as during the establishment phase), recombination will have a stronger effect as resistant parasites are more likely to recombine with sensitive parasites. Thus, the resistant phenotype is more likely to be lost. However, the impact of recombination decreases when the frequency of resistant mutations increases because resistant parasites are more likely to recombine with a resistant parasite. Thus, the main consequence of not simulating recombination events was that in high transmission settings, our model overestimated the probability of establishment of resistant parasites that need multiple mutations to be drug-resistant or that require additional mutations to restore the fitness cost. This means that the difference between the probability of establishment in high and low transmission settings is probably larger than reported here (see figure 4). In addition, we overestimated the spread of these resistant parasites when the mutations were present in low frequencies. However, these assumptions likely did not impact the probability of establishment and rate of spread of parasites that only need one mutation to confer resistance or do not have a mutation that reduces the fitness cost associated with resistance.

      (2) Modelling Triple drugs (TACTs)

      As pointed out by the reviewer, monotherapies are no longer recommended for malaria. Here we assessed the impact of factors on the evolution of parasite-resistance to the short-acting drug and the long-acting drug used separately in (non-recommended) monotherapy to identify determinants specific to each drug profile. This allowed us to identify some determinants that would not have presented themselves if we had only examined drug combination therapy. Once we identified which factors drive resistance for each drug profile, we looked at the combination of these two drug profiles and observed how dynamics changed. As suggested by the reviewer, the next step would be to look at the evolution of resistance under triple combination therapy.

      Our study showed that resistance to the partner drug (long-acting, previously referred to as drug B) depends on the length of the selection window. This result supports the evidence that triple artemisinin combination therapies (TACTs) can delay the spread of resistance to partner drugs as it can minimise the selection pressure that occurs during the selection window if the two long-acting drugs have the same half-lives. While we agree future work could focus on selective pressures from different drug profiles in TACT, this would require a very large study to look at different profiles of three drugs etc. and is outside our scope of already a very large study. Even so, we believe no additional analysis is necessarily needed to highlight the points on TACT in the paper as they logically follow from our results.

      However, we agree that other factors are likely to play a role in the evolution of resistance under TACTs, such as the inverse selection pressure generated by some drugs, as highlighted by the reviewer. Understanding the impact of factors on the evolution of resistance to TACTs is an important question and should be further investigated. However, this question is outside the scope of our study, as it would require running many more analyses and considering additional factors (such as the inverse selection pressure generated by some drugs, the synergic effect between drugs, or the fact that some mutation can conferee some degree of resistance to multiple drug, etc.), and could be considered in future work.

      (3) General comments

      As underlined by the reviewer, we did not directly assess the impact of sub-quality drugs and absorption (i.e. from poor nutritional status) on the rate of spread. However, one could extrapolate the impact of sub-quality drugs and absorption by recognising that both lead to a lower Cmax. In our study, we assessed the impact of low Cmax on the rate of spread, and thus one can extrapolate the effect of sub-quality drugs and absorption based on findings from our study. Factors such as poor nutrition could also affect other drug factors such as half-life, which we did not investigate, so we regard this as an important operational point made by the Reviewers which could be addressed in future studies.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript provides the first experimental evidence that some members of the newly discovered heliorhodopsins can function as proton channels. The authors provide evidence of this transport function as well as a characterization of the photocycle. The authors also demonstrate that these heliorhodopsin proton channels can be utilized as optogenetic tools. These findings should be of interest to a wide audience interested in membrane biophysics as well as in the development of tools for neuroscience.

      The authors present a very thorough characterization of several biophysical aspects of the transport properties and the photocycle of V2HeR3, as well as a phylogenetic analysis. Furthermore, the authors demonstrate that the V2HeR3 protein can be used as an optogenetic tool, albeit with limited capabilities.

      Though the experiments are carried out carefully and the results, in general, support the conclusions, some procedures and interpretation of results need to be expanded and/or clarified for a more general readership as well as for specialized readers.

      The manuscript will likely impact our understanding of the biophysics of bacteriorhodopsins in general and these new heliorhodopsins in particular, as well as serve as a platform to engineer these proton transporters for future use as tools in biotechnology and neuroscience.

      We thank all the reviewers for careful reading of our manuscript and for providing valuable opinions. As we described, this is the first demonstration that the some of heliorhodopsins (HeRs) exhibit light-activated ion transport. By combining electrophysiology and spectroscopic experiments of wt and mutants V2HeR3, we describe here molecular mechanism of ion transport. However, as all three reviewers pointed out, we did not pay much attention to the substantial peak photocurrent (I0) in our electrophysiological recordings. Thus, we performed additional experiment to characterize the I0 in more detail, which are shown in Figure S6 and S7. Besides this, we have corrected and modified the entire manuscript. I hope the revised version has improved the readability and would suit for publication in eLife.

      Reviewer #3 (Public Review):

      The manuscript from Hososhima et al. entitled "Proton-transporting heliorhodopsins from marine giant viruses" reports for the first time proton-translocation activity for heliorhodopsins. Heliorhodopsin (HeR) is a newly discovered family of opsin proteins that are distinct from either type-1 or type-2 rhodopsins and are found in Archaea, Bacteria and Eukarya as well as giant viruses (Pushkareve et al. 2018). A unique feature of HeR is their inverted topology compared to the microbial and type-2 opsins. Despite the availability of detailed structural information on members of the HeR family (Kovalev et al. 2020, Lue et al. 2020 and Shihoya et al. 2019), their function and mode of action remain unknown. In this manuscript, the authors use the heterologous expression of synthesized HeR genes from giant viruses (V1HeR1-2,V2HeR1-3) to investigate the ion transporting properties of these viral HeRs (VHeR). Authors demonstrate that one of the viral HeR genes (V2HeR3) exhibits a unique photon-induced current that translocates protons across the membrane. Interestingly all other tested viral HeR do not show any proton-translocating activity (similarly to previously tested HeR such as Ehux-HeR, TaHeR or HeR 48C12) potentially pointing to enzymatic/signalling function of these members. Furthermore, the authors characterized the basic electrophysiological parameters of these photocurrent components in terms of their light sensitivity, kinetics, ion selectivity and more. A mutational study identifies key residues that are likely controlling the direction of ion transport. Protein purification and UV-VIS spectroscopy further reveal a prototypical slow photocycle that is similar to other HeR with maximum absorption of around 500 nm. The authors identify the M-state as a putative conducting state.

      Overall, the work demonstrates nicely the mode of operation for a member of the HeR family that will pave the way to understanding the biological role and evolution of these rhodopsins. Also, the absence of any ion-translocating activity for the other HeR genes potentially underlines the diverse functions that lie within this new opsins family. The authors hand-wavingly discuss the functional role of a proton-transport activity for V2HeR3 as either depolarizing the host cell and thereby facilitating entry into the cell, or preventing superinfection.

      The authors carefully chose their wording in the title as " ... proton-transporting", but then focused very much on channel activity on V2HeR3. Yet, the contribution of the passive conductance (I1 and I2) is rather small compared to the pump current I0. Could the author add some information on the initial pumping current in terms of kinetic (on- and off-kinetic for I0 are also important parameters to evaluate the potential application for HeR), ion selectivity, or spectral properties? Authors should also show wavelength dependence for all components (I0 - I2). Does it follow the spectroscopic absorption? I was a bit puzzled by the light intensity curve for the I0 component - why is the pump current not saturating at such high light powers (off-kinetic/photocycle does not look so fast to account for that!).

      We thank for valuable comments on Io component which actually exhibited the largest amplitude in the photocurrent recording. We reanalyzed data and performed additional experiments to characterize the I0 more carefully concerning ion selectivity and kinetics (Fig. S6 and Fig. S7). As shown in Fig. S5, I-V plots of I0 under various ionic conditions suggest that I0 amplitude is dependent on intracellular pH. Thus, we proposed that H+ is permeant ion of I0. We also demonstrated that no other cation/anion is transported. As for kinetics, we showed in Fig. S7 that the peak time of I0 is about 0.9 ms regardless of membrane voltage, and the off-kinetics analysis revealed involvement of two time constants. We added explanation of above in the text (lines 145-158).

      As the reviewer suggested, it is important to assess the wavelength dependency of photocurrent components. However, our equipment (a light source) for the action spectrum measure does not have sufficient light power to obtain sufficient photocurrent of V2HeR3 (0.5~1 mW/mm2 is needed as judged from Fig. S2).

      As the reviewer pointed out, the I0 component does not saturate even at 25 mW/mm2, while I1 and I2 are saturated at 0.5~1.0 mW/mm2 already (Fig. S2). This result indicates that the light sensitivity of the each current component differs. The molecular basis of these observation is still unknown. But we previously reported a similar property in a Na+ pump rhodopsin, KR2 (Please refer Fig. S1 in Hosohima et al. PlosOne 2021 Sep 10;16(9):e0256728,).

      Authors should reconsider their terminology of the photocurrent components. I find peak photocurrent for I1 misleading, especially since I0 is called transient photocurrent. Maybe authors should stick to I0, I1, and I2? Also, is I1 an independent component? Ion selectivity looks very similar to I2, and maybe the observed overshoot at positive potentials (figure 1 C red arrow I1) is an effect of a slightly higher [H+] in the vicinity of HeR after the pumping current?

      Thank you for your important comment. We analyzed the I0 component in the revised manuscript. Although I0 presumably involves ion-transport, it is difficult to distinguish ionic current from an intramolecular charge displacement, as mentioned in the revised manuscript. We thus do no change the terminology (I0, I1, I2). Even though the ion selectivity of I1 and I2 are similar, we tentatively consider I1 is independent from I2.

      In figure 1D authors should check if changes in Erev or photocurrent for NaCl_e and KCl_e are significantly different when compared to NMG at pH_e 7.4.

      As the authors claim that there is an extracellular binding site for Cl- based on their results in figure S4. The larger photocurrent for Na2SO4 is a bit puzzling. So there is also a binding site for SO42- or did the authors not correct for double the amount of sodium? Table S7 is potentially useful in this respect but would need to be filled out completely.

      We have rechecked the Erev and the current amplitude for NMG+, Na+ and K+ in Fig. 1D. Although no significant difference in Erev in Na+ and K+, there is a statistically difference in current amplitude in K+. This indicates that H+ transport is somehow enhanced in the presence of K+ in the extracellular side. We explained this in the text (lines 115-117).

      We anticipated the Cl- binding site, because the current is reduced in the presence of the Cl- (also Br- and NO3-), i.e. Cl- binding (also Br- and NO3- binding) somehow modulates the H+ transport. However, the current amplitude in the presence of SO42- is in a similar level compared to the Asp-, indicating no effect of SO42-. Thus we conclude the Cl- binding site, but not SO42-.

      We thank the reviewer to point out the Table S7, which was totally empty in the submitted version. We filled is in the revised version.

      I believe that the use of HeR as an optogenetic tool is limited; the authors should not try to build such an artificial link to such an application. I believe their finding is of high value independent of optogenetic use. Yet, if the authors believe that it is hard to foresee how the community will embrace HeR, I suggest a more vigorous analysis. First, the expression in ND7/23 looks very cytoplasmatic (Fig 1B). Could the authors provide images from cortical neurons they use for the measurements in figure 1E (the image quality of all figures is very poor in the manuscript - it needs to be improved)?

      We thank for the opinion on the scientific significance of our study and on the optogenetics application. We agree that the use of V2HeR for optical neuronal stimulation is limited because of the small ion conductance and the long photocycle. Temporal resolution is limited only up to 1 Hz, whereas ChR2 and its variants enable much higher frequency (40 Hz and even higher).

      But there could be a room for V2HeR3 application for a specific use. Oppermann et al. reported anion channelrhodopsin (MerMAIDs), which exhibit a rapid and strong desensitization in its anion conductance (Ref.1 below). Such feature could be a disadvantage when continuous optical silencing is needed. But they demonstrated that MerMAIDS allows a transient suppression of individual action potentials without affecting subsequent spiking. Thus, V2HeR3 could be applicable for some specific purposes.

      Taken the reviewer’s words, we further analyzed the results. First, expression and membrane localization in ND7/23 cells was visualized by anti cMyc-AB staining in Fig. 1B. The eGFP image in Fig. 1B is observed in cytoplasmic side, because eGFP domain is truncated from the V2HeR3 domain with P2A signal peptide located between two domains (We have added text to explain this, lines 90-97).

      We are not able to provide image of cortical neurons unfortunately.

      Ref. 1. MerMAIDs: a family of metagenomically discovered marine anion-conducting and intensely desensitizing channelrhodopsins. Oppermann J, Fischer P, Silapetere A, Liepe B, Rodriguez-Rozada S, Flores-Uribe J, Peter E, Keidel A, Vierock J, Kaufmann J, Broser M, Luck M, Bartl F, Hildebrandt P, Wiegert JS, Béjà O, Hegemann P, Wietek J. Nat Commun. 2019 Jul 25;10(1):3315.

      Secondly, I do not agree that the experimental design the authors chose to test neuronal fitness after overexpression of HeR is appropriate. The electrical induction of APs (300pA, pulse width?) is not a good read-out for neuronal excitability levels (or alteration of those). Therefore the authors should measure rheobase (current steps or ramps). Additionally, parameters such as Ri, Cm, Vm, or Rm should be used to evaluate the fitness of cells.

      We thank for the suggestions for neuronal experiments. We injected 300 pA current for 10 ms. We followed two articles listed below for experimental condition of current injection. The authors in Ref.1 electrically stimulated neurons by current injection from 300 to 500 pA (Fig. 5b, c, d). In Ref.2, the authors also injected from 300 to 500 pA (Fig. 5c. e. f). Thus, we believe the 300 pA would be relevant. However we further analyzed our neuronal data and characterized properties of action potentials. We revised the Figures and text (Fig. 1E-H and Fig. S8, lines 165-76).

      Ref. 1. MerMAIDs: a family of metagenomically discovered marine anion-conducting and intensely desensitizing channelrhodopsins. Oppermann J, Fischer P, Silapetere A, Liepe B, Rodriguez-Rozada S, Flores-Uribe J, Peter E, Keidel A, Vierock J, Kaufmann J, Broser M, Luck M, Bartl F, Hildebrandt P, Wiegert JS, Béjà O, Hegemann P, Wietek J. Nat Commun. 2019 Jul 25;10(1):3315.

      Ref. 2. Grimm C, Silapetere A, Vogt A, Bernal Sierra YA, Hegemann P. “Electrical properties, substrate specificity and optogenetic potential of the engineered light-driven sodium pump eKR2.” Sci Rep. 2018, 8(1):9316

    1. Author Response

      Reviewer #1 (Public Review):

      The authors examined the relationships between humans' heartbeats and their ability to perceive objects using touch.

      Strengths: This study is a large and sophisticated one, with great attention to detail and systematic analysis of the resulting data. The hypotheses are clear and the study was carried out well. The presentation of the data visually is very informative. With such a large and high-quality set of data, the conclusions that we can draw should be clear and strong.

      Weaknesses: The main drawbacks for me were first, exactly how the data were analysed, and second that there seem to be too many results reported to get an overall view of what the study has found.

      First, there are always a number of choices that researchers can make when analysing their data. Too many choices in fact. So we always need to see a consistent, principled, and transparent account of how those choices were made and what the effects on the data were. At present, I think this needs to be improved, partly in the justification of the analyses that were done; partly by re-doing some analyses and the presentation of results.

      Second, I admit to being a little lost when trying to understand all of the analyses - why there were done, what choices were made, and what the findings were. In some cases, it felt a little bit like the analyses were decided on only quite late - after exploring the data. One clear way to address this would be to divide the main results into two kinds: confirmatory (those that the authors expected to do before the study was run), and exploratory (those that the authors decided to do only after seeing the data). This would be both good practice and would help to focus the reader on what are the most critical findings.

      Achievements: I think the presentation of results needs to be strengthened before I can decide whether the aims are achieved.

      Impact: This will also depend on the revision of the results.

      We thank the Reviewer for these comments. In the original manuscript we thought we have been clear as to those analyses that were planned and those that were exploratory. The planned analyses are in keeping with the previous studies in the literature on which this study was based (Al et al. 2020; Al et al. 2021; Grund et al. 2021). The only exploratory analysis was the inclusion of touch variance as a co-variate. We had not expected that participants would differ so much in how long they held their touch.

      Reviewer #2 (Public Review):

      In this article, the authors set out to discover whether the cardiac cycle influences active tactile discrimination, to better understand the putative relationship between interoception rhythms and exteroceptive perception. While numerous articles have looked at these relationships in the passive domain, here the authors designed an innovative active sensing task to better understand the interaction of sensorimotor processes with the cardiac rhythm.

      The authors report a series of consecutive analyses. In the first, they find that while active discriminative touch is not modulated by the cardiac cycle, non-discriminative touch is such that the start, median duration, and end time of touches are shifted forward along the cardiac cycle towards diastole. Next, the authors examined the proportion of total start and end touches within systole versus diastole and found that across both discrimination and control conditions, touch was roughly 10-25% more likely to terminate during diastole. Further, examining the median holding time, the authors found that touches initiated during systole were lengthened in duration, consistent with a perceptual inhibition by this phase. This last effect appeared to be greatest for the highest stimulus difficulty levels, further supporting the notion that some cardiac inhibition of sensory processing may be at stake. Finally, when examining physiological responses, the authors found that cardiac inter-beat intervals were lengthened during active touch, consistent with the hypothesis that the brain may exploit strategic cardiac deceleration to minimize inhibitory effects.

      Overall, the key effects of the manuscript are fascinating and robust. A major strength of the approach here is the task itself, which utilizes a well-controlled stimulus with multiple levels of task difficulty, as well as an elegant positive control condition. This enabled the authors to look rigorously at difficulty and stimulus condition interactions with the cardiac phase. This clearly pays off in the analyses, as the authors are able to construct a more informative story about how precisely cardiac timing events modulate perception.

      Statistically speaking, I found the overall approach to be rigorous and sound. The study is well powered for a psychophysical investigation of this nature, and the interpretation of results is based on robust effects in the presence of a strong positive control.

      We thank the reviewer for these positive comments on the original version of this paper.

      Reviewer #3 (Public Review):

      The manuscript presents a carefully designed and well-controlled study on active tactile perception and its relationship to internal bodily rhythms - the cardiac cycle. This work builds on previous studies which also showed that active perception/voluntary actions occur in certain phases of the cardiac cycle, but the previous research failed to show/was not designed to show the significance of these synchronizations for perception or behaviour. To my knowledge, this is the first report that seems to experimentally show that active perception in the cardiac diastole leads to behavioural advantages - better tactile discrimination.

      The manuscript itself is very clearly written, the introduction is concise but sufficient, while the results section is very well organised and I especially like how the authors guide the reader through the analysis and additional steps taken to understand the findings even better.

      Yet, despite careful study design, effective visualisations, and elegantly constructed story, there are some analytical choices that, in my opinion, are not sufficiently justified or explained (e.g., selecting a diastolic window equal in length to the duration of systole, instead of using the whole duration of diastole). Such analytical decisions could have (at least some) effects on the obtained results and thus conclusions drawn.

      We thank the Reviewer for these comments. The analyses referred to here were planned and specifically the choice of the windows for defining systole and diastole were identical to the studies in the literature on which this study was based (Al et al. 2020; Al et al. 2021).

    1. Author Response

      Reviewer #1 (Public Review):

      Weakness:

      I do not believe that the data support the statement given in lines 396-400. The authors state that RecX interaction with the apo form of the RecA-ssDNA filament inhibits the transition to the ATP-bound state, which I believe is supported by their data that transition into the ATP-bound state is delayed following incubation with RecX. However, they go on to say that this is in line with previous reports which show that RecX blocks ATP hydrolysis by RecA. I think rather that their data suggests that RecX binds to inactive RecA and slows down binding of ATP by RecA. This would be in line with their hypothesis that RecX binding between monomers inhibits the cooperativity between the RecA monomers and slows down the apo-ATP transition (lines 440-442).

      It is our understanding that the Reviewer does not agree that our data is in line with the inhibition of RecA ATPase activity by RecX.

      RecA is a DNA-dependent ATPase and performs ATP hydrolysis when it is bound to DNA. The inhibition of RecA ATPase by RecX was extensively reported before [see ref 39 for example]. According to the available reports, the underlying mechanism was that RecX stimulates net disassembly of RecA-DNA complexes and thus reduce the overall rate of ATP hydrolysis. In this way the inhibition of RecA ATPase is not direct, but it is mediated by RecX, which, as it was thought, stimulates the disassembly of the RecA-DNA filaments.

      Our data indicate that RecX not only stimulates the disassembly of RecA-DNA filaments but also reduces the fraction of active ATP-bound states within RecA-ssDNA filament. Since ATP hydrolysis event requires active ATP-bound RecA unit, the decrease in the amount of active states within the RecA-ssDNA filament should lead to the reduction of ATP turnover and, as consequence, the lower overall rate of ATP hydrolysis would be observed.

      We agree that our data do not support the statement that RecX directly blocks ATP hydrolysis by RecA. We believe that RecX acts rather indirectly by reducing the fraction of active ATP-bound states within the RecA-ssDNA filament. To rule out misunderstanding and to adequately address the comment of the Reviewer the following changes in the text have been introduced:

      Original version: “This is in line with the reported inhibition of the RecA ATPase activity by RecX [39] and directly shows that RecX can effectively block ATP hydrolysis by preventing the corresponding conformational transitions within the RecA-ssDNA filament without actual displacement of RecA monomers from ssDNA as proposed in [25].”

      Revised version: “This is in line with the reported inhibition of the RecA ATPase activity by RecX measured in bulk [39] since depletion of the ATP-bound conformation within the filament should effectively reduce the overall rate of ATP hydrolysis. Interestingly, the possibility of RecX to inhibit RecA ATPase without actual displacement of RecA monomers from ssDNA was proposed previously in [25].”

      We consider the suggestion of the Reviewer, “I think rather that their data suggests that RecX binds to inactive RecA and slows down binding of ATP by RecA.” as one possible mechanism of how RecX retards the apo-ATP transition of the RecA-ssDNA filament. However, our results do not allow elucidating whether RecX slows down binding of ATP by RecA filament or RecX binding directly prevents the conformational change without affecting ATP binding by RecA. Interestingly, a rather elegant mechanism in which RecX prevents conformational changes of RecA filament directly (without affecting ATP binding) was proposed previously based on the results of electron microscopy studies [ref 25 in the manuscript]. Corresponding discussion concerning this mechanism was added in the Discussion:

      “It is noteworthy that previous electron microscopy studies provide a possible explanation of how RecX binding hampers the apo-ATP transition of the RecA filament. It was shown that the conformational change of the filament is accompanied by a large movement of RecA’s C-terminal domain, which is supposed to be allosterically coupled to the ATPase site [7]. According to low resolution electron microscopy studies, RecX binds from the C-terminal domain of one RecA subunit to the core domain of another [25]. Thus it was proposed that RecX inhibits RecA ATPase activity by preventing conformational transition through clamping RecA’s C-terminal domain. Although the proposed mechanism is in line with the results of the current study, we believe that additional research is required to elucidate the mechanistic basis of the RecX effect on the conformational transitions of the RecA-ssDNA filament.”

      Reviewer #2 (Public Review):

      In the last paragraph of the introduction, the authors describe their previous work, in which 3 mechanically distinct states of RecA-ssDNA filaments are identified. Yet in this paper the authors only refer to two mechanically distinct states: active (ATP-bound) and ap (ATP hydrolyzed). Is there a role for a third state in the model to describe these experiments? This should be addressed.

      Thank you for this comment. Indeed, we reported previously that apo and ADP states are distinct by mechanical properties and stability. However binding of RecX to apo and ADP RecA-ssDNA filaments resulted in the similar slowdown of the following transition to the active state. Thus the interaction of RecX with ADP RecA-ssDNA filaments was only briefly addressed in the original version of the manuscript (Lines 287-289 in the original version). In order to address this point more extensively, we provided figure S4 in the Supplementary and made following corrections in the manuscript:

      Original version: “Similar results were obtained for the interaction of the RecX with the ADP-bound state of RecA-ssDNA filament (data not shown).”

      Revised version: “We also examined the interaction of RecX with ADP state of the RecA-ssDNA filament. Recently, it was shown that ADP and apo conformations represent two distinct inactive states of RecA-ssDNA filament [18]. Incubation of ADP-bound form of the RecA-ssDNA filament with RecX also resulted in the slowdown of the following decompression (Figure S4) similarly to the RecX interaction with apo RecA-ssDNA filament. Interestingly, ADP-bound RecA-ssDNA filaments exhibited greater stability when supplemented with RecX”

      Original version: “Unexpectedly, we discovered that RecX interacts with the compressed apo form of the RecA-ssDNA filament and inhibits its transition into the ATP-bound state.”

      Revised version: “Unexpectedly, we discovered that RecX interacts with the ADP and apo forms of the RecA-ssDNA filament and inhibits their transition into the ATP-bound state.”

      The presented model shows RecX binding specifically to inactive (ATP hydrolyzed) RecA proteins, reasoning that even in the presence of free ATP, patches of inactive RecA will be available for RecX binding. Thus, the model should be sensitive to the fraction of RecA units in the inactive state at equilibrium, which is not altered systematically in the described experiments. This inactive fraction would be determined by the balance of the rate of ATP hydrolysis and ATP binding. The latter could be altered by adjusting the concentration of free ATP in buffer before the introduction of RecX, with the model predicting shortening should be faster at lower ATP concentrations (RecX binding enhanced). Alternatively, the use of ATP analog ATP-gamma-S, which resists hydrolysis and stabilizes RecA filaments, should inhibit RecX binding and compaction according to the model. At least one of these experiments would help to validate the proposed model.

      We carried out the experiments with ATP-gamma-S, suggested by Reviewers. Figure 4 was updated, and the text covering these results was added to the manuscript:

      “We also assesed RecXmNG binding to the active form of RecA-ssDNA using non-hydrolyzable ATP analog, ATPγS. The RecA-ssDNA filament was formed in the presence of 0.5 mM ATPγS, followed by incubation in the channel containing 1 μM RecXmNG and 0.5 mM ATPγS for 30 seconds and then was visualized in the channel containing 0.5 mM ATPγS and no proteins. As a result, average intensity of the tether was close to the background level (Figure 4D) indicating that RecXmNG did not remain bound to the active RecA-ssDNA filament. Thus we suppose that RecX interaction with the active form of the RecA-ssDNA filament is much weaker compared to the binding of RecX to the apo state. Interestingly, in the presence of ATPγS RecX did not induce any shortening of the RecA-ssDNA filaments (Figure S5) indicating the essential role of ATP hydrolysis in the RecX induced destabilization of RecA-ssDNA filaments.”

    1. Author Response

      Reviewer #1 (Public Review):

      This study provides relatively convincing in vivo phenotype data in mice related to vertical sleeve gastrectomy (VSG) and provides some potential mechanistic insight. This study can potentially provide some therapeutic intervention strategies on combining VSG and immunotherapy in treating breast cancer. On the other hand, this paper also has some weaknesses especially related to the detailed molecular mechanism and characterization as described below:

      1. The major weakness lies on the detailed characterization on which inflammatory response factors that may mediate the phenotype of HFS VSG mice when compared to WM Sham mice. The data presented currently is mainly limited to RNA-Seq data, which lacks detailed characterization.

      Thank you for your comments which we have addressed and have strengthened the manuscript. To address your concern, we have quantified IL-6 in plasma which is significantly elevated in HFD- VSG vs WM-Sham. This data is now included as new Figure 3E. IL-6 signaling increases PD-L1 stability (Chan, Li et al. 2019, Li, Zhang et al. 2020). We show in new figure 3F that IL-6 treatment in vitro increased PD-L1 protein in breast cancer cells, as measured by flow cytometry mean fluorescence intensity. We also included new GSEA identification of the hallmark IL6 pathway, which we present as new figure 3G. Therefore, IL-6 is a potential inflammatory response factor that may mediate the phenotype of HFS VSG mice when compared to WM Sham mice. We have included this novel plasma and in vitro data in the abstract, methods, results, and added citations and discussion about IL-6 and PD-L1 stability in the discussion.

      1. The other significant weakness also is related to the descriptive nature on characterizing the effect of immune features in Fig.4 for these mice. What is the potential mechanism on regulating T cell signaling or Cytolysis in HFS VSG mice vs WM sham mice? This at least needs some preliminary exploration and characterization.

      We appreciate your insight. To examine the potential mechanisms known to impact T cell signaling and activation (such as elevated cytolysis markers including granzymes), we examined immune cells that impair T cell activation by additional flow cytometric analysis. We now demonstrate that in HFD-VSG tumors, there is a unique VSG-specific elevation of PD-L1+ monocytic myeloid derived suppressor cells (M-MDSC) and PD-L1+ macrophages relative to all other diet and surgical groups. Compared to HFD-VSG tumors, M-MDSC displayed a significant 2.9-fold reduction in tumor content compared to the WM-Sham group. Similarly, compared to HFD-VSG tumors, PD-L1+ macrophages displayed a significant 1.76-fold reduction in tumor content compared to the WM-Sham group. This is important because PD-L1+ is a marker of immunosuppressive capacity in M-MDSCs and macrophages which would impair T cell activation. This novel data is now included as new Figure 4G-H. We have included details in the methods, results, and added citations and discussion about PD-L1 positivity on M-MDSCs and macrophages in the discussion.

    1. Author Response

      Reviewer #2 (Public Review):

      This is an interesting and well-performed study that adds to the literature base. The authors investigated the role of a discrete brain pathway in binge drinking of alcohol. They adopted a multidisciplinary approach that overall suggested that alcohol-induced changes at synapses of anterior insula (AI) cortex inputs to the dorsolateral striatum (DLS) maintain binge drinking. Further, they suggest this may be a biomarker for the development of alcohol use disorder (AUD).

      Strengths:

      1. Extends previous studies and builds further evidence for AI→DLS involvement in aberrant alcohol intake.

      2. Adopts elegant approaches to isolate the defined connections. This included in vivo optogenetic stimulations (both open and closed loop), recording of defined synapses in slice preparations, applying in vivo optogenetic stimulation parameters to isolated brain slices

      3. Well-controlled for the most part, although at times the authors assert "specific" effects without unequivocal proof. For example, the insula also projects to the ventral striatum and this pathway has been implicated in regulation of alcohol intake in rodent models (Jaramillo et al., 2018), and is activated in heavy drinking humans during high threat related alcohol cue presentation (Grodin et al., 2018).

      4. Measures the microstructure of drinking behavior in subjects.

      5. Employed an artificial neural network and machine learning to interrogate data. After training the network it could predict both the fluid consumed (water vs alcohol) and the virus type based on drinking microstructure data.

      6. Applied a series of behavioral tests to confirm that stimulating the defined pathway was not in and of itself reinforcing, anxiogenic or altered locomotion.

      Weaknesses:

      1. Only used male mice, in humans binge drinking in females is a major problem and rates of AUD between males and females have been converging in recent times (Grant et al., 2015).

      We took age-matched female mice that were injected with AAV-ChR2 into AIC and had them undergo the same 3 weeks of Drinking in the Dark to replicate the male data displayed in Figure 1 with an experimental focus on AIC inputs. We then performed whole cell patch clamp electrophysiology in DLS brain slices from these female mice. We measured optically evoked input-output responses (oEPSCs), AMPA/NMDA current ratios (oNMDA/oAMPA), and paired pulse ratios (oPPR). These data are presented in supplemental figure 4. In contrast to males, we did not observe any effect of alcohol consumption on AIC inputs into the DLS of female mice compared to males. We also combined both male and female datasets to statistically determine if we had sex differences for these specific measures by the existence of a main effect and/or a sex x fluid interaction. We report these statistics in text from lines 180 to 195, where we note that we did not have a sex x fluid effect for oEPSCs but did note that we had a sex x fluid effect for our oNMDA/oAMPA synaptic plasticity measure. This finding further justifies the behavioral data and circuit manipulations being conducted in solely male mice.

      While this is a fascinating sex difference and important data for the field, this manuscript is not specifically about exploring sex differences per se. We believe we have done our due diligence and correctly reported the existence of sex differences, or the possibility of sex differences, but the electrophysiological findings that we later modulate in vivo are only present in males. We point out that future work is needed to determine the contribution of circuit-specific changes in females at these synapses. Ultimately it will take much more work to fully elucidate sex difference circuit-specific mechanisms that we feel are far beyond the scope of this manuscript.

      1. At times over-interpreted, especially with regards to specificity.

      We are not exactly sure what the reviewer is referring to with “regards to specificity,” but we have done our best to address what we think they are asking and hope that we have adequately addressed this critique. We added sentences (lines 173-178) regarding alcohol-induced plasticity at other inputs to DLS that were not tested and (lines 442 - 446) how we are not sure whether these synapses control consumption of other non-alcohol substances (but point out our prior sucrose drinking data from Muñoz et al., Nat. Comm. 2018).

      1. Lacks a mechanism, although the authors do acknowledge this.

      This is just a first step towards discovering a mechanism. We previously identified an unusually alcohol-sensitive synapse and are now elucidating its behavioral role and some associated plasticity at that synapse that may be part of a mechanism. With our new single session alcohol data to compare our 3 week drinking data to, we are closer to beginning the process of discovering a mechanism. Additional work that is beyond the scope of this manuscript is needed.

      1. I would like some more discussion about the potential for this to be a biomarker in humans.

      We have removed language in the body of the manuscript and expanded on the implications of our findings at the end of our results and discussion from lines 514 to 548.

      Reviewer #3 (Public Review):

      Haggerty et al. assess how the projection from the agranular insular cortex to the dorsolateral striatum contributes to binge drinking in mice. The authors use whole-cell patch-clamp electrophysiology to examine synaptic adaptations following binge drinking (Drinking-in-the-Dark) in male mice, finding a constellation of changes that include increased AMPA and NMDA receptor function at insula synapses onto striatal projection neurons. They go on to assess a causal role for this projection in regulating binge drinking using optogenetics, finding that stimulating insula->striatal transmission in vivo reduces total ethanol consumed during DID, along with several specific behavioral measurements of drinking microstructure. One of the most interesting of these findings is a decrease in "front-loading", or drinking during the very beginning of the session, a phenotype that has been associated with problematic drinking and alcohol use disorder in humans. Finally, the authors use machine learning to build a predictive model that can reliably discern stimulated mice from controls. These studies improve our understanding of the neurocircuitry that mediates binge drinking and synaptic and circuit adaptations that occur following binge drinking. Experiments are blinded and performed in a rigorous manner, including physiological validation experiments in support of the in vivo optogenetic manipulation. Despite many strengths, there are significant limitations and gaps in the electrophysiology studies included in this version of the manuscript. As acknowledged by the authors, there are curious findings that are seemingly at odds with each other, and further studies addressing cell type specificity and/or feedforward inhibition would significantly improve the interpretation of this work. Furthermore, the manuscript would be significantly improved by an expanded Introduction containing more specific background information along with a standalone Discussion to place these findings within the broader literature. Lastly, a major limitation of these studies is the low number of mice used for the in vivo optogenetic control experiments and the exclusion of female mice throughout.

      Major concerns:

      1) Expanded Introduction and Discussion. The Introduction does not discuss and/or downplays historical literature investigating neuroadaptations following binge drinking. Studies examining changes in glutamate receptor function within striatal circuits should be discussed in greater detail, rather than the broad pass and review citation included. Behavioral studies examining how the function of the insula and DLS regulate ethanol exposure should also be discussed, especially including work examining the insula to accumbens pathway. It would also be worthwhile to reference human studies implicating the insula and DLS in AUDs.

      We have expanded the introduction and discussion to include these topics.

      2) It is difficult to form a comprehensive picture of the electrophysiological changes reported in Figure 1. The data seems to indicate increased AMPAR function, even more increased NMDAR function, decreased glutamate release probability, and decreased population spikes. These conflicting findings are acknowledged and there are two possible factors mentioned in the manuscript - differential engagement of MSN populations and changes in feedforward inhibition through local interneurons. I disagree with the authors' dismissal of potential MSN subtype-specific effects contributing to these discrepancies. Although AIC inputs innervate D1 and D2 MSNs comparably under control conditions, it is quite possible that the pathways are differentially altered following DID, as has been observed in many reports of alcohol or drug exposure (e.g. Cheng et al. Biological Psychiatry 2017). On the other hand, I wholeheartedly agree with the authors that AIC-driven feedforward inhibition through local interneurons (or even MSNs) could explain the curious divergence between the synaptic and population-level changes depicted in Figure 1. I think additional experiments addressing to help connect the dots are critical in interpreting the changes described in this manuscript. The authors could consider targeted recordings from specific cell types (e.g. D1, D2, and/or interneurons), measurements of AMPA/NMDA receptor subunit stoichiometry, and/or additional experiments in conditions where feedforward transmission is blocked (e.g. PTX or TTX/4AP).

      The reviewer has excellent points that will help elucidate a mechanism. Many of these suggestions are planned experiments in our laboratory, but are, in our opinion, beyond the scope of the present manuscript. Please see our response to Reviewer #2’s 3rd stated weakness. We have revised the text to incorporate some of the points raised here.

      3) N=2 mice in the ICSS experiment in Figure 4J is not sufficient to interpret, and including error bars on this data set is misleading. There also appears to be a difference in distance traveled between GFP and ChR2 mice in Figure 4C, but statistics are not reported. It is also hard to understand what that might mean given the way these data are normalized.

      For this revised manuscript we reran this experiment with 6 animals per group and updated Figure 4 I and J and the accompanying methods section titled “Intracranial self-stimulation” to reflect the change. We also note that the new, correctly powered experiment confirmed the previous claim that AIC inputs to the DLS do not modulate operant responding behaviors.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Kamle and colleagues report that inhibition of host constitutively-expressed chitinase 3-like-1 (CHI3L1) increased epithelial expression of ACE2 and SPP, resulting in epithelial cell viral uptake of pseudoviruses that express the alpha, beta, gamma, delta or omicron S proteins. They further show that antagonism of CHI3L1 using anti-CHI3L1 or kasugamycin inhibits epithelial cell infection by the pseudoviruses with ancestral, alpha, beta, gamma S protein mutations. The in vitro data has relevance to SARS-CoV-2 pathogenesis and potentially has therapeutic implications in that the anti-CHI3L1 antibody and/or kasugamycin might be a treatment for this pandemic virus. These in vitro data are novel and the results are clear and convincing.

      We are pleased that the reviewer found these studies to be clear and convincing. We are also pleased that the reviewer recognizes the therapeutic potential of anti-CHI3L1 and kasugamycin in COVID 19.

      The most important challenge with this manuscript is whether these in vitro findings translate into inhibiting SARS-CoV-2 variants in vivo. Are the effects of anti-CHI3L1 or kasugamycin great enough to change the course of the disease? Given the limitations of the mouse model of human ACE2 expression, determining how effective this strategy is in disease pathogenesis is difficult to discern. Without in vivo results, the importance of the data in this manuscript is unknown and this is a significant limitation that should be certainly noted in the discussion and possibly the abstract.

      We agree that in vivo studies could add significantly to our understandings of the therapeutic potential of anti-CHI3L1 and kasugamycin. We can not undertake these investigations at the present time due to our lack of access to a BSL-3 lab facility. As requested, we have added a paragraph to the discussion that addresses this limitation. This new paragraph can be seen on pages 13-14. We have also modified that final paragraph in the discussion to highlight the importance of in vivo investigations and the limitations of the K18-hACE2 mouse model of SC2 infection.

      Reviewer #2 (Public Review):

      The paper by Khamle et al shows that CHI3L1 augments SARS-COV2 pseudovirus uptake in cells and that blocking CHI3L1 partially reduces uptake but the effect is not as efficient as some mAbs or soluble ACE2. A major limitation of the work is all of the data are based solely on experiments with pseudovirus. To be impactful, work would need to be performed with live virus assays as well as in vivo with either K18 mice or hamster models.

      We agree that in vivo experimentation has the potential to strengthen the therapeutic potential of these studies. The reasons why these can not be undertaken at the present time are noted above and are addressed in the new paragraph that has been added to the discussion on pages 13-14. In addition, the limitations of the K18-hACE2 mouse model are also commented on in the new paragraph and addressed above.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors have used every possible combination and permutation of treatments at different stages of diapause and post diapause development in the mouse and used conditional gene knockouts at different stages to tease out the interactions of Foxa2 with Msx1 and LIF in the reactivation and implantation process in mice. The authors extend diapause further after treatments with progesterone and an estrogen-degrading chemical to show that this will prolong diapause in the presence of Msx1. Overall this study advances our knowledge of the cross-talk between uterine endometrium and the blastocyst during and after the remarkable phenomenon that is diapause.

      Strengths

      Demonstrating that Msx1 is critical to maintaining diapause, and that diapause is maintained in Foxa2 deficient mice have clarified their interactions. It is interesting that LIF triggers implantation on day 8 but cannot support the pregnancy to full term. Suppression of the estrogen effects by progesterone or fulvestrant increases the duration of diapause. Demonstrating that Foxa2 induces diapause via interactions with MSX1 shows Foxa2 plays such an important role in the control of diapause and adds another 'cog' to the complex wheel of its control.

      Weaknesses

      There is an assumption that everyone will understand the various manipulations that are done in this study - some effort needs to be made to clarify each experimental stage. How long are the embryos viable after the extension of the diapause by the various manipulations.

      The very positive review by a well-known expert in the field of diapause is reassuring, and we agree with her suggestions to improve the quality of the manuscript. As recommended, we now provide a scheme to summarize our findings to illustrate the length of embryo dormancy (see Fig. 7).

      Reviewer #3 (Public Review):

      Matsuo et al. have authored a manuscript describing the effects of depletion of the forkhead box gene, Foxa2, on embryogenesis and gestation in the mouse. The effects of this treatment are the induction of the diapause arrest in the development of the embryo and consequent dormancy. The manuscript is wellprepared, and the figures, for the most part, are didactic and interpretable. Although the conclusions are interesting, the principal weaknesses of the manuscript are the lack of novelty and the perceived absence of some controls and follow-up experiments.

      Controls and Follow-ups:

      1) The Cre/lox system depletes rather than deletes genes. Although in situ data are presented, these are not judged to be quantitative. The usual qPCR analysis of tissues could have established the quantity of depletion. Stupid but can be done. This is important because the frequency of implantation sites in both Cre/lox models (lines 111-113) may be attributable to the residual expression of Foxa2.

      The Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ mouse models used in the current study have been used in the previous studies (refs 7 and 8 in the manuscript). The deletion efficiency of Foxa2 in Foxa2f/fPgrCre/+ mice was examined by RT-PCR and IHC (figure 2 in ref 7); while the deletion efficiency in Foxa2f/fLtfCre/+ mice was examined by IHC (figure S1 in ref 8). The deletion efficiency has been proven by hundreds of publications since the generations of Pgr-cre in 2005 and Ltf-cre mice in 2014.

      Although these mouse lines have been used before, we confirmed the deletion of Foxa2 at the beginning of our study at protein levels (fig 1c) and RNA levels (fig 1d). We understand that the reviewer is trying to link the observation that some of the knockout animals still carried implantation sites on day 8 of pregnancy with the possibility that the deletion of Foxa2 is not complete. However, it is not uncommon to observe such phenotypes that are not fully penetrant even in systemic knockout mouse models. Nonetheless, we now provide real time PCR results of uterine Foxa2 on day 4 of pregnancy in all mouse models used in the current manuscript in the new supplemental figure 1.

      2) The most novel and salient finding of the present study is that the depletion of Foxa2 results in embryos that are in a state that "morphologically resembled dormant blastocysts". A useful experiment would have been to transplant these embryos to normal recipients or to culture them in vitro to determine whether they were capable of reactivation from the dormant state.

      Whether dormant embryos in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri can be reactivated is the main question we studied. The results in figures 4-6 address this question. The blastocysts in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri can be activated on day 4 as shown in figure 4b. Without any support, blastocysts in Foxa2f/fLtfCre/+ uteri still can be reactivated on day 8 (figure 4b). In the following experiments and results shown in figures 5 and 6, we tried to improve the uterine environment by supplementing progesterone and estrogen. Dormant embryos are successfully re-activated by a LIF injection and the pregnancies proceeded to full terms.

      This reviewer suggests using normal recipients to test the reactivation of dormant embryos. Given dormant embryos can be reactivated in a knockout uterine environment, embryo transfer experiments using normal recipients are an addition measure to test the integrity of embryonic dormancy. The embryo transfer experiments may be futile attempt in our studies because of the following reasons.

      The numbers of mated mutant females that yield blastocysts are relatively meager and so are the numbers of blastocysts recovery, especially from diapausing donors. It is well known that implantation rates after blastocyst transfer are compromised due the surgical trauma and anesthesia. Therefore, the results from these experiments may not provide meaningful information.

      Furthermore, during the pandemic our mouse colonies were drastically reduced, and we are still recovering from this downturn during this “New Normal”. Notably, pregnancy rate fluctuates throughout the year even if mice are housed in a controlled environment, and pregnancy rate is often relatively poor in mutant mice which of course depend on the genetic background and diets (DOI: 10.1126/scisignal.aam9011). Most importantly, viability of diapausing embryos is amply evident from our experiments (Figs. 4-6)

      3) Figure 3C indicates that embryos recovered on Day 8 had an extensive proliferation of ICM cells, but not trophoblast. Previous studies have explored the progression of entry and exit from diapause in the mouse (DOI: 10.1093/biolre/ioz017) showing that reactivation of the embryo from diapause commences in the ICM and then proceeds to the trophoblast. It therefore may be possible that proliferation in the trophoblast is not suspended, rather than the recovered blastocyst has resumed development and that mitotic activity has not yet reached the trophoblast.

      It is common to see KI67 expression in the ICM of dormant embryos. Figure 4D from the paper quoted by this reviewer presents Ki67 staining on embryos undergoing diapause at different stages. In our study, we showed Ki67 staining on dormant embryos collected on day 8, which equals D7.5 in their figure. Our data in figure 3C is consistent with observation shown. Without LIF, embryos remain dormant in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri.

      4) In Figure 4B, neither the Ltf nor the Pgr Cre treated uteri appear normal on Day 8. This is not consistent with the conclusion in lines 170 et seq. of the manuscript. It is difficult to discern normality from Figure 4C, but it is clear that the PgrCre-lox uterus does not conform to the controls. It is later noted that there is edema in the uteri at this time in the Day 8-treated PgrCre/lox mice (lines 217-218).

      We have clarified our description.

      Lines 173-176: Notably, implantation sites with a normal appearance were observed in Foxa2f/fLtfCre/+ uteri when LIF was given on day 8 of pregnancy (Figure 4b), albeit Foxa2f/fPgrCre/+ uteri with edema have only faint blue bands. Histology of implantation sites confirmed this observation.

      In line 217, we stated that “the uterine edema in Foxa2f/fPgrCre/+ females two days after LIF injection on day 8…”. Figure 4B showed that Foxa2f/fPgrCre/+ uteri with edema have some very faint blue bands suggesting implantation-like reaction. But we do not think they are real implantation, which is confirmed by figures 4c and e.

      5) In Figure 6B, the implantation sites appear substantially smaller in mice of both mutant genotypes. Supplemental Figure 4 suggests that this is not the case. It is unclear whether the samples chosen for figures are representative of the uteri and whether variation in the size of implantation sites was observed.

      In figure 6B, the Foxa2f/f uteri samples were collected on day 10 of pregnancy, which is same as when Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ tissues were collected. Since embryos implanted in Foxa2f/f uteri on day 4 night but in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri on day 8 after LIF injections, the implantation sites are bigger in Foxa2f/f uteri. However, in supplemental figure 4 the implantation sites were collected from Foxa2f/f females on day 6 of pregnancy, which show similar size as compared to implantation sites collected from Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ females 2 days after LIF injection.

    1. Author Response

      Reviewer #1 (Public Review):

      There are various ways in which homothallism (self-fertility) has arisen in the fungal kingdom from supposed heterothallic (obligate outbreeding) ancestors. Understanding the genetic basis of homothallism is important from both a fundamental basis, as it provides intriguing evolutionary insights, and also from a practical viewpoint as it impacts on variation and sporulation of a species - of particular importance for pathogenic and species of economic importance. In the present study, the authors describe an investigation of the genetic basis of homothallism in Cryptococcus depauperatus, a fungus closely related to Cryptococcus species causing serious human lung disease. The authors use a combination of genome analysis and experimental gene expression and manipulation work to show that C. depauperatus has a novel form of homothallism never reported before from fungi. This involves loss of the homeodomain genes which normally control mating in basidiomycete fungi, and instead signalling by a cognate pheromone and pheromone receptor and pathway seems sufficient to achieve self-fertility and induction of the sexual cycle. This is a very interesting and significant finding, adding to knowledge in the fungal kingdom and beyond as to the evolution of sexual breeding systems in nature. Overall my conclusion is that the authors' claims and conclusions are justified by their data and the work presented has a large number of strengths, although there are some minor weaknesses and need to qualify one assertion as follows.

      Strengths

      (1) The work has been conducted to a very high and thorough standard and is very well written and illustrated throughout. The authors base their findings on a combination of genome analysis and experimental gene expression and manipulation work, together with additional work (e.g. microscopy and CHEF gel studies) where required. Results arising are then all subject to suitable statistical analysis.

      (2) Regarding the genomics part of the work particular credit is given for aspects such as the very high standard of bioinformatic analysis (e.g. use of both nanopore and Illumina sequencing methodologies and care taken in contig assembly) and presentation of data in figures; the thorough phylogenetic analysis involving over 4,000 protein-encoding genes in a concatenated study to show species relationships; careful checking of a range of mating genes to show mixed evolutionary origins of the de novo mating locus and other regions of the genome (e.g. via analysis of MYO2, STE12, STE11, STE20 origins from an a- or alpha-ancestor).

      (3) Regarding the experimental part of the work particular credit is given for aspects such as the de novo development of a gene transformation and selection system in C. depauperatus (based on Agrobacterium-mediated transformation); the use of a heterologous C. neoformans system to confirm the bioactivity of the putative MAT-2 alpha-type pheromone from C. depauperatus; very clever use of recessive drug resistance markers to select putative recombinant progeny and then use UV-induced markers to show recombination in intra-strain pairings; and analysis of expression of putative 'sex' genes during the sporulation cycle.

      (4) Novelty of the findings. Previous examples from the fungal kingdom have shown the evolution of homothallism by mechanisms such as the incorporation of complementary mating-type (MAT) genes into the same genome, mating-type switching, and unisexual mating. This is the very first study to describe a situation where the homeodomain genes that normally control sexual development in basidiomycete fungi have been lost, and sexual development is instead achieved by activation of a cognate pheromone and pheromone receptor system. To add further to the novelty is the fact that only one complementary pheromone precursor (a MAT-2 alpha-type) and pheromone receptor (a STE3 a-type) pair of genes were found, whereas normally in basidiomycete fungi and beyond a set of two complementary pheromone precursor and pheromone receptor genes are normally found in the same genome (i.e. an additional MAT-1 a-type pheromone precursor and STE2 alpha-type receptor gene).

      We thank the reviewer for highlighting the loss of the homeodomain genes as one of the main findings of our work. Indeed, these genes are invariably present and define a second compatibility checkpoint in basidiomycetes. However, we would like to note that whereas a set of two complementary pheromone precursor and pheromone receptor genes are normally found in the same genome in ascomycete fungi controlling cell-cell communication and fusion, in basidiomycetes, only one non-complementary pair of pheromone and receptors is typically found.

      (5) The work contains an appropriate balance of reporting and yet also some speculation, such as the model at the end suggesting possible evolutionary routes.

      (6) The work is very well referenced throughout. The work also has a very extensive set of supporting data included as supplementary files to support the assertions made.

      We would like to thank the reviewer for taking the time and effort necessary to review the manuscript. We are grateful for the reviewer’s view that this manuscript is well-written, exciting, and of high relevance for both colleagues studying fungi as well as those studying mating systems in eukaryotes, which is a highly active field of research attracting broad interest. We sincerely appreciate all the thoughtful comments and suggestions, which helped us to improve the quality of the manuscript. In our response, we submitted both track-changed and finalized edited copies of the manuscript. Line numbers refer to the revised untracked manuscript file.

      Weaknesses

      (1) The work only involves analysis of two isolates of C. depauperatus, whereas analysis of a wider range of isolates might have revealed additional insights. But to be fair to the authors, the species C. depauperatus has only been reported very rarely and the two isolates examined appear to be the only publicly available accessible isolates. The authors also concede themselves that additional isolates would ideally be examined in the future to see if the proposed models stand.

      We agree with the reviewer’s assessment that this work would benefit from the analysis of additional C. depauperatus isolates. However, as indicated in the manuscript and acknowledged by the reviewer, there are only two isolates presently available for study. Importantly, in a recent survey for mycoparasites of the coffee leaf rust (Guterres et al. 2021) two isolates were found that are apparently closely related to C. depauperatus based on morphological and molecular evidence (analysis of the 28S rDNA sequences). Unfortunately, attempts by the authors at isolating these strains in pure culture after prolonged storage (over 6 months), have repeatedly failed. We concur that increased sampling efforts will be required in the future to better understand the diversity, evolution, and biology of C. depauperatus.

      (2) The authors provide evidence for meiotic recombination based on a very low number of markers - just three UV-induced markers and two drug resistance markers. And recombination is only shown conclusively in a very limited number of progeny (as shown in Figure 8C). Based on this very limited dataset they then produce some centimorgan mapping data and compare this rate to kb/cM data from other Cryptococcus species. However, this is at best preliminary, pilot data and should be cautioned as such, and ideally, many more markers would be used. Though to be fair to the authors they only had one intra-strain 'cross' to work with so were very limited in the markers available, and even within this limited dataset, there was good evidence for some meiotic recombination.

      We concur with the reviewer that recombination frequency determined from only two UV-induced markers along the same chromosome should be seen as preliminary. Indeed, we recognize this limitation in our study by indicating that the genetic distances are likely underestimated as multiple cross-over events between distant marks would skew these estimates (lines 623-627). Nevertheless, we consider that the data presented provides ample evidence that sexual reproduction in C. depauperatus involves a meiotic cycle with genetic exchange through recombination. It is currently unclear why the recovered viable strains did not accumulate additional mutations following UV mutagenesis, and future studies will be necessary to understand if this might be associated with changes in the DNA repair machinery.

      (3) Some minor errors and clarifications are required at various points in the manuscript.

      We thank the reviewer for their careful reading of the manuscript and for pointing out typographical errors, and the sections that would benefit from some clarification. We have carefully addressed all of these points in the revised manuscript.

    1. Author Response:

      Reviewer #1:

      The manuscript by Bellio and colleagues is based on the experimental model of T. cruzi infection in WT, MyD88-/- and IL-18-/- mice previously described by the same group in a 2017 eLife publication. The main message of the current study is that, in addition to IFN-g+ Th1 effectors, T. cruzi infection induces an even larger population of cytotoxic CD4+ T cells.

      The characterization of the cytotoxic CD4+ T cells is well documented. The data shown are convincing. However, since Burel et al. (2012) described the existence of a similar population in humans infected with P. falciparum (an intracellular pathogen), the authors should modify the statement (line 35-36) in the abstract.

      First, we would like to thank Reviewer #1 for the positive comments on our work.

      Please note that our statement in the abstract is: “Here, for the first time, we showed that CD4CTLs abundantly differentiate during mouse infection with an intracellular parasite” refers to mouse experimental models of parasite infection and not to human studies. We could not find any article with Burel JG as first author published in 2012; we believe that Reviewer# 1 is referring to a study published in 2016 (Burel et al. PLoS Pathog. 2016 Sep 23;12(9):e1005839), in which a population of CD4 T cells with cytotoxic properties was described in humans after primary exposure of blood-stage malaria parasites. Please note that the finding of the important role of T-cell intrinsic IL- 18R/MyD88 signaling for the development of a strong CD4CTL response is also part of the main message of our manuscript.

      Similarly, the title "Cytotoxic CD4+ T cells… predominantly infiltrate Trypanosoma cruzi-infected hearts" is an overstatement. If cytotoxic CD4+ T cells outnumber 10:1 IFN-g-secreting population (in lymphoid tissue) their higher representation in hearts of infected mice is not a selective phenomenon but rather expected.

      We would like to thank Reviewer #1 for this comment, giving us the opportunity to clarify this point. Of note, we were not referring to the ratio of CD4CTL to Th1 cells, but to the frequency of CD4CTL among all the CD4+CD44+ (activated/memory) T cells. In fact, as shown in Figure 7-figure supplement 2, (now added to the revised ms), we found that the frequency of GzB+ cells among all activated/memory CD4+CD44hi T cells is significantly increased in the heart compared to the frequency of GzB+ among CD4+CD44hi T cells found in the spleen. Please also note that the frequency of CD4+ T cells expressing both GzB and PRF also increases in the heart compared to the spleen (Fig. 7F, middle panel and Fig. 1D left panel). We are now including this information in the revised manuscript, clarifying this point.

      My major concern is that the function of these cells remains undefined. Are they beneficial or detrimental for the host? It appears that the authors themselves could not make up their minds. The GzB+ CD4+ T cells protect but do not decrease the parasite load (Fig 6G).

      Our results in the mouse model of infection with T. cruzi, employing the adoptive transfer of WT CD4+GzB+ T cells to the susceptible Il18ra-/- mouse strain, indicate a clear beneficial role of CD4CTLs in the acute phase of experimental T. cruzi infection. Significantly extended survival was observed in the group of mice receiving sorted CD4+GzB+ cells, without, however, decreasing parasite load (Figure 6G). We would like to comment here that in order to be beneficial to the host, an immune response does not always result in decreasing the pathogen load. In fact, in certain circumstances, to hinder the excessive inflammatory response (which can lead to host death), is an advantage for the host, even if this does not result in the reduction of the pathogen numbers. The advantage conferred to the host by regulating the inflammatory response was probably also explored in pathogen/host co-evolution, giving rise to chronic infections, where the host can survive for a longer period and the pathogen increases its chances of transmission (Schneider DS & Ayres JS., 2008, Nat Rev Immunol;8(11):889; Medzhitov R, et al, 2012, Science; 335(6071):936). Therefore, the results shown on Figure 6G are fully compatible with a potential regulatory role exerted by CD4CTLs, previously proposed by other authors (Mucida et al, Nat. Immunol. 2013), and point to the beneficial role of CD4CTLs for the host in the acute phase of infection with T. cruzi, probably by contributing to the decrease of immunopathology, the detrimental side of an exacerbated immune response, as discussed. Also favoring this hypothesis, the frequency of CD4CTLs expressing immunoregulatory molecules is increased when compared to other activated CD4+T cell subsets (Figure 3 and new Figure 7-figure supplements 3 and 4). Please see our complete discussion on this subject in the revised manuscript.

      On the other hand, during the chronic phase of the disease, the persistence of the immune response against the parasite might involve functional changes in the CD4 T cell response. This hypothesis could explain the association found between CD4CTLs and cardiomyopathy in chronic Chagas patients. Therefore, a beneficial role for CD4CTLs in the acute phase is totally compatible with the hypothesis that, during the chronic response in a persistent infection, CD4CTLs might acquire a detrimental role, contributing to immunopathology. Of note, several studies in the literature have shown a beneficial role for Th1 cells during the acute phase of infection with T. cruzi, while the Th1 response has also been associated to a pathologic outcome during the chronic phase of Chagas disease (reviewed in Ferreira et al, 2014 World J Cardiol 2014 6(8):7820 and in Fresno & Girones, 2018, Front.Immunol. 9;351). Therefore, it is not implausible that the CD4CTL subpopulation, could also display different roles in the acute versus the chronic phases of the infection with T. cruzi. However, at present, this hypothesis remains speculative as stated in the manuscript discussion. An extensive investigation of the role of CD4CTLs, as well as of immunoregulation mechanism acting in chronic Chagas patients need to be conducted to fully answer this question, which is beyond the scope of the present work. Nevertheless, we acknowledge that the alternative possibility remains, in which the higher levels of CD4CTLs in chronic patients reflect elevated parasite burden and/or inflammation in the heart, without a direct involvement of this cell subset in the pathology. Please see our answer to Review #2 on this topic and the inclusion of discussion clarifying this point in the revised manuscript.

      Are they terminally differentiated or "exhausted" effectors? GzB+ CD4+ T cells can be found in the hearts of chronically infected mice, but we do not know if they are specific for pathogen or self Ags. Do they express the markers of exhaustion on day 14 in the heart?

      1) We have commented in the first version of the manuscript that one of the limitations of our work is the fact that very few CD4 epitopes of T. cruzi presented by I-Ab have been described so far, and this limits the investigation on the specificity of CD4CTLs in our model. This is a very interesting and important question, which, however, is not possible to address in the present work.

      We would like to thank Reviewer#1 for the suggestion of performing a broader analysis on the expression of immunoregulatory markers associated with exhaustion and/or terminal differentiation, which adds for the comprehension of CD4CTL biology in the model of acute infection with T. cruzi. Whether GzB+CD4+ T cells are terminally differentiated or "exhausted" effectors is an interesting and debated question. It was initially hypothesized that since exhausted T cells share features with terminally differentiated T cells, this would suggest a developmental relationship between these cell states (Akbar, A.N. & Henson, S.M., 2011 Nat. Rev. Immunol.11:289; Blank, C.U. et al, 2018, Nat.Rev.Immunol,19:665). However, subsequent studies showed that exhausted T cells seem to be derived from effector cells that retain the capacity to be long-lived (Angelosanto, J.M. et al., 2012, J. Virol. 86: 8161). In the first version of our manuscript, we investigated the expression of several markers associated with exhaustion such as 2B4, Lag-3, Tim-3 and CD39, besides the downregulation of CD27 on GzB+ CD4+ T cells (Figures 1E, 3B, 3D-E and 5E). In general, cells losing the expression of CD27 have been characterized as Ag-experienced further differentiated cells (Takeuchi and Saito, 2017, Front.Immunol. 8:194). Our finding that, differently from GzB-negative cells, most GzB+CD4+ T cells had lost the expression of CD27, suggested to us that CD4CTLs present in the spleen of mice infected with T. cruzi might be further differentiated T cells (Figure 3E). The transcription factor Blimp-1 controls the terminal differentiation of cells in a variety of immunological settings and its high expression in CD4+ and CD8+ T cells is associated to the expression of immunoregulatory markers (Chihara, N. et al, 2018, Nature 558:454). The observed high expression of Blimp-1 by GzB+CD4+ T cells (Figure 5D) is also compatible with the hypothesis that CD4CTLs are terminally differentiated. Of note, most of the exhaustion studies were performed on CD8+ T cells and it is still not well established if this phenomenon is equally regulated in CD4+ T cells. We have now extended the investigation on the expression of terminal differentiation/exhaustion markers, including PD-1 staining, on GzB+PRF+ CD4+ T cells in the spleen and in the heart of infected mice. Results in Figure 7-figure supplement 3, show that CD44hiGzB+PRF+ CD4+ T cells compose the subset of activated cells among which the higher frequency of cells expressing these markers is found, both in the spleen and in the heart, at day 14 pi. The only exception was the equal ratio of cells expressing PD-1, and at equivalent levels, when comparing CD44hiGzB-PRF- and CD44hiGzB+PRF+ CD4+ T cells in the spleen. Non-significant differences in the percentages of cells expressing PD-1 among CD44hiGzB-PRF- and CD44hiGzB+PRF+ CD4+ T cells were found in the heart. However, the intensity of expression of the PD-1 marker (MFI) was significantly higher among CD44hiGzB+PRF+ compared to CD44hiGzB-PRF- CD4+ T cells infiltrating the heart. Furthermore, we also compared the frequency of CD44hiGzB+PRF+ CD4+ T cells expressing Lag-3, Tim-3, CD39 and PD-1, and their corresponding MFI values, between the spleen and the heart (Figure 7-figure supplement 4). Of note, while MFI values of Tim-3, CD39 and PD-1 expression were increased on CD4CTLs (CD44hiGzB+PRF+) in the heart compared to CD4CTLs in the spleen, Lag-3 expression levels were decreased on CD4CTLs infiltrating the cardiac tissue. Despite exhaustion being often seen as a dysfunctional state, it is important to note that the expression of these inhibitor molecules allows strongly activated T cells to persist and partially contain chronic viral infections without causing immunopathology and that highly functional effector T cells can also express such inhibitory receptors (reviewed in Wherry, E.J., 2011, Nat. Immunol.,12:492; Blank, C.U. et al, 2018, Nat. Rev. Immunol., 19:665). Interestingly, only PD-1, but not Lag-3, Tim-3 or CD39 expression is upregulated on CD8CTLs in the heart relatively to the spleen, an indication that the T. cruzi-infected cardiac tissue is a less so-called exhaustion-inducing environment compared to certain tumors (Figure 7- figure supplement 4). It is known that many immunomodulatory molecules, including Lag-3, Tim-3, PD-1 and CD39 are co-expressed as part of a module composing a larger co-inhibitory gene program, which is expressed in both CD4+ and CD8+ T cells under certain activation conditions, driven by cytokine IL-27 (Chihara, N. et al, 2018, Nature 558:454). The opposing behavior of Lag-3 expression, which is downmodulated on CD4CTLs in the heart in comparison to the spleen, indicate that CD4CTLs infiltrating the heart are not typically exhausted cells. Of note, a recent study has shown that exhausted CD8+T cells can partially reacquire phenotypic and transcriptional features of T memory cells, in a process that includes the downmodulation of Lag-3 expression (Abdel-Hakeem, M.S. et al, 2021, Nat.Immunol., 22:1008). As requested, these new data were included (Figure 7-figure supplements 3 and 4) and discussed in the revised manuscript.

      The factors that control differentiation of cytotoxic CD4+ T cells are the same as for IFN-g- Th1 cells. MyD-88-/- and IL-18-/- mice significantly lack both populations and succumb to T. cruzi infection. In their 2017 eLife publication, this group reported that survival of infected MyD-88-/- and IL-18-/- mice can be rescued by adoptive transfer of purified total WT CD4+ T cells, which was attributed entirely to their ability to secrete IFN-g (at least in the case of MyD-88-/- recipients). In the current study, the authors only used infected IL-18-/- recipients and show that this time transfer of GzB+ CD4+ T cells is sufficient to confer the protection. When compared with the old data, the rescue of the infected IL-18-/- with only GzB+ CD4+ T cells looks weaker (2 surviving animals out of 10 pooled from 2 experiments), strongly suggesting that IFN-g Th1 cells do play a significant role. It is unclear when the parasite load in Fig G6 was evaluated. It would be good to show deltaCT values for individual mice.

      We thank Reviewer #1 for the opportunity to clarify the point on the protective role of Th1 and CD4CTLs cells during T. cruzi infection and to better discuss our data. Please note that we do not question the beneficial role of Th1 cells in this infection model. In our paper published in 2017 in eLife, we have shown that the adoptive transfer of IFN-g- deficient CD4+ T cells do not result in the decrease of parasite loads in susceptible recipient mice. These results are totally in agreement with the known beneficial role of Th1 cells during infection with T. cruzi, through the microbicidal action of IFN-g, which was also described by other groups.

      The new information that our present study brings is that the adoptive transfer of GzB+CD4+ T cells with poor (GzB-YFP+) or no (Ifng-/-) capacity of IFN-g secretion, also significantly extended survival of infected Il18r-/- mice, which have lower levels of both Th1 and CD4CTLs, compared to WT mice (Figure 6G and Figure 6-figure supplement 2). Please note that 3 (not 2) out of 10 mice that received GzB+CD4+ T cells survived. We stated in our discussion that, together, our present and past data demonstrate that both Th1 and CD4CTL are important for improving survival, although through different mechanisms, since adoptively transferred GzB+CD4+ T cells (as well as Ifng-/- CD4+ T cells) were not capable of reducing parasite load but, notwithstanding, extended survival.

      Following the guidelines of the Animal Care and Use Committee, in order to prevent/alleviate animal suffering, all laboratory animals found near death must be euthanized. Therefore, parasite load in the hearts was evaluated in mice found at the moribund condition (a severely debilitated state that precedes imminent death, as defined in Toth, L.,2000; ILAR J, 41:72), presenting unambiguous signals that the experimental endpoint has been reached. We have now included 2ˆDeltaCT values for individual mice in Figure 6G, as requested.

      Because donor IFN-g-/- CD4+ T cells do express IFN-gR (Supp Fig 6-2), IFN-g produced by IL-18-/- host cells could enhance the activity and/or help expand cytotoxic CD4+ T cells among the IFN-g-/- CD4+ donor population. To directly test the protective role of cytotoxic CD4+ T cells in the absence of IFN-g, the authors should treat infected IL-18-/- mice that have received IFN-g-/- CD4+ T cells with anti-IFN-gamma Ab.

      It is known that IFN-g is critically important for resistance against infection with T. cruzi. Accordingly, Ifng-/- mice are extremely susceptible, dying at early time points of infection (Campos, M. et al, 2004, J.Immunol, 172:1711). Of note, IFN-g production by other cell types, and not only derived from CD4+ T cells, is relevant for resistance against infection, as demonstrated for CD8+ T cells (Martin D & Tarleton R. Immunol Rev. 2004, 201:304). In our present work, we performed experiments where Ifng-/- CD4+ T cells were adoptively transferred to susceptible Il18ra-/- mice, with the goal of testing whether the transferred cells would be able to confer some increment in the survival time of infected mice, despite of not being able to decrease parasite loads, a direct consequence of their deficiency in IFN-g production, as previously shown (Oliveira et al., 2017, eLife). In fact, this turned out to be the case and we showed that the transfer of purified Ifng-/- CD4+ T cells extended survival (Figure 6-figure supplement 2). Of note, our data demonstrate that the percentage of GzB+CD4+ T cells is not affected in the total absence of IFN-g, since Ifng-/- mice display the same frequency of this cell population as found in WT mice (Figure 4B). The increased survival of adoptively transferred mice is compatible with a regulatory function of GzB+CD4+ T cells, which additionally express several immunoregulatory molecules, as shown. Whether IFN-g produced by the host is enhancing the activity and/or expanding cytotoxic CD4+ T cells among the transferred T cell population is not an essential point here, since we were not aiming to test the protective role of cytotoxic CD4+ T cells in the total absence of IFN-g in the host mice.

      The intracellular cytokine staining in this study appears to be suboptimal. Instead of stimulating with PMA/ionomycin in the presence of Golgi block, Roffe et al. (2012) stimulated lymphocytes with anti-CD3 prior to adding Brefeldin A, an important technical difference which may explain the rather low frequencies of IFN-g+ and IL-10+ cells in this study.

      We respectfully disagree from Reviewer #1 on this point. The frequency of IFNg+ CD4+ and IL-10+CD4+ T cells in the spleen of mice infected with T. cruzi Y strain obtained in our experiments is in the same range to what was previously described by other research groups investigating the immune response to this parasite, including studies that have employed anti-CD3 stimulation and brefeldin A, such as Jankovic, D. et al, 2007, JEM 204:273 (Fig.S1), cited in our manuscript (page 9, lines 218-219), among others (Nihei J et al, 2021, Front. Cell. Infect. Microbiol.11:758273; Martins GA et al, 2004, Microbes Infect 6:1133 – Fig.6B; Hamano S. et al, 2003, Immunity, 19:657- Fig. 2A). In the present work, we used the combination of monensin and brefeldin A after PMA/iono treatment, and found the same frequency of IFN-g+CD4+ T cells described in a previous study of our group, where staining was performed after incubation of splenocytes with parasite-derived protein extract and brefeldin A alone (Oliveira AC et al., 2010, PLoSPath 6(4):e1000870 –Fig. 8D). On the other hand, please note that the study cited by Rev. #1 (Roffe et al., JI 2012) employed a different strain of T. cruzi, the Colombiana strain, which differs in several aspects from the Y strain used in our work. Colombiana induces a different pathology, with distinct kinetics. In that study, intracellular IFN-g and IL-10 detection was performed at a much later time point of infection (day 30 pi), and in cells infiltrating the heart, not the spleen. In summary, frequencies of IFN-g and IL-10 secreting CD4+ T cells described in our manuscript are comparable to the ones found in the spleen of mice infected with the same or similar strains of T. cruzi and reported in articles of prestigious journals by other groups, cited above.

      Reviewer #2:

      In this work, Professor Bellio and her colleagues provide compelling evidence to show unusually strong induction of cytotoxic CD4 T cells (CD4CTLs) in Trypanosoma cruzi-parasitized mice. Using genetic models and mixed bone marrow chimeras they dissect the signals responsible for CD4CTL induction in this infection and identify T cell-intrinsic IL-18R/MyD88 signaling as the key inducer. The CD4CTLs that clonally expand in T. cruzi infection outnumber CD4 cells with typical Th1 profile (IFN-γ secretion) and bear the hallmarks of CD4CTLs described in other model systems and in humans. Utilizing GzmbCreERT2/ROSA26EYFP reporter mice, the authors show that adoptive transfer of CD4 cells that have made GzB can increase the survival of T. cruzi parasitized l18ra-/- mice. Finally, the authors describe a clear correlation between the frequency of CD4CTLs the circulation of patients with T. cruzi-induced chronic Chagas cardiomyopathy, implying a pathogenic role for these cells in chronic disease.

      The findings reported here are an important addition to the understanding of both the origin of CD4CTLs and their potential role in host protection or disease. The evidence provided in support of the main claims is very strong and the association between CD4CTLs and Chagas disease quite intriguing. There are, however, some aspects of the work that would benefit from further clarification or experimental support, so that alternative interpretations of the data can be excluded.

      The defining characteristic of CD4CTLs that separates them from other CD4 subsets is the production of granzymes and perforin and, by extension, the ability to kill target cells in a granzyme/perforin-dependent manner. In contrast, all T cells can kill target cells via alternative mechanisms that are not dependent on granzyme/perforin, for example through expression of TNF family members. It would appear that much, if not most, of the killing activity of T. cruzi-induced CD4CTLs can be attributed to FasL (Fig. 1B). FasL-mediated killing is not restricted to CD4CTLs and as the title of one of the cited studies (Kotov et al., 2018) states, "many Th cell subsets have Fas ligand-dependent cytotoxic potential". It would be important to ascertain if expression of granzyme/perforin by CD4CTLs in T. cruzi infection is also associated with granzyme/perforin-dependent cytotoxicity. This affects the direct and indirect in vitro cytotoxicity assays, as well as the interpretation of in vivo protection.

      Similarly, the protective effect of transferring GzmbCreERT2/ROSA26EYFP reporter-positive cells to Il18ra-/- mice may not be necessarily mediated in a granzyme/perforin-dependent manner or by CD4CTLs for that matter. The reporter will mark cells that express GzB at the time of tamoxifen administration but does not guarantee that these cells will continue to express GzB or that they will prolong survival of recipients in a granzyme/perforin-dependent manner.

      While the authors provide evidence that GzB-producing cells are largely distinct from IFN-γ-producing cells, the reporter-positive cells may still contain genuine Th1 cells. Given Th1 cells have been previously found necessary for protection of Il18ra-/- mice in the T. cruzi model, can a role for Th1 cells in this transfer model be formally excluded? The authors do convincingly demonstrate that IFN-γ itself is not essential for protection, but that does not leave granzyme/perforin-dependent as the only other alternative. For example, the experiment described in Fig. 6G lacks an important control, the transfer of reporter-negative cells. What would the conclusion be if reporter-negative (but T. cruzi-specific) cells proved as protective as reporter-positive cells?

      We would like to thank Reviewer #2 for the positive comments on our study and for giving us the opportunity to better discuss and clarify the relevant points raised in this review.

      (i) Concerning the role of GzB/PRF in cytotoxicity: as explained in more details in our next answer to Reviewer #2, we have now shown that the cytolytic activity of the CD4 T cell subset differentiating in the murine T. cruzi-infection model is totally dependent on a GzB- and PRF-mediated mechanism.

      (ii) Concerning a possible role for Th1 in the adoptive transfer experiments: please note that the parasite load is not decreased by the adoptive transfer of CD4+GzB+ T cells (Figure 6G); Additionally, we showed that the adaptive transfer of Ifng-/- CD4+ T cells also extend the survival of infected mice (Figure 6-figure supplement 2), but did not decrease parasite levels (Oliveira et al., 2017). These results exclude a role for Th1 cells, which are known to exert an important microbicidal function through the production of IFN-g, as previously demonstrated by us (Oliveira, 2017) and other groups. Together, our present and past data support the notion that both Th1 and CD4CTL are important for extending survival, although through different mechanisms. Our results are in accordance with an immunoregulatory role played by CD4CTLs, likely through the GzB/PRF/FasL-mediated killing of infected APCs in an IFN-g-independent manner, although it is not possible to attribute the beneficial role of the adoptively transferred CD4CTLs exclusively to their cytolytic function, as discussed in the revised manuscript. Of note, we also show here that most CD4+GzB+PRF+ T cells express high levels of immunomodulatory molecules, raising the possibility that the beneficial role of adoptively transferred CD4CTLs might rely on the concerted action of their cytolytic function and immunomodulatory activity. Please see the full discussion on this point in the revised version of the manuscript.

      (iii) Concerning the adoptive transfer of GzB-EYFP-negative cells: unfortunately, GzB-EYFP-negative cells cannot be employed as a control, since in the GzmBCreERT2/ ROSA26EYFP mouse line age, only 1 - 3 % of total splenic CD4+ T cells express EYFP after induction by tamoxifen (Figure 2-figure supplement 3). This contrasts to 10-40% of GzB+ and PFR+ cells among CD4+ T lymphocytes, observed by intracellular staining. Consequently, the majority of the CD4+GzB+ T population is EYFP-negative in this system and thus, sorted “GzB-EYFP-negative”, based on the absence of expression of EYFP, would not be bona-fide GzB-negative cells. If it were possible to sort GzB reporter-negative cells, Th1 cells would be among the sorted cells and upon adoptive transfer they would secrete IFN-g and, consequently, decrease the parasite load in recipient mice (Oliveira, 2017). However, in the absence of the proposed immunoregulatory action of CD4CTLs, Th1 cells transferred alone might also increase pathology and, consequently, it is possible that they would not extend survival, albeit diminishing parasite load. It is expected that higher levels of extended survival would be attained when both Th1 and CD4CTLs are transferred, as discussed in the manuscript and in answer (ii) above. Importantly, please note that one current hypothesis is that CD4CTLs differentiate from Th1 and, therefore, the adoptive transfer of Th1 cells will not guarantee that Th1-derived CD4CTLs would not be developing in vivo, unless special engineered mouse strains, not available at present, would be employed for these experiments.

      Reviewer #3:

      By modelling trypanosoma cruzi infection in mice, the authors highlighted the presence of a subsets of CD4 T cells expressing canonical markers and transcription factors of CTLs and capable of exerting antigen specific and MHC class II restricted cytotoxic activity. Mechanistically, using KO mice, the authors have shown that myd88 expression is required for strengthening the CD4 CTLs phenotype during the infection.

      Moreover, by investigating the presence of a previously published CD4 CTLs gene signature in a mixed bone marrow chimera settings they highlighted a cell intrinsic role for Myd88 in imprinting the signature. The study also identifies Il18R as a myd88 upstream receptor potentially responsible for CD4 CTLs development by showing that lack of IL18R phenocopied myd88 deficiency in failing to promote a CD4 CTLs phenotype.

      Finally, by showing the direct correlation between perforin expressing CD4 T cells in Chagas infected individuals and parameters of heart disfunction the authors hinted at a possible involvement of CD4CTLs in a clinical setting.

      -The core finding of the paper, providing the first evidence of CD4 CTLs development in a mouse model of intracellular parasite is well supported by the data. The expression of markers correlated to CD4 cytotoxicity in other settings and gene signatures fits well the phenotype described and suggests possible common features for CD4 CTLs development across infection with different pathogens.

      This manuscript will boost the knowledge over the involvement of non canonical CD4 types in the immune responses to parasites. Moreover the finding that CD4 CTLs are the predominant phenotype in organs importants for viral replication imply an involvement of these cells in the development of the pathology that will have to be taken into accounts in future studies.

      • The understanding of the parental relationship beteween CD4CTLs and Th1 remains unclear and it's complicated by the low numbers of IFNg (regarded as an hallmark of functional Th1) producing CD4 T cells detected in the model. IFN-g production by CD4 is lower than 10% even when achieved by PMA/Iono stimulation and half of Gzb+ CD4 stain positive for the cytokine. On the other hand the putative transcription factor of Th1 development, Tbet, is expressed by all Gzb positive CD4s. This discrepancy and the low number of IFNG+ should be better discussed by the authors.

      First, we would like to thank Reviewer #3 for the constructive criticism on our manuscript. Regarding the apparent discrepancy on the frequencies of IFN-g+ and Tbet+ CD4+ T cells in our model, please first note that the percentage of IFN-g+ CD4+ T cells detected in the present study is comparable to the ones found in the spleen of mice infected with the same or similar strains of T. cruzi and reported by other groups (please see our complete answer to Reviewer #1 on this topic). With this remark done, we think that the apparent discrepancy between the expression of T-bet and the low fraction of GzB+CD4+ T cells producing IFN-g is a very interesting question. It is known that T-bet is a key transcription factor associated with the development of IFN-g-producing CD4+ T cells and that it also coordinates the expression of multiple other genes in CD4+ T cells and in other cell types. Also, T-bet can interact with other proteins, resulting in the induction or inhibition of key factors in T cell differentiation (reviewed in Hunter, 2019, Nat. Rev. Immunol, 19:398). Importantly, it has been shown that during the late stages of Th1 cell activation, T-bet recruits the transcriptional repressor Bcl-6 to the Ifng locus to limit IFNg transcription (Oestreich, 2011, JEM, 208:1001) Therefore, T-bet action is not limited to transactivation of the Ifng gene, but can also act as part of a negative-feedback loop to limit IFN-g production in certain cells. We do not believe that Bcl-6 is playing a role in CD4+GzB+ T cells in our model, since we found that the majority of CD4+GzB+ T lymphocytes express Blimp-1 (Figure 5D), and Blimp-1 and Bcl-6 are known to be reciprocally antagonistic transcription factors.

      However, the possibility remains that another repressor factor is downregulating Ifng gene transcription in the majority of T-bet+ CD4+GzB+ T cells, with the participation of T-bet or not. Of note, Blimp-1 was shown to be a critical regulator for CD4 T cell exhaustion during infection with T. gondii, and CD4+ T cells deficient in Blimp-1 produced higher levels of IFN-g in infected mixed-bone marrow chimeric mice reconstituted with WT and Blimp-1 conditional knock-out cells (Hwang, S., 2016, JEM 213:1799). Furthermore, Blimp-1 attenuates IFN-g production in CD4 T cells activated under nonpolarizing conditions and chromatin immunoprecipitation showed that Blimp-1 binds directly to a distal regulatory region in the Ifng gene (Cimmino, L. et al. 2008, JI 181:2338). We have also shown that, like Blimp-1, Eomes is expressed by around 60% of the GzB+CD4+ T cells (Figure 2G). It is known that Eomes controls the transcription of cytotoxic genes and promotes IFN-g production in CD8+ T cells, binding to the promotor of the Ifng gene. Interestingly, Eomes was also shown to participate in the induction of immunoregulatory/exhaustion receptors, such as PD-1 and Tim-3. Furthermore, deficiency of Eomes led to increased cytokine production (Paley, M.A. et al., 2012, Science 338: 1220). More recently, evidence in favor of the participation of Eomes in the repression of IFN-g production in TCR-gamma-delta T cells was also published (Lino, C. et al.,2017, EJI 47:970). Therefore, these studies indicate the complex control of Ifng gene, in which T-bet, Eomes, Blimp-1 and possible other TFs might play concerted roles. We think it would be interesting to investigate the role of Eomes and/or Blimp-1 in the repression of the Ifng gene in GzB+CD4+ T cells. Kinetics studies on the expression of these TFs, may contribute for the better understanding of the parental relationship between CD4CTLs and Th1 cells, a fundamental question, not completely understood yet. A comment on this subject was included in the revised manuscript.

      On the same note, while the confirmation of a CD4 CTLs gene signature in the model is very convincing, it must be noted that the one used as a reference was obtained by performing single cell RNA seq , taking into account only IFNg+ CD4 cells and then comparing Gzb+ and Gzb- negative in the setting. The authors are instead using bulk RNA seq and comparing populations of cells that would have none VS low levels of Th1. In this view, while the confirmation of the CD4 CTLs signature is striking, addressing the relative relationship with Th1 cells is complicated. Using Gzb YFP reporters in the setting could help improving the resolution between the 2 subsets.

      Our analysis clearly demonstrated the presence of the CD4CTL signature among WT CD4+ T cells, and its absence among Myd88-/- CD4+ T cells from the same mixed-BM chimeric mice. Together with our past work (Oliveira, 2017) and results included in the present manuscript, this analysis strongly contributes to demonstrate the importance of T-cell intrinsic IL-18R/MyD88 signaling for the development of a robust CD4CTL response to infection with an intracellular parasite. Although these results argue in favor of a common origin for CD4CTLs and Th1 cells during infection, an interesting point is that Ifng-/- mice display the same percentage of GzB+CD4+ T cells as WT mice (Figure 4B), suggesting that GzB+CD4+ T cells might emerge independently of IFN-gdependent Th1 cells. Therefore, the possibility remains that not all CD4CTLs are derived from the putative terminal differentiation of Th1 cells but that, instead, a divergence between the Th1 and CTL differentiation programs might occur at an earlier step. Although addressing this fundamental question goes beyond the possibilities of the present study, we believe that our results bring an important and substantial contribution for the understanding of the biology of CD4CTLs in response to infection and highlights the importance of IL-18R/MyD88 signaling for the reinforcement and/or stabilization of CD4+ T cell commitment into the CD4CTL phenotype. Regarding the use of GzB-YFP reporters, please see our answer below.

      • The dependancy on the Myd88/IL18r axis to promote CD4 CTLs is well characterized and the prolonged survival rate of IL18r-/- after the adoptive transfer of Gmb YFP+ CD4 is very convincing. However instead of using PBS as control the authors could have used YFP- or total CD4 cells for the task. While in previous publication it was already showed that protection was achieved by transferring the total CD4 population; comparing GzB + VS GzB- would have added useful insights over the amount of protection conferred by the subtypes and relative roles of CD4 CTLs and Th1 in the model. Parasitemia could also be reassessed in this view.

      We have already discussed the impossibility of sorting bona-fide GzB-negative cells from the reporter mouse strain available. Please see our complete answer to Reviewer 2 on this issue (iii) in this point-by-point letter. Moreover, due to the low percentage of GzB-EYFP cells labeled in the tamoxifen-treated reporter mice, a high number of mice is necessary for performing these adoptive transfer experiments. Unfortunately, due to the COVID-19 pandemic and its consequences on our animal facility, at present it is impossible to repeat this experiment including total CD4+T cells within a reasonable time. However, we have already shown in our past study (Oliveira, 2017), that the transfer of total WT CD4+T cells to Il18ra-/- mice, increased survival and lowered parasite load. On the other hand, our current data demonstrate that the adoptive transfer of GzB+CD4+ T cells increases survival but does not change the parasite load (Figure 6G). Therefore, these data strongly support that GzB+CD4+ T cells act in an IFN-g-independent way and, hence, differ from Th1 in the effector mechanism employed for extending survival of the recipient mice. In summary, our results favor the notion that CD4CTLs and Th1 cells have complementary roles, both being able to extend survival of recipient mice, although only Th1 are effective in lowering parasite load.

    1. Author Response

      Reviewer #3 (Public Review):

      In "Zika virus causes placental pyroptosis and associated adverse fetal outcomes by activating GSDME," Zhao et al. investigated the mechanism of fetal growth restriction caused by maternal Zika virus infection.

      Strengths:

      The in vitro studies (knockouts) are clear in showing a role for GSDME in cell death. They show that GSDME may be functioning similarly in several cell types in addition to placental cells. They also show that RIG-I recognition of the viral 5' UTR is critical for the cellular pyroptotic response. Using a pregnant mouse model, they show that GSDME knockout prevents disease in fetuses.

      Weaknesses:

      Given that the authors describe pyroptosis in other cell types, it seems possible that the effects of GSDME knockout on the fetus could be indirect and due to decreased pyroptosis in elsewhere in the dams. How did GSDME knockout alter the clinical signs of disease (weight loss, histopathology) in the dams?

      The Gsdme-/- mice used in our study were kindly provided by Dr. F Shao, and it was demonstrated that deletion of GSDME has no effect on the development and immune system of mice (Wang et al., 2017). In our in vivo model, no clinical signs of disease and weight loss were observed in both ZIKV-infected WT and GSDME KO dams. This is consistent with the results of previous studies which revealed that ZIKV-infection didn’t lead to any clinical symptoms in immunocompetent pregnant mice except placental damage and adverse fetal outcomes (Szaba et al., 2018; Barbeito-Andres et al., 2020). Our data showing extremely low viral loads in spleens, serum and brains of infected dams and no difference of viral titers in tissues of WT and Gsdme-/- dams (Figure 5G) also support that GSDME knockout didn’t alter the clinical signs of disease. The figure showing the weight change of dams and relative discussion (Figure S7A) were included in the revised manuscript. In addition, the vast majority of affected embryos underwent resorption, leaving only the placental residues and embryonic debris, so it’s hard to evaluate the function of GSDME by histopathological methods in embryos. In the remaining embryo samples, no obvious clinical signs were found in both WT embryos and Gsdme-/- embryos. To address whether the effects of GSDME knockout on the fetus are due to decreased pyroptosis in placenta or dams, we tried to cross Gsdme+/- and Gsdme-/- mice and compare pathology in +/- to -/- littermates. However, the pregnancy and litter rates were too low, even though we spent a lot of time and tried many times, we still could not get enough data to draw conclusions, so we added a discussion of related issues and the limitations of our in vivo experiments in revised manuscript (Line 463 to 482).

      Figure 5D/E/F and Figure 6C/D- how are the authors distinguishing between apoptosis and pyroptosis as the cause of cell death in the placental tissue?

      Due to the lack of effective commercial antibodies that can detect the activation of GSDME by immunohistochemistry or by Western blot within mouse tissues, we can only use PI staining to exclude early apoptosis, because PI cannot pass through the membrane structure of early apoptotic cells. But till now, there is still a lack of effective methods to distinguish between GSDME-mediated pyroptosis and the late stage of apoptosis completely, and some studies also called GSDME-mediated pyroptosis as secondary necrosis (Rogers et al., 2017). In addition, our results showed that ZIKV could induce GSDME-mediated pyroptosis in primary mouse trophoblast cells (Figure S6), which supports the conclusion of our in vivo experiments. However, we realized that this is a limitation of our in vivo model and additional research efforts are required to address this issue in the future. A discussion relative to this issue has been included in the revised manuscript (Line 463 to 482).

    1. Author Response

      Reviewer #1 (Public Review):

      This well-written, convincing paper ties together three major topics. The authors first detail a general strategy to combine CRISPRi approaches previously reported by the authors in S. pneumoniae with FACS to identify larger or smaller cells. The goal is to carry out a CRISPRi screen coupled to FACS to identify genes whose decreased expression distorts pneumococcal cell shape, with focus here on increased cell size. A strength of this strategy, which the authors call SCRilecs-seq, is the availability of a robust CRISPRi system and fluorescent-protein labeling methods in S. pneumoniae that have been developed and reported in several previous publications from the Veening lab. Sorting based on increased forward light scattering (FSC) indicative of increased cell size led to the identification of 17 genes, whose decreased expression leads to increased FCS. This set includes genes involved in cell division, peptidoglycan (PG) synthesis, and teichoic acid synthesis, including two operons in the mevalonic acid synthesis pathway. The paper then explores how mevalonic acid synthesis is linked to cell size and PG synthesis, and further, how inhibition of this pathway can be used to potentiate amoxicillin killing of resistant pneumococcal strains.

      The shift to an interesting biological problem means that the SCRilecs-seq method is presented here as a workable pilot method that could be further optimized. The authors point out in the Results and Discussion several classes of known essential genes that mediate cell size that were not detected in this first screen ("false negatives), some of which are in operons with other essential genes. They also point out that the encapsulated D39 used forms chains of cells that must be separated by vigorous vortexing that could potentially lead to loss of sensitive cells. Besides mentioning some potential issues, the authors might consider including additional suggestions of steps that could be taken in future studies to further optimize the SCRilecs-seq method. For example, isogenic unencapsulated D39 mutants do not form chains and often show more severe cell division and PG synthesis phenotypes than encapsulated strains. Therefore, unencapsulated strains would obviate the need for vortexing and perhaps increase the shape phenotypes caused by expression inhibition of some genes. Additional suggestions about ways to optimize the method in the future would add to this paper.

      We have now added possible points of improvement of our screening approach, including the use of an unencapsulated strain, to our discussion.

      The second topic concerns how the mevalonic acid synthesis pathway causes elongation of pneumococcal cells. This section forms a playbook on following up candidates detected in SCRilecsseq screens. The authors construct depletion/complementation merodiploid strains for operon 1 and operon 2 that encode genes in the pathway. They confirm elongation of cells by static and time-lapse phase-contrast microscopy and statistically robust determinations of cell lengths. They demonstrate complementation in strains with and without the fluorescent-protein reporters. As expect, from the pathways, deletion mutants of operon 1 can be fed mevalonic acid for growth. These physiological data are of high quality. One question that is not fully resolved is why depletion of operon 1 or operon 2 expression in culture leads to a drop in culture OD that levels out, which is interpreted as surviving cells, whereas depletion on agarose pads leads to complete lysis by microscopy. Likewise, starvation of an operon 1 mutant for mevalonic acid leads to complete lysis of the culture. Whether the depleted cultures actually contain survivors or debris can be checked by Live/Dead staining. Suppressor accumulation is suggested, but seems not tested. These phenotypes in different growth conditions might be tied together a bit more. Nevertheless, transformation assays and the mevalonic acid starvation experiments show that the pathways are essential in pneumococcus.

      Whereas data on suppressor accumulation was indeed not shown in our original manuscript, we did test this hypothesis and confirmed that the acquisition of suppressor mutations in depletion (inducible ectopic operon copy present) but not deletion (operon completely absent) strains is responsible for the apparent difference in behavior between these strains. Indeed, we find that cultures that manage to grow in absence of IPTG in these complementation strains are either mutated in the Plac promoter or in the lacI repressor. This is now mentioned in the revised manuscript and provided in the supplementary information of our revised manuscript.

      Experiments that localize FtsZ-rings or regions of active PG synthesis with fluorescent-D-amino acids led to a remarkable result. TEM indicates that elongated cells depleted for operon 1 or operon 2 expression start to pinch in a little, but stop. Similarly, depletion of operon 1 or operon 2 expression leads to elongated cells with multiple unconstricted FtsZ-rings and relative faint bands of FDAA labeling with the around cell diameters. Constricted FtsZ-rings and FDAA labeling less than cell diameters are absent. This pattern strongly supports the interpretation that septal ring closure and PG synthesis does not occur, while PG elongation continues at a reduced level. This pattern is highly reminiscent of the cells depleted for the GpsB regulator that is required for septal closure and PG synthesis, but phenotypically acts like a negative regulator of PG elongation. An interesting future question is whether the residual peripheral PG elongation during depletion is carried out by PBP2b:RodA as during normal growth or by other PBPs as part of a stress response.

      Based on the results of our second sCRilecs-seq screen performed on a strain depleted for mevalonate, we conclude that Pbp2b:RodA is at least partly responsible for the observed cell elongation. Results indicate that when pbp2b or rodA expression is halted during mevalonate depletion, cells are not able to become as long as when Pbp2b and RodA are fully active. We have now explicitly mentioned this observation at line 539. Additionally, we have tried to characterize the contribution of RodA more quantitatively towards cell elongation in the mevalonate depletion phenotype. However, this double depletion strain is extremely sensitive to the amount of inducer for both rodA expression and the expression of the mevalonate genes. We were therefore unable to identify conditions in which cells behaved properly that could be used as a starting point to perform this experiment.

      The authors further test whether cell elongation upon mevalonate limitation occurs because of reduced undecaprenol-phosphate amount and whether this correlates with capsule, teichoic acid, or PG synthesis, that all use undecaprenol-phosphate as a carrier. In a well-designed series of physiology experiments, the authors first showed that mutants blocked later in the pathway to undecaprenol synthesis have the same elongation morphology as that caused by mevalonic acid depletion. In addition, only a late block in PG synthesis produced MraY depletion produced cells with the elongation phenotype observed during mevalonic acid synthesis. In contrast, deletion of capsule did not produce a growth defect, and depletion of teichoic acid synthesis led to elongated cells with a completely different cell morphology defect. The inference from the matching phenotypes is that the division defects caused by mevalonic acid limitation are largely caused by a lack of PG precursor, and this limitation is leads to residual PG elongation, but no septal closure. Several reasonable hypotheses for the mechanism of this differential synthesis are presented in the Discussion for future testing. Although the physiological data are high consistent with this interpretation no direct determinations of the final Lipid-II precursor were included in this study.

      The authors did one further series of clever experiments to understand the phenotypes caused by mevalonic acid limitation. They applied the SCRilecs-seq method to cells depleted for mevalonic acid to identify decreased gene expression that would lead to even longer or smaller cells. The results show that depletion of gene expression involved in cell division or septal PG synthesis (e.g., FtsW, PBP2x) further elongated mevalonate-depleted cells, whereas genes involved in blocking protein expression, energy metabolism, or PG elongation (e.g., DivIVA, RodA, PBP2b) led to smaller cells.

      The last topic builds on the idea that synergy between two antibiotics is often greatest when they target the same process or pathway. Since locking mevalonic acid synthesis limits undecaprenol and blocks PG synthesis, then there may be synergy with the beta-lactam antibiotic, amoxicillin that inhibits PBP transpeptidase activity, especially in amoxicillin-resistant clinical strains. To test this idea, the authors built on a previous result from Staphylococcus aureus showing that the FDA-approved drug fertility drug, clomiphene, inhibits undecaprenol synthesis. The authors used cell elongation morphology and mevalonic acid depletion to confirm that clomiphene likely inhibits undecaprenol synthesis in S. pneumoniae, although Lipid II levels were not determined. In addition, synergy between clomiphene and amoxicillin inhibition was observed for the virulent, sensitive D39 strain and for three more recently isolated amoxicillin-resistant clinical strains. This synergy did not occur between clomiphene and antibiotics that do not inhibit PG synthesis. An attempt was made to test the synergy in a murine lung model of infection with one of the clinical strains (19F) that showed an intermediate synergy between clomiphene and amoxicillin. However, no synergy was detected in vivo, perhaps because insufficient clomiphene dosing. Further experiments could try other clinical strains with greater in vitro synergy (e.g., 23F) or improving the clomiphene dosing.

      We are indeed performing additional experiments to try and find conditions where the clomiphene-amoxicillin combination displays potentiation in vivo. Our current strategy is mainly focused at optimizing clomiphene to increase its potency in vivo. For example, clomiphene is a prodrug that is metabolized by the liver but because of clomiphene’s capability to boost amoxicillin action in vitro, this metabolic conversion is undesirable for our applications. Slight changes to the compound that prevent this conversion could therefore boost its activity. We hope that these and other minor adjustments will help to increase the in vivo efficacy of the clomiphene-amoxicillin drug combination. However, this is a work in progress and results will be presented at a later time.

      Together, this paper presents a number of important new findings that will strongly impact the field. A successful pilot screening method has been developed for S. pneumoniae, a major human pathogen. This method will undoubtedly be optimized, refined, and expanded in future screens. The screen pointed to the critical role of mevalonic acid in pneumococcal cell shape determination. While not totally unexpected, this paper is the first to systematically study this pathway in pneumococcus. This line of investigation led to the remarkable observation that mevalonic acid and undecaprenol deficiency preferentially blocks septal closure and septal PG synthesis over weakened PG elongation, not necessarily by the normal peripheral PG elongasome proteins. The basis for this phenotype will be explored in future studies. Finally, once characterized, depletion of undecaprenol synthesis showed synergy with amoxicillin in amoxicillin-resistant clinical strains of pneumococcus for the first time. Thus, this paper goes from unbiased screening, to characterization of a pathway that affects cell shape through modulation of PG synthesis, to manipulation of this pathway for antibiotic potentiation in resistant strains.

      Reviewer #3 (Public Review):

      This manuscript describes a powerful high throughput screening in the bacterial pathogen Streptococcus pneumoniae, that couples genome-wide CRISPR interference depletion with FACS sorting of elongated cells. This approach is therefore not limited to measuring changes in fitness but can be used to screen for mutants with any phenotype that can be detected by flow cytometry.

      The results from the screen uncovered an important role of the mevalonate pathway on cell length, as well as new factors with unknown functions required for proper cell morphology in Streptococcus pneumoniae.

      Upon depletion of mevalonate, overall peptidoglycan (PG) synthesis rate decreased. However, peripheral PG synthesis for cell elongation continued after septal peptidoglycan synthesis for cell division was inhibited. This suggests a form of regulation that directs PG synthesis towards septal or peripheral, depending on the availability of PG precursors. However, this mechanism is not identified.

      Finally, authors use the knowledge gained from the screening to design a combination therapy of amoxicillin with clomiphene that resensitizes amoxicillin-resistant S. pneumoniae strains. This is in agreement with previous data from Eric Brown´s lab showing that clomiphene potentiates the activity of β-lactam antibiotics against methicillin-resistance Staphylococcus aureus strains. The authors then show that clomiphene/amoxicillin combination was not effective in vivo, using a murine pneumonia disease model, possibly because the active concentration of clomiphene in the lung was too low.

      The manuscript is very clear and the experiments were carefully done. There are two major points that should be addressed

      1) Authors mention that depletion of mevalonate operons prevents cell division and leads to cell elongation. This conclusion is based on cell morphology (longer cells) and on microscopy experiments assessing peptidoglycan incorporation (Fig 4D). However, it is not clear from Fig 4D that there is no septal synthesis. For example, in the panel corresponding to depletion of mevalonate operon 1, the top cell is constricted in the middle and the RADA labelling shows a constricted ring, similar to what is seen for the wild type strain. Authors should point clearly to what they consider peripheral and septal synthesis to be; should overlay the green and red signals so that localization of PG synthesis over time can be more easily seen; and should discuss their data in the context of the recent model for peptidoglycan assembly during the cell cycle of S. pneumoniae by the Morlot group (https://doi.org/10.1016/j.cub.2021.04.041) which proposes that "the ovoid-cell morphogenesis (relies) on the relative dynamics between peptidoglycan synthesis and cleavage rather than on the existence of two distinct successive phases of peripheral and septal synthesis".

      Indeed, Figure 4D contains a cell that is still able to constrict slightly. We do not mean to claim that there is no constriction possible at all upon depletion of the mevalonate operons. However, constriction is very severely hampered and diminished strongly in comparison to a wild-type strain. We hope this can be appreciated qualitatively from the cells shown in Figure 4D and Figure 5F and from all acquired images that have been deposited on the EMBL-EBI BioImages Archive: https://www.ebi.ac.uk/biostudies/studies/S-BIAD477. Additionally, we have also attempted to quantify this phenotype. We recognize, however, that it was not apparent from our original manuscript how the distinction between septal and peripheral peptidoglycan synthesis was made. We have therefore now explicitly mentioned this at line 362: “Indeed, when defining septal peptidoglycan synthesis as FDAA bands that are less than 70% of the maximal cell width (i.e. sites of constriction, a criterion also used previously), quantitative analysis of hundreds of peptidoglycan synthesis sites demonstrated that the amount of septal synthesis is drastically reduced when transcription of the mevalonate operons is repressed (Figure 4E).”

      Figure 4D and Figure 5F now contain overlays of the sBADA and RADA signal so the progression of peptidoglycan synthesis can be interpreted more easily by the reader.

      Finally, we have included the recently formulated model for peptidoglycan synthesis postulated by the Morlot group in our manuscript. We have included this important work and have interpreted our results in light of this new and highly plausible model. This is part of our discussion starting at line 746. Additionally, we have rephrased some statements throughout our manuscript to allow for this new alternative interpretation of results.

      2) Authors propose that the elongation phenotype due to downregulation of the mevalonate pathway is caused by "insufficient transport of cell wall precursors across the cell membrane due to a limitation in the production of undecaprenyl phosphate (Und-P)". However, this conclusion is based almost exclusively on the similarity of phenotypes of the uppS deletion mutant and the mevalonate mutants. The levels of Und-P were not measured. An alternative to measuring these levels could be adding exogenous Und-P, which should revert the elongation phenotype. In S. aureus addition of exogenous Und-P suppressed the activity of clomiphene, indicating that cells are able to incorporate the compound (doi: 10.1073/pnas.1511751112)

      Many thanks for pointing us to this interesting result. We have referred to this experiment in the revised manuscript at line 572.

      The authors focus only on the elongation phenotype upon mevalonate depletion. However cells are also considerably wider (cell width increases 1.35X). This is similar to what happenes in Bacillus subtilis, where clomiphene was shown to cause swelling of cells (doi: 10.1073/pnas.1511751112). It would be interesting to discuss what may be the cause for this phenotype.

      Unfortunately, we do not know why cell width increases upon mevalonate depletion. However, it has been well established that both chemical and genetic inhibition of PBP2x, the PBP that is dedicated to septal cell wall synthesis, also leads to increases in cell width in S. pneumoniae which are often described as a “lemon-shape”. Why cells adopt this lemon-shape remains unknown and understudied. We can therefore currently not provide any plausible explanation for why cell width increases.

      We have, however, included a description of the Und-P inhibition phenotype of B. subtilis in our discussion section and have compared this phenotype to that of S. pneumoniae starting at line 705.

    1. Author Response

      Reviewer #1 (Public Review):

      The results are quite interesting and potentially have important therapeutic implications. Nevertheless, in the current form there are several weaknesses that diminish the strength of the findings.

      1) As the authors note, they do not provide direct evidence for the ultimate conclusion of the study that assembly with β2a and β2e subunits are necessary for CaV2.3 channels to contribute to pacemaking in SN DA neurons. The authors state siRNA knockdown experiments in SN DA neurons are technically challenging. Nevertheless, shRNA knockdown studies in SN neurons have been previously published. Such a study is critical to provide direct evidence for what would be a very important and impactful finding.

      Please refer to our detailed response to essential revision 1 above.

      2) Relative contribution of CaV1.3 (L‐type) and CaV2.3 channels to pacemaking in SN DA neurons. As the authors note, a phase III clinical trial for the L‐type channel blocker, isradipine, showed no efficacy for neuroprotection, even though some mice studies suggested this might be efficacious. On the other hand, the authors' previous work with CaV2.3 knockout mice suggest inhibition of this channel would be more appropriate for a neuroprotective response. It would be useful to get a direct comparison of the impact of isradipine and SNX‐482 on pacemaking in SN DA neurons (Figs. 1 and 2). If their impacts on pacemaking (and Ca2+ oscillations) are similar it would suggest something beyond the pacemaking Ca2+ influx could be responsible for neuroprotection (e.g. changes in NCS‐1 expression as previously suggested by the authors).

      The question about the relative contribution of Cav1.3 and Cav2.3 on pacemaking is complex due to the finding that different results have been obtained regarding the role of L‐type channels on pacemaking. In Cav1.3 knockout mice pacemaking frequency is normal (7, 8). Inhibition (of Cav1.2 and Cav1.3) by dihydropyridine Ca2+ channel inhibitors (e.g. isradipine, nimodipine) was found to inhibit pacemaking in some (e.g. 9‐11) but not in all (8, 12) reports. This seems to be dependent on experimental conditions, but the reasons for these discrepancies are currently unclear. Similarly, we find inhibition of pacemaking by SNX‐482 in cultured midbrain neurons (this paper) but, as previously reported, not in Cav2.3‐deficient mice (1). While this toxin is well suited to isolate Cav2.3‐mediated Ca2+ current components, effects on pacemaking in DA neurons have to be interpreted with more caution because (as clearly outlined in our original MS and our previous paper, 1), SNX‐482 is also a potent inhibitor of Kv4.3 channels. We consider this limitation even more in the discussion of SNX‐482 effects on pacemaking in cultured neurons (data now moved to Suppl Fig. 5) in the revised MS (end of page 15, top of page 16), although the SNX‐482 changes suggest an involvement of Cav2.3 for AP generation.

      Although we acknowledge the relevance of the question raised by the reviewer, based on our previous findings (1) the absence of an obvious role of Cav2.3 for pacemaking in SN DA neurons (despite their role for Ca2+ transients) as an experimental read‐out prevents a straightforward approach to study the contribution of different β‐subunits and their splice variants for this process.

      3) The slice recording data (Fig. 9) are confusing and raise concerns about adequacy of pharmacological isolation of CaV2.3 currents in this preparation. The accuracy of interpretation of the data in Fig. 9 rests critically on the idea that the cocktail of CaV channel blockers given successfully isolates CaV2.3 currents. Yet, the amplitudes of the exemplar currents shown for plus or minus the CaV channel blocker cocktail are almost the same. This cannot be due to CaV2.3 providing the dominant current in the slice preparation since addition of SNX‐482 only decreased Ca2+ current amplitude by 13% (Suppl Fig. 5). It is not clear to me why the steady‐state activation and inactivation curves experiments were not conducted in the cultured neuron preparation (Figs. 1 and 2) where there seems to be better control of pharmacological block of different Cav channel isoforms.

      We have now performed the isolation of SNX‐482sensitive currents not only in the cultured neuron preparation as suggested but, in addition, also in SN DA neurons. The latter experiments gave essentially identical steady‐state inactivation parameters as compared to our "R‐type" current (current remaining in the presence of all other channel blockers). This now also allows a direct comparison of SNX‐482‐sensitive current properties in cultured neurons and in slices (see response above). We now also specifically discuss previous reports of SNX‐482‐sensitive Rtype components in the introduction to allow comparison of these reports with our findings. Please also note that in our legend to Fig. 9A (original MS, now Fig. 6) we have explicitly stated that recordings of "similar amplitudes were chosen" to facilitate comparison of current kinetics. We still think that this makes sense and kept this part of the figure but now strengthened this point even more in the figure legend (Fig. 6).

      4) While the transcript data show that β2a and β2e are present in SN DA neurons, numerically they would still represent only a minority of the beta subunits present (<25%). I don't think sufficient thought has been given to this in the discussion of the results. Unless there is some preferential association of CaV2.3 with β2a and/or β2e, there would be a mix of channels with the majority incapable of supporting pacemaking in SN DA neurons. Given this, one would not necessarily expect that the gating characteristics of CaV2.3 would be the same as what is obtained with reconstituted channels in tsA201 cells where all the channels are assembled with β2a or β2e (see point #5 below).

      We now give this important point more thought in the discussion and mention that our data would imply such a preferential association of Cav2.3 with β2a and/or β2e and provide possible explanations. In addition, as in the original MS, we also provide alternative interpretations (Discussion, pg 14, 2nd and 3rd paragraph).

      5) The V0.5,inact of putative CaV2.3 channels in SN DA neurons of ‐52.4 mV was said to be 'very similar' to the value of ‐40 mV that was observed in tsA201 cells. A difference of +12 mV in voltage‐dependence gating of ion channels is substantial and should not be brushed off. A more nuanced interpretation would be that in SN DA neurons CaV2.3 likely associates with other beta subunits in addition to b2a and b2e and so one would not necessarily expect the V0.5,inact to be the same as what is observed in reconstituted channels in tsA201 cells.

      The V0.5,inact of ‐52.4 mV refers to the control current. We correctly stated that the V0.5,inact of R‐type current was ‐47.5 mV (as also shown in Table 3), i.e. only about 7 mV more negative than in tsA‐cells. We now rephrased this chapter because we also included the new data with inactivation data of SNX‐482sensitive currents in cultured neurons and in SN DA neurons recorded in slices (Discussion, page 13, 2nd paragraph). We do not refer to "'very similar" (difference ~5 mV) values anymore as suggested.

      Reviewer #2 (Public Review):

      This reviewer is very enthusiastic about the work but notes that most of the conclusions are based on data obtained by overexpressing Cav2.3 and accessory subunits in a heterologous expression system. The authors make a good argument for cross‐correlation between data in tsA‐201 cells and dopaminergic neurons, but it is unclear that the results will translate from one system to another. More data may be needed to do so (the reviewer does understand that these are challenging experiments), which the authors acknowledge in a section about the study's limitations. Based on this, it seems that the title is misleading without additional data supporting the role of Cav2.3 in dopaminergic neurons. Along the prior line, statements linking the study results to potential pathological implications seem a big stretch not supported by current data, and therefore should be eliminated.

      An issue with this manuscript is that the narrative and organization of the data are difficult to follow. The reviewer understands that the authors are weaving a complex story that involves using multiple techniques and approaches. Still, the way the data is organized and described makes the reader go back and forward to compare and contrast results constantly. This is further complicated by the fact that some experiments are done in dopaminergic neurons and others in tsA‐201 cells (the identity of the cell type used should be made clearer), the order of some figures is not appropriate (Supp Fig 1 for example) and some figure panels are not discussed (Supp Fig 5E to 5J).

      The MS has been completely rewritten, based on the additional SNX‐482experiments we have now performed both in the cultured DA neurons as well as in the midbrain slices. We therefore also moved data on effects on the spontaneous activity of cultured neurons by SNX‐482 into the supplement to make the key results easier to follow. The identity of neurons is indicated in all headers of table and figure legends to identify cell types. We also changed the title to “β2‐subunit alternative splicing stabilizes Cav2.3 Ca2+ channel activity during continuous midbrain dopamine neuronlike activity” to attenuate our previous statement regarding a role in dopaminergic midbrain neurons.

    1. Author Response

      Reviewer #2 (Public Review):

      We are in a golden age for comparative genomics and this is a prime example of the utility of the field. "Vision-related convergent gene losses reveal SERPINE3's unknown role in the eye" details the discovery of a function for a previously uncharacterized gene in regulating organ development in evolution. The authors intersect patterns of gene loss, quantified as the percentage of intact coding sequence, with visual acuity scores across Mammalia. This analysis identified 26 significant genes that have undergone convergent loss with phenotypic decreases of vision. Many of those genes have previously been annotated in the eye, indicating the analysis was successful and suggesting the uncharacterized genes may also have roles there.

      The authors ruled out the top hit due to its specific expression in the testis, and instead performed an in-depth characterization of the second hit, SERPINE3. This included an impressive breadth of comparative genomics across 430 placental mammals, carefully describing the many and diverse genetic perturbations of SERPINE3 in lineages with low visual acuity. These results are persuasive that SERPINE3 is involved in vision, and it is a great example and description of gene loss in adaptation.

      Critically, the authors validated the role of SERPINE3 in eye structure by confirming expression patterns in the eye, and characterizing its knockout in zebrafish, demonstrating both qualitative and quantitative impairments to eye structure. This is particularly satisfying as many comparative genomics make such associations but never validate the result. Here, validation of SERPINE3 was an undeniable success and puts a functional annotation to a previously uncharacterized gene. The utility of comparative genomics and zebrafish genetic models has been expertly capitalized upon and there is no doubt our knowledge of eye genetics has increased.

      We thank the reviewer for these kind words and the valuable comments that we addressed below.

      While these end results are certainly valuable to the community, details regarding the statistics and filters underlying the initial convergence analysis are too sparse to interpret. The impressive false discovery rate of the top hits is called into question when the top hit (corrected p-value < 1.1E-15 with visual acuity < 2) is subsequently skipped due to its specific expression in the testes. Given this disconnect, and without knowing the rationale and consequences of the various filters, it is difficult to get a sense of the accuracy and robustness of these p-values. Plots of p-value distributions across the dataset would demonstrate the method is statistically sound and would provide the backdrop to interpret the top hits of interest.

      We have now simplified the workflow to detect convergent gene losses in species with lower visual acuity values and explained the rationale of each step (this is detailed in the responses below). We would like to mention that our screen may find genes that are associated with other phenotypes that are shared between species exhibiting lower visual acuity values. For example, several of these species are subterranean mammals, which share other traits and adaptations to their environment. While we do not know to which trait the loss of the testis gene TSACC is associated with, its FDR is only slightly lower than the FDR of the second-ranked SERPINE3 (FDR 1.1E-6 vs. 1.5E-6).

      As suggested, we plotted the distribution of the raw P-values of all 13172 genes for which we ran the phylogenetic least square approach. This distribution has a peak at low P-values, indicating that some genes are preferentially lost in the poor-vision mammals. The distribution also showed a peak at ~0.5 and at ~1. We investigated which patterns of the %intact reading frame values appear to contribute to these two peaks.

      Many genes with P-values of ~0.5 have one high-acuity species (blue), where the %intact value is slightly reduced, whereas other high- and poor-acuity (red) species all have a 100% intact reading frame. Two examples, where rhesus or dolphin have lower %intact values are shown below:

      Similarly, many genes with P-values of ~1 have two or more high-acuity species, where the %intact value is reduced, whereas all other species have a 100% intact reading frame.

      Since these genes have lower %intact values in a few high-acuity species, the high P-values likely capture a negative association with our trait of interest. While it is not clear why many P-values are around 0.5 or 1, it is clear that these genes are not associated with poor vision.

      Our main purpose of using the phylogenetic least square approach was to rank the genes by their association with the poor vision phenotype. Importantly, the top-ranked candidates are all preferentially lost in low-acuity mammals, which is evident from Figure 1A. Furthermore, for SERPINE3, where we experimentally confirmed an eye-related function, three different screens with different phenotype definitions robustly support a preferential loss in low-acuity species (detailed below).

      Notes on how many genes pass each filter, and what kinds of genes, would allow interpretation of possible bias in those filters and how they interact with the convergence analysis.

      We thank the reviewer for this suggestion. As detailed below, we have now simplified the filtering procedure, justified the filter steps in the revised methods section, and added a flowchart (Figure 1 - supplement figure 1) describing each step and how many genes passed each filter (below).

      For instance, the slight changes in visual acuity cutoffs have non-obvious operational consequences for vision, yet large impacts on the resulting gene lists, making it difficult to interpret how the measure is functioning. Most importantly, a negative control in the convergence analysis, demonstrating a null p-value distribution with the same filters, would assuage most concerns.

      The reviewer is correct that changes in the visual acuity cutoff leads to different gene lists because the screen searches for genes preferentially lost in different species. However, our screens using three visual acuity cutoffs consistently find SERPINE3 as a candidate in the top 8 genes (Figure 1 - source data 5,6), showing that the association with lower visual acuity is robust for this gene.

      As suggested, we have now run a negative control screen. For the negative control, we considered close relatives of the low-acuity species as trait-loss species. Specifically, we selected elephant, rhinoceros, horse, the two flying foxes, guinea pig, degu and squirrel. These 8 species represent five independent lineages. All other species (including the low-acuity species) were treated as trait-preserving species. A Forward genomics screen with otherwise identical filter parameters retrieved only two hits, TUBAL3 and TRIM52, which have no known function in the eye. This supports the specificity of our screen.

      We added this to the main text:

      “To confirm the specificity of these results, we performed a control screen for genes that are preferentially lost in high-acuity sister species of the low-acuity mammals. This control screen retrieved only two genes, none of which have known functions in the eye (Figure 1 - source data 4). Together, this shows that our genome-wide screen for genes preferentially lost in low-acuity species successfully retrieved known vision-related genes.”

      and Methods:

      “As a control to ensure that a Forward Genomics screen does not always retrieve vision-related genes, we ran a new screen, searching for genes preferentially lost in high-acuity sister species (elephant, rhinoceros, horse, two flying foxes, guinea pig, degu, squirrel) of the low-acuity mammals that we used in the original screen. All other species including the other high-acuity mammals were then treated as background (Figure 1 - source data 4).“

    1. Author Response

      Reviewer #1 (Public Review):

      In the article "Whole transcriptome-sequencing and network analysis of CD1c+ human dendritic cells identifies cytokine-secreting subsets linked to type I IFN-negative autoimmunity to the eye," Hiddingh, Pandit, Verhagen, et al., analyze peripheral antigen presenting cells from patients with active uveitis and control patients, and find several differentially expressed transcription factors and surface markers. In addition, they find a subset of antigen presenting cells that is decreased in frequency in patients with uveitis that in previous publications was shown to be increased in the eye of patients with active uveitis. The greatest strength of this paper is the ability to obtain such a large number of samples from active uveitis patients that are not currently on systemic therapy. While the validation experiments have methodologic flaws that decrease their usefulness, this study will still serve as a valuable resource in generating hypotheses about the pathogenesis of uveitis that can be tested in future projects.

      We thank the reviewer for the constructive comments and effort to review our work in detail.

      Since all CD36+CX3CR1+ cells are CD14+ (Figure 4D), how CX3CR1 ended up being differentially regulated in a similar way despite this population was excluded from 2nd bulk RNAseq data set should be commented on by the authors.

      We agree with reviewer that the CD14 surface expression in relation to the black-gene module and CD36+CX3CR1+ DC3s requires more detailed analysis. As described in the results section, genes in this module are linked to both CD1c+ DCs and inflammatory CD14+ monocytes, which we cannot distinguish by bulk RNA seq analysis. Therefore, we aimed to use an approach to demonstrate that the black module is a bona fide CD1c+ DC gene signature not dependent on CD14 surface expression: We showed that there was not difference in CD14+ cell fractions in the samples for RNA-seq between patient and control samples (see Fig. 1F). We now further investigated this by additional data and experiments. We now show in Figure 2 Supplement 2A that CD14 – as expected - does not correlate with the black module. To confirm this experimentally, we purified CD14+CD1c+ and CD14- CD1c+ DCs from 6 donors and subjected these to qPCR analysis to evaluate the expression of key genes from the black module (see revised Figure 2A). As illustrated in revised Figure 2 panel B, we show that the expression levels of genes, including CD36 and CX3CR1, are not significantly altered between CD14+/- CD1c+ DCs which supports that the identified gene module is also not dependent on CD14 surface expression by CD1c+ DCs. To assess if the expression of the black module was also independent of CD14 in inflammatory disease, we used RNA-seq data from FACS-sorted CD14+CD1c+ DCs and CD14-CD1c+ DCS from patients with SLE and Scleroderma (GSE13731) and confirm that the expression of the black module genes is independent from CD14 surface expression (see revised Figure 2 panel C). Finally, we removed CD14+ cells from the analysis in the 2nd bulk RNA-seq experiment to proof that indeed the black module could be perceived as being associated with uveitis independent of CD14+ expression which allowed attributing the black module to CD1c+ DCs by bulk RNA-seq analyses. Also, more detailed analysis by flow cytometry (Revised Figure 4) and scRNA-seq (Figure 6) confirm these findings. For example, we show that the CD36+ CX3CR1+ DC3s are in fact a subset of CD14+ CD1c+ DCs (Figure 2 – Supplement 2) and we show that eye-infiltrating CD1c+ DCs that harbor the black module gene signature show increased CD36 and CX3CR1, but not CD14 (Figure 6C). We have addressed all these experiments and data in the result section on page 12-13, 16,17, and in the discussion section on page 19. We hope the reviewer agrees that this has now been sufficiently addressed.

      Line 153: "...substantiates this gene set as a core transcriptional feature of human autoimmune uveitis." It would be difficult to argue that when only 137 of the 1236 DEGs from the first module are repeated in a validation data set that this is the core transcriptions set that defines the population in any uveitis. Further concerns include that the validation data set is not the same population, but rather a subset not containing CD14.

      We agree with the reviewer and have changed this in the result section to “substantiates this gene set as a robust and bona fide transcriptional feature of CD1c+ DCs in human non-infectious uveitis” at page 13. We agree that - as expected - the removal of CD14+ cells impacted the sensitivity of our analysis, but that this strategy was required to attribute the black module to CD1c+ DCs. Our data supports that the black module gene signature is not restricted to CD14+ CD1c+ DCs by demonstrating that its dysregulation in non-infectious uveitis can even be perceived in CD14- CD1c+ DCs. We show now that the replication of a fraction of genes of the black module is a consequence of sensitivity to detect differentially expressed genes (Figure 2 – Supplement 1C). – most likely due to lower cell number after sorting out CD14+ cells. We have outlined this in greater detail in the result section on page 13. We hope the reviewer agrees this has now been adequately described.

      Line 220: Notch-dll experiments: with the experiments presented it is not possible to say that the changes are due to maintenance of CD1c+ DCs without further experiments outlining what NOTCH2 signaling changes throughout time. Is the population fully developed in the first 7 days of culture prior to adding NOTCH2 or ADAM10 inhibitors? Is there more apoptosis in this pathway? Less proliferation? It would be more accurate to say that there are fewer cDC2s after 14 days of culture without speculating the cause. In this experiment it is unclear why the gate of CD141/CD1c was chosen, as this appears to be in the middle of the population. In normal PBMCs CD141+ DCs would be CD1c negative; therefore why exclude the CD141hiCD1c+ and CD141loCD1c+ populations?

      We agree with the reviewer that in the current state the additional Notch-DLL experiments are inconclusive. Based on the comments from this reviewer, we believe the most appropriate experiments would be to show changes in the surface protein expression of CD36, CX3CR1 and other key surface markers of the black module upon inhibition of NOTCH2 or ADAM10. To this end, we repeated the experiments with human CD34+ HPC-derived DCs cells to measure cell subset by flow cytometry using the same panel we used for the PBMCs. However, we experienced substantial autofluorescence of human CD34-HPC derived cultures (expected for the complex heterogeneous cellularity of these cultures and as previously reported for CD34+ cells (Donnenberg et al., Methods 2015) that introduced significant artifacts and interfere with optimal identification of CD1c+ DCs and their subsets (see example below). We were unable to control for this so far, unfortunately. Since we agree with the reviewer that in the current form the supplemental figure does not significantly contribute to the manuscript, we removed the supplemental figure entirely from the manuscript. We hope the reviewer agrees that we already provide several complementary lines of evidence that link NOTCH-RUNX3 signaling to the black module (Figure 3A-D), including RNA-seq data from NOTCH2-DLL experiments, and that the current data is sufficient to support the main conclusions of the manuscript. We hope the reviewer agrees with this proposal.

      Author response figure 1: Manual gating example of human CD34-HPC derived DCs shows substantial autofluorescence.

      Line 256: The hypothesis that the loss of CD36+CX3CR1+ cells was due to migration to the eye doesn't make sense based on volume and number of cells. 0.1% of all PBMC is ~1x107 cells, and distributed throughout the eye would give about 1.3x106 cells/mL of eye volume. This would make the eye turbid which is not consistent with birdshot chorioretinopathy and would be rare in HLA-B27 anterior uveitis and intermediate uveitis

      We agree with the reviewer and have changed this in the manuscript section to “We speculated that the decrease in blood CD36+CX3CR1+ CD1c+ DCs was in part the result of migration of these cells to peripheral tissues (lymph nodes) and that these cells may also infiltrate the eye during active uveitis.” On page 17.

      Line 267: Would have liked to see the gating of CX3CR1/CD36 cells be more consistent (there are overlapping CX3CR1+ and CX3CR1- populations in 5A, but in Figure 4 quadrants were used to define the populations when evaluating the numbers in uveitis and healthy controls. The populations in Figure 5 are more separated by CD36.

      We agree with the reviewer and have added a more detailed example of the gating strategy used to sort CD36/CX3CR1 subsets in Figure 5 – Supplement 1 including the expression of CX3CR1 and CD36 in the sorted populations.

      Line 269, IN VITRO stimulation: The experimental paradigm is set up to find a difference between cells but does not to test any biologically relevant scenario. By sorting on a surface marker, then stimulating with the ligand for that receptor, the result better proves that CD36 is important in TLR2 signaling than does it give any information on how these dendritic cells might behave in uveitis.

      We agree with the reviewer that the connection between the cytokine expression of the CD1c+ subsets and non-infectious uveitis may benefit from additional experimental data. To this end, we profiled available eye fluid biopsies and paired plasma by Olink proteomics to measure 92 immune mediators from patients and controls from this study (and several additional samples, including aqueous humor from non-inflammatory cataract controls – see revised Figure 5 panel D). This analysis shows that cytokines produced by CD36+CX3CR1+ DCs such as TNF-alpha and IL-6 are specifically increased in eye tissue of patients, but not in blood. We hope the reviewer agrees that we have provided additional experimental data that links the functional differences in DC subsets to cytokines implicated in the pathogenesis of non-infectious uveitis.

      Reviewer #3 (Public Review):

      First, a note on nomenclature. The authors use the term 'auto-immune' uveitis to encapsulate three different conditions -- HLA-B27 anterior uveitis, idiopathic intermediate uveitis, and birdshot choroidopathy. While I would agree with this terminology for the third set, there is substantial controversy as to whether HLA-B27 is truly autoimmune or autoinflammatory. Indeed, one major hypothesis is that this condition is driven by changes in gut microbiome. Intermediate uveitis is even more problematic; a substantial number of cases of this condition will turn out to be associated with demyelinating disease, which has recently been linked to Epstein Barr virus disease. To my knowledge in none of these diseases has a definitive autoantigen been identified nor passive transfer via transfusion shown; I would suggest the authors abandon this terminology and simply refer to the conditions as they are called.

      We would like to thank the reviewer for the constructive suggestions. We agree and have changed the term “autoimmune uveitis” to “non-infectious uveitis” throughout the manuscript.

      Further, it would have been very desirable to compare the DC transcriptome for the other class of uveitic disease -- infectious -- for acute retinal necrosis or similar. As well it would have been very useful to compare profiles to other, related immune-mediated diseases such as ankylosing spondylitis.

      We agree with the reviewer that comparison of DC transcriptomes is useful for interpretation of biological mechanisms involved. This is precisely the reason we use (in Figure 3) comparison of our DC transcriptomic data to well-controlled transgenic models and DC culture systems. This revealed NOTCH2-RUNX3 signaling driving the uveitis-associated CD1c+ DC signature. We have now included transcriptomic data from CD1c+ DC subsets of type I IFN diseases SLE and Systemic Sclerosis in Figure 2. Although we agree that comparison to infectious uveitis would be interesting, bulk RNA-seq data from CD1c+ DCs are – to the best of our knowledge – unfortunately not available.

      Finally, it must be noted that looking for systemic signals in dendritic gene expression may be a bit of a needle in the haystack approach. Presumably, the function of the dendritic cells in uveitis is largely centered on those cells in the eye. It would have been highly desirable to examine the expression profile of intraocular DCs in at least a subset of patients who may have come to surgery (for instance, steroid implantation or vitrectomy).

      We agree with the reviewer that analysis of blood requires enormous efforts and controls to dissect disease-relevant changes in gene profiles of cDC2 subsets. We therefore designed a strategy that focusses on replication of gene modules, use independent cohorts, and complementary immunophenotyping technologies to detect key changes in specific subsets of CD1c+ DCs in uveitis patients. To further extend these analyses, we have now also detailed our analysis of intraocular DCs using single-cell RNA seq of eye fluid biopsies (aqueous humor) of HLA-B27 anterior uveitis (identical to our “AU” group of patients). As shown in revised Figure 6, we detected eye-infiltrating CD1c+ DCs and were able to cluster cells positive for the uveitis-associated black module (revised Figure 6B), which showed – as expected - that “black-module+” CD1c+ DCs show higher expression for CD36, CX3CR1, and lower RUNX3, but not CD14 (revised Figure 6C)– closely corroborating our blood CD1c+ DC analyses. These DC3s were also found at higher frequency in the eye of patients with AU (Figure 6D). We hope the reviewer agrees we have sustainably improved the analysis of intraocular DCs and that this has now been sufficiently addressed.

      It is also problematic that no effort has been made to assess the severity of uveitis. Flares of disease can range from extremely mild to debilitating. Similarly, intermediate uveitis and BSCR can range greatly in severity. Without normalizing for disease severity it is difficult to fully understand the range of transcriptional changes between cases.

      In our view, a key limitation in determination of uveitis severity for molecular analysis is the fact that objective biomarkers that assess disease severity across uveitis entities are lacking. Currently, disease severity is dependent an array of clinical features (i.e, SUN criteria) which cannot be applied consistently to anterior, intermediate and posterior uveitis. For example, the severity of anterior uveitis is in part assessed by grading of inflammation in the anterior chamber, while the anterior chamber is (typically) not involved in Birdshot Uveitis (BU in this study). However, to allow the study of patients with high disease activity, we exclusively used systemic treatment-free patients that all had active uveitis at sampling at our academic institute, making the results highly relevant for understanding the pathophysiology of non-infectious uveitis. For this reviewer’s convenience, we have conducted additional analysis that includes key clinical parameters (anterior chamber cells, vitreous cells, and macular thickness for patients from cohort I). These data showed no clear clustering of patients based on any of the clinical parameters (revised Figure 1 -Supplement 2). We hope the reviewer agrees this has been addressed in sufficient detail.

      The use of principal component analysis for clustering may be underpowered; I would suggest the authors apply UMAP to determine if higher dimensional component analyses correlate with disease type.

      Upon request of the reviewer, we have conducted UMAP (with different tuning of hyperparameters) on the DEGs (cohort I, see image below). We believe that UMAP analysis did not provide additional insights or correlates with disease type. We hope the reviewer agrees.

      The false-discovery rate in large transcriptomic projects is challenging. While the authors are to be commended for employing a validation set, it would be useful to employ a Monte Carlo simulation in which groups are arbitrarily relabeled to determine the number of expected false discoveries within this data set (i.e. akin to Significance Analysis of Microarray techniques).

      We determined the adjusted P values via the DESeq2 package (for false-discovery rate of 5% and Benjamini-Hochberg Procedure). The results are shown in Supplemental File 1K-1M and analysis in Figure 1A.

      I do not fully understand the significance of the mouse CD11c-Runx3delta mice. It appears these data were derived from previous datasets or from bone marrow stromal line cultures. Did the authors attempt to generate autoimmune uveitis (i.e. EAU) in these animals? Without this the relevance for uveitis is unclear.

      We did not attempt to induce experimental autoimmune uveitis in CD11c-Runx3delta mice. We used transcriptomic data from dendritic cells purified from this model to show that loss of RUNX3 induces a gene signature highly reminiscent of the gene module identified in non-infectious uveitis patients. Using enrichment analysis, we show that the transcriptome of patients is highly enriched for this signature which indicates that the decreased RUNX3 observed in patients underlies the upregulation of CD36, CX3CR1 and other surface genes. In other words, we used data from transgenic models to dissect which of the altered transcription factors were driving this gene module and we identified the RUNX3-NOTCH2 axis as an important contributor.

    1. Author Response

      Reviewer #1 (Public Review)

      The documented findings may be explained by the artifact of task design and the way the signals were calculated: The vmPFC was the only ROI for which a positive correlation was found between BGA and mood rating and TML. Instead, most other regions showed negative correlation (inlc da-Insula, dorsolateral prefrontal cortex, the visual cortex, the motor cortex, the dorsomedial premotor cortex, the ventral somatosensory cortex, and the ventral inferior parietal lobule). This can be purely an artifact of task itself: In 25% of mood rating trials, subjects were presented with a question. They had to move the cursor from left (very bad) to the right (very good) along a continuous visual analog scale (100 steps) with left and right-hand response buttons. They even got a warning if they were slow. In 75% of trials, subjects saw none of this and the screen was just blank and the subjects rested.”

      1) First of all, it is unclear if the 25% and 75% trials were mixed. I am assuming that they were not mixed as that could represent a fundamental mistake. The manuscript gives me the impression that this was not done (please clarify).

      If by 25% and 75% trials the Reviewer means rating and no-rating trials then yes, they were intermixed (following on Vinckier et al. 2018). As explained in the initial manuscript, mood was rated every 3-7 trials (for a total of 25% of trials), and we used a computational model to interpolate mood (i.e., theoretical mood level) for the trials in between. This was implemented to avoid sampling mood systematically after every feedback and to test whether vmPFC and daIns represents mood continuously or just when it must be rated. We do not see how this could represent a fundamental mistake. Note that the associations between BGA and mood hold whether we use only rating trials, or only no-rating trials, or both types of trials.

      To better explain how ratings and feedbacks were distributed across trials, we have added a supplementary figure that shows a representative example (Figure S1). This plot shows that ratings were collected independently of whether subjects were in high- or low-mood episodes. In other words, the alternance between rating and no-rating trials was orthogonal to the alternance between low- and high-mood episodes.

      2) Assuming that they were not mixed and we are seeing the data from 75% of trials only. These trials would trigger increased BGA activity in the default mode areas such as the vmPFC, and opposite patterns in the salience, visual and motor areas. Hence the opposite correlations. The authors should just plot BGA activity across regions during rest trials and see if this was the case. That would provide a whole different interpretation.

      Even if there were opposite correlations induced by the alternance between rating and no-rating trials, they would be orthogonal to mood fluctuations induced by positive and negative feedbacks. There is no way these putative opposite correlations could confound the correlation between BGA and mood, when restricted for instance to rating trials only. Anyway, what data show is not an opposite correlation between vmPFC and daIns (see figure R1 below) but that these two regions, when included as competing regressors in a same model, are both significant predictors of mood level. This could not be the case if vmPFC and daIns activities were just mirror reflections of a same factor (alternance of rating and no-rating trials).

      We agree on the argument that performing a task may activate (increase BGA in) the daIns and deactivate (decrease BGA in) the vmPFC, but this average level of activity is not relevant for our study, which explores trial-to-trial fluctuations. It would only be problematic if the alternance between rating and no-rating trials was 1) correlated to mood levels and 2) inducing (anti)correlations between vmPFC and daIns BGA. The first assumption is false by construction of the design, as explained above, and the second assumption is empirically false, as shown below by the absence of correlation between daIns and vmPFC BGA. For each trial, we averaged BGA during the pre-stimulus time window (-4 to 0s) and tested the correlation between all possible pairs of vmPFC and daIns recording sites implanted in a same subject (n = 247 pairs of recording sites from 18 subjects). We observed no reliable correlation between the two brain regions, whether including only rest (no-rating) trials, only rating trials, or all trials together (see figure R1 below). On the contrary, the positive correlation between mood and vmPFC, as well as the negative correlation between mood and daIns, was observed in all cases (whether considering rest, rating, or all trials together).

      Figure R1: Correlation between vmPFC and daIns activities. Bars show the correlation coefficients, averaged across pairs of recording sites, obtained when including all trials, only rest trials (no rating), or only mood-rating trials. The p-values were obtained using a two-sided, one-sample Student’s t-test on Fisher-transformed correlation coefficients. Note that performing the same analysis across subjects (instead of recording sites) yields the same result.

      3) In addition, it is entirely unclear how the BGA in a given electrode was plotted. How is BGA normalized for each electrode? What is baseline here? Without understanding what baseline was used for this normalization, it is hard to follow the next section about the impact of the intracerebral activity on decision-making.

      The normalization we used is neutral to the effect of interest. Details of BGA computation are given in the Methods section (lines 746-751):

      “For each frequency band, this envelope signal (i.e., time varying amplitude) was divided by its mean across the entire recording session and multiplied by 100. This yields instantaneous envelope values expressed in percentage (%) of the mean. Finally, the envelope signals computed for each consecutive frequency band were averaged together to provide a single time series (the broadband gamma envelope) across the entire session. By construction, the mean value of that time series across the recording session is equal to 100.”

      Then, BGA was simply z-scored over trials for every recording site. Thus, there was no baseline correction in the sense that there was no subtraction of pre-stimulus activity. We agree this would have been problematic, since we were precisely interested in the information carried by pre-stimulus activity. By z-scoring, we took as reference the mean activity over all trials.

      We added the following sentence in the Methods section (lines 755-756):

      “BGA was normalized for each recording site by z-scoring across trials.”

      4) line 237: how was the correction for multiple comparisons done? Subject by subject, ROI by ROI, electrode by electrode? Please clarify.

      The correction for multiple comparisons was done using a classic cluster-based permutation test (Maris & Ostenweld, 2007, J. Neurosci. Methods) performed at the level of ROI.

      We have updated the section detailing this method in the manuscript (lines 807-818), as follows:

      “For each ROI, a t-value was computed across all recording sites of the given ROI for each time point of the baseline window (-4 to 0 s before choice onset), independently of subject identity, using two-sided, one-sample, Student’s t-tests. For all GLMs, the statistical significance of each ROI was assessed through permutation tests. First, the pairing between responses and predictors across trials was shuffled randomly 300 times for each recording site. Second, we performed 60,000 random combinations of all contacts in a ROI, drawn from the 300 shuffles calculated previously for each site. The maximal cluster-level statistics (the maximal sum of t-values over contiguous time points exceeding a significance threshold of 0.05) were extracted for each combination to compute a “null” distribution of effect size across a time window from -4 to 0 s before choice onset (the baseline corresponding to the rest or mood assessment period). The p-value of each cluster in the original (non-shuffled) data was finally obtained by computing the proportion of clusters with higher statistics in the null distribution, and reported as the “cluster-level corrected” p-value (pcorr).”

      Reviewer #2 (Public Review)

      “This study used intracranial EEG to explore links between broad-band gamma oscillations and mood, and their impact on decisions. The topic is interesting and important. A major strength is the use of intracranial EEG (iEEG) techniques, which allowed the authors to obtain electrical signals directly from deep brain areas involved in decision making. With its precise temporal resolution, iEEG allowed the authors to study activity in specific frequency bands. While the results are potentially interesting, one major concern with the analysis procedure-specifically grouping of all data across all subjects and performing statistics across electrodes instead of across subjects-reduces enthusiasm for these findings. There is also a question about how mood impacts attentional state, which has already been shown to impact baseline (pre-stimulus) broad band gamma.”

      Major comments

      1)The number of subjects with contacts in vmPFC, daIns, and both vmPFC and daIns should be stated in the manuscript so the reader doesn't have to refer to the supplementary table to find this information.

      These details have been added to the Results section (lines 236-242 and 258-262), as follows:

      “The vmPFC (n = 91 sites from 20 subjects) was the only ROI for which we found a positive correlation (Figure 2b; Source data 1; Table S2) between BGA and both mood rating (best cluster: -1.37 to -1.04 s, sum(t(90)) = 122.3, pcorr = 0.010) and TML (best cluster: -0.57 to -0.13 s, sum(t(90)) = 132.4, pcorr = 8.10-3). Conversely, we found a negative correlation in a larger brain network encompassing the daIns (n = 86 sites from 28 subjects, Figure 2b; Source data 1; Table S2), in which BGA was negatively associated with both mood rating (best cluster: -3.36 to -2.51 s, sum(t(85)) = -325.8, pcorr < 1.7.10-5) and TML (best cluster: -3.13 to -2.72 s, sum(t(85)) = -136.4, pcorr = 9.10-3). (…) In order to obtain the time course of mood expression in the two ROIs (Figure 2c), we performed regressions between TML and BGA from all possible pairs of vmPFC and daIns recording sites recorded in a same subject (n = 247 pairs of recording sites from 18 subjects, see Methods) and tested the regression estimates across pairs within each ROI at each time point.”

      2) Effects shown in figs 2 and 3 are combined across subjects. We don't know the effective sample size for the comparisons being made, and the effects shown could be driven by just a few subjects. If the authors compute trial-wise regressions between mood and BGA for each subject, and then perform the statistics across subjects instead of across electrodes, do these results still pan out?

      Yes, we have redone the analyses at the group level to get statistics across subjects (see response to essential revisions). All main results remained significant or borderline. In these group-level random-effect analyses, data points are subject-wise BGA averaged across recording sites (within the temporal cluster identified with the fixed-effect approach). We have incorporated these analyses into the manuscript as a supplementary table (Table S4). However, these statistics across subjects are less standard in the field of electrophysiology, as they are both underpowered and unadjusted for sampling bias (because the same weight is given to subjects with 1 or 10 recording sites in the ROI), so we prefer to keep the usual statistics across recording sites in the main text.

      These analyses have been incorporated into the Results section (lines 355-357), as follows:

      “We also verified that the main findings of this study remained significant (or borderline) when using group-level random-effects analyses (Table S4, see methods), even if this approach is underpowered and unadjusted for sampling bias (some subjects having very few recording sites in the relevant ROI).”

      The methods section has also been edited, as follows (lines 831-835):

      “To test the association between BGA and mood, TML or choice at the group level (using random-effects analyses), we performed the same linear regression as described in the electrophysiological analyses section on BGA averaged over the best time cluster (identified by the fixed-effects approach) and across all recording sites of a given subjects located in the relevant ROI. We then conducted a two-sided, one-sample Student's t-test on the resulting regression estimates (Table S4).”

      3) Furthermore, how many of the subjects show statistically significant regressions between BGA and mood at any electrode? For example, the error bars in fig 2b are across electrodes. How would this figure look if error bars indicated variance across subjects instead?

      Depending on the metrics (mood rating or theoretical mood level), statistically significant regressions between BGA and mood was observed in 4 to 6 subjects for the vmPFC and 5 to 9 subjects in the daIns. We provide these numbers to satisfy the Reviewer’s request, but we do not see what statistical inference they could inform (inferences based on number of data points above and below significance threshold are clearly wrong). To satisfy the other request, we have reproduced below Fig. 2B with error bars indicating variance across subjects and not recording sites (Figure R2). Again, to make an inference about a neural representation at the population level, the relevant samples are recording sites, not subjects. All monkey electrophysiology studies base their inferences on the variance across neurons (typically coming from 2 or 3 monkeys pooled together).

      Figure R2: Reproduction of Figure 2B with lower panels indicating mean and variance across subjects instead of recording sites (upper panels). Blue: vmPFC, red: daIns. Bold lines indicate significant clusters (p < 0.05).

      4) In panel f, we can see that a large number of sites in both ROIs show correlations in the opposite direction to the reported effects. How can this be explained? How do these distributions of effects in electrodes correspond to distributions of effects in individual subjects?

      In our experience, this kind of pattern is observed in any biological dataset, so we do not understand what the Reviewer wants us to explain. It is simply the case for any significant effect across samples, the distribution would include some samples with effects in the opposite direction. If there were no effects in the opposite direction, nobody would need statistics to know whether the observed distribution is different from the null distribution. In our case, the variability might have arisen from different sources of noise (in mood estimate, in BGA recording, in stochastic fluctuations of pre-stimulus activity, in the link between mood and BGA that may be depends on unknown factors, etc.) This variability has been typically masked because until recently, effects of interest were plotted as means with error bars. The variability is more apparent when plotting individual samples, as we did. It is visually amplified by the fact that outliers are as salient as data points close to the mean, which are way more numerous but superimposed. We have replotted below the panel f with data points being subjects instead of recording sites (Figure R3).

      Figure R3: Reproduction of Figure 2F with lower panels showing the distribution, of regression estimates over subjects instead of recording sites (upper panels). Blue: vmPFC, red: daIns. Note that this is the only analysis which failed to reach significance using a group-level random-effect approach. This is not surprising as this approach is underpowered (perhaps in particular for this analysis over a [-4 to 0 s] pre-choice time window) and unadjusted for sampling bias (some subjects having very few recording sites in the relevant ROI).

      5) Baseline (pre-stimulus) gamma amplitudes have been shown to be related to attentional states. Could these effects be driven by attention rather than mood? The relationship between mood and decisions may be more complex than the authors describe, and could impact other cognitive factors such as attention, which have already been shown to impact baseline broad-band gamma.

      We agree with the Reviewer that the relationships between mood and decisions are certainly more complex in reality than in our model, which is obviously a simplification, as any model is. We also acknowledge that pre-stimulus gamma activity is modulated by fluctuations in attention. However, what was measured and related to BGA in our study is mood level, so it remains unclear what reason could support the claim that the effects may have been driven by attention. A global shift in attentional state (like being more vigilant when in a good or bad mood) would not explain the specific effects we observed (making more or less risky choices). If the Reviewer means that subjects might have paid more attention to gain prospects when in a good mood, and to loss prospects when in a bad mood, then we agree this is a possibility. Note however that the difference between this scenario and our description of the results (subjects put more weight on gain/loss prospect when in a good/bad mood) would be quite subtle. We have nevertheless incorporated this nuance in the discussion (lines 494-496):

      “This result makes the link with the idea that we may see a glass half-full or half-empty when we are in a good or bad mood, possibly because we pay more attention to positive or negative aspects.”

      6) The authors used a bipolar montage reference. Would it be possible that effects in low frequencies are dampened because of the bipolar reference instead of common average reference?

      This is unlikely, because the use of a common average reference montage has been shown to significantly increase the number of channels exhibiting task-related high-frequency activity (BGA), but not the number of channels exhibiting task-related low-frequency activity (see Li et al., 2018, Figure 5A-B). In addition, using a monopolar configuration would also have the disadvantage of significantly increasing the correlations between channels (compared to a bipolar montage). This would have therefore artificially induced task-related effects in other channels due to volume conduction effects (Li et al., 2018; Mercier et al., 2017).

      Reviewer #3 (Public Review):

      In this interesting paper, Cecchi et al. collected intracerebral EEG data from patients performing decision-making tasks in order to study how patient's trial-by-trial mood fluctuations affect their neural computation underlying risky choices. They found that the broadband gamma activity in vmPFC and dorsal anterior Insula (daIns) are distinctively correlated with the patient's mood and their choice. I found the results very interesting. This study certainly will be an important contribution to cognitive and computational neuroscience, especially how the brain may encode mood and associate it to decisions.

      Major comments

      1) The authors showed that the mood is positively correlated in vmPFC on high mood trials alone and negatively correlated daIns in low mood trials alone. This is interesting. But those are the trials in which these regions' activity predict choice (using the residual of choice model fit)?

      This is an excellent point. The intuition of Reviewer 3 was correct. To test it, we performed a complementary analysis in which we regressed choice (model fit residuals) against BGA, separately for low vs. high mood trials (median-split). This analysis revealed that in the vmPFC, BGA during high mood trials positively predicted choices whereas in the daIns, BGA during low mood trials negatively predicted choices.

      We have added the following paragraph in the Results section (lines 328-337):

      “Taken together, these results mean that vmPFC and daIns baseline BGA not only express mood in opposite fashion, but also had opposite influence on upcoming choice. To clarify which trials contributed to the significant association between choice and BGA, we separately regressed the residuals of choice model fit against BGA across either high- or low-mood trials (median split on TML; Figure 3b). In the vmPFC, regression estimates were significantly positive for high-mood trials only (high TML = 0.06 ± 0.01, t(90) = 5.64, p = 2.10-7; two-sided, one-sample, Student’s t-test), not for low-mood trials. Conversely, in the daIns, regression estimates only reached significance for low-mood trials (low TML = -0.05 ± 0.01, t(85) = -4.63, p = 1.10-5), not for high-mood trials. This double dissociation suggests that the vmPFC positively predicts choice when mood gets better than average, and the daIns negatively predicts choice when mood gets worse than average.”

      Also, Figure 3 has been modified accordingly.

      2) It would be helpful to see how high-mood trials and low-mood trials are distributed. Are they clustered or more intermixed?

      We thank the Reviewer for the suggestion. To provide a more detailed view on how feedback history shaped mood ratings and TML, we added a supplementary figure that shows a representative example (Figure S1).

      3) I am not sure how I should reconcile the above finding of the correlation between mood and BGA on high-mood vs. low-mood trials, and the results about how high vs. low baseline BGA predict choice. I may have missed something related to this in the discussion section, but could you clarify?

      Following the Reviewer’s suggestion, we now demonstrate that the vmPFC positively predicts choice when mood gets better than average, and the daIns negatively predicts choice when mood gets worse than average (see response to first point).

      To clarify this, we have added the following paragraph in the discussion (lines 461-469), and a schematic figure summarizing the main findings (Figure 4).

      “Choice to accept or reject the challenge in our task was significantly modulated by the three attributes displayed on screen: gain prospect (in case of success), loss prospect (in case of failure) and difficulty of the challenge. We combined the three attributes using a standard expected utility model and examined the residuals after removing the variance explained by the model. Those residuals were significantly impacted by mood level, meaning that on top of the other factors, good / bad mood inclined subjects to accept / reject the challenge. The same was true for neural correlates of mood: higher baseline BGA in the vmPFC / daIns was both predicted by good / bad mood and associated to higher accept / reject rates, relative to predictions of the choice model. Thus, different mood levels might translate into different brain states that predispose subjects to make risky or safe decisions (Figure 4).”

    1. Author Response

      Reviewer #1 (Public Review):

      As we lack empirical data of the response of most species to environmental changes, developing predictive tools based on traits that are easier to access or infer may help us developing better management tools. This is the case even for terrestrial mammals, a rather well studied group but with a large study bias towards temperate Europe and North America. This study uses maximum longevity, litter size and body mass to predict the sign and size of the relationships between annual temperature and precipitation anomalies and population growth rates, using the Living Planet database for times series of abundance and Chelsa for weather anomalies. The authors use a Bayesian framework to relate the size and absolute magnitude of the relationships between detrended population growth rates and weather anomalies, the framework accounting for the uncertainty in estimates as well as phylogenetic dependencies. They did not find any systematic effects -- on average the slopes of the relationships were close to 0 -- but the magnitude of the coefficients decreases for species with high maximum longevity and low litter size. Therefore, this study points to possible predictions of the magnitude of the response to weather variability using simple demographic indices such as longevity and litter size. The study has clear limitations that are common to similar "meta-regressions" using publicly available databases, but they are not ignored when discussing the results. One would hope that such limitations would lead to improving the quality of such databases, both in terms of taxonomic and geographic coverage as well as quality of data.

      We would like to thank Reviewer 1 for their overall positive feedback and constructive comments on the method and our predictions. We have now included complementary analyses based on high-quality subsets (≥ 20-year records; using life history traits estimated from structured population models), have clarified our set of hypotheses and discussed our results accordingly. Detailed responses are given below.

      I would like to challenge the authors in terms of why one would expect relationships of a given sign or magnitude. First with respect to sign of relationships, even for the same species and the same weather parameters, one could expect different signs depending on where the study is done with regards to the climatic niche. If one is close to the warm (or wet) edge, any positive temperature (or precipitation) anomalies would probably have a negative effect, but the reverse would happen when close to the cold or dry edge. There are studies showing such demographic and growth rate variability differences. I find therefore hard to interpret the sign of such weather anomalies and what it tells us about the "effect" of weather variability.

      We think that this is an important point to discuss with respect to the importance of within-species variability in population dynamics. Certainly, from the results L203-206 it is clear that populations of the same species can have responses of differing signs. It is also interesting to note that this may be the result of a population’s position in the climatic niche. However, aside from exploring this for species with long-term demographic monitoring across the range, we do not feel that exploring this was in the scope of the current study across species. We agree fully however that adding this perspective to studies of how populations are responding to changing climates is critical. As well as the paper mentioned below by Gaillard et al. (2013), recent work in Plantago lancelota with extensive spatial replication has also begun to reveal these within-range dynamics as a function of latitudinal or climatic gradients (Römer et al. 2021). We have added further discussion of this to the manuscript L330-340. We believe that this point adds to the context of our results highlighting variability within-species. In addition, we have clarified in the introduction that no clear directional responses of populations to weather anomalies was expected among and within species L133-135.

      Römer, G., Christiansen, D. M., de Buhr, H., Hylander, K., Jones, O. R., Merinero, S., ... & Dahlgren, J. P. (2021). Drivers of large‐scale spatial demographic variation in a perennial plant. Ecosphere, 12(1), e03356.

      Second with regards to the magnitude, it is clear that the maximum growth rate is strongly linked to maximum longevity and litter size -- slow species have a much lower maximum rate of growth than fast species. So, one would expect that variability of population growth rates is larger in fast species than slow species, and therefore the magnitude of their response to environmental variability. Now the question might also be whether weather variability explains a smaller or larger proportion of the variability in population growth rates -- that is, does weather have a relatively larger influence in fast species than slow species? You might have the answer but with the multiple standardizations of the response and predictor variables it is not obvious (that is, when you standardize the response and predictor variables, coefficients are correlations, but this is across species, not for a given population).

      The reviewer raises a very interesting and important point on whether the patterns we observe are simply a result of larger variability in growth rates in short-lived species. We have two responses to this point: 1) while there is indeed larger variation in the population growth rates of short-lived species, we believe that this variability is likely an evolved life-history strategy in response to the environment, and thus a key component of patterns we observe, 2) we also feel that our use of models that included annual effects, and state-space models with explicit process-noise terms, account for any confounding effect of this variation.

      To address the first point in more detail, we expect that life-histories (and thus population dynamics) are evolved responses to the environment (Stearns, 1992). For ‘fast’ organisms therefore, their intrinsic life-history strategy results in boom-bust population dynamics relative to ‘slow’ species. This is clearly observable in transient or non-asymptotic dynamics, where short-lived species more often have short-term population dynamics with a greater magnitude (Stott et al. 2011). On this point, we therefore argue that this variation in population growth is part of what we are trying to capture. Anomalies in the weather are therefore expected to act more strongly in ‘fast’ species. Following this point and the comments of Reviewer #3, we have now included more explicit hypotheses in terms of life-history L133-144.

      For the second point, while we may expect this variability to be the result of dynamics we are trying to capture, this does not preclude other sources of variation in population size confounding the patterns we could observe. For example, hunting pressure may influence both short-term population variability and long-term trends. As a result, we aimed to capture this residual variation using auto-regressive terms for year in our GAMs. While these terms do not explicitly model variability in population growth, they do account for a component of the trend, with variation (error around the trend, which is expected to be larger for fast species), and auto-regressive components of population change. Moreover, we did additional analyses using a state-space modelling approach. In the state-space approach, process noise, which in our case would equate to variability in population growth, is explicitly modelled and accounted for. We therefore believe that our analyses account for residual variability in population growth rates. State space models were also highly correlated with our auto-regressive GAMs, and we can therefore conclude that we do not expect that this variability influences our findings. We have now asserted this in the Methods section L531-535.

      Stearns, S.C., 1992. The evolution of life histories (No. 575 S81).

      Stott, I., Townley, S. and Hodgson, D.J., 2011. A framework for studying transient dynamics of population projection matrix models. Ecology Letters, 14(9), pp.959-970.

      Your analyses remove trends -- that is, climate or other systematic change as opposed to weather anomalies (yearly differences) -- and trends might be the main concerns in terms of conservation. This is made clear in the discussion but perhaps not as much in the introduction where you seem to focus on climate change (the title reflects this well, however, as you mention weather, not climate). This confusion between weather and climate is often made in the literature, when reference is made to climate effects rather than weather effects.

      We agree with the reviewer that climate and weather are often conflated in ecological studies. We apologise for this oversight in the introduction, and agree that the narrative and link to weather was not made explicit in the previous version. Following this point and the suggestions of Reviewer #3, we have now restructured large sections of the introduction to improve the clarity of our hypotheses. To address this point, we have now included specific introduction of different components of climate that species populations may respond to, including short-term extreme weather patterns as we explore in this study. Please find this revised section L80-97.

      Finally, I would like to see a measure of how good is the prediction you can make using traits. You may have "significant effects" but not helping much in terms of prediction (see PB Adler et al. 2011 in Science, for an example with species richness and productivity).

      On this point we disagree with the reviewer. The core of our analysis framework was to examine the predictive performance of models. We do not report any significant effects, and instead use Bayesian inference. Throughout the analysis framework, we used explicit tests of out-of-sample predictive performance with leave-one-out cross validation (Vehtari et al. 2017). This is asserted in the manuscript title and results section when introducing our spatial analysis L188-191. Cross validation was combined with model selection to test the predictive performance of a set of candidate models with respect to base models excluding predictors of interest. This predictive performance framework was not applied to examine the directional effects (question 1), as these models did not contain key predictors. However, model selections using predictive performance were done throughout questions 2 and 3, to explore spatial and life-history effects. We highlight this point in both the results L188-191 and methods sections L608-615. In the case of life-history, we found that relative to the base model, out-of-sample predictions were improved when including univariate life-history traits relative to the base model, and thus life-history traits aid in predicting weather responses.

      We did not explore the relative predictive performance of life-history traits with respect to other traits such as dietary specialisation, which have been shown to be important in climate responses (Pacifici et al. 2017). We believe that this would have been out of scope for the purpose of the current study, where we aimed to test specific hypotheses established in life-history theory.

      Pacifici, M., Visconti, P., Butchart, S.H., Watson, J.E., Cassola, F.M. and Rondinini, C., 2017. Species’ traits influenced their response to recent climate change. Nature Climate Change, 7(3), pp.205-208.

      Vehtari, A., Gelman, A. and Gabry, J., 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, 27(5), pp.1413-1432.

      Reviewer #2 (Public Review):

      Jackson et al. present a global analysis of the effects of life history on the response of terrestrial mammal populations to weather, showing that litter size and longevity significantly alter how populations respond to anomalies in temperature and rainfall. The topic is highly interesting, as it has implications for what data we should monitor to make more reliable predictions about species' responses to climatic change, and how we should prioritise which species to conserve by identifying those which might be at greatest risk.

      The authors comprehensively validate their results with substantial secondary analyses, and I believe that their assertions are supported by the results presented here. Whilst global scale analyses such as this provide useful generalities, they should be taken as that: an investigation of the general trends observed across large spatial scales, and caution should be taken extrapolating too far away from the species which have been analysed for this study.

      We thank the reviewer for their positive feedback, and agree with not drawing too many generalities from our findings. In the first paragraph of the discussion L253-262, we now explicitly refer to the results in the context of mammal population-dynamics/conservation.

      Reviewer #3 (Public Review):

      In this study, the authors aim to investigate how mammalian species are likely to respond to climate change. To this end, they investigate the effects of weather anomalies on the growth rates of mammalian populations. They use long-term population records for 157 terrestrial mammals from the Living Planet database. They explore three different questions using a two-step modelling approach: (1) whether temperature and precipitation anomalies have significant effects on population growth rates across species; (2) whether responses differ among species and biomes; and (3) whether life-history traits explain species responses to weather anomalies.

      The work undertaken in this manuscript is of broad appeal in the field and has the potential to inform conservation. Overall, the methodology is sound and the modelling framework robust; the authors took care to test the robustness of their models by fitting alternative sets of models. The two-step design of this study is interesting and the choice of the study system is relevant for the questions the authors aim to tackle. The authors also paid attention to some important points that are at times overlooked such as resolving taxonomy before running their analyses. I also appreciated the fact that the authors made their code available.

      We thank the reviewer for their positive feedback on the manuscript, which highlights many of our key goals with the paper.

      I nevertheless think that, in its present form, the main weakness of this manuscript is the clarity of the writing, the framing of the study and the overall flow. I found the manuscript at times a bit difficult to follow. That said, I think there is much scope for the authors to improve it. First, I think the work would benefit from better explanation of the underlying hypotheses. Second, in some places I think the authors go into a lot of details at the expense of clarity. As such, I think the authors should strive to better balance clarity with detailed information (notably in the results and methods; adding summary sentences, for example, could help clarify these sections). Third, I think there is room for improvement in the narrative and the flow of the introduction and the discussion. Finally, I think stronger justifications are sometimes required regarding specific points of the analysis.

      I believe that the conclusions of this work are supported by the data and the analyses, and think they are of interest and relevant to the field. However, I think the discussion should highlight the main limitations of the study. In particular, I think the biases in the data should be discussed, and notably whether these biases are expected to affect the results (and if so, in what way).

      To conclude, I think that beyond the aforementioned weaknesses of this study, the results and the methods are of interest for the field. I think the modelling framework is applicable to other study systems and relevant to the field as well.

      We warmly thank the reviewer for their positive words and thorough constructive feedback. We have extensively re-worked large sections of the manuscript (particularly the discussion and introduction) based on these points, and done our best to address all of them. Generally, we have strived to improve the clarity and succinctness of the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Guggenmos proposes a process model for predicting confidence reports following perceptual choices, via the evidence available from stimuli of various intensities. The mechanisms proposed are principled, but a number of choices are made that should be better motivated - I develop below a number of concerns by order of importance.

      I’d like to thank the reviewer for their thorough and excellent review. It’s no set phrase that this review substantially improved the manuscript.

      1) Lack of separability of the two metacognitive modules.

      Can the author show that the proposed model can actually discriminate between the noisy readout module and the noisy report module? The two proposed modules have a different psychological meaning, but seem to similarly impact the confidence output. Are these two mutually exclusive (as Fig 1 suggests), or could both sources of noise co-exist? It will be important to show model recovery for introducing readout vs. report at the metacognitive level, e.g., show that a participant best-fitted by a nested model or a subpart of the full model, with a restricted number of modules (some of the parameters set to zero or one), is appropriately recovered? (focusing on these two modules) This raises the question of how the two types of sigma_m are recoverable/separable from each other (and should they both be called sigma_m, even if they both represent a standard deviation)? If they capture independent aspects of noise, one could imagine a model with both modules. More evidence is needed to show that these two capture separate aspects of noise.

      Testing the separability of the two noise types (readout, report) is a great idea and I have now performed a corresponding recovery analysis. Specifically, I have simulated data with both noise types for different regimes of sensory and metacognitive noise. As shown in the new Figure 7—figure supplement 6, the noise type can be precisely recovered in the most typical regimes.

      I now refer to this analysis in the subsection 2.4 Model recovery (Line 521ff):

      “One strength of the present modeling framework is that it allows testing whether inefficiencies of metacognitive reports are better described by metacognitive noise at readout (noisy-readout model) or at report (noisy-report model). To validate this type of application, I performed an additional model recovery analysis which tested whether data simulated by either model are also best fitted by the respective model. Figure 7—figure supplement 6 shows that the recovery probability was close to 1 in most cases, thus demonstrating excellent model identifiability. With fewer trials per observer, recovery probabilities decrease expectedly, but are still at a very good level. The only edge case with poorer recovery was a scenario with low metacognitive noise and high sensory noise. Model identification is particularly hard in this regime because low metacognitive noise reduces the relevance of the metacognitive noise source, while high sensory noise increases the general randomness of responses.”

      In principle, both noise modules can co-exist and model inversion should be possible (though mathematically more complicated). On the other hand, I anticipate that parameter recovery would be extremely noisy in such a scenario. For this work, I decided to not test this possibility as it would add a lot of complexity, with a high probability of ultimately being unfeasible.

      2) The trade-off between the flexibility of the model (modularity of the metacognitive part, choice of the link functions) and the generalisability of the process proposed seems in favor of the former. Does the current framework really allow to disambiguate between the different models? Or at least, the process modeled is so flexible that I am not sure it allows us to draw general conclusions? Fig 7 and section 3 of the results explain that all models are similar, regardless of module of functions specified; Fig 7 supp shows that half of participants are best fitted by noisy readout, while the other half is best fitted by noisy report; plus, idiosyncrasies across participants are all captured. Does this compromise the generalisability of the modeling of the group as a whole?

      This is a fair point and I understand the question has two components: a) is the model too flexible, potentially preventing generalized conclusions? b) is the flexibility of the model recoverable?

      Regarding a), I should emphasize that the manuscript (and toolbox) provides a modeling framework, rather than a single specific model. In other words, researchers applying the framework/toolbox must make a number of decisions: which noise type? which metacognitive biases should be considered? which link function? To ensure interpretability / generalizability, researchers have to sufficiently constrain the model. Due to this framework character, it makes sense that the manuscript is submitted under the Tools & Resources Article format rather than the Research Article format.

      On the other hand, I agree that it is the duty of the manuscript introducing the framework to provide all necessary information to help the researcher make these decisions. This is where the reviewer’s point b) is critical and I hope that with the new parameter and model recovery analyses in the present revision (see other comments) I meet this requirement to a satisfactory degree.

      To clarify the scope and aim of the paper, I now put a new subsection in front of the example application to the data from Shekhar and Rahnev, 2021 (Line 534ff):

      “It is important to note that the present work does not propose a single specific model of metacognition, but rather provides a flexible framework of possible models and a toolbox to engage in a metacognitive modeling project. Applying the framework to an empirical dataset thus requires a number of user decisions: which metacognitive noise type is likely more dominant? which metacognitive biases should be considered? which link function should be used? These decisions may be guided either by a priori hypotheses of the researcher or can be informed by running a set of candidate models through a statistical model comparison. As an exemplary workflow, consider a researcher who is interested in quantifying overconfidence in a confidence dataset with a single parameter to perform a brain-behavior correlation analysis. The concept of under/overconfidence already entails the first modeling decision, as only a link function that quantifies probability correct (Equation 6) allows for a meaningful interpretation of metacognitive bias parameters. Moreover, the researcher must decide for a specific metacognitive bias parameter. The researcher may not be interested in biases at the level of the confidence report, but, due to a specific hypothesis, rather at metacognitive biases at the level of readout/evidence, thus leaving a decision between the multiplicative and the additive evidence bias parameter. Also, the researcher may have no idea whether the dominant source of metacognitive noise is at the level of the readout or report. To decide between these options, the researcher computes the evidence (e.g., AIC) for all four combinations and chooses the best-fitting model (ideally, this would be in a dataset independent from the main dataset).”

      In addition, the website of the toolbox now provides a lot more information about typical use cases: https://github.com/m-guggenmos/remeta

      3) More extensive parameter recovery needs to be done/shown. We would like to see a proper correlation matrix between parameters, and recovery across the parameter space, not only for certain regimes (i.e. more than fig 6 supp 3), that is, the full grid exploration irrespective of how other parameters were set.

      The recovery of the three metacognitive bias parameters is displayed in Fig 4, but what about the other parameters? We need to see that they each have a specific role. The point in the Discussion "the calibration curves and the relationships between type 1 performance and confidence biases are quite distinct between the three proposed metacognitive bias parameters may indicate that these are to some degree dissociable" is only very indirect evidence that this may be the case.

      A comprehensive parameter recovery analysis is indeed a key analysis that was missing in the first version of the manuscript. I now performed several analyses to address this, rewrote and extended section 2.3 on parameter recovery. The new parameter recovery analysis was performed as follows (Line 455ff):

      “To ensure that the model fitting procedure works as expected and that model parameters are distinguishable, I performed a parameter recovery analysis. To this end, I systematically varied each parameter of a model with metacognitive evidence biases and generated data. Specifically, each of the six parameters (σs, ϑs, δs, σm, 𝜑m, δm) was varied in 500 equidistant steps between a sensible lower and upper bound. The model was then fit to each dataset. To assess the relationship between fitted and generative parameters, I computed linear slopes between each generative parameter (as the independent variable) and each fitted parameter (as the dependent variable), resulting in a 6 x 6 slope matrix. Note that I computed (robust) linear slopes instead of correlation coefficients, as correlation coefficients are sample-sizedependent and approach 1 with increasing sample size even for tiny linear dependencies. Thus, as opposed to correlation coefficients, slopes quantify the strength of a relationship. Comparability between the slopes of different parameters is given because i) slopes are – like correlation coefficients – expected to be 1 if the fitted values precisely recover the true parameter values (i.e., the diagonal of the matrix) and ii) all parameters have a similar value range which makes a comparison of off-diagonal slopes likewise meaningful. To test whether parameter recovery was robust against different settings of the respective other parameters, I performed this analysis for a coarse parameter grid consisting of three different values for each of the six parameters except σm, for which five different values were considered. This resulted in 35·51 = 1215 slope matrices for the entire parameter grid.”

      In addition, I computed additional supplementary analyses assessing a case with fewer trials, a model with confidence biases, and models with mixed evidence and confidence biases. For details about these analyses, I kindly point the reviewer to section 2.3. Together, these new analyses demonstrate that parameter recovery works extremely well across different regimes and for all model parameters, including the metacognitive bias parameters mentioned in the reviewer’s comment.

      1.8: It would be important to report under what regimes of other parameters these simulations were conducted. This is because, even if dependence of Mratio onto type 1 performance is reproduced, and that is not the case for sigma_m, it would be important to know whether that holds true across different combinations of the other parameter values.

      I now repeated this analysis for various settings of other parameters and include the results as new Figure 6—figure supplement 2. While the settings of other parameters affect the type 1 performance dependency of Mratio (with some interesting effects such as Mratio > 1), parameter recovery of sigma_m is largely unaffected. The same basic point thus holds: Mratio shows a nonlinear dependency with type 1 performance, but sigma_m can be recovered largely without bias under most regimes (the main exception is a combination of low sensory noise and high metacognitive noise under the noisy-readout model, which is also mentioned in the manuscript).

      Is lambda_m meaningfully part of the model, and if so, could it be introduced into the Fig 1 model, and be properly part of the parameter recovery?

      I now reworked the part about metacognitive biases to make it more consistent and to introduce lambda_m on equal footing with the other metacognitive bias parameters. I now distinguish between metacognitive evidence biases (the two main bias parameters of the original model, phi_m and theta_m) and metacognitive confidence biases, i.e. lambda_m and a new additive confidence bias parameter kappa_m. The schematic presentation of the model framework in Figure 1 is updated in accordance:

      This change also complies with reviewer 2, who rightfully pointed out that the original model framework put much stronger emphasis on bias parameters loading on evidence than on confidence. The metacognitive confidence bias parameters are now also part of the parameter recovery analyses (Figure 7—figure supplement 2).

      While it is still feasible to combine the two evidence-related bias parameters and lambda_m – as queried by the reviewer – not all mixed combinations of evidence- and confidence-related bias parameters perform well in terms of model recovery (in particular, combining all four parameters; cf. Figure 7—figure supplement 3). Hence, a decision on the side of the modeler is required. I comment on this important aspect at the end of the section 1.4 about metacognitive biases (Line 276ff):

      “Finally, note that the parameter recovery shown in Figure 4 was performed with four separate models, each of which was specified with a single metacognitive bias parameter (i.e., 𝜑m, δm, λm, or Km). Parameter recovery can become unreliable when more than two of these bias parameters are specified in parallel (see section 2.3; in particular, Figure 7—figure supplement 3). In practice, the researcher thus must make an informed decision about which bias parameters to include in a specific model (in most scenarios one or two metacognitive bias parameters are a good choice). While the evidence-related bias parameters 𝜑m and δm have a more principled interpretation (e.g., as an under/overestimation of sensory noise), it is not unlikely that metacognitive biases also emerge at the level of the confidence report (λm, km). The first step thus must always be a process of model specification or a statistical comparison of candidate models to determine the final specification (see also section 3.1).”

      4) An important nuance in comparing the present sigma_m to Mratio is that the present model requires that multiple difficulty levels are tested, whereas instead, the Mratio model based on signal detection theory assumes a constant signal strength. How does this impact the (unfair?) comparison of these two metrics on empirical data that varied in difficulty level across trials? Relatedly, the Discussion paragraph that explained how the present model departs from type 2 AUROC analysis similarly omits to account for the fact that studies relying on the latter typically intend to not vary stimulus intensity at the level of the experimenter.

      I thank the reviewer for this comment which made me realize that I incorrectly assumed that my model requires multiple stimulus difficulty levels. The only parameter that would require multiple stimulus intensities is the sensory threshold parameter, but for this parameter I already state that it requires additional stimulus difficulties close to threshold (Line 147ff). Otherwise I now made extensive tests that the model works just fine with constant stimuli. My reasoning mistake (iirc) was related to the fact that I fit a metacognitive link function, which I thought would require variance on the x-axis; but of course there is already plenty of variance introduced through noise at the sensory level, so multiple difficulty levels are not required to fit the metacognitive level. I now removed the relevant references to this requirement from the manuscript.

      Nevertheless, I agree that it is interesting to perform the comparison between Mratio and sigma_m also for a scenario with constant stimuli. See both the new Figure 6–supplement 1 with constant stimuli, and the (updated) main Figure 6 with multiple stimulus levels for comparison.

      The general point still holds also for constant stimuli: Mratio is not independent of type 1 performance. Thus, the observed dependence on type 1 performance is not due to the presence of varying stimulus levels. I now reference this new supplementary figure in Result section 1.8 (Line 389).

      5) 'Parameter fitting minimizes the negative log-likelihood of type 1 choices (sensory level) or type 2 confidence ratings (metacognitive level)'. Why not fitting both choices and confidence at the same time instead of one after the other? If I understood correctly, it is an assumption that these are independent, why not allow confidence reports to stem from different sources of choice and metacognitive noise? Is it because sensory level is completely determined by a logistic (but still, it produces the decision values that are taken up to the metacognitive level)?

      The decision to separate the two levels during parameter inference was deliberate. I now explain this choice in the beginning of Result section 2 (Line 416ff):

      “The reason for the separation of both levels is that choice-based parameter fitting for psychometric curves at the type 1 / sensory level is much more established and robust compared to the metacognitive level for which there are more unknowns (e.g., the type of link function or metacognitive noise distribution). Hence, the current model deliberately precludes the possibility that the estimates of sensory parameters are influenced by confidence ratings.”

      Indeed, I would regard it as highly problematic if the estimates of sensory parameters were influenced by confidence ratings, which are shaped by a manifold of interindividual quirks and biases and for which computational models are still in a developmental stage. Yet, from a pure simulation-based parameter recovery perspective, in which the true confidence model is known, using confidence ratings would indeed make sensory parameter estimation more precise (because of the rich information contained in continuous confidence ratings which is lost in the binarization of type 1 choices).

      6) Fig 4 left panels: could you clarify the reasoning that due to sensory noise, overconfidence is expected, instead of having objective and subjective probability correct aligning on the diagonal? Shouldn't the effects of sensory noise average out? In other words, why would the presence of sensory noise systematically push towards overconfidence rather than canceling out on average?

      As an intuitive explanation consider the case that no signal is present in a stimulus, e.g., a line grating in a clockwise/counterclockwise orientation discrimination task with an angle of 0 degrees. Since there is no true information in the stimulus, type 1 performance will be at chance level irrespective of sensory noise.

      However, sensory noise matters for the metacognitive level. Assuming no sensory noise (i.e., sigma_s = 0), the observer’s stimulus/decision variable would be zero and thus confidence would be zero. Thus, confidence would exactly match type 1 performance. Yet, assuming the presence of sensory noise, the stimulus estimate (“decision value”) will be always different from point-zero, if ever so slightly. While the average estimate of the stimulus variable across trials will indeed cancel out to zero, each individual trial will be different from zero (in either direction) and hence also the confidence will be different from zero in each trial. Since confidence is unsigned, the average confidence will be greater than zero and thus give the impression of an overconfident observer.

      Note that this explanation was implicitly included in the paragraph on the 0.75 signature of confidence (“When evidence discriminability is zero, an ideal Bayesian metacognitive observer will show an average confidence of 0.75 and thus an apparent (over)confidence bias of 0.25. Intuitively this can be understood from the fact that Bayesian confidence is defined as the area under a probability density in favor of the chosen option. Even in the case of zero evidence discriminability, this area will always be at least 0.5 − otherwise the other choice option would have been selected, but often higher.”, Line 257ff).

      7) The same analysis as Fig 6 but for noisy readout instead of noisy reports do not show the same results: both sigma_m and m-ratio vary as a function of type 1 performance. Does this mean that the present model with readout module does not solve the issue of dependency upon type 1 performance?

      I refer to this in the Result section: “The exception is a regime with very high metacognitive noise and low sensory noise under the noisy-readout model, in which recovery becomes biased” (Line 391ff). Indeed, the type 1 performance dependency of sigma_m recovery in this edge case is not as good as in the noisyreport model. However, note that recovery is stable across a large range of d’ including the range typical aimed for in metacognition experiments (i.e., medium performance levels to ensure sufficient variance in confidence ratings).

      It is also important to point out that a failure to recover true parameters under certain conditions is not a failure of the model, but a reflection of the fact that information can be lost at the level of confidence reports. For example, if sensory noise is very high, the relationship between evidence and confidence becomes essentially flat (Figure 3), producing confidence ratings close to zero irrespective of the level of stimulus evidence. It becomes increasingly impossible to recover any parameters in such a scenario. Vice versa if sensory noise is extremely low, confidence ratings approach a value of 1 irrespective of stimulus evidence, and the same issue arises. In both cases there is no meaningful variance for an inference about latent parameters. This issue is more pronounced in the noisy-readout case because it requires an inversion of precisely the relationship between evidence and confidence.

      8) In Eq8, could you explain why only the decision values consistent with the empirical choice are filtered. Is this an explicit modeling of the 'decision-congruence' phenomenon reported elsewhere (eg. Peters et al 2017)? What are the implications of not keeping only the congruent decision values?

      I apologize, this was a mistake in the manuscript. The integration is over all decision values, not just those consistent with the choice. I corrected it accordingly.

      Reviewer #2 (Public Review):

      This paper presents a novel computational model of confidence that parameterises links between sensory evidence, metacognitive sensitivity and metacognitive bias. While there have been a number of models of confidence proposed in the literature, many of these are tailored to bespoke task designs and/or not easily fit to data. The dominant model that sees practical use in deriving metacognitive parameters is the meta-d' framework, which is tailored for inference on metacognitive sensitivity rather than metacognitive biases (over- and underconfidence). This leaves a substantial gap in the literature, especially as in recent years many interesting links between metacognitive bias and mental health have started to be uncovered. In this regard, the ReMeta model and toolbox is likely to have significant impact on the field, and is an excellent example of a linked publication of both paper and code. It's possible that this paper could do for metacognitive bias what the meta-d' model did for metacognitive sensitivity, which is to say have a considerable beneficial impact on the level of sophistication and robustness of empirical work in the field.

      The rationale for many of the modelling choices is clearly laid out and justified (such as the careful handling of "flips" in decision evidence). My main concern is that the limits to what can be concluded from the model fits need much clearer delineation to be of use in future empirical work on metacognition. Answering this question may require additional parameter/model recovery analysis to be convincing.

      I thank the reviewer for these encouraging and constructive comments!

      Specific comments:

      • The parameter recovery demonstrated in Figure 4 across range of d's is impressive. But I was left wondering what happens when more than one parameter needs to be inferred, as in real data. These plots don't show what the other parameters are doing when one is being recovered (nor do the plots in the supplement to Figure 6). The key question is whether each parameter is independently identifiable, or whether there are correlations in parameter estimates that might limit the assignment of eg metacognitive bias effects to one parameter rather than another. I can think of several examples where this might be the case, for instance the slope and metacognitive noise may trade off against each other, as might the slope and delta_m. This seems important to establish as a limit of what can be inferred from a ReMeta model fit.

      This is an excellent point and was also raised by reviewer #1. See major comment 3 of reviewer #1 for a detailed response. In short, I now provide comprehensive analyses that demonstrate successful parameter recovery across different regimes and both noisy types (noisy-readout, noisy-report). See Figure 7.

      Regarding the anticipated trade-offs between the confidence slope (now referred to as multiplicative evidence bias) and metacognitive noise / delta_m (now additive evidence bias), there is a single scenario in which this becomes an issue. I describe this in the Results section as follows (Line 480ff):

      “Here, the only marked trade-off emerges between metacognitive noise σm and the metacognitive evidence biases (𝜑m, δm) in the noisy-readout model, under conditions of low sensory noise. In this regime, the multiplicative evidence bias 𝜑m becomes increasingly underestimated and the additive evidence bias δm overestimated with increasing metacognitive noise. Closer inspection shows that this dependency emerges only when metacognitive noise is high – up to σm  0.3 no such dependency exists. It is thus a scenario in which there is little true variance in confidence ratings (due to low sensory noise many confidence ratings would be close to 1 in the absence of metacognitive noise), but a lot of measured variance due to high metacognitive noise. It is likely for this reason that parameter inference is problematic. Overall, except for this arguably rare scenario, all parameters of the model are highly identifiable and separable.” In my experience, certain trade-offs in specific edge cases are almost inescapable for more complex models. Overall, I think it is fair to say that parameter recovery works extremely well, including the ‘trinity’ of metacognitive noise / multiplicative evidence bias / additive evidence bias.

      • Along similar lines, can the noisy readout and noisy report models really be distinguished? I appreciate they might return differential AICs. But qualitatively, it seems like the only thing distinguishing them is that the noise is either applied before or after the link function, and it wasn't clear whether this was sufficient to distinguish one from the other. In other words, if you created a 2x2 model confusion matrix from simulated data (see Wilson & Collins, 2019 eLife) would the correct model pathway from Figure 1 be recovered?

      Great point. I introduced a new subsection 2.4 “Model recovery”, in which I demonstrate successful recovery of noisy-readout versus noisy-report models. See also my response to the first comment of Reviewer #1, which includes the new model recovery figure and the associated paragraph in the manuscript. The key new figure is Figure 7—figure supplement 6.

      • Again on a similar theme: isn't the slope parameter rho_m better considered a parameter governing metacognitive sensitivity, given that it maps the decision values onto confidence? If this parameter approaches zero, the function flattens out which seems equivalent to introducing additional metacognitive noise. Are these parameters distinguishable?

      Indeed, the parameter recovery analysis shows a slight negative correlation between the slope parameter (now termed multiplicative evidence bias) and metacognitive noise (Figure 7). As the reviewer mentions, this is likely caused by the fact that both parameters lead to a flattening /steepening of the evidenceconfidence relationship. For reference, in the empirical dataset by Shekhar & Rahnev, the correlation between AUROC2 and the multiplicative evidence bias is almost absent at r = −0.017. Critically, however, while an increase of the metacognitive noise parameter σm will ultimately lead to a truly flat/indifferent relationship between evidence and confidence, the multiplicative evidence parameter 𝜑m only affects the slope (i.e., asymptotically confidence will still reach 1). This is one reason why parameter recovery for both σm and 𝜑m works overall very well. The differential effects of σm and 𝜑m are now better illustrated in the updated Figure 3:

      Also conceptually, the multiplicative evidence parameter 𝜑m plausibly represents a metacognitive bias, with either interpretation that I suggest in the manuscript: as a an under/overestimation of the evidence or as a an over/underestimation of one’s own sensory noise, leading to under/overconfidence, respectively. In sum, I think there are strong arguments for the present formalization and interpretation.

      • The final paragraph of the discussion was interesting but potentially concerning for a model of metacognition. It explains that data on empirical trial-by-trial accuracy is not used in the model fits. I hadn't appreciated this until this point in the paper. I can see how in a process model that simulates decision and confidence data from stimulus features, accuracy should not be an input into such a model. But in terms of a model fit, it seems odd not to use trial by trial accuracy to constrain the fits at the metacognitive level, given that the hallmark of metacognitive sensitivity is a confidence-accuracy correlation. Is it not possible to create accuracy-conditional likelihood functions when fitting the confidence rating data (similar to how the meta-d' model fit is handled)? Psychologically, this also makes sense given that the observer typically knows their own response when giving a confidence rating.

      While I agree of course that metacognitive sensitivity quantifies the relationship confidence-accuracy relationship, a process model is a distinct approach and requires distinct methodology. Briefly, the current model fit cannot be improved upon, as it is based on a precise inversion of the forward model. Computing accuracy-conditional likelihoods would lead to a biased parameter estimates, because it would incorrectly imply that the observer has access to the accuracy of their choice. While the observer knows their choice, as the reviewer correctly notes, they do not know the true stimulus category and hence not their accuracy.

      I argue in the manuscript that both approaches (descriptive meta-d’, explanatory process model) have their advantages and disadvantages. The concept of meta-d’ / metacognitive sensitivity does not care why a particular confidence rating is the way it is, or whether an incorrect response is caused by sensory noise or by an attentional lapse. On the one hand, this implies that one cannot draw any conclusions about the causes and mechanisms of metacognitive inefficiency, which could be perceived as a major drawback. In this respect, it is a purely descriptive measure (cf. last comment of Reviewer #1). On the other hand, because it is descriptive, it can simply compare the confidence between correct and incorrect choices and thus, in a sense, capture a more thorough picture of metacognitive sensitivity; that is, being metacognitively aware not only of the consequences one’s own sensory noise (as in typical process models), but also of all other sources of error (attentional lapses, finger errors, etc.). I now added an additional paragraph in which I summarize the comparison of type 2 ROC / meta-d’ and process models along these lines (Line 800ff):

      “In sum, while a type 2 ROC analysis, as a descriptive approach, does not allow any conclusions about the causes of metacognitive inefficiency, it is able to capture a more thorough picture of metacognitive sensitivity: that is, it quantifies metacognitive awareness not only about one’s own sensory noise, but also about other potential sources of error (attentional lapses, finger errors, etc.). While it cannot distinguish between these sources, it captures them all. On the other hand, only a process model approach will allow to draw specific conclusions about mechanisms – and pin down sources – of metacognitive inefficiency, which arguably is of major importance in many applications.”

      • I found it concerning that all the variability in scale usage were being assumed to load onto evidencerelated parameters (eg delta_m) rather than being something about how subjects report or use an arbitrary confidence scale (eg the "implicit biases" assumed to govern the upper and lower bounds of the link function). It strikes me that you could have a similar notion of offset at the level of report - eg an equivalent parameter to delta_m but now applied to c and not z. Would these be distinguishable? They seem to have quite different interpretations psychologically: one is at the level of a bias in confidence formation, and the other at the level of a public report.

      I substantially reworked the section about metacognitive biases, including an additive metacognitive bias (κm) also at the level of confidence. The previous version of the manuscript already included a multiplicative bias parameter loading onto confidence (previously referred to as ‘confidence scaling’ parameter, now multiplicative confidence bias λm), but it was considered optional and e.g. not part of the parameter recovery analyses.

      My previous emphasis on biases that load onto evidence-related variables was due to a more principled interpretation (e.g. ‘underestimation of sensory noise’), but I agree that metacognitive biases must not necessarily be principled and may be driven e.g. by the idiosyncratic usage of a particular confidence scale. Updated Figure 1 sketches the new, more complete model.

      Is a mix of evidence- and confidence-related metacognitive bias parameters distinguishable? I tested this in Figure 7—figure supplement 3.

      The slope matrices show that e.g., the model suggested by the reviewer (two evidence-related bias parameters 𝜑m and δm + an additive confidence-based bias parameter κm) is to some degree dissociable, although slight tradeoffs start to emerge with such a complex model. By contrast, a mix of only one evidence-related and one confidence-related bias parameter is much more robust. In general, I thus recommend using at most two metacognitive bias parameters, which are selected either based on a priori hypotheses or on a model comparison. I comment on the necessity of choosing one’s bias parameters in a new paragraph in section 1.4 about metacognitive biases (Line 276ff):

      “Finally, note that the parameter recovery shown in Figure 4 was performed with four separate models, each of which was specified with a single metacognitive bias parameter (i.e., 𝜑m, δm, λm, or m). Parameter recovery is more unreliable when more than two of these bias parameters are specified in parallel (see section 2.3; in particular, Figure 7—figure supplement 3). In practice, the researcher thus must make an informed decision about which bias parameters to include in a specific model (in most scenarios 1 or 2 metacognitive bias parameters is a good choice). While the evidence-related bias parameters 𝜑m and δm have a more principled interpretation (e.g., as an under/overestimation of sensory noise), it is not unlikely that metacognitive biases also emerge at the level of the confidence report (λm, km). The first step thus must always be a process of model specification or a statistical comparison of candidate models to determine the final specification (see also section 3.1).”

    1. Author Response

      Reviewer #1 (Public Review):

      This study produces conservative estimates of the rates of SARS-CoV-2 importation into Canada through February 2021. The study also estimates the relative rates of intra-provincial, inter-provincial, and international transmission by province. Because these rates are investigated over time periods with varying types of non-pharmaceutical interventions, the results provide foundational information on the impact of NPIs and rates of spread to and within Canada. These rates provide useful benchmarks for other regions and deepen our understanding of the natural history of SARS-CoV-2.

      Aside from a few places where speculation is unexpectedly mixed with careful data interpretation, the main limitation of the paper appears to be the unclear impact of sampling biases on the results. These biases occur inside and outside Canada. As the authors note, sequences are missing entirely from many countries and time periods where there was surely transmission. The analysis takes steps to mitigate this problem, but it is not clear how much distortion might remain. It is also unclear whether preferential testing or sequencing of specimens from recent travelers occurred and how strong this preference was (relative to sampling "random" community cases) in different places and times.

      These limitations are shared by many other phylogeographical analyses, but they raise the question of how literally the quantitative estimates and confidence intervals should be interpreted. My intuition is that some are much more robust than others, but this is left as an exercise.

      We have elaborated in the Discussion upon the high level of uncertainty that we have surrounding the exact estimations of importations. Throughout, we have emphasized that the relative dynamics are more important than the absolute estimates. Confidence intervals may underestimate the level of uncertainty.

      "Discussion

      Low sequence representation can lead to underestimates of total introductions if neither index case or descendants were sampled, underestimates of sublineage size if not all descendants were sampled, and similarly, overestimates of the proportion of singletons, which may have been from unsampled transmission chains. Extrapolating an upper estimate of introductions is challenging in the absence of additional data. Clean genomes available in Canada prior to 1 March 2021 represented 4.2% of confirmed diagnoses (and 3.2% when 75% of Canadian sequences retained. Diagnoses were estimated to represent about 9% of total cases in Canada up to September 2020, while other geographies ranged from 5% in Italy to 99% in Qatar (Noh & Danuser, 2021). The probability of a case being detected is affected by geography (sociodemographic structure, testing capacity and recommendations), by individual (age, contact-traced, political beliefs, co-morbidities), and by lineage (symptom severity, infectivity profile). Reason for sequencing is not always random – it could be for an outbreak investigation or to confirm VOC identity - and it varies over time by jurisdiction. As more sequences are generated and made available, we expect more descendants of previously identified sublineages than travellers or their recent contacts harbouring new sublineages or singletons. When sequencing efforts or resources are lower, travellers are a more efficient use of resources if prevalence is higher abroad than domestically, increasing the travel bias. Thus, importations do not scale linearly with sequence representation. In theory, the upper limit of importations by province could be estimated by adjusting for monthly sequence representation, case ascertainment rate, outbreak bias (ratio of probabilities of testing given infected for random versus outbreak-linked), and travel bias (ratio of probabilities of testing given infected for domestic versus travelling populations) over time, stratified by geography. More consistent inclusion of the reason for sequencing and testing in the publicly available metadata could facilitate better estimates of the extent of travel-related and outbreak-related bias. Additionally, prospective cohort studies or seroprevalence studies would ameliorate our estimate of the case ascertainment fraction."

      Reviewer #2 (Public Review):

      In this article entitled "early introductions of SARS-CoV-2 sublineages into Canada drove the 2020 epidemic", McLaughlin et al analyze genetic patterns in a large set of publicly-available SARS-CoV-2 sequences to characterize COVID-19 introductions and spread throughout Canada early in the pandemic. The authors conclude a majority of viral introductions into Canada can be traced to the United States via Quebec and Ontario. In addition, they report a reduction in viral importation into Canada following implementation of travel restrictions and other public health measures to reduce spread. The authors speculate that more rapid implementation of border controls and quarantine might have significantly reduced COVID-19 disease burden in Canada, at least early in the pandemic.

      Although many similar genomic epidemiology studies using SARS-CoV-2 data have been published, this is the first major study focused on Canada at a national scale. The authors download a large dataset from GISAID and use appropriate tools and methods to clean and subsample this dataset. They appropriately acknowledge the limitations of their dataset as a small subset of the total Canadian case counts. Although the work is largely retrospective, the authors argue and I agree that this work can be valuable in evaluating the effectiveness of public health interventions to reduce viral importation and spread and therefore can be informative of ongoing public health measures and useful in comparing viral dynamics to the present time (future work).

      While I believe it is ultimately worthy of publication, this article can be strengthened in a few key areas. Primarily, the authors do not assess the robustness of their results against alternative subsampling schemes. They subsample their global sequences proportionally to case counts, but retain all Canadian sequences. As a result, their dataset is skewed heavily to sequences collected during the winter and spring of 2020, which is not representative of case counts or of case distribution. Additionally, the study focuses primarily on international importations with very limited analyses and perspective on the role of person-to-person spread within Canada.

      Overall, this study deploys a set of tools used by many others in a new and important geographic region of Northern America. They make important, although, retrospective conclusions about the drivers of the COVID-19 pandemic in 2020 in Canada and conclude a reduction of international travel and quarantine requirements were important measures to reduce spread.

      We thank the reviewer for their perspective on how we could better consider the sampling bias attributable to focusing on Canadian sequences and the extent of domestic transmission by province and sublineage. For the former, we have undertaken a sensitivity analysis to subsample the Canadian sequences at 25%, 50%, and 75%, which we believe addresses the bias in regards to contributions of domestic versus international importations. We have also added figures and description of the domestic circulation of dominant sublineages during the first two waves.

      Reviewer #3 (Public Review):

      The authors present a comprehensive description of the early importation and transmission dynamics of SARS-CoV-2 during the early stages of the COVID-19 epidemic in Canada. They implement phylodynamic analyses on a rich genomic data set generated within the country, contrasted to a vast collection of publicly available SARS-CoV-2 sequences from across the globe. Due to the vast quantities of genomic data available for this virus, they apply a downsampling scheme to generate a computationally manageable set of sequences on which analyses are run: this set includes all of the (high-quality) available sequences generated within Canada and a selection of sequences from other countries, which is proportional to the monthly reported COVID-19 cases in each of those countries. Following this step, the authors use a series of phylogenetic and phylogeographic methods to explore the number of importations of the virus to the country, the sources of these importations and the recipient provinces in Canada. They also characterise the sublineages that result from these importations (i.e., importations that result in onwards transmission), particularly regarding their size, duration and circulation between provinces.

      The authors make good use of an abundant collection of SARS-CoV-2 genome sequences collected across all of Canada, providing one of the most in-depth panoramas of the spatiotemporal spread of the virus in the country during 2020. While not all Canadian provinces are represented within the data set, it is evident that the ones that contain the largest urban areas and represent the main international travel hubs within the country are included. The characterisation of the sublineages that emerge from the inferred importation events are very comprehensive and highlight how the largest importation peak of 2020 was preceded by the implementation of non-pharmaceutical interventions, while also showing that overall introductions continued at considerably lower levels during the months where these interventions remained stringent. They also show how most of these earlier sublineages became 'inactive' (i.e., extinct or no longer represented in the country's genomic surveillance) while a small proportion of the earlier introductions did remain active for longer timespans. The exploration of the main hubs where importations were detected (Quebec and Ontario) and the role that these provinces had in seeding transmission lineages across other Canadian provinces provides an interesting picture of the domestic transmission dynamics for SARS-CoV-2.

      The attempt by the authors to identify the international sources of importation faces some challenges which arise from the vastly heterogeneous sequencing efforts by different countries across time. Phylogeographic methods have been long known to be sensitive to sampling bias; this is particularly the case for the COVID-19 pandemic where key territories presented well-documented underreporting of both new cases and viral genome sequences, likely introducing gaps in the available genomic data. The authors choose an interesting approach to address this bias, informing their downsampling by the monthly COVID-19 cases reported by the Johns Hopkins University Center for Systems Science and Engineering (through the 'coronavirus' R package). It is likely that this approach manages to account for some of the sampling bias between countries, but the lack of validation tests for the method and the lack of external confirmation of these results through complementary data sources warrants some careful interpretation of these findings and the uncertainty associated to them. Beyond the available sequence data, case reporting (e.g., data collected by the JHU-CSSE) has also been found to be heterogeneous across countries, particularly where diagnostic scale-up did not keep up with the local epidemic trends. These biases are less likely to affect some of the main identified sources of importation like the USA, but the possible effects for other locations will probably vary.

      In regards to the effect of the potential bias imposed towards identifying USA importation sources:

      Although the USA was highly represented in all of our subsamples as a result of its large contribution to COVID-19 cases in 2020 and high sequence availability during early months, our results suggest a greater effect than due to sampling alone. On average, the USA sequences represented 28.9% (28.7 - 29.2%) of total international sequences, yet accounted for 46.3% (44.0 - 48.7%) of all sublineages and 57.7% (55.6 - 59.8%) of singletons. Upon maximizing the number of Canadian sequences in the analysis, where global sequence representation was more normalized but less comprehensive, the USA sequences represented fewer of the international sequences (25.8%, 25.6 - 26.1%) and still accounted for 38.4% (37.0 - 39.8%) of sublineages and 46.4% (44.6 - 48.3%) of singletons.

      While individual reports of the early epidemics in specific provinces have been published, this is the first nation-wide analysis of the early COVID-19 epidemic in Canada. Given the geographical location and size of the country, these findings are key in understanding the early phases of the COVID-19 pandemic; they also add to the growing body of evidence describing the effects of multiple seeding events on the persistence of an epidemic caused by a respiratory pathogen, the speed at which such a pathogen can spread across large distances and the changes in transmission dynamics that accompany behavioural changes in human populations (in this case, derived from public health interventions). It is also important to highlight that the downsampling approach used by the authors to generate a computationally manageable data set could potentially be useful and applied to other contexts, following deeper exploration and validation.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper correctly identifies two biophysical properties that may impact an OHC contribution to cochlear amplification. These are the membrane RC time constant and prestin kinetics. The RC problem was identified by Santos-Sacchi 1989 (1) based on measures of OHC membrane capacitance, electromotility (eM) and published OHC resting and receptor potential data. At issue was a 20 dB disparity between threshold BM measures and eM when the resting potential (RP, ~ -70 mV)) is displaced from the voltage at maximal eM gain or peak NLC (Vh; ~ -40 mV). If RP were actually at Vh then the problem would not have been identified, assuming that prestin's voltage-responsiveness were frequency-independent, which was not in question at that time. Over the last two decades several groups have found prestin performance to be low pass. Isolated OHCs, macro-patch and OHCs in situ cochlear explants all show this low pass behavior. To date, no manipulations of load have pushed the voltage responsiveness to frequency-independent. This manuscript tries to avoid the kinetics issue and attempts to focus on the RC problem that has been dealt with extensively since 1989, including at that time a suggestion that the RC problem points to the dominance of the stereocilia bundle (2).

      The authors suggest that kinetics of prestin is not addressed in the current manuscript, but this is not the case. In ignoring the paper from Santos-Sacchi and Tan 2018 (3), reliance on Frank et al.'s (4) data explicitly utilizes their kinetic results. OHC84 (so-called short cell, 51 um long) is essentially frequency-independent after microchamber voltage roll-off correction. The authors choose 1 nm/mV gain at 50 kHz to work with in their arguments. As it turns out, the corrected eM of OHC84 is wrong since it does not fix the reported 23 kHz microchamber voltage roll-off. While OHC65 is appropriately fixed, OHC84 is over compensated. Gain at 50 kHz should be about half the chosen gain. This is not the most problematic issue for their arguments, however.

      In Santos-Sacchi and Tan 2018 (3) we show that low frequency (near DC) eM gain for OHCs averaging 55.3 um long is about 15 nm/mV. This indicates, as noted in that paper, that the resting potential of OHC84 was far shifted from Vh, accounting for its wide-band frequency response. If indeed, the authors still maintain that OHC eM is frequency-independent, ala Frank et al. (and in disregard to other publications where, to the contrary, eM gain would be far less at 50 kHz - see (5, 6)), then the eM gain at 50 kHz should be closer to 15 nm/mV; large enough, I think, to make their RC problem exercise overkill. That is, even in 1989 such a gain would not have suggested an RC problem. This is assuming that the normal resting potential is at Vh. Of course, at Vh membrane capacitance would be about twice that of linear capacitance (due to peak NLC) - the cell time constant does not discriminate against source of capacitance. All in all, isolated OHC biophysics that provides the voltage dependence and the kinetics of prestin cannot be ignored to deal with the RC problem in isolation. Doing so will give a false sense of how the cochlea works, and will encourage others to neglect, without rationale, published pertinent data, as with the Sasmal and Grosh 2019 (7) model where the OHC is treated as a frequency-independent PZE device.

      Finally, to scorn the significance of component characteristics comprising the whole cochlea, e.g., based on isolated OHC biophysics or prestin's cryo-EM structure, as a fallacy of composition suffers itself from hasty generalization. Of course, knowing the biophysics of single OHCs informs on the system response. Otherwise, the prestin KO would have been an unfunded goal, never allowed to pass beyond a system modeler's review. Indeed, the authors would have none of the "carefully" chosen data to present their RC counter argument. Pertinent, published biophysical characteristics must be included in any critical discussion on OHC performance. For that matter, cochlear modelers must follow the same rule.

      We thank reviewer #1 for the suggestions on the kinetics of prestin and previous literature.

      Although there is no data (to our best knowledge) for electromotilty (eM) in isolated basal murine OHCs, a more thorough review of the existing literature on the topic suggest that the assumed parameters are indeed a reasonably conservative estimation of eM in situ.

      Additionally, the OHC parameters are pessimistic enough to account for a doubling of effective capacitance due to NLC.

      Regarding the fallacy of composition, we are puzzled that the reviewer interpreted it as a “scorning” of the OHC biophysics, obviously important for cochlear function. The raised point is simple and rather obvious: a system built with low-pass filters doesn’t mean that the system is a low-pass filter. This is elucidated with the analogy, familiar to electrical engineers, that high- and band-pass filters are often built by cascading and mixing the response of low-pass filters. The “fallacy of composition” therefore lies in the conclusion that since eM is “low-pass”, it can’t possibly contribute to high frequency amplification. Strikingly, this conclusion is often based on measured vibrations near the OHCs showing transfer functions with >30 dB peak-to-tail ratio, and that are somewhat consistent with the inner working of cochlear models. That is, we are criticizing one specific interpretation of the biophysical data, not certainly suggesting that collecting and analyzing the data in the first place is unimportant.

      Reviewer #2 (Public Review):

      In the inner ear, the cochlea transforms sound-induced vibrations into electrical signals that are sent to the brain. Cochlear outer hair cells (OHCs) are thought to amplify these vibrations, but it is unclear how amplification works. Sound-induced vibrations modulate the current entering an OHC, which drive its receptor potential, causing the OHC to change length. The change in length owing to the receptor potential variation, known as the OHC's electromotile response, depends on the size of the receptor potential. However, the receptor potential decreases with increasing sound frequency, because of the resistance (R) and capacitance (C) of the OHC's membrane. This paper addresses the RC problem, limitations on high-frequency amplification owing to the OHC's receptor potential decreasing with frequency.

      The authors use a well-known simplification of the RC problem and some back-of-the-envelope calculations to argue that OHCs can amplify sufficiently well at high frequencies to match experimental data, despite the decrease in their receptor potentials. They argue that changes to OHC properties along the cochlea allow them to amplify at high frequencies and that OHCs reduce noise and distortion. They argue against OHCs as being cochlear impedance regulators and that OHCs do not limit cochlear tuning.

      Figure 1 and Equations 1-6 are useful teaching tools but are not novel. The back-of-the-envelope calculations use these equations and a limited number of data points from the literature. There are many prior models that show amplification despite the RC problem, but they are not analyzed or discussed in much detail.

      How RC OHC filtering reduces noise without reducing the signal is not explained. The type of noise calculation done in Appendix 1 is well-known and the application is again a rough back-of-the-envelope calculation. Most of the statements about noise are not fleshed out or supported by calculations.

      The discussion about tonotopic variations has little new data. Fig. 2 uses two data points from the literature and an unpublished data point from a colleague. The fact that BM displacement is smaller at the base than at the apex is well known. There is speculation that reduced OHC motion is "effectively counteracted" by gradients in OHC capacitance and MET current, but no evidence is presented.

      The discussion about distortions is pedagogical but is again speculation without new or strong-supporting evidence. Fig. 3 argues that OHCs might reduce high-frequency distortions, but don't limit the cochlear amplifier. The plots shown are either well-known consequences of filtering or a summary of the authors' previous model data.

      The arguments against OHCs as regulators and that they don't limit tuning are not well flushed out, speculative, and unsupported by new calculations or data.

      This paper does not clarify OHC operation or the RC problem, because it mixes speculation, limited data, and topics that are not clearly related to the problem.

      We agree with reviewer #2 that there are no new physics principles elucidated here, and that most of the discussion relies on simple calculations. But we believe that such simple calculations are the missing piece (absent in the literature) that allow one to appreciate the magnitude of the problem under exam—magnitude typically inflated by focusing on quantities whose physical significance is uncertain. In other words, we believe that the simplicity of the calculations and physical reasoning is not a bug, but a feature of the paper.

      We believe that in his criticism regarding various topics of discussion presenting little or speculative new evidence, this reviewer might not have fully considered that most of the evidence provided here is fundamentally a physics-based review of the recent experimental data, incidentally the same type of data previously employed to argue that the RC problem is dramatic in the first place. Likely we didn't convey this message clearly enough in the manuscript.

      While the arguments against OHCs as regulators are not all new, they are often ignored (or perhaps forgotten) and we believe there is a value in synthesizing them all in one place. The support for these arguments comes from fundamental hydrodynamic principles, previous modeling studies, and most importantly from OCT data collected over the last 6 years. Of course, the discussion on the plausibility of suggested mechanisms lacking a concrete proposal cannot be 100% “analytic”.

      About noise and signal amplification, the missing piece perhaps is that distributed internal noise sources (e.g., thermal and shot noise) are independent of each other and hence spatially incoherent. While the manuscript doesn’t specifically deal with signal vs. noise amplification in cochlear models, spatially distributed amplification is known to boost signals more than internal noise—a principle universally used in telecommunications and addressed in >60-year-old literature.

      Reviewer #3 (Public Review):

      This paper discusses the effect of the low-pass filtering between outer hair cell transducer current and receptor voltage. The filter's cut-off frequency (where the response is down by a factor of 0.71 of its maximum) can be quantified by the resistance and capacitance of the cell hair cell's basolateral membrane. The capacitance value is determined mainly by the lipid membrane and is augmented by the charge movement of the piezoelectric prestin molecule, which endows the OHC with its electromotile properties. The OHC's capacitance (C) value is pretty well known. The resistance (R) is determined mainly by K+ channels in the basolateral membrane, a value that is also known reasonably well. The low-pass cut-off frequency is equal to (2pi*RC)^-1 and has a value of a ~1 to a few kHz - a value that has both experimental and theoretical support. The low-pass filtering of membrane voltage is important because the cell responds to membrane voltage by shortening and lengthening - this electromotility is thought to be key to the cochlea's operation and in particular to cochlear amplification, the process that enhances the magnitude and tuning of the cochlea's passive response to sound. However, the auditory system works to 80 kHz and even higher in some animals. Thus, it has been posed (let's say by team A) that the RC cut-off frequency value of a few kHz makes electromotility too slow to operate "cycle-by-cycle" up to several 10s of kHz. The article under review, representing team B, supports "cycle-by-cycle" action, arguing that the several kHz cut off frequency is not a problem and is even an advantage.

      The arguments put forward in favor of cycle-by-cycle action are:

      1. The size of the motions, even with the low-pass-filtered attenuation are as large or larger as those measured in the cochlea at high frequencies.

      2. Noise is often increasing as frequency decreases, thus low-pass-filtering is actually good, to reduce the predominantly low frequency noise.

      3. Harmonic distortion is at supra-CF frequencies, so it's good if the hair cell is low-pass-filtering to reduce harmonics.

      These three points are reasonable, and the quantification relating to statement 1 is convincing. However, the quantification associated with point 2 is muddled. The hair cell voltage signal is expressed in volts, but the noise value is given in terms of the current mediated by 1-5 channels. A quantitative comparison should be made, with signal and noise expressed in the same units, preferably volts and volts/root(Hz), with a bandwidth estimated. The appendix attempts to be more quantitative and something like that short appendix should be incorporated into the paper. If a quantitative comparison in standard units is not possible with current data, that can be stated and underscores that we really don't know whether the noise is a problem for cycle-by-cycle amplification. Point 3 is reasonable and nicely illustrated in Fig. 3B. I did not get anything from Fig. 3A and the corresponding discussion on page 8 lines 320-335. Panels C and D were under-explained and could be removed, and the caption's reference to "short wave hydrodynamics" was also under-explained.

      The arguments put forward to challenge gain control mechanics, which employ DC shifts to set effective operating conditions:

      1. Operation based on DC and quasi-DC operating points is sensitive to noise, which as noted above is often increasing as frequency decreases.

      2. Operation that employs a DC shift for operating point is likely to work in such a way to reduce stiffness, which has been shown to be inconsistent with active cochlear responses. For example, stiffness reduction would reduce traveling wave wavelength and thus alter the response phase and timing to a degree that has not been observed experimentally. This has long been known and relevant papers are cited.

      Point 4 was not convincing to me because the motions related to setting operating conditions could be larger than the nanoscale cycle-by-cycle response motions - thus these operating point motions could be above the noise values that seem limiting to cycle-by-cycle amplification. Point 5 is a nice reminder of the conclusion that, based on experimental findings and physics-based basic cochlear models, the cochlear amplifier must work by means of energy injection. This point was made clearly by Kolston (well cited in this paper) and later supported by other work.

      The present paper is informative in many ways and offers useful insights for further exploration. It is nicely written and illustrated. Because the signal and noise values are not quantified, the basic claim, that the cochlea amplifier can amplify a noisy signal effectively, is not convincing and that basic question is still unsettled. Overall, the paper would be improved if the claims and arguments were presented more tightly, with fewer digressions, and more modestly.

      We thank reviewer #3 for the many comments and suggestions.

      We agree that plotting the spectral density of a “near-threshold” OHC signal vs. inherent electric noise results in much simplification. Regarding noise and signal amplification, previous work on transmission lines points out that amplification is the way to increase SNR along the line.

      We believe that part of the undergoing confusion is that the problem is not how OHC can amplify a “noisy signal” —the cochlea amplifies “noisy” sounds similarly as it amplifies pure tones— but how OHCs can amplify signals in presence of internal noise. Amplification and detection are two distinct things, and signal amplification does not rely on detection. Detection is an intrinsically nonlinear decision process (e.g., signal present/absent). Amplification in relevant frequency ranges is what allows to detect signals in the real world (e.g., radio receivers). The cochlea (as portrayed by classic theories) does not seem exceptional in this regard.

      We agree that the effect of noise on DC responses is not very clear in the manuscript. Although it is difficult to make quantitative statements on a hypothesis that lacks a concrete mechanistic proposal, ~63% of (inherent) electric noise power is confined below the RC corner frequency, i.e, the frequency band of the regulatory OHC. In presence of (unavoidable) flicker and brown noise (e.g., Brownian motion of stereocilia), this percentage can only increase. Conversely, in the frequency band of OHC cycle-by-cycle amplification, the noise power is only a tiny fraction of the total.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Abdellatef et al. describe the reconstitution of axonemal bending using polymerized microtubules (MTs), purified outer-arm dyneins, and synthesized DNA origami. Specifically, the authors purified axonemal dyneins from Chlamydomonas flagella and combined the purified motors with MTs polymerized from purified brain tubulin. Using electron microscopy, the authors demonstrate that patches of dynein motors of the same orientation at both MT ends (i.e., with their tails bound to the same MT) result in pairs of MTs of parallel alignment, while groups of dynein motors of opposite orientation at both MT ends (i.e., with the tails of the dynein motors of both groups bound to different MTs) result in pairs of MTs with anti-parallel alignment. The authors then show that the dynein motors can slide MTs apart following photolysis of caged ATP, and using optical tweezers, demonstrate active force generation of up to ~30 pN. Finally, the authors show that pairs of anti-parallel MTs exhibit bidirectional motion on the scale of ~50-100 nm when both MTs are cross-linked using DNA origami. The findings should be of interest for the cytoskeletal cell and biophysics communities.

      We thank the reviewer for these comments.

      We might be misunderstanding this reviewer’s comment, but the complexes with both parallel and anti-parallel MTs had dynein molecules with their tails bound to two different MTs in most cases, as illustrated in Fig.2 – suppl.1. The two groups of dyneins produce opposing forces in a complex with parallel MTs, and majority of our complexes had parallel arrangement of the MTs. To clarify the point, we have modified the Abstract:

      “Electron microscopy (EM) showed pairs of parallel MTs crossbridged by patches of regularly arranged dynein molecules bound in two different orientations depending on which of the MTs their tails bind to. The oppositely oriented dyneins are expected to produce opposing forces when the pair of MTs have the same polarity.”

      Reviewer #2 (Public Review):

      Motile cilia generate rhythmic beating or rotational motion to drive cells or produce extracellular fluid flow. Cilia is made of nine microtubule doublets forming a spoke-like structure and it is known that dynein motor proteins, which connects adjacent microtubule doublet, are the driving force of ciliary motion. However the molecular mechanism to generate motion is still unclear. The authors proved that a pair of microtubules stably linked by DNA-origami and driven by outer dynein arms (ODA) causes beating motion. They employed in vitro motility assay and negative stain TEM to characterize this complex. They demonstrated stable linking of microtubules and ODAs anchored on the both microtubules are essential for oscillatory motion and bending of the microtubules.

      Strength

      This is an interesting work, addressing an important question in the motile cilia community: what is the minimum system to generate a beating motion? It is an established fact that dynein power stroke on the microtubule doublet is the driving force of the beating motion. It was also known that the radial spoke and the central pair are essential for ciliary motion under the physiological condition, but cilia without radial spokes and the central pair can beat under some special conditions (Yagi and Kamiya, 2000). Therefore in the mechanistic point of view, they are not prerequisite. It is generally thought that fixed connection between adjacent microtubules by nexin converts sliding motion of dyneins to bending, but it was never experimentally investigated. Here the authors successfully enabled a simple system of nexin-like inter-microtubule linkage using DNA origami technique to generate oscillatory and beating motions. This enables an interesting system where ODAs form groups, anchored on two microtubules, orienting oppositely and therefore cause tag-of-war type force generation. The authors demonstrated this system under constraints by DNA origami generates oscillatory and beating motions.

      The authors carefully coordinated the experiments to demonstrate oscillations using optical tweezers and sophisticated data analysis (Fourier analysis and a step-finding algorithm). They also proved, using negative stain EM, that this system contains two groups of ODAs forming arrays with opposite polarity on the parallel microtubules. The manuscript is carefully organized with impressive movies. Geometrical and motility analyses of individual ODAs used for statistics are provided in the supplementary source files. They appropriately cited similar past works from Kamiya and Shingyoji groups (they employed systems closer to the physiological axoneme to reproduce beating) and clarify the differences from this study.

      We thank the reviewer for these comments.

      Weakness

      The authors claim this system mimics two pairs of doublets at the opposite sites from 9+2 cilia structure by having two groups of ODAs between two microtubules facing opposite directions within the pair. It is not exactly the case. In the real axoneme, ODA makes continuous array along the entire length of doublets, which means at any point there are ODAs facing opposite directions. In their system, opposite ODAs cannot exist at the same point (therefore the scheme of Dynein-MT complex of Fig.1B is slightly misleading).

      Actually, opposite ODAs can exist at the same point in our system as well, and previous work using much higher concentration of dyneins (e.g, Oda et al., J. Cell biol., 2007) showed two continuous arrays of dynein molecules between a pair of microtubules. To observe the structures of individual dynein molecules we used low concentrations of dynein and searched for the areas where dynein could be observed without superposition, but there were some areas where opposite dyneins existed at the same point.

      We realize that we did not clearly explain this issue, so we have revised the text accordingly.

      In the 1st paragraph of Results: “In the dynein-MT complexes prepared with high concentrations of dynein, a pair of MTs in bundles are crossbridged by two continuous arrays of dynein, so that superposition of two rows of dynein molecules is observed in EM images (Haimo et al., 1979; Oda et al., 2007). On the other hand, when a low concentration of the dynein preparation (6.25–12.5 µg/ml (corresponding to ~3-6 nM outer-arm dynein)) was mixed with 20-25 µg/ml MTs (200-250 nM tubulin dimers), the MTs were only partially decorated with dynein, so that we were able to observe single layers of crossbridges without superposition in many regions.” Legend of Fig. 1(C): “Note that the geometry of dyneins in the dynein-MT complex shown in (B) mimics that of a combination of the dyneins on two opposite sides of the axoneme (cyan boxes), although the dynein arrays in (B) are not continuous.”

      If they want to project their result to the ciliary beating model, more insight/explanation would be necessary. For example, arrays of dyneins at certain positions within the long array along one doublet are activated and generate force, while dyneins at different positions are activated on another doublet at the opposite site of the axoneme. This makes the distribution of dyneins and their orientations similar to the system described in this work. Such a localized activation, shown in physiological cilia by Ishikawa and Nicastro groups, may require other regulatory proteins.

      We agree that the distributions of activated dyneins in 3D are extremely important in understanding ciliary beating, and that other regulatory proteins would be required to coordinate activation in different places in an axoneme. However, the main goal of this manuscript is to show the minimal components for oscillatory movements, and we feel that discussing the distributions of activated dyneins along the length of the MTs would be too complicated and beyond the scope of this study.

      They attempted to reveal conformational change of ODAs induced by power stroke using negative stain EM images, which is less convincing compared to the past cryo-ET works (Ishikawa, Nicastro, Pigino groups) and negative stain EM of sea urchin outer dyneins (Hirose group), where the tail and head parts were clearly defined from the 3D map or 2D averages of two-dynein ODAs. Probably three heavy chains and associated proteins hinder detailed visualization of the tail structure. Because of this, Fig.2C is not clear enough to prove conformational change of ODA. This reviewer imagines refined subaverage (probably with larger datasets) is necessary.

      As the reviewer suggests, one of the reasons for less clear averaged images compared to the past images of sea urchin ODA is the three-headed structure of Chlamydomonas ODA. Another and perhaps the bigger reason is the difficulty of obtaining clear images of dynein molecules bound between 2 MTs by negative stain EM: the stain accumulates between MTs that are ~25 nm in diameter and obscures the features of smaller structures. We used cryo-EM with uranyl acetate staining instead of negative staining for the images of sea urchin ODA-MT complexes we previously published (Ueno et al., 2008) in order to visualize dynein stalks. We agree with the reviewer that future work with larger datasets and by cryo-ET is necessary for revealing structural differences.

      That having been said, we did not mean to prove structural changes, but rather intended to show that our observation suggests structural changes and thus this system is useful for analyzing structural changes in future. In the revised manuscript, we have extensively modified the parts of the paper discussing structural changes (Please see our response to the next comment).

      It is not clear, from the inset of Fig.2 supplement3, how to define the end of the tail for the length measurement, which is the basis for the authors to claim conformational change (Line263-265). The appearance of the tail would be altered, seen from even slightly different view angles. Comparison with 2D projection from apo- and nucleotide-bound 3-headed ODA structures from EM databank will help.

      We agree with the reviewer that difference in the viewing angle affects the apparent length of a dynein molecule, although the 2 MTs crossbridged by dyneins lie on the carbon membrane and thus the variation in the viewing angle is expected to be relatively small. To examine how much the apparent length is affected by the view angle, we calculated 2D-projected images of the cryo-ET structures of Chlamydomonas axoneme (emd_1696 and emd_1697; Movassagh et al., 2010) with different view angles, and measured the apparent length of the dynein molecule using the same method we used for our negative-stain images (Author response image 1). As shown in the plot, the effect of view angles on the apparent lengths is smaller than the difference between the two nucleotide states in the range of 40 degrees measured here. Thus, we think that the length difference shown in Fig.2-suppl.4 reflects a real structural difference between no-ATP and ATP states. In addition, it would be reasonable to think that distributions of the view angles in the negative stain images are similar for both absence and presence of ATP, again supporting the conclusion.

      Nevertheless, since we agree with the reviewer that we cannot measure the precise length of the molecule using these 2D images, we have revised the corresponding parts of the manuscript, adding description about the effect of view angles on the measured length in the manuscript.

      Author response image 1. Effects of viewing angles on apparent length. (A) and (B) 2D-projected images of cryo-electron tomograms of Chlamydomonas outer arm dynein in an axoneme (Movassagh et al., 2010) viewed from different angles. (C) apparent length of the dynein molecule measured in 2D-projected images.

      In this manuscript, we discuss two structural changes: 1) a difference in the dynein length between no-nucleotide and +ATP states (Fig.2-suppl.4), and 2) possible structural differences in the arrangement of the dynein heads (Fig.2-suppl.3). Although we realize that extensive analysis using cryo-ET is necessary for revealing the second structural change, we attempted to compare the structures of oppositely oriented dyneins, hoping that it would lead to future research. In the revised manuscript, we have added 2D projection images of emd_1696 and emd_1697 in Fig.2-suppl.3, so that the readers can compare them with our negative stain images. We had an impression that some of our 2D images in the presence of ATP resembled the cryo-ET structure with ADP.Vi, whereas some others appeared to be closer to the no-nucleotide cryo-ET structure. We have also attempted to calculate cross-correlations, but difficulties in removing the effect of MTs sometimes overlapped with a part of dynein, adjusting the magnifications and contrast of different images prevented us from obtaining reliable results.

      To address this and the previous comments, we have extensively modified the section titled ‘Structures of dynein in the dynein-MT-DNA-origami complex’.

      In Fig.5B (where the oscillation occurs), the microtubule was once driven >150nm unidirectionally and went back to the original position, before oscillation starts. Is it always the case that relatively long unidirectional motion and return precede oscillation? In Fig.7B, where the authors claim no oscillation happened, only one unidirectional motion was shown. Did oscillation not happen after MT returned to the original position?

      Long unidirectional movement of ~150 nm was sometimes observed, but not necessarily before the start of oscillation. For example, in Figure 5 – figure supplement 1A, oscillation started soon after the UV flash, and then unidirectional movement occurred.

      With the dynein-MT complex in which dyneins are unidirectionally aligned (Fig.7B, Fig.7-suppl.2), the MTs kept moving and escaped from the trap or just stopped moving probably due to depletion of ATP, so we did not see a MT returning to the original position.

      Line284-290: More characterization of bending motion will be necessary (and should be possible). How high frequency is it? Do they confirm that other systems (either without DNA-origami or without ODAs arraying oppositely) cannot generate repetitive beating?

      The frequencies of the bending motions measured from the movies in Fig.8 and Fig.8-suppl.1 were 0.6 – 1 Hz, and the motions were rather irregular. Even if there were complexes bending at high frequencies, it would not have been possible to detect them due to the low time resolution of these fluorescence microscopy experiments (~0.1 s). Future studies at a higher time resolution will be necessary for further characterization of bending motions.

      To observe bending motions, the dynein-MT complex should be fixed to the glass or a bead at one part of the complex while the other end is free in solution. With the dynein-MT-DNA-origami complexes, we looked for such complexes and found some showing bending motions as in Fig. 8. To answer the reviewer’s question asking if we saw repetitive bending in other systems, we checked the movies of the complexes without DNA-origami or without ODAs arraying oppositely but did not notice any repetitive bending motions. However, future studies using the system with a higher temporal resolution and perhaps with an improved method for attaching the complex would be necessary in these cases as well.

    1. Author Response

      Reviewer #2 (Public Review):

      Schrecker, Castaneda and colleagues present cryo-EM structures of RFC-PCNA bound to 3'ss/dsDNA junction or nicked DNA stabilized by slowly hydrolyzable ATP analogue, ATPyS. They discover that PCNA can adopt an open form that is planar, different from previous models for the loading a sliding clamp. The authors also report a structure with closed PCNA, supporting the notion that closure of the sliding clamp does not require ATP hydrolysis. The structures explain how DNA can be threaded laterally through a gap in the PCNA trimer, as this process is supported by partial melting of the DNA prior to insertion. The authors also visualise and assign a function to the N-terminal domain in the Rfc1 subunit of the clamp loader, which they find modulates PCNA loading at the replication forks, in turn required for processive synthesis and ligation of Okazaki fragments.

      This work is extremely well done, with several structures with resolutions better than 3Å, which a significant achievement given the dynamic nature of the PCNA ring loading process. To investigate the role of the N-terminal domain of Rfc1 in PCNA loading, the authors use in vitro reconstitution of the entire DNA replication reaction, which is a powerful method to identify specific defects in Okazaki fragment synthesis and ligation.

      Important issues

      1. Figure 3B,D,F. I would find them much more informative if the authors showed the overlay between atomic model and cryo-EM density in the main figure. If the figure becomes too busy, the authors could decide to just add additional panels with the overlay as well as the atomic models alone. I do not think that showing segmented density for the DNA alone, as done is Figure 6C is sufficient. Also including the density for e.g. residues Trp638 and Phe582 seems important.

      We thank the reviewer for the suggestion. However, we have been unable to establish a way to show the density for both the protein and DNA in a meaningful manner due to the large number of atoms in the fields of view. For an example, please see Figure 1, which corresponds to Figure 3H. To aid the reader, we have revised several of the Figures and Figure Supplements to include density for the DNA.

      Consistent with our structures, recent work from the Kelch group has identified Trp638 and Phe582 as facilitating DNA base flipping (Gaubitz et al., 2022a). Despite the role in base flipping, no growth defects were observed in cells in which either of these residues were mutated and thus their functional role and the role of DNA base-flipping remains unclear.

      1. Cryo-EM samples preparation included substoichiometric RPA, which has been shown to promote DNA loading of PCNA by RFC. Would the authors expect a subset of PCNA-RFC-DNA particles to contain RPA as well? The glycerol gradient gel indicates that, at least in fraction 5, a complex might exist. If the authors think that the particles analyzed cannot contain RPA, it would be useful to mention this.

      We have no evidence to suggest that RPA cannot be present in the imaged particles. We have revised the text (lines 150 - 152) clarify that while RPA was present in the sample, we did not observe any density that could not be assigned to either DNA, RFC or PCNA. We therefore suggest that RPA does not interact with the complex in a stable manner.

      1. Published kinetic data indicate that ATP hydrolysis occurs before clamp closure. To incorporate this notion in their model, the authors suggest that ATP hydrolysis might promote PCNA closure by disrupting the planar RFC:PCNA interaction surface and hence the dynamic interaction of PCNA with Rfc2 and -5 in the open state. In addition, ATP hydrolysis promotes RFC disengagement from PCNA-DNA by reverting from a planar to an out-of-plane state. This model appears reasonable and nicely combines published data with the new findings reported by the authors. However, the model is oversimplified in Figure 6, where the only depicted effect of ATP hydrolysis is RFC release. Perhaps the authors could use the figure caption to acknowledge that ATP hydrolysis likely still has a role in facilitating PCNA closure.

      We have revised Figure 6 to show that DNA hydrolysis may occur either before or after ring closure.

      1. Can the authors explain what steps should be taken to describe PCNA loading by RFC in conditions where ATP hydrolysis is permitted? How would such experiments further inform the molecular mechanism for the loading of the PCNA clamp?

      As highlighted in point 3 above and by the other reviewers, ATP and ATPgS may alter the behavior and energetic landscape of RFC. In our studies, ATPgS was added trap the complex in a pre-hydrolysis state in which all components are assembled. We have added a section to the discussion noting the potential differences and highlighting the need for future studies to better elucidate the role of nucleotide hydrolysis. To achieve a hydrolysis competent complex, one could apply time-resolved cryo-EM approaches where the complex is formed on the grids and quickly vitrified. Such an approach, particularly if coupled with stopped-flow kinetic analyses, may provide additional insights in the kinetics of loading of PCNA onto DNA by RFC.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors attempt to optimize the FluoroSpot assay to allow for the assessment of cross-reactive antibodies targeting conserved epitopes shared by multi-allelic antigens and those specific to unique antigen variant at the B cells level. This is a critical aspect to consider when identifying targets of a broad range of cross-reactive antibody for vaccine development and the antigen VAR2CSA used in this work is one that will benefit from the method described in the manuscript.

      Overall, this is a method manuscript with extensive detail of the assay validation process. The description of the assay performance steps using, first monoclonal antibodies and later hybridoma/immortalized B cells was important to understand conditions that can influence the antigen-antibody interactions in the assay. This multiplex approach can assess the cross-reactivity of antibody to up four allelic variants of an antigen with the possibility to explore the affinity of antibody to a particular variant using the RSV measurements. The validation of the assay with PBMC from malaria exposed donors both men and women (that naturally acquired high titer of antibodies to VAR2CSA during pregnancy) is a strength of this work as this is in the context of polyclonal antibodies with more heterogenous antibody binding specificities.

      The ability of the assay to detect cross-reactive antibodies using all four tags appear highly variable even in the context of monoclonal antibody targeting the homologous antigen labelled with all 4 tags.

      We understand the concern for variability, but we think that in general the assay was very consistent. Regardless of the configuration used, we detected strikingly comparable number of spots/well, especially when the homologous antigen labelled with four tags was used (Figure 2A). Similar consistency has been previously reported when a similar assay was used to study cross-reactivity in dengue-specific antibodies.

      Overall, it appears that the assessed antibody reactivity with TWIN tagged antigens was relatively low and this needs to be explained and discussed as the current multiplex method, as it is, might just be optimized for study of cross-reactive antibodies to 3 antigens.

      The LED380 (used to detect and visualize the TWIN tag) indeed gave more background than the other three detection channels. We normally observed a ring of fluorescence at the edge and the middle of the wells, accompanied by lower intensity of the spots. These two characteristics are apparent in the figures and RSV plots presented in the manuscript. In an attempt to reduce these issues, we attempted to substitute the TWIN tag for a BAM tag detected with a peptide-specific antibody (data not presented). However, that approach did not improve the readout and we therefore decided to keep the TWIN-StrepTactin pair for all the experiments. Importantly, even with these issues, routine manual inspection of the wells confirmed the Apex software automatically and efficiently counted “real” spots giving us confidence on the performance of the assay. We acknowledge that exclusion of the LED380 data would lead to higher assay accuracy. However, it would result in reduced ability to assess broad antibody cross-reactivity, which was the primary objective of our study. We have added text briefly discussing this to the revised manuscript (lines 154-160).

      As acknowledged by the authors, the validation of this assay on PBMC from only 10 donors (7 women and 3 men) is a caveat to the conclusion and increasing this number of donors (the authors have previously excelled in B cells analyses of PfEMP1 proteins and would have PBMC readily available) will strengthen the validity of this assay.

      We thank the reviewer for this comment and agree the number of donors tested is far from sufficient to provide any conclusive evidence regarding frequencies of VAR2CSA-specific and cross-reactive B cells in the context of placental malaria. However, we firmly believe that the validation of the assay – which was the objective of the study – is sufficient, especially because we included human B-cell lines isolated from donors naturally exposed to VAR2CSA-expressing parasites. Futures studies including more donors and full-length VAR2CSA antigens are certainly warranted. As the performance of assay has now been validated (this manuscript) to our satisfaction, we are indeed planning such studies.

      Reviewer #2 (Public Review):

      The manuscript describes the development of a laboratory-based assay as a tool designed to identify individuals who have developed broadly cross-reactive antibodies with specificity for regions that are common to multiple variants of a given protein (VAR2CSA) of Plasmodium falciparum, the parasite that causes malaria. The assay has potential application in other diseases for which the question ofacquisition of antibody-mediated immunity, either through natural exposure or through vaccination, remains unresolved.

      From a purely technical/methodological viewpoint, the work described is of high quality, relying primarily on the availability of custom-designed, in-house-derived protein and antibody reagents that had, for the most part, been validated through use in earlier studies. The authors demonstrate a high degree of rigour in the assay development steps, culminating in a convincing demonstration of the ability to accurately and reproducibly quantify cross-reactive antibody types under controlled conditions using well-characterized monoclonal antibodies.

      In a final step, the authors used the assay to assess the content of broadly cross-reactive antibodies in samples from a small number of malaria-exposed African men and women. Given that VAR2CSA is a parasite-derived protein that is exclusively and intimately involved in the manifestation of malaria during pregnancy, with specific localisation to the maternal placental space, the premise is that antibodies -including those with cross-reactive specificities - should be almost exclusively detectable in samples from women, either pregnant at the time of sampling or having been pregnant at least once. The assay functioned technically as expected, identifying antibodies predominantly in women rather than men, but it failed to identify broadly cross-reactive antibodies in the women's samples used, only revealing antibodies with specificity for just one of the different variants used. The latter result could have two mutually non-exclusive explanations. On the one hand, the small number of women's samples (7) screened in the assay could simply be insufficient, demanding the use of a much larger panel. On the other hand, for technical reasons the assay involves the use of only relatively restricted parts of the VAR2CSA protein, and this particular aspect may represent its primary limitation. In earlier work, the authors did identify broadly cross-reactive antibodies in samples from African women, but that work relied on the use of the whole VAR2CSA protein present in its natural state embedded in the membrane of the infected red cell, or as a complete protein produced in the laboratory. The important point being that the whole protein likely interacts with antibodies that recognize protein structures that the isolated smaller parts of the whole protein used in the assay fail to reproduce, and that the cross-reactive antibodies identified recognize these structures that are conserved across different VAR2CSAvariants. The authors recognize these potential weaknesses in their discussion of the results. It is also possible that VAR2CSA variants expressed by parasites from geographically-distinct regions (Africa, Asia, South America) are themselves distinct, and this aspect could also have affected the outcome, since the variant protein sequences used in the assay were derived from parasites originating in these different regions.

      The assay could find application in the malaria research field in the specific context of assessments of antibody responses to a range of different parasite proteins that are, or have been, considered candidates for vaccine development but for which their extensive inherent allelic polymorphism has effectively negated such efforts.

      We thank the reviewer for the kind evaluation. We fully acknowledge the need for more comprehensive studies to assess the robustness of the pilot data regarding antibody cross-reactivity after natural exposure in the present study, which was aimed to document the performance of the complicated multiplexed assay rather than to provide such evidence. As mentioned above, we are currently planning such a study. We also acknowledge the need to assess the degree of cross-reactivity to full-length antigens rather than domain-specific components of them. This is obviously particularly true for large, multi-domain antigens such as PfEMP1 (including VAR2CSA). Such an exercise is complicated by the need for appropriately tagged antigens. We are intrigued by the apparent discrepancy between the degree of antibody cross-reactivity in depletion experiments using individual DBL domains of VAR2CSA (low cross-reactivity) versus full-length VAR2CSA antigens (very substantial cross-reactivity) reported by Doritchamou et al., and are keen to apply our approach to explore that finding. Therefore, as also mentioned above, we are currently planning a study employing tagged full-length VAR2CSA allelic variants as well.

    1. Author Response

      Reviewer #2 (Public Review):

      Portes et al. investigated the nanoscale architecture and dynamics of the osteoclast sealing zone using high-end microscopy techniques. They first use DONALD 3D single molecule localization microscopy on osteoclasts seeded on glass to study the lateral and axial localization of key components of the sealing zone. They show that for some components (vinculin, talin Cterminus), the axial localization was higher when molecules were in close proximity to the actin core while for other components (cortactin, actinin, filamin, paxillin), there was no difference in height as a function of distance from the actin core. They next show that random illumination microscopy (RIM) is a suited microscopy technique to study the sealing zone of osteoclasts on a bone mimetic substrate. They continue to use RIM to show that the dynamics of neighbouring podosomes correlate up to a distance of about 1.5um. They next show that within the sealing zone, groups of podosomes are surrounded by the classical adhesion adaptor proteins such as vinculin, talin and paxillin while actinin is present at the periphery of all single cores. This suggests that the sealing zone has an "intermediate" level of organization and that groups of podosomes form a functional unit within the sealing zone. The authors lastly demonstrate that the fluorescence intensity of the cores within these groups correlate with the intensity of the adaptor proteins that surrounds the group and that also the fluorescence intensity of the cores within one group correlates with each other.

      Strengths:

      The authors use bone slices to evaluate the nanoscale organization of cytoskeletal components in the sealing zone. Podosome conformations in osteoclasts strongly depend on the substrate type and the usage of bone slices accurately mimics the physiological environment in which osteoclasts reside in vivo.

      The authors use state-of-the-art imaging approaches to evaluation the nanoscale organization and dynamics of multiple podosome components in the sealing zone.

      The identification of groups of podosomes that demonstrate correlated dynamics within the sealing zone is a novel finding that is convincingly demonstrated.

      We thank the reviewer for these encouraging comments and the valuable suggestions below.

      Weaknesses:

      The rationale for the analysis performed on the DONALD super-resolution images (explained in Figure S1) is unclear. The analysis is also not properly explained and it is unclear how the data should be interpreted or put into context. Specific comments related to this analysis:

      – The authors make a distinction between towards the internal or external part of the cell when it comes to the height of the investigated proteins but it is unclear why this is done. Also, while the authors make this distinction, no conclusions are derived from this distinction and only the height values from towards the internal part of the cell are mentioned in the text.

      As the sealing zone is usually located near the cell periphery, we wondered whether the proximity of the peripheral plasma membrane could influence the molecular architecture of the structure, and a possible difference in tension between the inner and outer parts, and this is why we distinguished between the inner and outer side of the structures. However, our analyses revealed little difference between these two sides, the most striking being a closer proximity of the vinculin to the cores on the outer side of the belt. We now make this explicit in the manuscript (P3, L113116).

      • It is very much unclear how the distance of the investigated proteins towards the actin core is calculated. From Figure S1, it seems like a rectangle is taken that is centered around a podosome but the rectangle in the example contains more than one core. It seems like this would influence a proper interpretation of the data presented in the figures than contain the height values. The authors should better explain how the analysis was performed and how the analysis deals with the presence of multiple podosome cores in the rectangle of interest.

      We apologize for this omission. In order not to bias the analysis, the protein distance was calculated for all cores present, not just one. This is now specified in the legend of the figure.

      • In the text, the distance of the proteins with respect to the actin core is given (350nm-710nm depending on the specific protein and localization towards the external or internal part of the cell). It is mentioned that the measurements are not shown but it should be better explained how these numbers were derived from the data and the measurements (average, SD/SEM) should be shown.

      These values correspond to the maxima of the distributions of the different podosome markers shown in Figure 1G. Each of these proteins (vinculin, talin, filamin-A and paxillin) has a broad distribution marked by a depletion at the core, and not a peak as suggested by the first version of the manuscript. We propose not to indicate these values in the revised version in order to simplify the manuscript and not to confuse the reader.

      • Related to the previous comment. While it is mentioned that vinculin for example is located at ~500nm from the actin core, the height values (Figure 1E) are binned within 50nm of the core. This does not seem to match. It would be very helpful if the authors would add how many localizations are found so close to the core. Since this is expected to be low it would also be valuable it the authors would discuss what this means for difference in height between the molecules found close by and away from the core.

      Indeed, as shown in Figure 1G, vinculin is much less present in the center of actin cores than at 500 nm from these cores. The graph shown in Figure 1E, which shows the height of vinculin as a function of the distance to the core, without explaining the proportion of molecules detected, can indeed be confusing. This being said, a large number of molecules were detected, 197967 for the vinculin graph, including 5973 within 300 nm around the core, which is far from being negligible. To facilitate the understanding of this graph, as well as that of the graphs corresponding to the heights of the other proteins studied (Figures 1 and S2), we now superimpose on the height distributions, the frequency of the locations (new Figure 1E,F), still compiled in Figure 1G.

      • For cortactin, filamin A and actinin it is found that they reside on average at a height of approximately 150nm, even up to a large distance from the podosome core. It is unclear how these values should be interpreted. 150nm is way above the location where actin is expected to be (and also way above the average actin height that is found by the authors, with approximately 80nm more distant from the cores). The authors should add a discussion of what type of structures cortactin, filamin A and actinin would associate to at this position or how this height can be explained. This should also be included in the final model of Figure 6. In the current cartoon, filamin A for example seems to be associated with the integrins but this does not match with the height position observed by the authors.

      The average heights of cortactin, filamin-A and actinin are indeed around 150 nm, but are actually present over a wider range of heights (0-400nm), as shown in the histograms in Figure 1H. These values are therefore not inconsistent with the distribution of actin, which indeed has a lower average height, but is also present over this entire height (histogram now added in Figure 1H). These analyses suggest that there are different sets of actin filaments and that there is proportionally more cortactin, filamin-A and actinin on the high actin filaments, rather than on those close to the plasma membranes. To fully account for these results, we now point out the potential presence of different sets of actin filaments in the discussion (P7, L266-275) and corrected the model shown in the new Figure 6, placing a population of filamin A on the radial filaments, not just associated with integrins, and added filamin A and actinin in the side view of the model, to appreciate their likely localisation.

      The authors mention that the RIM resolution is 100nm and 300nm in the lateral and axial direction, respectively. This should also be confirmed on the bone slices with beads. It is well conceivable that the optical properties of bone have an effect on the optimal RIM resolution.

      In order to evaluate RIM resolution on osteoclast samples, as suggested by the reviewer, we did some experiments with beads and used the Fourier Ring Correlation Method (Nieuwenhuizen et al., Nat Methods 2013). This consists in making two RIM images with two different speckle illumination sequences, and comparing the correlations of the images in the Fourier space. The following figure shows the correlation curve as a function of spatial frequencies. The FIRE number, when the FRC curve reaches a correlation value of 1/7, gives an estimation of the resolution of the image.

      Using this approach, we evaluated the resolution to be of 125 nm, in average.

      The authors find three specific fluctuation periods (100s/25s/7s) but it is unclear what these periods mean. The authors only very briefly mention that these periods correlate with similar observations in macrophages but they should also add the implications of this finding and suggested a possible molecular mechanism that underlies these different fluctuations.

      We agree with this comment. So far, the mechanisms regulating these oscillations, whether purely mechanical or involving signaling, as well as and their importance for podosome and sealing zone function, are not yet understood. In van den Dries et al. Nat Commun 2013 and Labernadie et al. Nat Commun 2014, it was shown that these oscillations in macrophage podosomes depend on myosin IIA activity. It would thus be interesting to explore the effects of drugs interfering with actin polymerization on both the periodicity and the spatial synchrony properties of the sealing zone. We now discuss this point in the manuscript (P7, L296-300).

      The authors find that actinin-1 is localized around the podosome cores while filamin and vinculin surround groups of podosomes. The current representative images, though, that are chosen to support this difference display a very different density in podosome cores. The filamin and vinculin images seems to have a much denser podosome content compare to the actinin and cortactin images. I would encourage the authors to select images that are more comparable to fully appreciate the difference in localization of the associated proteins.

      This is a good point. Indeed, not all sealing zones are alike, especially with respect to the density of actin cores. This is why we have chosen to show a gallery of different cases (now in Figure S7), and not to intentionally select always the same patterns in the main figures in order not to mislead the reader. It is important to note that whatever the actin density, we find the same locations for the different proteins.

      In Figure 4 and 5, the authors show that the sealing zone is subdivided in groups of podosomes and it is implied that these for functional units within the sealing zone. Yet, it is unclear how persistent these groups are. Considering the dynamic nature of podosomes in other cell types (and as also demonstrated in the supplementary movies) it is well conceivable that these groups continuously fuse and remodel. To better define the nature of these groups of podosomes, the authors should add an analysis on these podosome groups and measure parameters such as group stability, podosome number per group, group size etc. This would very much enhance the novel aspects of the findings in this paper.

      Following the reviewer’s suggestion, we have quantified the number of podosomes per group and the group size. Measurements of these islets of clustered cores showed that they were 2.3 +/-2.1 µm² (average +/-SD) and contained in 7 +/-8 (average +/-SD) cores. These results are now included in the manuscript (P6, L213). Unfortunately, we could not accurately measure the stability of the clusters, as this would require a long, and challenging, time-lapse by RIM of osteoclasts expressing both paxillin-GFP and lifeact-mCherry, which we were able to achieve only on a few cells and on short timescales.

      The authors mention in the discussion that their finding about the groups of podosomes is very different from the "double circle" distribution found in previous publications. Yet, it is unclear what explains these different observations. While the authors use RIM super-resolution in this paper to assess the localization of the adaptor proteins, it is very unlikely that this is the source of this difference since the groups of podosomes would have been easily identified by conventional or confocal microscopy as well. The authors should add an extended discussion on how these differences could be explained and what this means for bone resorption properties.

      Indeed, our observation that the sealing zone is composed of islets of actin cores that are bordered by a network of adhesion complexes diverge from most of the previous studies describing a “double circle” organization. We believe that this difference may come, not only from the high resolution of our images, but mainly from the fact that most studies on the organization of sealing zones have been performed on mouse osteoclasts. We also believe that this particular organisation probably allows an efficient sealing of the osteoclast plasma membrane to the bone surface and maintains the resorption lacuna and the diffusion barrier. We now indicate this in the discussion (L7, P286-288).

    1. Author Response

      Reviewer #2 (Public Review):

      Huan-Huan et al. investigated the structure of phosphoribosyl pyrophosphate (PRPP) synthase (PRPPS) from Escherichia coli, a highly conserved enzyme from bacteria to mammals that catalyzes the synthesis of a key common compound for several metabolic pathways. Although the structure of this enzyme was known, the mechanism of regulation by ADP and AMP remained uncharacterized. Previously, the group of JE. Wilhelm found that PRPP synthetase from different eukaryotes assembles into long filamentous structures (named cytophidia). The present study shows that PRPP synthetase filaments also form in bacteria, both in vitro and in vivo. Then, they determined the structure of two different forms of PRPPS filaments at atomic detail using cryo-electron microscopy. Combining structural data with mutagenesis and activity assays, they demonstrate that the enzyme is regulated differently by allosteric effectors when assembled into one filament form or the other.

      The strength of the manuscript is the high-quality cryo-EM data, which allows the reconstruction of two different filament forms bound to different ligands, the identification of a new regulatory site, and the description of the movements of the regulatory loop at the active site, which either blocks the active site (in filament type B) or hampers the binding and inhibition of ADP to the allosteric site (in filament type A).

      Based on the structural information, the authors designed point mutants that favor the formation of one filament type or the other. Using these mutants, the authors dissect the different responses of the two filament forms to the nucleotides that bind and regulate the reaction rate.

      The authors conclude that filament formation is not needed for PRPPS activity, but that the formation of these filaments is an additional layer to fine-tune its activity.

      Overall, the data are of high quality and the conclusions are of interest to understanding the significance of the organization of proteins into supramolecular membrane-less compartments.

      A similar filamentous organization is expected for this enzyme in other higher eukaryotes, including humans. Defects in the human enzyme are the cause of rare congenital diseases. Based on the current data, the authors speculate that the mechanistic effect for certain pathogenic variants could be affecting the formation of the filaments.

      This manuscript reveals that this enzyme is more complicated than initially expected. The newly proposed regulatory mechanism is not easy to understand, since ADP can inhibit or enhance the reaction depending on whether it binds to one regulatory site or to the other, but also by competing with ATP in the catalytic site. Some parts of the text and figures are not sufficiently clear and difficult to follow. The authors could make an effort to improve clarity and correct grammatical issues.

      We would like to thank the reviewer for the informative comments on our manuscript. We have endeavored to address them as fully as possible in our revision. The manuscript has been rewritten to improve the clarity and the presentation.

      Reviewer #3 (Public Review):

      This work aims to investigate the role of self-assembly in the regulation of enzyme activity of E. coli PRPS (EcPRPS). EcPRPS is an important enzyme in the biosynthesis of nucleic acids and some amino acids, forms micron-sized self-assemblies in cells (cytoophidia), is allosterically inhibited by ADP, and is activated by inorganic phosphate, Pi. The authors set out to investigate the structure and function of the filamentous form of the enzyme responsible for cytoophidia formation.

      The authors present two new, high-resolution cryo-EM structures of filamentous forms of EcPRPS, each assembling via unique stacking of EcPRPS hexamers. One form, type A, was formed in the presence of ATP and Mg2+, and the cryo-EM map was interpreted as containing one ADP and one R5P (ribose 5-phosphate) in the active site (as well as two Mg2+). The substrates of EcPRPS are ATP and R5P and the ADP occupies the expected ATP binding site, though neither ADP nor R5P was supplied in the experimental solution. Surprisingly, a second ADP molecule is also identified, bound near the active site at a location named site 2. In addition, one Pi is bound in the canonical allosteric site, site 1. A second type of filament (type B) was formed in the presence of Pi and contains two Pi, one in site 1 and one in the R5P binding site of the active site.

      Analysis of these two structures revealed a significant change in the positioning of a segment of the enzyme, named the Regulatory Flexible (RF) loop. In the type A filament structure, with the active site occupied by ADP and R5P, the RF loop interacts with the ADP in site 2. In the type B structure, with an empty site 2, the RF loop sits in a different position and occupies the ATP binding site of the active site. The suggestion made by these observations is that the binding of ADP in site 2 stabilizes the RF loop such that ATP may bind the active site, and without ADP in site 2, the loop will block ATP from binding the active site. This suggests that ADP is not merely an allosteric inhibitor, but could also act as an activator.

      As for allosteric site 1, which is occupied only by Pi in both structures, the authors suggest it binds ADP as seen in structures of homologous PRPS enzymes, and that this binding is the cause of allosteric inhibition by ADP. Structural comparisons suggest that the binding of ADP to site 1 is controlled by the position of the RF loop, which in turn can be controlled by interactions with ADP at site 2. The authors suggest that the binding of ADP to site 1 and to site 2 is mutually exclusive via this mechanism of RF positioning.

      In addition to the structural analysis, a strength of the paper is the use of point mutations to investigate the effects of eliminating one or both types of filaments on enzyme activity, cell growth, and cytoophodia formation. When either one or the other type of filament form is knocked out, cytoophidia still forms, but when both types are knocked out using a double-mutant, no cytoophidia form. This suggests that both filament types form in cells and are responsible for cytoophidia formation. Effects on cellular growth were nominal, and largest only with the double-mutant, which grew much more slowly than the wild-type enzyme. At longer time points only, each single point-mutant showed faster growth than the wild-type enzyme. These results are interpreted by the authors to mean that both types of filaments form in cells and have functional consequences.

      In activity assays, the double-mutant, which does not form either type of filament, showed a very high level of sensitivity to allosteric inhibition by ADP, suggesting that filament formation mitigates this to some degree (and that filament formation is not necessary for allosteric inhibition by ADP). The absence of type B and presence of type A filaments leads to lower sensitivity to allosteric inhibition by ADP: lower than wild type EcPRPS and lower than when neither filament forms. Hence the presence of the type A and absence of type B filament leads to greater enzymatic activity and less allosteric inhibition by ADP than no filamentation or when both types are present. Absence of the type A filament appears similar to the double-mutant (which does not have filament) in that it is very sensitive to ADP inhibition. The conclusion is that the type A filament mitigates allosteric inhibition by ADP, while the type B filament is allosterically inhibited by ADP (similar to the non-filamenting enzyme).

      The work has a few weaknesses. First, R5P was not included in the solution used to prepare the type A filaments, yet is built into the cryoEM map. The map around the modeled R5P is not shown, making it difficult to assess this interpretation. Second, no filament structure with ADP in site 1 has been determined. Instead, the structure of a related PRPS with ADP in site 1 is shown, but the position of the RF loop in that structure does not occupy the ATP binding site as implied by the authors to be the function of this conformation (i.e. when ADP is bound in site 1).

      The authors also claim that the type B filament enhances inhibition, but in fact, shows similar inhibition to the enzyme which cannot form filaments. However, when type A filaments are present, it appears that type B filaments are necessary to allow for some allosteric inhibition by ADP. Though not discussed, it may be that levels of the two types of filaments are altered to control overall enzyme activity. In addition, much of the discussion deals with interpretations about binding affinities of ligands to various sites, but all evidence used is indirect, as no binding affinities are measured directly.

      Another weakness is in the investigation of site 2. The authors claim that ADP binding to site 2 enhances ATP binding in the active site, however, the mutation designed to disrupt ADP binding to site 2 results in reduced ATP in the absence of ADP, not in its presence.

      Finally, a discussion of the role of Pi, and how the choice of the two filamentous forms is chosen is not addressed in the study. The authors show compelling evidence to support their coexistence in vitro and in cells, and differing activity, however, how and why one is formed over the other has yet to be uncovered.

      We would like to thank the reviewer for the informative comments on our manuscript, and we have endeavored to address them as fully as possible in our revision. Specifically,

      1) We have added a supplementary figure showing the map around R5P. 2) We did not propose when ADP bound to allosteric site 1, the RF loop occupy the ATP binding site. 3) We temper our claims on type B filament in the absence of direct binding affinity measurement. 4) We temper our claims on the role of ADP binding to site 2. 5) We have added discussion on the role of Pi and how the choice of the two filamentous forms is chosen.

    1. Author Response

      Reviewer #2 (Public Review):

      A 2 amino-purine fluorescence-based DNA melting assay is used to show that both 3' and 5' recessed DNA molecules (with ATPgS) exhibit an increase in fluorescence interpreted to mean that melting occurred (Supplementary Figure 1.4). Given the structure-based finding that the first molecule must bind first to enable binding of the second molecule, is it surprising that the 5' recessed molecule on its own is bound and melted (i.e., without the 3' recessed molecule binding first)?

      We thank the reviewer for the support. Our model (Figure 6) suggests that the non-primer duplex (second molecule, with a recessed 5’ end) binds and melts first at RFC’s external DNA binding site. Our 2AP experiments showing that the 5’ recessed molecule binds and is melted on its own (Figure 4—figure supplement 2) agrees with this model.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study Crawford et al., studied the protein composition of translating ribosomes in yeast under unstressed or stress conditions. To achieve this, the authors employed a combination of polysome profiling, which is a method that separates translating from non-translating ribosomes based on their sedimentation in sucrose gradients, and mass spectrometry. They identified aspartate aminotransferase, Aat2 as one of the proteins that is enriched in translating ribosomes in stressed cells. Crawford et al., went further to show that deletion of Aat2 impairs adaptation of yeast to osmotic stress and provided some evidence that Aat2 may play a role in integrated stress response. Finally, the authors show that the aminotransferase activity is not required for its function in translation and stress responses. Altogether, this study was found to be of high interest as it provides further potential insights in the molecular underpinnings of stress adaptation and further emphasizes potential unconventional roles of metabolic enzymes in regulation of translation. It was therefore thought that this study is likely to be of broad interest to the fields of biochemistry, molecular and cellular biology and beyond.

      Strengths: This study is based on an elegant combination of biochemical and genetic approaches. Evidence implicating Aat2 in oxidative stress response was found to be strong. In addition, it was appreciated that authors demonstrate that a potential role of Aat2 in regulation of protein synthesis under stress is independent of its aminotransferase activity.

      Weaknesses: The major weaknesses were thought to be related to the relative lack of the mechanistic evidence of how Aat2 is recruited to the ribosomes. In addition, factor(s) that transduce signals from stressors to entice Aat2:ribosome association remain(s) elusive.

      We thank the reviewer for their positive comments about our manuscript. To address the weaknesses: as outlined below a fraction of Aat2 is constitutively associated with ribosomes under the conditions we have investigated, rather than appearing during stress.

    1. Author Response

      Reviewer #1 (Public Review):

      “The presentation of the clinical data in table 1 is very short and patchy and seems incomplete, also some of the classifications don't appear to be correct E.g. PTEN hamartoma tumor syndrome is a genetically distinct entity, that does not harbor somatic PIK3CA mutations but rather germline PTEN mutations. There are 5 patients with CLOVES and 1 patient with KTS, these patients often have mixed (e.g. lymphatic-venous) malformations, are the analyzed samples truly pure LMs? There are some more instances where I wonder if the presented data allows the reader to understand the cases.”

      Thank you for this thoughtful comment. One of the limitations of our study is that the CGP cohort was a retrospective study of data available from an international reference laboratory. While the use of data from a reference laboratory enables the study of relatively high numbers of rare diseases, it limits us to only clinical information provided by the ordering physicians at the time of testing. The table data under the heading “Clinical syndrome” was collected from review of test order forms and pathology reports submitted to the reference laboratory at the time of testing. The scope of our study did not enable outreach to ordering physicians and pathologists to determine if and how the genomic results refined the working clinical diagnosis and/or pathologic diagnoses.

      To more accurately describe the source of data in the “clinical syndrome” column, we have now edited the column to read “Submitted clinical syndrome” and clarified this in the Results section and, further, have clarified the limitations of this study in more detail in the Discussion.

      “If the histology is described as kaposiform, these cases likely represent kaposiform lymphangiomatosis, which is a very different disease than common LMs. KLA belongs to the group of complex lymphatic anomalies and usually is caused by NRAS mutations, which would be in line with the presented data. Case 24 (conventional histology, NRAS mutation) could also be a generalized lymphatic anomaly. This distinction of common LMs and complex lymphatic anomalies (including GLA and KLA) should be made and should include what is known about the genetics of these diseases. Taken together with the first point, the presentation of the cases might benefit from a more structured description and classification.”

      We agree that our NRAS mutant tumors with kaposiform histology are compatible with the entity of kaposiform lymphangiomatosis (KLA), and have therefore added additional details about the clinical, histologic, and genomic features of KLA to the Introduction and Discussion.

      “In the discussion and other parts of the manuscript, terms describing LMs and tumors are interchanged frequently. This mistake is also present in the study protocol (NCT03941782), in which "locally advanced or metastatic cancer" is listed as an inclusion criterion. Other examples include "tumor nuclei". Much of the cited literature also focuses on oncology rather than vascular malformations. And LMs are directly compared to "other low-grade pediatric tumors".

      Also, clonality is a concept not too often used in vascular malformations, as an aberrant development of vascular structures during embryogenesis is seen as the cause of vascular malformations, as opposed to clonal expansion in tumors (but this might warrant further investigation in the field). Thus, the manuscript mixes tumors and malformations, however, it should be stressed, that LMs are not tumors but vascular malformations.”

      Thank you again. We have considered the diagnosis and classification of vascular anomalies (vascular malformations and others) to be a holistic integration of clinical examination, imaging studies, pathology diagnosis, and/or genomic results. As addressed above, one of the limitations of our study is that the CGP cohort was a study of data available from an international reference laboratory. While the use of data from a reference laboratory enables the study of relatively high numbers of rare diseases, it limits us to only clinical information provided by the ordering physicians at the time of testing and only to one representative pathology specimen submitted by the pathology laboratory. The scope of our study did not enable outreach to ordering physicians and pathologists to determine if and how the genomic results refined the working clinical diagnosis and/or pathologic diagnoses.

      Recognizing these limitations, we agreed to use the term lymphatic malformations to describe the lesions in our cohort. Lymphatic malformation is widely accepted to include a clinicopathologic continuum of benign tumors of lymphatic origin (https://rarediseases.org/rare-diseases/lymphatic-malformations/), including cystic lymphangioma, kaposiform lymphangiomatosis, macro/microcystic lymphatic malformation. While evidently an imperfect solution, we have also clarified their use of this term in the Introduction.

      “The explanation for the reduced EF doesn't quite make sense, as there should be little blood flow into the LM. This is different from the Venot paper in Nature, where the reduced EF was due to the presence of an AV malformation.”

      We agree the EF was an unusual finding and our case with reduced EF is somewhat bewildering. However, EF was measured both by ECHO and by cardiac MRI with the same results. The LM in this patient was of a giant size and could potentially produce hemodynamic changes, as proposed in the Results. That is the only coherent explanation we could identify as the EF corrected after we achieved a response.

      “The data on the (back at the start of treatment of the patient novel) alpelisib is presented as a rather new finding. However, clinical data on its use in PROS diseases have already been published starting in 2018 (the paper from Venot is also mentioned in the manuscript). At the moment, an international clinical trial on alpelisib in PROS disease is recruiting, which could be mentioned.”

      Our index case of a giant LM treated with alpesilib was within an early trial when the drug was experimental, and it represents the only case of non-syndromic LM that achieved complete remission and remained in sustained remission for years, but relapsed after discontinuing the medicine. Thus, it provides mechanistic insight into the potential efficacy and need for continuous administration in sporadic non-syndromic cases. Interestingly, the FDA approval of alpelisib came while this manuscript was under review. The Discussion was revised to better describe the current state of the field.

      “Treatment: The rather high dose of 350 mg/d is not further discussed. Also, a patient like this would usually first receive sirolimus, especially back when alpelisib was started in this patient since it was much more experimental at that time point. This should also be explained.”

      When our trial patient was deemed a challenging surgical oncology case, we felt targeting the driver mutation made more sense than using mTOR inhibitors, and indeed we achieved complete response with the phase I selected dose with no detectable toxicity. The patient continues on alpesilib as of now, more than 5 years later.

      Reviewer #2 (Public Review):

      “This is not the first demonstration of somatic activating NRAS mutation associated with Kaposiform lymphangiomatosis. A prior study demonstrated that 10/11 patients with Kaposiform lymphangiomatosis had NRAS mutation (PMID 30542204).”

      Thank you for the suggestion to add clarity to the statement that our findings of NRAS in LM with kaposiform histology align with prior studies showing NRAS driver mutations in kaposiform lymphangiomatosis which is considered to be a distinct histologic and clinical entity. We have now refined the statement to highlight the relationship between the NRAS mutant cases in our cohort and kaposiform lymphangiomatosis.

      “This is not the first demonstration of somatic activating PIK3CA patients exhibiting malformation shrinkage with alpelisib and they only had a single treated patient. A prior study showed similar results in 6/6 patients treated with alpelisib for 6 months (PMID 34613809).”

      Correct. We have now added this reference to our Discussion, and we appreciate the replicability of recently emerging findings. Of note, our treated index case provides unique mechanistic insight in terms of speed and depth of response, durability, and need for continued therapy despite sustained complete remission in this setting, as discussed.

      “The effects of alpelisib on cell growth in vitro were tested on lymphatic cells from one patient, and the results would have been strengthened if cells from 2 or more patients had been tested.”

      We agree that additional cell lines would have been desirable; however, as this was a confirmatory study to more fully understand our index patient data, we believe that one cell line is sufficient to demonstrate the activity of alpelisib. Future studies will confirm the findings in other cell lines.

      “For the RNA-seq experiments, the significance of the gene expression remains unclear, especially given the numerous cell types present in their tissues.”

      We agree that these neoplastic lesions represent mixture of cell types and most are reactive and non-clonal. But we still appreciated an intriguing gene expression profile that would otherwise provide other targets such as JAK3, for this entity that appears druggable from different angles. New Figure 4 may enable the further interrogation of the relevant gene expression profiles.

    1. Author Response

      Reviewer #1 (Public Review):

      This study by Hurwitz et al. defines a functional relationship between the ISR and microtubule dynamics. This is mediated through the mRNA-specific translation of genes including ATF5 in the context of proteotoxic stress. They further show that this relationship is particularly important in the context of recovery from bortezomib treatment which in the WT setting leads to efficient clearance of protein aggregates. However, this process is significantly less efficient when the ISR is impaired through phosphorylation defective eIF2alpha. The authors use whole transcriptome ribosome profiling to identify mRNAs that are differentially translated upon treatment with bortezomib and uncover a subset of mRNAs which exhibit an increase in translation. This list of 24 mRNAs includes ATF5 which they functionally show is important for cell survival in the context of proteotoxic stress. Overall, this work is solid and provides new insights into SCC management proteotoxic stress. This work reveals a new arm of cellular regulation dynamics that is controlled by the ISR and helps grow our understanding of how the ISR enables survival in the context of distinct stress.

      We thank this Reviewer for his/her comments. We are pleased to notice that he/she finds our work to be solid and innovative and that it will help to advance our understanding of how the ISR promotes recovery in the context of distinct stresses. We also appreciate this Reviewer’s enthusiasm for our work, which he/she defines as interesting and a strong contribution to the literature.

      Reviewer #2 (Public Review):

      The manuscript from Hurwitz et al. documents a connection between the integral stress response (ISR) and a centrosome-mediated protein clearance mechanism, using skin carcinoma cells as a model system. Using ribosomal profiling and molecular approaches, the authors identify that upon stress, the IRS promotes a shift in the translation of centrosomal proteins required for the clearance of unfolded protein-enriched aggregates in the pericentrosomal area. Abrogating the ISR response sensitizes cancer cells, promoting cell death. The authors generated useful cellular tools for the community and information about the translational changes of specific proteins involved in the IRS response, paving the way for future studies.

      There are no major significant weaknesses, and the authors achieve their aim of dissecting the relevance of the ISR in skin carcinoma cells.

      We are grateful to this Reviewer for his/her comments. We are happy to note that this Reviewer acknowledges the generation of useful research tools for the scientific community and that our work will likely pave the way to future research endeavors on this topic. We are also pleased that he/she found our work to be convincing and statistically robust without any significant weaknesses.

      Reviewer #3 (Public Review):

      This is an interesting study that identifies why or how the ISR pathway regulates cell recovery upon proteotoxic stress, which is especially interesting in cancer cells resistant to proteasome inhibitors. The study concludes that only by favouring canonical translation initiation of mRNAs encoding microtubule cytoskeleton, centrosome and ATF5 proteins are necessary to recover from proteotoxic stress. The study is robust and uses advanced pre-clinical models and sequencing techniques to explore the translatome of stressed cancer cells.

      The authors claim that they find a proteotoxic mechanism exclusive to SCC stem cells. However the authors do not use stem cells, they work with primary SCC cells. They would need to actually show in stem cells that this is the case and that normal keratinocytes or epidermal stem cells do not use this exclusive mechanism. In addition, it would be very interesting to translate these findings into the clinic. It would be interesting to know how relevant this mechanism is for human tumour cells.

      We thank this Reviewer for his/her useful comments. We are pleased to notice that he/she found our study to be interesting and robust. We share this Reviewer’s excitement for our pre-clinical model and agree that future study should aim at dissecting the clinical relevance of the combination of proteasome inhibitors together with ISR inactivation.

      In our study, we used primary keratinocytes, which have been transformed by expression of mutant HRasG12V and deletion of TgfbrII. These cells have been previously characterized and used as a model of SCC stem cells (Yang et al., 2015). To address whether transformed keratinocytes are differentially sensitive to proteotoxic stress, we compared responses to bortezomib in primary keratinocytes isolated from littermates: either WT control (TGFbrIIfl/fl), HRasG12V expressing, pre-transformed keratinocytes (HRasG12V; TGFbrIIfl/fl) and fully transformed keratinocytes (HRasG12V; TGFbrIID/D), which were used to generate S51 cells.

      We find that WT keratinocytes are significantly more sensitive to bortezomib, underscoring a unique way in which cancer cells are able to protect themselves against proteotoxic stress (Figure 1G and Figure1figure supplement 3A). We show that a major difference the unique ability of cancer cells to translationally upregulate ATF5 during stress (Figure 7H). The ability of the cancer cells to do so relies upon the ISR, as when eIF2a phosphorylation is prevented by S51A, the cancer cells are no longer protected. In agreement with a role for ATF5 in centrosome dynamics, WT cells fail to increase MTOC size (Figure 4-figure supplement 3). Altogether these data suggest that SCC cells have acquired a resistance mechanism to proteotoxic stress, which allows rapid recovery and preservation of microtubule function.

    1. Author Response

      Reviewer #2 (Public Review):

      (1) The authors mention that they did not observe DA release at sites that did not also have bassoon puncta. However, the data in Figures S13A, and B suggests that this statement may be true only to a rough approximation. If possible, the authors should verify this statement by quantifying the DopaFilm signals at bassoon positive and bassoon negative areas.

      We thank the reviewer for this important comment. The image in Figure S13A/B (now Figure 6—figure supplement 2A/B) was taken with a low magnification objective and we can see how it may paint a somewhat unconvincing picture of the localization between Bassoon and ∆F/F responses. However, because the specimen from which the data is collected is no longer available, we are unable to reimage and reanalyze this particular data.

      To strengthen our claim, we instead chose to offer an analysis that quantitatively examines the correlation between Bassoon expression and ∆F/F hotspots activity from a different dopamine neuron. In this analysis, we show that DopaFilm activity detected at a location is strongly correlated with Bassoon expression at the same location. This new analysis is consistent with our observation that Bassoon plays an important role in orchestrating release in dendritic processes.

      Changes to manuscript in response to this comment:

      (1) A new Figure 6—figure supplement 1 is added. (2) Accompanying text in manuscript referencing this new figure and updates to Methods section describing how this analysis was carried out.

      We modified the original sentence that read:

      “Importantly, while we observed that presence of Bassoon did not necessarily indicate presence of DopaFilm activity, we did not observe DopaFilm activity from a dendritic process that did not have Bassoon puncta (Figure S13).”

      The sentence now reads:

      “Importantly, density of Bassoon expression at a location correlated positively with the magnitude of ∆F/F activity measured by DopaFilm at the same location (Figure 6— figure supplement 1-2).”

      (2.1) In Fig 7C, the synaptobrevin2 staining does not seem to overlap well with the MAP2 or TH-GFP staining. Please comment on whether the synaptobrevin staining shown here represents staining in neighboring glutamatergic cells that are present in the co-cultures.

      The non-overlapping synaptobrevin-2 signal shown in Figure 7 are from other neurons in culture. We have clarified this in the text.

      Changes to manuscript in response to this comment:

      (1) We modified the figure caption for Figure 7 with this additional sentence:

      “Red puncta that do not colocalize with TH-GFP are from non-dopaminergic neurons in the co-culture system.”

      (2.2) Related to this, did the authors find any dependence of dopamine release here on glutamatergic transmission from cortical neurons? Please comment on this.

      We would first like to thank the reviewer for encouraging us to examine the effect that glutamatergic transmission may have on dopamine release in the co-culture system. To investigate this, we carried out experiments in which AMPA-type glutamate receptor antagonists NBQX and NMDA-type glutamate receptor antagonist D-AP5 were bath applied to the culture system while imaging release from dopamine neurons. We first examined neurons from which DopaFilm activity can be detected from spontaneous spiking events (that is, in which we applied no external stimulus to generate activity but from which we measured action potential driven, synchronous release). We imaged from these neurons under ACSF (our normal imaging buffer) and then applied NBQX (10 µM). DopaFilm activities that were detected before application of NBQX were absent in the post drug imaging sessions (see Figure 3—figure supplement 2A-B). Application of NBQX was sufficient to abolish the activity. We observed this phenomenon from n = 4 dopamine neurons.

      Additionally, we examined the extent to which glutamatergic currents contributed to dopamine neuron depolarization during evoked activity imaging. To investigate this, we carried out evoked imaging before and after glutamate receptor blockade with a combined application of NBQX and D-AP5. Here, such treatment resulted in reduced DopaFilm activity as measured by the peak amplitude of ∆F/F traces and the area under the curve (AUC) of ∆F/F traces (Figure 3—figure supplement 2C-E).

      Glutamatergic regulation of dopamine release is an interesting phenomenon that is worthy of a systematic exploration in an independent study. There are several outstanding questions and controversies in this field that are beyond the scope of the current study. However, these preliminary data suggest to us that our system may be suitable for examining how glutamatergic transmission regulates dopamine release, and if such regulation can occur independent of dopamine cell body firing. Our ability to measure dopamine release with synaptic spatial resolution may offer new insights into these phenomena in future studies.

      In light of these new experiments, we have made the following changes to the manuscript.

      Changes to manuscript in response to this comment:

      (1) A new supplementary figure (Figure 3—figure supplement 2) that shows the effect of glutamate transmission on DopaFilm activity.

      (2) A new paragraph in manuscript discussing these results. The new paragraph reads:

      “In our study, dopamine neurons are co-cultured with cortico-hippocampal neurons, and we explored if glutamatergic activity from neurons in co-culture could influence dopamine release. To investigate this, we carried out experiments in which AMPA-type glutamate receptor antagonist NBQX and NMDA-type glutamate receptor antagonist D-AP5 were bath applied to the co-culture system while imaging release from dopamine neurons. We first examined neurons from which DopaFilm activity can be detected from spontaneous spiking events in which we applied no external stimulus to generate activity. We imaged from these neurons under artificial cerebrospinal fluid (ACSF, our normal imaging buffer) and then bath applied NBQX (10 µM). DopaFilm activities that were detected before application of NBQX were absent in the post drug imaging sessions (Figure 3—figure supplement 2A-B). Application of NBQX was sufficient to abolish these activities. Additionally, we examined the extent to which glutamatergic currents contribute to dopamine neuron depolarization during evoked activity imaging. To investigate this, we carried out imaging before and after glutamate receptor blockade with a combined application of NBQX and D-AP5. Here, such treatment resulted in reduced dopamine release as measured by the peak amplitude of ∆F/F traces and the area under the curve (AUC) of ∆F/F traces (Figure 3—figure supplement 2C-E). In sum, these results indicate that DopaFilm offers an opportunity for direct measurement of dopamine release under pharmacological perturbations and suggests that our in vitro culture system may permit simplified explorations of local chemical circuitries that control dopamine release in the absence of complex circuit effects that may be encountered in vivo.”

      (3) In Figure 4, optical stimulation results in DA release and fluorescence increases at multiple hotspots. Interestingly, the change in fluorescence reaches very similar amplitudes across hotspots for each stimulation (compare dF/F at first red symbol across hotspots for example). Does this indicate saturation of the nanosensor? Interestingly, this seems to be true for the third stimulus as well, after depression when the signals are much smaller. By contrast in the dendrites, this doesn't seem to be the case, as shown in Fig S12. Better clarification on this point will inform whether DopaFilm can be used to probe synaptic release properties such as variance, etc. Please comment on this.

      The reason the traces appear to have the same peak amplitude is because of the scale of the figure. A closer look at the traces, when drawn on the same y-axis, shows that there is variability in the peak amplitude achieved across ROIs. To visualize the ∆F/F traces where one can better appreciate the variabilities observed from one hotspot to the next, we have created a new supplementary figure (Figure 4—figure supplement 1) that addresses this comment as well as several other comments. Additionally, we would like to point out that other data presented in this study show that there is variability in the measured peak ∆F/F values and sensor saturation is not observed (Figure 2B, Figure 3I, Figure 4E). However, it is possible, under some cases of high dopamine release, that we may observe a non-linear response from DopaFilm.

      Changes to manuscript in response to this comment:

      (1) We generated a new supplementary figure, Figure 4—figure supplement 1C-D, to go along with main Figure 4 to help improve the presentation and visualization of the data. Figure 4—figure supplement 1 has several panels that addresses this comment and several others from the reviewer.

      (4) Short-term plasticity. The authors suggest that dopamine neurons can sustain robust levels of release with no depletion but do not directly show this. Please provide time courses of DA release both from axons and dendrites during repeated stimulation. Relatedly, data shown in Fig 3I shows multiple stimulations without indicating the interstimulus interval. Please report the interstimulus interval for these experiments. The text mentions 1 stim per 2-3 min, but this is unclear. Lastly, optical stimulations in Figure 4C demonstrate multiple stimulations over time. It would be useful to see this quantified/normalized to the first stimulation.

      We would like to thank the reviewer for encouraging us to clarify some of our presentation of the data. Figure 3A was meant to provide the reader with evidence for robustness of response across stimuli but we concede that more can be done to support our claims in quantitative terms. Accordingly, we have now provided time course of response from DopaFilm hotspots collected from an extended imaging session. We did this for axons and dendrites in two separate supplementary figures (Figure 3—figure supplement 1A/B for axons, Figure 5—figure supplement 4 A-B for dendrites). In addition, we have clarified the interstimulus intervals both in the body of the text as well as figure captions. For data in Figure 3, the time on the x-axis provides time interval between stimuli. These are multiple stimuli delivered within the same imaging session, and the short-term depression observed typically recovers after a rest period.

      Changes to manuscript in response to this comment:

      (1) We generated figure panels A and B in Figure 3—figure supplement 1 that will serve as a supplement for Figure 3. We produced a similar data for dendrites in Figure 5— figure supplement 4 A-B.

      (2) We clarify stimulation and imaging protocols in the accompanying figure captions.

      (5) Kinetics of release measured with DopaFilm. Figures 2B and 2D suggest that dendritic release is fast but the scaling of the traces shown makes it difficult to see onset timing. Please provide measurements of the averaged time-to-peak for both axonal DA release and somatodendritic release. Also, it would be helpful to the reader to discuss how these times compare to the time course of DA release as measured using dLight or GRABDA, carbon fibers and D2 IPSCs.

      Thank you for this comment. We appreciate that the discussion of the turn-on and clearance kinetics in axons and dendrites, and how these compare with other methods of measuring dopamine release, will be of interest to the broader readership. Accordingly, we now provide a comparison of time-to-peak (τpeak) and first order decay time constant (τoff) for activities measured in axons and dendrites. We discuss these values in the context of values reported for other tools.

      Changes to manuscript in response to this comment:

      (1) A new panel in supplementary Figure 2—figure supplement 1 is generated that provides information on the on and off temporal properties.

      (2) A discussion of these values is provided in the results section of the main text. The new text reads:

      “The turn-on and clearance kinetics of the measured transients in axons were 0.46 ± 0.16 s (Mean ± SD) for time-to-peak (τpeak) and 3.83 ± 0.8 s (Mean ± SD) for first order decay time constant (τoff) (Figure 2—figure supplement 1). The turn on kinetics is slower than those reported for the genetically encoded dopamine sensors GRABDA (≈100 ms) and dLight (reported as τ1/2 of ≈10 ms followed by a plateau of ≈100 ms).32,33 On the other hand, decay kinetics appears to be slower than dLight (reported as τ1/2 ≈ 100 ms) and comparable to or faster than those reported for GRABDA ( ≈ 3 – 17 s for variants). For comparison, GIRK-current based dopamine dynamics measurements exhibited τpeak ≈ 250 ms whereas carbon fiber recordings peaked in τpeak ≈ 300 ms.34 This suggests that the kinetic properties of DopaFilm transients are comparable with the range of reported values from existing tools.”

      Reviewer #3 (Public Review):

      Bassoon spots with no dopamine release. Do these silent sites always remain silent? What is the percentage of these 'silent' sites compared to all observed dopamine release events?

      We would like to thank the reviewer for encouraging us to compute the fraction of boutons that are release-competent in a given axonal arbor. We found that this value varied greatly, with some FOVs producing up to 65% release-capable boutons, whereas others had just small fraction (5%) that participated in release.

      Changes to manuscript in response to this comment:

      (1) We added the following sentence to the results section at the relevant location:

      “The percentage of release-competent boutons varied greatly in axonal arbors, ranging from 5% in some FOVs to 65% in others, with a mean of 32% of putative boutons participating in release.”

    1. Author Response

      Reviewer #2 (Public Review):

      Use of binding models is common in rational drug design so it is understandable for the authors to pursue a binding model for MBDTA-2. It is difficult to assess the utility of the docking model for SAR development without a better understanding of how many docking conformation predictions the software provided and/or a measure of the docking score.

      We thank the reviewer for the comment. The predicted binding affinity is now described as follows: Line 158: ‘with a binding affinity of -6.2 kcal·mol-1’.

      The measure of the apparent Km values for the substrate and co-factor with MBDTA-2 at the sub-saturated IC50 values (6.92 and 8.58 micromolar) would help better understand the potential interaction between MBDTA-2 and the substrate and co-factor at the binding site.

      The point raised by the reviewer has been incorporated as a discussion point at Line 278: ‘Such rational design efforts could also be guided by additional kinetic assays in the presence of subsaturating amounts of MBDTA-2. Examining any changes in the KM values for the DHDPR substrate and cofactor under these conditions may provide further insights into MBDTA-2 interactions at the binding site.’

      Interpretation of the whole plant data for such an application would be clearer with the inclusion of the application rate and whole plant data for the positive control, chlorosulfuron PESTANAL.

      The chlorsulfuron positive control data has been added to Figure 5, panel A. The Figure 5 legend has been updated accordingly with the addition of ‘or 1200 mg·L-1 of chlorsulfuron.’ The claims for biological activity on Lolium rigidum have been altered as follows: Line 216: ‘as we have demonstrated that MBDTA-2 possesses herbicidal activity against one of the most problematic weed species to global agriculture’ deleted; Line 81: ‘we successfully extended our previous herbicidal activity studies’ changed to ‘we extended our previous in vivo activity studies’; Line 271: ‘herbicidal’ deleted. Line 211: the extrapolated application rate has been removed by deleting ‘(equivalent to 48 kg·ha-1)’.

      It would be interesting to get the authors' perspectives on opportunities to utilize the binding data for MBDTA-2 on DHDPS and the docking model data for MBDTA-2 on DHDPR to identify new analogs that could have increased affinity for both enzymes with the goal to increase the whole plant activity.

      We thank the reviewer for the comment. The possibility of utilising the crystallography and docking data for the rational design of more potent analogues has been noted in the Discussion with the following addition to Line 275: ‘For the future development of dual-target herbicides, the previously published DHDPS co-crystal structure (Soares da Costa et al., 2021) and the DHDPR binding model presented here could be used for the rational design of new MBDTA-2 analogues with increased target site activity.’

    1. Author Response

      Reviewer #1 (Public Review):

      Kim et al. demonstrated biphasic roles of ERK-MAPK-mTOR pathway in osteoblast differentiation. They first showed the administration of the MEK inhibitor trametinib increased bone formation and prevented bone loss in OVX mice. They also confirmed the effect of MEK inhibition on late phases of osteoblast differentiation in the culture of human bone marrow-derived mesenchymal stromal cells (hBMSCs). They then focused on the action of ERK-MAPK pathway on the late phase. Indeed, deletion of MEK1 and MEK2 in mature osteoblasts and osteocytes (Dmp1-cre-dKO) led to increased bone mass with augmented osteoblast function, which was also confirmed by an in vitro culture of the mutants' osteoblasts; Ocn-cre-mediated inducible deletion of MEK1 and MEK2 in mature osteoblasts resulted in the similar phenotypes. However, osteocyte apoptosis was increased in Dmp1-cre-dKO. Gene expression profile obtained by RNA-seq supported the mutants' osteoblast phenotypes. Besides osteoblast differentiation-related genes, angiogenic factors were upregulated in the mutants. Conditioned medium of the mutants' osteoblasts enhanced osteogenic potential of mouse BMSCs and in vitro capillary formation of endothelial progenitors. They further found that ERK inhibition augmented glutamine metabolism and mitochondrial function, possibly leading to enhancement of osteoblast function. Lastly, they demonstrated that mTORC2 and its downstream factor SGK1 was involved in the ERK inhibition-mediated osteoblast phenotypes. Based on these data, they propose that the ERK-mTORC2 axis, where ERK inhibits mTORC2, regulates osteoblast differentiation and angiogenesis. Overall this study is well performed, and the manuscript is clearly written.

      We thank the reviewer for summarizing and highlighting the significance of our manuscript. We appreciate the reviewer’s constructive suggestions and believe that addressing these points has strengthened the manuscript.

      Reviewer #2 (Public Review):

      The authors found that the unique role of Erk signaling pathway that inhibited osteoblastogenesis and bone formation at the late stage, while Erk has widely shown to be essential for bone and osteoblast development. These data are also useful for the Readers. In vivo results including OVX experiments and phenotypes of Mek1/2 cKO mice are very interesting and useful information for the bone field, probably for other fields with some interest. The idea to show the mechanism by performing in vitro-based approaches sounds potentially interesting and novel. Conversely, although the authors claim that Erk-mTOR2-SGK1 pathway plays a role in the phenomenon found in these in vivo experiments, the evidence is very weak and preliminary. More appropriate and straightforward approaches and experimental design could strengthen their conclusion. The results shown in the manuscript in this part are still phenomenological. Several important questions were not solved. In particular, the linkage between MEK-Erk and mTOR2-SGK1 with mitochondria is still elusive. The rescue experiments in the cKO mice would be appreciated. Since in vivo and in vitro experiments for the early and late stage of bone formation did not reflect their purpose very much, the authors could re-write the manuscript.

      We thank the reviewer for highlighting the significance of our manuscript. We agree that strengthening the mechanistic studies on the ERK-mTORC2-SGK1 pathway is important to strengthen the overall revised manuscript, and have added additional data on this topic to the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this article the authors use paired gene expression and chromatin accessibility data on isolated Sox9 positive progenitors to identify a role for Pi3K in lung epithelial differentiation and branching. The authors show some intriguing findings.

      We appreciate the generally positive and very constructive comments from both reviewers, which are highly aligned and helped us to improve the manuscript in revision. We have focused our time and attention on evaluation of the in vivo model of Shh-Cre/Pik3caf/f knockout animals, and have reorganized several figures to highlight these expanded data.

      However, some additional experiments are necessary to confirm/validate their conclusions. Some issues with the current experiments make them hard to interpret.

      1) While the experiments in Fig.6 show an increase in branching morphogenesis after treatment with different inhibitors, it is unclear whether this is because inhibition of Pi3K in the epithelium or mesenchyme.

      We agree that one of the major limitations of the lung explant model is the inability to isolate the effects of pharmacologic treatments to specific cell-types within the lung. We think that the data itself is robust in that it is highly repeatable and reproducible with two unrelated pan-class I PI3K inhibitors. However, given the significantly increased in vivo data present in this revision, we decided to omit these data in the current version of the manuscript in order to focus on the epithelial specific roles appreciable in the Shh-Cre knockout animals.

      2) Similarly it is difficult to assess whether the effect on Sox9+ EPCs is due to the inhibitor acting on the epithelium or mesenchyme.

      See response to #1 above.

      3) In the abstract the authors mention that prior to E13.5, SOX9+ progenitors are multipotent, generating both airway and alveolar epithelium, but are selective alveolar progenitors later in development. To further investigate this the authors isolated Sox9 positive progenitors at 11.5 and 16.5. The authors then as expected find some genes being differentially expressed in the progenitors at these different time points. However, while these changes in expression likely reflect the narrowed differentiation potential of the Sox9+ EPCs at E16.5 it is unclear whether this really helps to explain how Sox9+EPCs at E11.5 differentiate into proximal epithelium.

      We agree that a significant number of the differentially expressed genes and differentially accessible chromatin regions observed between E11.5 and E16.5 reflect the “narrowed differentiation potential” of the lung epithelium, but the regulators of this differentiation potential are incompletely understood. We specifically chose to further interrogate the role of Pi3K signaling in proximal-distal patterning of the lung epithelium both based on these data, as well as from our prior work on congenital pulmonary airway malformations (a disease most commonly characterized by expansion of cystic airway structures). Our added data in revision, evaluating the contribution of Pik3ca signaling to epithelial maturation, validate the idea that these data can be used to identify new regulators of distal progression of differentiation at minimum.

      4) qPCR in Fig.8 reflects the lack of airways but doesn't reflect their differentiation, it appears differentiation in club and ciliated cells still occurs but appears delayed. Differentiation of the bronchial epithelium occurs after Sox9+ EPCs have differentiated into Sox2+ airway cells.<br /> It is unclear if the differentiation of the Sox2+ airway epithelium is delayed or whether Pik3ca plays a role in the differentiation of these Sox2+ airway epithelial cells.

      While we agree that there is not a complete impairment of airway epithelial differentiation (i.e. there are still club and ciliated cells present), our data imply that differentiation of airway epithelium into mature secretory and ciliated cells appears to be more heavily impacted than the generation of Sox2+ epithelium itself. We have added additional data and restructured Figure 8 to hopefully make this more clear. Immunofluorescence microscopy for Sox2 from E12.5-E18.5 (new Figure 8, panels A-X) and quantification of Sox2 mRNA at E18.5 (new Figure 8, panel NN) are now shown. Although there does appear to be a decrease in the number of airway branches (consistent with what is also seen in the H&E times series and data shown in Figure 6), the airways are still lined with Sox2+ epithelium and the reduction in Sox2 mRNA transcript at E18.5 is relatively small (and doesn’t reach statistical significance). In contrast, there is a dramatic reduction in the secretoglobin proteins Scgb1a1 and Scgb3a2 at both the mRNA and protein level, and the ciliated cell marker Foxj1 (at mRNA level). Moreover, there is a dramatic reduction in the percentage of secretory and ciliated cells in the Pik3caShhCre lungs. Thus, although we cannot exclude that there is a smaller but significant impairment in the generation of Sox2+ epithelium, it appears that the more significant phenotype present is the differentiation of airway epithelium into mature airway epithelium. We anticipate that our follow-up studies will refine the precise molecular mechanisms by which PI3K signaling directs differentiation of the lung epithelium.

    1. Author Response

      Reviewer #2 (Public Review):

      In this paper, Guo et al. investigated how DNA double-strand break (DSB) formation is regulated during C. elegans meiosis. Meiotic recombination initiates with programmed DSB formation, which is catalyzed by the Spo11 holoenzyme. In C. elegans, three SPO-11 cofactors have been identified so far. One of them is DSB-1, which is one of two homologs of Rec114. The authors first show that a phosphorylated form of DSB-1 appears as a slower migrating species on western blots. Using this as a readout, they demonstrate that phosphorylation of DSB-1 is dependent on two DNA damage sensor kinases, ATR (ATL-1) and ATM (ATM-1) and that dephosphorylation of DSB-1 is partially mediated by a member of PP4 phosphatase, PPH-4.1. It was previously shown that PPH-4.1 is required for multiple steps in meiotic chromosome dynamics, such as homolog pairing, synapsis, and DSB formation. Interestingly, heterozygous null mutation of atl-1, but not atm-1 deletion, restores meiotic DSB formation in pph-4.1 animals. It was further shown that DSB-1 contains five S/T-Q sites within its disordered region, and mutating all five sites leads to increased DSB formation and partially restores homologous pairing, DSB formation, and chiasma formation in pph-4.1 mutants. The rescue of homolog pairing was unexpected, and this illustrates a requirement of meiotic DSBs in enforcing correct pairing in C. elegans, similar to the case in other eukaryotes. The authors further demonstrate that DSB-1 phosphorylation occurs in an age-dependent manner, and this trend is not observed in dsb-2 mutants, which leads to a proposal that DSB-2 might have evolved to compensate for the decreased activity of the phosphorylated DSB-1 in older animals.

      Overall, this is a nice study illustrating the antagonistic relationship between ATL-1and PPH-4.1 in regulating meiotic DSB formation. This work establishes that meiotic DSB formation is negatively regulated by ATL-1 in C. elegans, similar to what has been established in other organisms, and adds that a PP4 family member opposes this function of ATR. Rescued DSB formation and homolog pairing in pph-4.1; dsb-1(5A) is striking (even though it's partial), indicating that DSB-1 is a major target of PPH-4.1. Perhaps the partial rescue is somewhat expected, as PPH-4.1 is involved in many meiotic processes other than DSB formation. Therefore, more thorough analyses of the "rescued" phenotypes in pph-4.1 mutants, especially the status of SC assembly in both pph-4.1; dsb-1(phosphomutant series) and irradiated animals (with different doses), will help clarify some of the discussion points regarding the function of PPH-4.1 in processing recombination intermediates and the degree to which exogenous DSBs contribute to homolog pairing and synapsis in C. elegans. Another criticism is that this study only focuses on the phosphoregulation of DSB-1, while both DSB-1 and DSB-2 are C. elegans homologs of Rec114, and DSB-2 also contains four S/T-Q sites. Structural prediction of the putative DSB-1:DSB-2:DSB-3 and DSB-1:DSB-1:DSB-3 complexes in the discussion is very illuminating and suggests that perhaps the remaining DSB-1 in dsb-2 mutants is the pool that forms the DSB-1:DSB-1:DSB-3 complex and is prematurely phosphorylated by ATL-1 simply because of mass action. A model figure illustrating the phospho-regulation of DSB-1 (and maybe DSB-2) by ATL-1 and PPH-4.1 will greatly strengthen this paper.

      We have now examined the status of SC assembly in pph-4.1 combined with dsb-1 or atl-1/nT1 mutations and included in the data as discussed in response to the specific question #4 as below. Further, to expand our understanding of possible phosphoregulation on DSB-2, we have generated and examined dsb-2 non-phosphorylatable mutants (4A) at its SQ sites. In contrast to dsb-1 non-phosphorylatable mutations, the dsb-2 (4A) mutation did not increase the number of DSBs, suggesting that DSB-2 is refractory to phosphoregulation. We now include this data in Figure 2–figure supplement 2. We also include a model figure of DSB-1 phosphoregulation in Figure 6B as suggested.

    1. Author Response

      Reviewer #2 (Public Review):

      First, I want to congratulate the author team on this manuscript, which I read with great pleasure. I think this will be a fine addition to the literature!

      The present MS by Clement et al. provides a comprehensive overview of the brain shapes of lungfishes. Besides previously known/described brain endocasts, the work includes models and descriptions of previously undescribed taxa. Notably, all CT data are deposited online following best practices when working with digital anatomy. The specimen sample is impressive, especially as the sampled material is housed in museum all over the world. Although the sample size may seem numerically low (12 taxa), this actually is a comprehensive sample of fossil (and extant) lungfishes in terms of what's preserved in the first place.

      The study at hand has several goals: (1) The description of lungfish brains for taxa that were previously undescribed; (2) the quantification of aspects of brain shape using morphometric measurements; (3) the characterization of brain shape evolution of lungfishes using exploratory methods that ordinate morphometric measurements into a morphospace.

      The provided 3D data and descriptions will serve as valuable comparisons in future lungfish work. This type of data is imperial for palaeontological studies in general, and the anatomical information will be extremely valuable in the future. For example, anatomical characters related to brain architecture have been shown to be informative about phylogeny in the past, and the presented data may inform future phylogenetic studies. The quantification of brain shape via (largely linear) measurements is relatively simplistic, and can thus only detect gross trends in brain shape evolution among lungfishes. The authors describe several such trends - such as high variation in the olfactory brain region in comparison to other parts of the brain. The results and interpretations drawn from the authors are supported by their data, and the approach taken is valid, even if more sophisticated shape quantification methods (e.g. 3D landmarking) and analytical methods (e.g. explicit phylogenetic comparative methods) are available, which could provide additional insights in the future.

      We agree with Reviewer #2 that 3D geometric morphometrics could have provided more sophisticated analytical methods. However, geometric morphometrics has some limitations with regard to the type of data that we analysed: (1) low sample size and (2) missing/incomplete data. In order to have a comprehensive coverage of the brain shape, it would have required to have numerous landmarks (and semilandmarks) to represent the complexity of brain shape.

      First, our sample size (12 taxa) is low (although it is an impressive sample size when considering the type of data). Although there are no universal rule concerning the ratio “number of specimens / number of landmarks” (Zelditch et al., 2012), ideally the sample size must be from two to three times the number of landmarks. Thus, with a sample size of 12 we could have used ca. 4-6 landmarks which is very limited to describe complex shapes. In addition, in order to use geometric morphometrics (2D or 3D), the landmarks should be present on all the specimens. Because of the partial completeness of the studied fossils, the brain endocasts are not uniformly known for each species. Incomplete and deformed specimens prompt the removal of potential landmarks for analyses. Even using right-left reflexion of the endocasts, most specimens do not share all neurocranial information.

      We agree with Reviewer #2 that a phylogenetic PCA could have provided interesting analytical perspectives. Phylogenetic PCA are available on standard PCA, it is uncertain that it can be used on Bayesian PCA and InDaPCA (this method has been published very recently, and we haven’t found much literature about it). However, we did not find an adaptation of phylogenetic PCA to the BPCA nor the InDaPCA; we even contacted Liam Revell, who created the phylogenetic PCA, about this issue.

      The presented results and interpretations in this regard must be seen as a preliminary assessment of lungfish brain evolution, but it is clearly written and generally well performed.

      A potential shortcoming of the paper is the lack of explicit hypothesis testing, which is not problematic per se, but puts limits on the conclusions the authors can draw from their data.

      We decided to address the issues using exploratory methods rather than testing hypotheses. It is a more conservative approach, since it is the first quantitative analysis of dipnoan endocasts. Future analyses, will be able to formulate hypotheses based on our interpretation of our exploratory approach. We hope to stimulate such hypotheses testing, when in the future further dipnoans will be added; however, one has to remember that ossified neurocrania are known in Devonian dipnoans and one partially ossified neurocranium in a Carboniferous, the remaining dipnoans have cartilaginous neurocrania which limit the sample size from which endocast data could be gathered.

      For example, the authors state that different anatomical parts of the labyrinth (particularly, the utricle with respect to the semicircular canals or saccule) may show modular dissociation from other labyrinth modules, based on the polarity of eigenvalue signs of the PCA analysis. I think this is fine as a first approximation, but of course there are explicit statistical tools available to test for modularity/integration, such as two-block partial least squares regression analysis (Rohlf & Corti 2000, Syst. Biol.). I don't see the lack of usage of such methods as problematic, because you cannot do everything in one paper, and the authors remain careful in their interpretation.

      We agree with Reviewer #2 that different geometric morphometrics methods have been developed to look at variational modularity; one of the co-authors (RC) has been publishing a few papers on patterns of morphological integration and modularity in fishes (see Larouche, Cloutier & Zelditch, 2015, Evol. Biol.; Lehoux & Cloutier, 2015, J. Exp. Zool. Mol. Dev. Evol.; Larouche, Zelditch & Cloutier, 2018, Sci. Rep.). Interesting a priori hypotheses of brain modules could have been formulated and tested for modularity using for example Covariance Ratio (CR) and distance matrix approach. But still the low sample size and the incompleteness of the data are major constrains to test modularity. We would however endeavour to use such methods in future work as more complete material becomes available.

      It may be advisable, however, to add the odd sentence or statement about how some findings are preliminary or hypothesized, and that these should receive further treatment and testing using other methods in the future. I think this approach is actually very rewarding, because then you can inspire future work by outlining outstanding research problems that arise from the new data presented herein.

      We have now included an additional sentence early in the Discussion section stating: “We acknowledge that our investigation of lungfish brain evolution as elucidated from morphometric analysis of cranial endocasts is still preliminary in several respects. We hope that our study can inspire future work on the neural evolution of both fossil and extant lungfish.”

      In the following, I comment on a few aspects of the manuscripts. These represent instances where I had additional thoughts or ideas on how to slightly improve various aspects of the manuscript.

      1) Presentation of PCA results

      The authors provide several PCA analyses (preliminary analyses on partial matrices, BPCA, InDaPCA), and are very explicit about the procedures in general. For instance, I appreciate they explicitely state using correlation matrices for PCA analyses due to the usage of different measurement units among their data.

      Visually, the BPCA and InDaPCA are presented in figures 2 and 3, whereas the preliminary partial matrix PCAs are only reported as supplementary figures. While I don't object to any of this, I find the sequence of information given in the results section suboptimal.

      The figures have now been substantially reorganised to include more within the main body text and not as Supplementary Information, and we hope that this improves the sequence of information within the manuscript.

      The authors start by discussing the partial matrix analyses, although none of these analyses are visually/graphically depicted in the main text figures, and although their results do not seem to be of real importance for the narrative of the discussion. The other two PCA analyses actually are presented afterwards and separately, but they convey some common signals, particularly that the major source of variation seems to be a decreasing olfactory angle with increasing olfactory length, and a scaling relationship between all linear measurements (which all have the same eigenvector signs on the first PC axis). I wonder if an alternative way of presenting the PCA results would be better for this particular MS. For example, the authors could give "first level observations" first ("PCA analyses agree in X,Y,Y"), and then move to second order observations ("Morphospace of BPCA has some interesting taxon distribution with regard to chirodipterids"; "InDaPCA axis projections continuously retrieve clustering of specific variables"). I suspect this would shorten the text somewhat and could serve as a clearer articulation of the take home messages?

      Accordingly with Reviewer #2, we have now provided “first level” observations based on the standard PCA. We added some further comments on the species distribution in the morphospaces.

      2) Selection of PC axes for interpretation

      You describe how you use the broken-stick method to decide how many PC axes are retained for the interpretation of results, which I agree is a good procedure. However, I have a few questions regarding this. First, in line 331 (description of InDaPCA) you state that the first three axes are non-trivial "based on the screeplot" - which got me confused because it sounds a bit like eyeballing off the screeplot. Have you used the broken stick method for all your PCA analyses?

      Originally, we used both screeplot and broken-stick method, however, we are now solely using the broken stick method to determine the number of non-trivial axes. We agree with Reviewer #2 that this method is more rigorous than the scree plot. Our choice is greatly inspired by the studies of Jackson (1993, Ecology) and PeresNeto et al. (2005, Computational Statistics & Data Analysis). We have now edited the text so that our methods are clearer (and removed the text relating to the screeplot such as “based on the screeplot…”).

      The second question relates to the results of the broken stick method, which I did not find reported. Unless I am mistaken, for the xth axis, the method sums the fractions of 1/i (whereby i = x..n; n = number of axes), and divides this number by n to get a value of expected variation per axis. This number is then compared with the actual value of variance explained by the axis. So for the 1st of 17 axes, the broken-stick expectation is = (1 + 1/2 + .. + 1/17) / 17. If you apply this to your BPCA, the third axis' value (i.e., (1/3 + ... + 1/17)/17) is 0.114, which is smaller than the reported 0.120 that PC3 explains. Thus, following the broken stick method, PC3 does explain more variation that expected (and should thus be retained, contra your comment in line 311 which refers to two non-trivial axes)?

      We thank Reviewer #2 for the insightful evaluation of our paper who took the time to validate each step of our analyses. Effectively, we agree with Reviewer #2 that based on the broken stick method the third axis in nontrivial. The value for the third axis is 1,0531310. Thus, we are presenting these results as well as discussing the three PCA projections (axis 1 versus axis 2, axis 2 versus axis 3, axis 1 versus axis 3).

      Related to this potential issue is the presentation of the BPCA results in Fig. 2: You present loadings of three PC axes, although only the first two are considered in morphospace bi-plots and although the text also mentions only two non-trival axes. If the third axis is indeed non-trivial, then the loading-presentation could be retained in the figure, but then the authors should consider showing a PC1 vs. PC3 plot in addition to the currently presented biplot showing the first and second axis only. If the third axis indeed is trivial, as currently suggested by the text, then showing the loadings is unnecessary.

      We consider showing a biplot of PC1 vs PC3 unnecessary as those shown (PC1 vs PC2) already account for 83.4% of the variation captured. We have edited these figures so that the loadings related to PC3 have also now been omitted.

      It would be great if you clarify the usage/application of the broken stick method for all your PCAs. An easy way to report the results may be the add a row to each of your PCA loading tables in the supplements, in which you divide the actual value of variation explained by the value expected under the broken stick method - this way, all axes which explain more variation than expected by the stick method have values larger than 1, and axes which explain less have values lower than 1.

      We have taken this suggestion from Reviewer #2 on board and have now recalculated all values for the brokenstick method for each analysis; we also provide broken-stick values in their respective loading tables in the SI.

      3) Missing commentary on allometry

      In basically all PCA analyses, the first PC axis seems to be dominated by allometric size effects, given that all linear measurements have the same eigenvalue signs. The authors do acknowledge this (lines 314-316; 335-336), but offer no further comment on size effects/allometry.

      We agree that normally the first axis represents variation related mainly to size changes and shape changes related to size (allometry). However, we are reluctant to assume that our first axis corresponds to evolutionary allometry. Among others, Klingenberg & Zimmermann (1992) and Klingenberg (1996) used standard PCA (or multi-group PCA) to disentangle evolutionary and ontogenetic allometry (as well as static allometry) mainly by analysing multiple specimens for each group (or species) in order to have a better repartition of the covariance. Since our sample is limited to 12 species, and that they are all represented by a single specimen (except for Dipterus), it would be difficult to clearly discriminate variation associated to allometry. Even in a case of ontogenetic allometry, a sample size of 12 would have been limited to unambiguously conclude any variation.

      For example, it would be interesting to see how the linear measurements scale with overall head size. Similarly, the authors note that the semicircular canal measurements covary strongly, as do the utricle and saccule height/length measurements (paragraph line 346). Basically, it seems that the semicircular canal measurements scale with one another: as one gets bigger, so gets the other. It is interesting that the utricle does not seem to follow the same scaling pattern as the saccule and semicircular canals, and it would be good to hear if the authors think that there is a functional implication for this. Increases in utricular/saccular/semicircular canal sizes are usually explained by increased sensitivity - so is an increased utricular size a compensatory development to decreased semicircular canal+saccule size to retain an overall level of sensitivity, or does it maybe related to a relative change of importance of the specific functions, e.g. increased importance of linear accelerations in the horizontal plane with simultaneous decrease of importance of angular and vertical accelerations?

      We thank Reviewer 2 for this suggestion about overall head size scaling - endocast measurements. Our original study design also included measurements of dermal skulls, but we omitted this from the final version as the material available was far too incomplete to be able to conduct meaningful analyses. It is a topic of future study that some of us (AC, RC) have already discussed as a potential future project to be investigated.<br /> With respect to the functional implications of the modular dissociation of the labyrinths, we have expanded the final paragraph of the “implications for sensory abilities” within the Discussion, and similarly added the sentence “However, we acknowledge that it is difficult to determine if increased relative utricular size results from greater reliance of sensitivity in the horizontal plane alone, or if it expands to compensate for e.g. relative stagnation of the sacculus + semicircular canals in some way. Further studies, such as investigation of neuronal densities in extant lungfish labyrinths, may potentially in part clarify this uncertainty in future.”

      4) Labyrinth size

      With the above mentioned utricular exception, labyrinth size measurements particularly on the semicircular canals seem to imply that there is a relative consistent scaling relationship between the canals. When one canal gets larger, so do the others, perhaps thereby retaining canal symmetry across different absolute labyrinth sizes. Labyrinth size in tetrapods is often interpreted in relation to body size/mass or head size (e.g. Melville Jones & Spells 1963, Proc. R. Soc. Lond. Biol. Sci.; Spoor & Zonneveldt 1998, Yearb. Phys. Anthr.; Spoor et al. 2002, Nature; Spoor et al. 2007, PNAS; Bronzati et al. 2021, Curr. Biol.), as deviations from the expected labyrinth size per head size indicate increased or decreased relative labyrinth sensitivities. Large relative head sizes of birds and (within) mammals have generally been interpreted as indicative of "active" or "agile" behaviour, although doubt has been casted on these relationships recently (e.g., Bronzati et al. 2021). Increased sampling of relative labyrinth size from various vertebrate groups would be important to better understand labyrinth sizefunction relationships. Melville Jones & Spells (1963) have shown that fishes have large labyrinth sizes compared to most tetrapods, but they don't have lungfish data and the large labyrinth sizes of fishes have often remained uncommented on in tetrapod works. I think this study offers a fantastic opportunity to provide comparative labyrinth size data for lungfishes. In this regard, it would be really interesting to quantify labyrinth size relative to head size, and show a respective (phylogenetic) regression analysis. Ideally, the size of the labyrinth could be quantified along the arc lengths of the semicircular canals, but other ways are also thinkable (for example a box volume of labyrinth size by the existing measurements, contrasted with a box volume of the skull, i.e. heightwidthlength).

      Firstly, many thanks for the suggested reading of Bronzati et al. (2021) And while we consider a labyrinth skull size regression analysis to be a worthwhile suggestion, we have chosen not to include one in this study, partly as there is no phylogenetic regression based on the new methods that we are using, and secondly that it forms the basis of another study currently underway by some of the authors.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes a large field experiment in which diverse maize lines were grown under high and low nitrogen conditions. The authors surveyed the root microbiome and performed several different analyses in search of evidence of host control over the microbiome. These analyses include clustering the 16S amplicon sequencing data into 'microbial traits', searching for evidence of heritability and selection, GWAS and transcriptomics. The final section of the paper looks for patterns across the different analyses and highlights a single microbial trait, a corresponding GWAS hit and evidence that one allele correlates with both microbial abundance and plant health.

      The strengths of this manuscript include an impressive dataset, nicely compiled figures and an impressive compilation of distinct analyses to reduce complexity and reveal interactions. This manuscript is interesting on its own and will also serve as a template for future analyses on similar or distinct datasets. In several places, this manuscript could be improved by more clearly articulating the logic behind choices made for the analyses (see specific comments below).

      We thank the reviewer for the comments. We clarified the logic for the analyses as shown below.

      1). How the 'microbe traits' are defined is critical for interpreting the rest of the paper and so needs to be explained more clearly in this manuscript. From the text, I interpret that ASVs were first clustered into genera and then further clustered based on differential abundance between high and low N in both years. If so, this would seem to exclude a bunch of microbes that were specific to the 202 genotypes planted only in the second year. However, Supp Fig 2 shows a handful of 'microbe traits' that are not differentially abundant (grey). Please clarify. Also, the authors should clarify how the 'names' for each microbe trait defined.

      In the revised manuscript, we added a section “Clustering of ASVs into microbial groups” in the materials and methods. Additionally, we added a supplementary figure (or Figure R1) to better explain how microbial groups were clustered. We also added the original plots used to define the 150 microbial groups in Supplementary File 6. See below for detailed responses.

      Added Text:

      Clustering of ASVs into microbial groups

      ASVs were clustered into groups of rhizosphere microbes at the family, genus, and species level using a procedure described previously (Meier et al., 2021). First, the 3,626 ASVs in the present study were grouped at the family levels (the lowest taxonomic rank for which all ASVs were successfully annotated) and the phylogenetic tree derived from 16S V4 alignment was plotted alongside taxonomic annotation at the genus and species level. Because the ASVs are derived from short reads and may constitute variations not covered in the SILVA database, annotation at the genus and species level was often not possible. To close these gaps and form biologically meaningful groups of ASVs at low taxonomic ranks with better confidence, we examined the overall abundance of each ASV as well as the differential abundance in response to the N treatment alongside the sequence-based clustering. The premise here is that ASVs derived from biologically closely related individual microbes are similarly abundant in our dataset and respond similarly to the N treatment imposed on the field, in addition to similar 16S sequences due to common ancestry. An example is given in Supplementary Figure 8 with a subset of ASVs assigned to the Burkholderiaceae family. The plots used to determine all 150 microbial groups in this study are available in Supplementary File 6.

      2) Throughout the manuscript, I expected more consideration of which microbe traits overlap between the different analyses. This comes in only at the final figure and so we never get a sense of how the different analyses overlap. For example, 150 microbe traits are defined and assessed for heritability and evidence of selection. I was not clear on whether the traits that were found to be heritable overlapped with those under selection (as one might expect). Lines 387-395 and Supp Fig 5 attempt to synthesize the different analyses but should be expanded to help the reader understand overlap between the analyses.

      We present evidence of links between microbe abundance, plant genetics, and plant performance through heritability analysis, estimation of selection, GWAS, and correlation of microbe abundance with the canopy coverage phenotype. Summarizing the overlap of these analyses of interest was challenging to do concisely for 150 microbial groups and two distinct N treatments, thus we chose to summarize the results of several analyses for the 62 microbial groups that showed either a positive or negative correlation with canopy coverage.

      Supplementary Figure 5 was revised to include the results of the estimation of selection. Furthermore, the section “Heritable and adaptively selected rhizobiota are associated with plant phenotypes” in the results was revised to better explain overlaps between the separate assays.

      Notably, not all traits that are heritable are expected to be under selection, as traits can be heritable, i.e., transmitted from one generation to the next, without impacting the fitness or performance of offspring individuals under the conditions under which recent natural and/or artificial selection has occurred. This point was added to the discussion.

      Lastly, the complete data with the observed values for all 150 microbial groups and all analyses are made available in Supplementary File 3.

      3) I gather that the authors performed the GWAS separately for each 'microbe trait' and nitrogen condition, but then searched for 'hot spots' where the data from different microbe traits was considered in a pooled manner. The logic behind this decision is not clear to me. Why would we expect different microbe traits to be co-localized in the genome?

      Our logic is that plant morphological (i.e., root architecture) or physiological (root exudation) changes may affect several rhizobiome traits, such that a GWAS signal controlling plant metabolism has an effect on several groups of microbes; therefore, it can be detected as a hotspot. In the revised manuscript we now explicitly describe our reasoning for the expectation of microbial trait hot spots in the section “Genes underlying microbe-associated plant loci are preferentially expressed in root tissue”.

      Line 346-349 indicates that no plant loci were found to associate with microbe 'traits' under both nitrogen treatments and speculate as to why (dynamic interactions or not enough statistical power). The traits were defined based on robust differential abundance between nitrogen treatments and if I understand correctly, the GWAS was run separately for each trait and nitrogen treatment, so it seems logical that this method would only yield microbe trait associations that are differential between nitrogen treatments. If I did understand this correctly, I recommend emphasizing this point as it seems to indicate that the methods are working as expected.

      We are grateful for this suggestion. These points were elaborated in the revised manuscript in the section “Genes underlying microbe-associated plant loci are preferentially expressed in root tissue”.

      To investigate this, we performed GWAS using each of the 150 rhizobiome traits. This analysis was done separately for the -N and +N conditions, as N deficiency induces dramatic changes in plant metabolism, including changes in root gene expression (Choudhury et al, 2022) and root exudation (Zhu et al, 2016), and because N applied to the field directly impacts soil and rhizosphere microbiomes (Meier et. al, 2021). However, it is important to emphasize that microbes which did not exhibit differential abundance in response to nitrogen were indeed included in our analysis. It is clear the explanation for our methodology for grouping ASVs into functionally distinct clades/traits was a significant weakness in the previous version of our manuscript and caused significant confusion. We have revised the relevant sections substantially (see above).

      4) Having access to expression data for 298 genotypes is amazing. It would seem logical to try to more directly connect the MAPLs and microbe traits with this expression data. Do the lines that show association with the microbes also show higher expression of the corresponding gene?

      We agree with the reviewer that it would be logical to connect the MAPLs, gene expression, and microbe traits altogether. Modeling three distinct types of variables in a high dimensional genomics setting, however, is non-trivial. In fact, to address the challenge, another graduate student in our group has developed a genome-wide mediation analysis method with the ultimate goal of establishing a causal chain from genotype to intermediate molecular traits (i.e., gene expression or microbial traits) to plant phenotype. In a recent publication (Z. Yang et al., 2022, Genetics), our results from both simulation and empirical analyses suggested that this model could identify mediating genes with certain power, albeit with some limitations. We are working on improving the model and applying it to the current dataset by considering microbial groups as intermediate traits (mediators).

      We feel conducting and including the results of microbiome mediation analysis will exceed the scope of the current study. In addition, it may not be reliable as the gene expression data published by Kremling et al. does not cover two distinct N treatments. The dataset associating MAPLs with particular genes presented here (Supplementary File 5) was provided to be used alongside new gene expression data to investigate particular associations in greater detail in more targeted experiments.

      The authors generated additional RNAseq data from 2 week old plants from 4 genotypes but the logic for the selection of these lines is missing and I am not sure about the relevance of this since the samples were collected from young plants. Is a nitrogen treatment effect observable at 2 weeks?

      As a small validation experiment, we selected 4 diverse and well characterized maize genotypes from the diversity panel to be grown under controlled greenhouse conditions. N treatments were included for consistency with the field experiment and because application of N fertilizer may have a direct influence on rhizosphere microbiomes independent from the host plant. The relevant section in the manuscript was revised for clarity as below.

      To complement the gene expression data provided by Kremling et. al, we selected 4 diverse and well characterized maize genotypes (K55, W153R, B73, and SD40). Plants were grown in a controlled greenhouse environment under standard N and N deficient conditions and gene expression was analyzed in roots and shoots of two-week old seedlings (for details refer to Xu et al, 2022). In agreement with the dataset provided by Kremling et al, significantly higher expression of 97 MAPL genes was observed in root but not leaf tissue compared to (n = 44,049) other genes available in this dataset (Figure 3C). No strong physiological response to N deficiency was expected for 2-week-old seedlings and no significant differences were observed in the pattern of MAPL gene expression between the two N treatments.

      The authors conclude that the gene expression data is consistent with host control over root microbiome (line 371-373) but, as is, I'm not convinced that this analysis supports that statement. Fig 3C is striking on its own, but based on panel B, I suspect that a similar pattern would be observed for 'third leaf' and 'germinating shoot' so it is harder to make a direct connection with the microbe traits.

      We thank the reviewer for the comment. In Kremling’s experiment (Fig. 3B), leaf tissue samples from adult plants are taken both from the tip of the third leaf and from the base, and similar to roots we do observe higher expression of MAPL genes in the leaf base of adult plants, although other parts of the leaf show the opposite trend. We agree that the current dataset is insufficient to explain this observation, and a direct link between microbiome features and root gene expression can not be conclusively established at the moment. We revised the wording in the relevant section:

      Revised Text: 371-373: Collectively, these data are consistent with the hypothesis that root-associated microbial communities are at least in part genetically controlled by the host plant in a process mediated by plant gene expression.

      5) The authors report that 62 microbe traits associated with canopy coverage, a very exciting result! However, again, this confuses me based on how the microbe traits were defined. To be considered a microbe trait, the microbes had to show differential abundance across the treatments. The logic for how this could manifest in phenotypic changes in both treatments needs to be elaborated.

      How the microbial groups were defined in this study was a major point of confusion, and we included a more thorough explanation of the procedure above. Differences in the pattern of response to N treatments whether positive, negative, or no response), as well as differences in overall abundance were used to separate sister clades of ASVs; however clades of ASVs which did not show differential responses to N treatment were indeed included in our analyses. The purpose of this section was to associate the abundance of microbial groups under either N treatment (not to be confused with the differential abundance between N treatments) with plant performance in the field. More targeted experiments are required to determine the direction of causation and a potential mechanism by which microbe abundance could influence phenotypic changes in the host. This point was added in the discussion.

      6) The final figures summarizes correlations for one microbe trait across the different analyses and looks very promising, especially for noisy field data. The authors are careful to not overstate this finding, perhaps a bit too conservative. They see a significant correlation between microbe abundance and canopy coverage that also correlates with allele frequency. The difference in canopy coverage by allele frequency is not significant, but shows a similar trend and this is not necessarily surprising given all the other factors that will influence this one trait. I expected a comment on gene expression of the genes in the locus and perhaps a peek at the other plant traits to see if any of them also show a similar trend.

      We thank the reviewer for this suggestion. As indicated above, we are hesitant to make strong statements using the gene expression data published by Kremling et. al. because experimental conditions differed from our study.

      As the reader may indeed wonder about the gene expression of the genes in the exemplary locus in Figure 5, we supplied gene expression information of the three relevant genes in an additional supplementary figure (or Supplementary Figure 10).

      We do see elevated gene expression in roots for two out of the three genes, which matches the previously observed trend. A brief literature review of the same genes indicates a possible link to root hair physiology and an altered microbiome. We revised the text as below and incorporated the speculation into the discussion.

      Reviewer #3 (Public Review):

      Strengths:

      The choice of 230 genotypes from a well-known maize diversity panel, with accompanying SNP genotype data, was a good one for this purpose, given the focus on selection during intensive breeding during the 20th century in heavily fertilized conditions.

      The very large dataset (N>3000 with replication of 230 genotypes) is a useful source of information on maize rhizosphere bacterial microbiomes, and the availability of the host genotype SNP data is an especially useful and unusual feature. The authors used a relatively newly developed (2018) Bayesian computational approach to characterize genetic architecture of rhizosphere composition, which is an interesting advance in the microbiome field. The same tool makes inferences about whether each SNP underlying rhizosphere features shows signatures of past selection (inferred as the variation in effect size relative to the minor allele frequency).

      Weaknesses:

      The BayesS results classifying rhizosphere-related SNPs as under positive, negative, or no selection appear to be over-interpreted. First, it is not clear that this method is meant for comparing current patterns of selection in contrasting environments (as in this N+/N- experiment), but rather for detecting signatures of selection in the distant past. Second, it IS clear that this method only reveals signatures of selection on a locus or SNP, and cannot confirm that selection is acting on a particular trait. The Zeng et al. 2018 paper states this quite clearly. The authors of this manuscript did not attempt to rule out that the loci classified as under selection do not have pleiotropic effects on (or are linked to) traits other than rhizosphere microbes. Occam's razor in this case suggests that these loci control root traits that are important for plant survival and also happen to affect microbiome composition. No functional benefit of these microbes was demonstrated beyond correlations with plant phenotypes.

      Thank you for the insightful comments. We agree with the reviewer that the signatures of selection reflect the selection in the distant past during the plant evolution but not the recent crop improvement processes. And that we can’t rule out the possibility that the selection might be acting on the plant fitness and in turn affect microbiome composition.

      Per the reviewer’s comments, we estimated the selection gredients (see detailed responses below) and modified the results of the selection section substantially.

      Additionally, we have clarified the concerns raised by the reviewer in the discussion section as below.

      The BayesS method leverages the relationship between the variance of SNP effects and MAF as a proxy of natural selection in the distant past. This method detects signatures of natural selection on SNPs associated with microbiome traits but is not indicative of selection acting on the particular microbes.

      To further approve the beneficial effects of the microbes on the plant fitness, additional functional analyses (i.e., inoculation experiments) are warranted, and that naturally occurring microbe-plant symbiosis may be harnessed for further crop improvement.

      It is unclear how substantially different the N+/N- treatments were from each other. The entire experiment followed commercial corn, so presumably all plots had been fertilized within the past year. The soil chemical profiles were not subsequently tested, and basic analyses (such as comparison of plant growth in the two treatments) are missing.

      This is a valid concern. The +N plots were applied urea (dry fertilizer) as a source of N at the rate of 120 lbs/acre (approximately 134.5 kg/ha), and the -N received no treatment with the assumption that N had been exhausted by commercial corn planted in the previous year. In addition, we now cite the results of UAV imagery data analysis conducted using the same field experiment (Rodene et al. 2022) comparing plant growth in the two treatments and indicated the different N reponses between +N and -N treatments. We have clarified this in the revised manuscript.

      240-244: Paired-end 16S sequencing of 3,313 rhizosphere samples from 230 replicated genotypes of the maize diversity panel (Flint-Garcia et al., 2005) were collected from field experiments conducted under both +N and -N conditions (Materials and Methods, Supplementary Figure 1). At the time of sampling, visible phenotypic differences were observable between +N and -N plots as measured through aerial imaging (Rodene et. al, 2022). Sequencing produced 216,681,749 raw sequence reads representing 496,738 unique amplicon sequence variants (ASVs) (Materials and Methods).

      Furthermore, the assumption that microbiome differences between fertilization treatments are driven by some activity of the plant host (lines 94-99) is not justified - direct responses of the microbes to N addition would almost certainly be reflected in the rhizosphere, since rhizosphere microbiomes are almost entirely derived from the surrounding soil. No bulk soil samples were collected as controls to rule this out. There is clear evidence that N fertilization directly affects microbial communities in the soil. However, the direct effect of N fertilization as opposed to changes induced indirectly via altered

      This is also a valid concern, and we thank the reviewer for the insight. The issue of a direct effect of N fertilization on microbial communities as opposed to an indirect effect via the plant host was partially addressed in a previous study (Meier et.al, 2021). The focus here was to find candidate associations between plant genetics, microbial groups, and plant performance under two N treatments, and to lay the basis for more targeted experiments in which causality can be inferred.

      We edited the text to clarify this:

      94-99: It was observed previously (Zhu et al., 2016) that soil microbial communities drastically change in response to N fertilization. In bulk soil, this is likely due to a direct effect of N application or lack thereof. In rhizospheres, however, only a subset of the observed changes can be attributed to direct effects of N fertilization, while particular microbial groups may be subject to indirect effects induced by the plant host in response to N availability or deficiency (Meier et al., 2021). A possible explanation for this could be that the vast majority of the interval between maize domestication and the present, beneficial plant-microbe interactions have evolved in low-input agricultural systems characterized by relative scarcity of nutrients, predominantly nitrogen (N) (Brisson et al., 2019).

      The authors report how various patterns differ between the N+ and N- treatments- for example, more rhizobiome features had nonzero heritability in N- than in N+. However there is no statistical support for this apparent difference, i.e. no direct test of heritabilities in the two treatments. Nor does they test for possible differences (or lack thereof) in the magnitude of heritability between treatments. This incomplete style of analysis was repeated several times in the paper, e.g. for comparing patterns of selection between treatments, and for comparing correlations between rhizobiome features and plant traits between features.

      We agree that appropriate statistical tests can strengthen our results. See detailed response below (Reviewer #3 Recommendations for the authors).

      The methods sections contain inconsistencies and omissions that made it difficult to evaluate some of the claims. For instance, lines 147-148 describe collection of rhizosphere samples from entire root crowns, but the appendix (lines 774-778) describe collection of rhizosphere from roots that fit in 50 mL tubes. So it is unclear which part of the root crown was actually used, and whether the focal root type was consistent for all samples. Similarly, the appendix states that the B73xMo17 check genotype was used to correct for small-scale geographic differences (747-748), but no additional detail is provided nor are the results of this process reported. In general, the descriptions of statistical analyses lack important details. For example, by definition a constrained ordination (CAP) analysis requires a formula to be specified, but this was not reported in the paper, making it impossible to interpret the meaning of the constrained axes shown in the figures. Ordinations also require the use of a distance or dissimilarity metric, the choice of which affects interpretation - the metric used in this paper was not provided.

      We thank the reviewer for these comments. The relevant sections in the manuscript were revised below to clarify the sampling procedure and provide missing information about the check plants and the CAP analysis.

      146-148: Eight weeks after planting (2018: July 10 and 11; 2019: July 30, 31 and August 1), plant roots were dug up to a depth of 30 cm and rootstocks were manually shaken to remove and discard loosely adherent bulk soil. For each plant, all roots thus exposed were cut into 5 cm pieces and homogenized, and 20-30 ml randomly selected root material (with adherent rhizosphere soil) was collected to generate the rhizosphere samples (Supplementary Methods).

      159-163: Raw ASV reads were subjected to a series of filters to produce a final ASV table with biologically relevant and reproducible 16S sequences (Supplementary File 1). For the constrained ordination (CAP) analysis performed here, the weighted Unifrac distance metric was used with model “distance ~ year + genotype + nitrogen + block + sp + spb”. Only ASVs that were highly abundant and repeatedly observed in both years of sampling were considered for downstream analysis.

      746-749: In each of 12 split plot blocks per quadrant, at least one subplot was randomly selected and assigned the hybrid genotype (B73xMo17) to be used as a check to test for differences between geographical field locations. Two check genotypes (B73xMo17 and B37xMo17) were used in 2018, and a single check genotype (B73xMo17) was used in 2019. Plant growth across the field was determined uniform within quadrants using the checks as reported in a sister study on the same experimental field (Rodene et. al, 2022).

      774-778: To wash the tightly adherent rhizosphere soil layer off the roots, tubes were filled up to the 40 ml mark with autoclaved PBS buffer (46 mM NaH2PO4, 60 mM Na2HPO4, 0.02% Silwet-77), and shaken horizontally at 8000 rpm for 30s. Rhizosphere suspension was filtered through a 100 μm nylon cell strainer (Celltreat Scientific Products, Pepperell, MA, USA) into a fresh 50 ml tube to capture root debris and large soil particles.

      Finally, many of the analyses throughout the paper take the form of testing 150 different rhizobiome traits, one by one, and then reporting the number of significant results (e.g., differential abundance between N+/N-, significant heritability, selection, correlations with plant traits). This suggests a potentially severe risk of false positives due to repeated multiple testing. After the p-values are corrected for the very large number of statistical tests (using Bonferroni, FDR, or similar) many of the conclusions might change.

      We agree that the large number of tests may lead to false discoveries. To address this, a multiple testing correction method was applied to increase the stringency of the GWAS analysis.