1,295 Matching Annotations
  1. Jun 2021
    1. Author Response:

      Reviewer #1 (Public Review):

      In this work, Panigrahi et. al. develop a powerful deep-learning-based cell segmentation platform (MiSiC) capable of accurately segmenting bacteria cells densely packed within both homogenous and heterogeneous cell populations. Notably, MiSiC can be easily implemented by a researcher without the need for high-computational power. The authors first demonstrate MiSiC's ability to accurately segment cells with a variety of shapes including rods, crescents and long filaments. They then demonstrate that MiSiC is able to segment and classify dividing and non-dividing Myxococcus cells present in a heterogenous population of E. coli and Myxococcus. Lastly, the authors outline a training workflow with which MiSiC can be trained to identify two different cell types present in a mixed population using Myxococcus and E. coli as examples.

      While we believe that MiSiC is a very powerful and exciting tool that will have a large impact on the bacterial cell biological community, we feel explanations of how to use the algorithm should be more greatly emphasized. To help other scientists use MiSiC to its fullest potential, the range of applications should be clarified. Furthermore, any inherent biases in MiSiC should be discussed so that users can avoid them.

      We thank the reviewer for the positive feedback and comments to help disseminate MiSiC to the broad bacterial cell biology community as it is meant to. As described above we have largely addressed this comment via the redaction of a comprehensive handbook. As detailed below, we now also provide precise measurements of the MiSiC segmentation accuracy compared to ground truth for the various imaging modalities and bacterial species segmentation.

      Major Concerns:

      1) It is unclear to us how a MiSiC user should choose/tune the value for the noise variance parameter. What exactly should be considered when choosing the noise variance parameter? Some possibilities include input image size, cell size (in pixels), cell density, and variance in cell size. Is there a recommended range for the parameter? These questions along with our second minor correction can be addressed with a paragraph in the Discussion section.

      Setting the noise parameters is now detailed in the handbook (section 1.d). A set of thumb rules and recommendations are provided. In addition a paragraph explaining the importance of noise addition for images with sparse bacterial cell density has been added in the results section.

      “Associated Figure S1. Background noise can lead to spurious cell detection by MiSiC. SI images retain the shape/curvature information of the intensities in a raw image through eigenvalues of the hessian of the image and an arctan function, creating the smooth areas corresponding to cell bodies and propagating noisy regions where there is no shape information. Thus, MiSiC segments the cells by discriminating between “smooth” and “rough” regions. In effect, when adjusting the size parameter, scaling smooths out the image noise, leading to background regions that have a smoother SI than in the raw image. Some of these areas could be falsely detected as bacterial cells. This effect is shown here: When an image with uniform and random intensity values is segmented with MiSiC with increasing smoothening (here using a gaussian blur filter), spurious cell detection becomes apparent. In addition, since the SI keeps the shape information and not the intensity values, background objects that are of relatively low contrast (ie dead cells or debris) may be detected as cells. All these artifacts can be mitigated by adding synthetic noise to the scaled images.”

      2) Could the authors expand on using algorithms like watershed, conditional random fields, or snake segmentation to segment bacteria when there is not enough edge information to properly separate them? How accurate are these methods at segmenting the cells? Should other MiSiC parameters be tuned to increase the accuracy when implementing these methods?

      We thank the reviewer for raising this point as it is important to make clear that post-processing algorithms can certainly improve the accuracy of MiSiC masks downstream. To show this specifically, we further processed MiSiC masks of Bacillus subtilis filamentous cells to resolve division septa using the watershed algorithm. This example is now provided as Figure S3. Importantly, there is no particular MiSiC adjustment that needs to be performed prior to running these processing steps, which can be done directly in Image-J or its bacterial cell analysis plug-in, MicrobeJ. It is worth noting that the post- processing strategy may depend on the scientific question under consideration. In the handbook, we also give an example of post-processing methods that may be used.

      “Associated Figure S3. Refining cell separations with watershed. Watershed methods may be used to obtain a more accurate segmentation of septate filaments such as Bacillus subtilis. In this example applying this method to the MiSiC mask effectively resolves cell boundaries that are not captured in the prediction but are visible by eye (arrows).”

      3) Can the MiSiC's ability to accurately segment phase and brightfield images be quantitatively compared against each other and against fluorescent images for overall accuracy? A figure similar to Fig. 2C, with the three image modalities instead of species would nicely complement Fig. 2A. If the segmentation accuracy varies significantly between image modalities, a researcher might want to consider the segmentation accuracy when planning their experiments. If the accuracy does not vary significantly, that would be equally useful to know.

      This is a very important issue that was also raised by reviewer 3 and which we decided to address in full. For each imaging modality and distinct species, we measured the Jaccard Index as a function of the threshold set for the Intersection over Union (ioU). The resulting curves are now provided in two separate Figures 2 and 3 and a supplemental Figure S2; they provide a robust measure of the segmentation for each modality/tested species.

      “Figure 2. MiSiC predictions under various imaging modalities. a) MiSiC masks and corresponding annotated masks of fluorescence, phase contrast and bright field images of a dense E. coli microcolony. b) Jaccard index as a function of IoU threshold for each modality determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained Jaccard score curves are the average of analyses conducted over three biological replicates and n=763, 811, 799 total cells for Fluorescence, Phase Contrast and Bright Field, respectively (bands are the maximum range, the solid line is the median). The fluorescence images were pre-processed using a Gaussian of Laplacian filter to improve MiSiC prediction (see methods).”

      “Associated Figure S2. MiSiC predictions under various imaging modalities. a) MiSiC masks and corresponding annotated masks of fluorescence, phase contrast and bright field images of a dense M. xanthus microcolony. b) Jaccard index as a function of IoU threshold for each modality determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained curves are the average of analyses conducted over three biological replicates and n=193,206,211 total cells for Fluorescence, Phase Contrast and Bright Field, respectively. The fluorescence (bands are the maximum range, the solid line is the median) images were pre-processed using a Gaussian of Laplacian filter to improve MiSiC prediction (see methods). c) A human observer is slightly less performant than MiSiC. The same ground truth as used in Figure 2 (dashed lines) was compared to an independent observer’s annotation (solid lines) and Jaccard score curves were constructed as shown in Figure 2. BF: Bright Field, PC: Phase Contrast, Fluo: Fluorescence.”

      “Figure 3. MiSiC predictions in various bacterial species and shapes. a) MiSiC masks and corresponding annotated masks of phase contrast images of another Pseudomonas aeruginosa (rod-shape), Caulobacter crescentus (crescent shape) and Bacillus subtilis (filamentous shape). b) Jaccard index as a function of IoU threshold for each species determined by comparing the MiSiC masks to the ground truth (see Methods). The obtained Jaccard score curves are the average of analyses conducted over three biological replicates and n=1149,101,216 total cells for P. aeruginosa, B. subtilis and C. crescentus, respectively (bands are the maximum range, solid line the median). Note that the B. subtilis filaments are well predicted but edge information is missing for optimal detection of the cell separations.”

      4) The ability of MiSiC to segment dense clusters of cells is an exciting advancement for cell segmentation algorithms. However, is there a minimum cell density required for robust segmentation with MiSiC? The algorithm should be applied to a set of sparsely populated images in a supplemental figure. Is the algorithm less accurate for sparse images (perhaps reflected by an increase in false-positive cell identifications)? Any possible biases related to cell density should be noted.

      In fact, MiSiC performs well both with densely or sparsely populated images. In the case of sparsely populated images it is however possible that non-cell objects can occasionally appear in the MiSiC mask. As mentioned above, inclusion of noise can help remove these objects in the sparsely populated images. This issue is now fully explained in a supplemental Figure S1. Of note, non-cell objects -if they were to remain after noise addition- can be eliminated using additional general morphometric filters or specific models fitting bacterial cells, as for example those included in Microbe-J and Oufti. These points are now clarified in the text.

      “Associated Figure S1. Background noise can lead to spurious cell detection by MiSiC. SI images retain the shape/curvature information of the intensities in a raw image through eigenvalues of the hessian of the image and an arctan function, creating the smooth areas corresponding to cell bodies and propagating noisy regions where there is no shape information. Thus, MiSiC segments the cells by discriminating between “smooth” and “rough” regions. In effect, when adjusting the size parameter, scaling smooths out the image noise, leading to background regions that have a smoother SI than in the raw image. Some of these areas could be falsely detected as bacterial cells. This effect is shown here: When an image with uniform and random intensity values is segmented with MiSiC with increasing smoothening (here using a gaussian blur filter), spurious cell detection becomes apparent. In addition, since the SI keeps the shape information and not the intensity values, background objects that are of relatively low contrast (ie dead cells or debris) may be detected as cells. All these artifacts can be mitigated by adding synthetic noise to the scaled images.”

      and:

      “Along similar lines, non-cell objects can appear in the MiSiC masks and while some can be removed by the introduction of noise, an easy way to do it is to apply a post-processing filter, for example using morphometric parameters to remove objects that are not bacteria. This can be easily done using Fiji, MicrobeJ or Oufti."

      5) It is exciting to see the ability of MiSiC to segment single cells of M. xanthus and E. coli species in densely packed colonies (Fig. 4b). Although three morphological parameters after segmentation were compared with ground truth, the comparison was conducted at the ensemble level (Fig. 4c). Could the authors use the Mx-GFP and Ec-mCherry fluorescence as a ground truth at the single cell level to verify the results of segmentation? For example, for any Ec cells identified by MiSiC in Fig. 4b, provide an index of whether its fluorescence is red or green. This single-cell level comparison is most important for the community.

      We have now performed this comparison and determined Jaccard indexes for E. coli and Myxococcus detection using the individual fluorescence images as a reference (figure 5b). Since we were only able to make this comparison in relatively small fields we also kept the comparison of expected morphometric parameters in large images. Taken together, these data now demonstrate that semantic classification as performed does well separate Myxococcus cells from E. coli cells (see more details in our response to reviewer 3).

      Reviewer #2 (Public Review):

      Panigrahi and co-authors introduce a program that can segment a variety of images of rod-shaped bacteria (with somewhat different sizes and imaging modalities) without fine-tuning. Such a program will have a large impact on any project requiring segmentation of a large number of rod-shaped cells, including the large images demonstrated in this manuscript. To my knowledge, training a U-Net to classify an image from the image's shape index maps (SIM) is a new scheme, and the authors show that it performs fairly well despite a small training set including synthetic data that, based on Figure 1, does not closely resemble experimental data other than in shape. The authors discuss extending the method to objects with other shapes and provide an example of labelling two different species - these extensions are particularly promising.

      The authors show that their network can reproduce results of manual segmentation with bright field, phase and fluorescence input. Performance on fluorescence data in Fig. 1 where intensities vary so much is particularly good and shows benefits of the SIM transformation. Automated mapping of FtsZ show that this method can be immediately useful, though the authors note this required post-processing to remove objects with abnormal shapes. The application in mixed samples in Fig. 4 shows good performance. However, no Python workflow or application is provided to reproduce it or train a network to classify mixtures in different experiments.

      We thank the reviewer for the positive comment. As discussed in our answer to reviewer 1, the classification presented in Figure 4 (now Figure 5) is meant to provide an example of how MiSiC can be further used to train networks to classify species in interspecies communities by generating two datasets, one per species of interest, to further train a U-Net. Here, the secondary U-Net was developed to specifically discriminate Myxococcus from E. coli, which is a very specialized application. Hence it was not included in the MiSiC package. Nevertheless the code is accessible at https://github.com/pswapnesh/MyxoColi (which is mentioned in the Methods).

      Performance was compared between SuperSegger with default parameters and MiSiC with tuned parameters for a single data set. Perhaps other SuperSegger parameters would perform better with the addition of noise, and it's unclear that adding Gaussian noise to a phase contrast image is the best way to benchmark performance. An interesting comparison would be between MiSiC and other methods applying neural networks to unprocessed data such as DeepCell and DeLTA, with identical training/test sets and an attempt to optimize free parameters.

      In fact, we believe that it does make sense to test how MiSiC performs in the presence of noise and show that it is robust, making it suitable for use on complex multi-tile images. For this analysis we kept the comparison with Superseger, which provides a reference as it is done on a data set optimized for Superseger segmentation. Importantly, we keep the parameters constant throughout the analysis because it would not be feasible to tweek parameters tile-by-tile in a multi-tile image. This analysis shows that MiSiC is more adapted for this application.

      INSTALLATION: I installed both the command line and GUI versions of MiSiC on a Windows PC in a conda environment following provided instructions. Installation was straightforward for both. MiSiCgui gave one error and required reinstallation of NumPy as described on GitHub. Both give an error regarding AVX2 instructions. MiSiCgui gives a runtime error and does not close properly. These are all fairly small issues. Performance on a stack of images was sufficiently fast for many applications and could be sped up with a GPU implementation.

      We have updated the pip install script available in GitHub for MiSiCgui that remediates some of these issues : There is no more numpy error, it closes properly and there are only warning messages concerning future deprecations in the napari packages. We have tested in Windows 10, Linux Ubuntu 18, and Mac OS Catalina. For the moment it seems impossible to install in Mac OS BigSur maybe due to the python 3.7 requirement. We will work on this problem in the near future. We have removed the command line interface as we are developing future version with an easiest way to provide MiSiC as Napari or FIJI/ImageJ plugin

      TESTING: I tested the programs using brightfield data focused at a different plane than data presumably used to train the MiSiC network, so cells are dark on a light background and I used the phase option which inverts the image. With default settings and a reasonable cell width parameter (10 pixels for E. coli cells with 100-nm pixel width; no added noise since this image requires no rescaling) MiSiCgui returned an 8-bit mask that can be thresholded to give segmentation acceptable for some applications. There are some straight-line artifacts that presumably arise from image tiling, and the quality of segmentation is lower than I can achieve with methods tuned to or trained on my data. Tweaking magnification and added noise settings improved the results slightly. The MiSiC command line program output an unusable image with many small, non-cell objects. Looking briefly at the code, it appears that preprocessing differs and it uses a fixed threshold.

      We thank the reviewer for testing the programs. Tiling related artifacts may now be avoided by excluding a few pixels at the border in the new version of MiSiC code. This is now implemented in the MiSiC.segment function as segment(im,invert = False,exclude = 16). Without seeing the reviewers data it is difficult for us to see how the segmentation (which is said to be acceptable) could be further improved. The command line program has now been removed in favor of continuous development on the graphical interface.

      Reviewer #3 (Public Review):

      The authors aimed to develop a 2D image analysis workflow that performs bacterial cell segmentation in densely crowded colonies, for brightfield, fluorescence, and phase contrast images. The resulting workflow achieves this aim and is termed "MiSiC" by the authors.

      I think this tool achieves high-quality single-cell segmentations in dense bacterial colonies for rod-shaped bacteria, based on inspection of the examples that are shown. However, without a quantification of the segmentation accuracy (e.g. Jaccard coefficient vs. intersection over union, false positive detection, false negative detection, etc), it is difficult to pass a final judgement on the quality of the segmentation that is achieved by MiSiC.

      We thank the reviewer for this comment. To address it we divided the previous Figure 2 into two figures (and associated supplemental figures) separately showing how MiSiC performs (i), to segment two very distinct bacterial species E. coli and Myxococcus under various imaging modalities. (ii) to segment other bacterial species: rods (P. aeruginosa), filaments (B. subtilis) and crescent shapes (C. crescentus). The results now clearly show both the strength and limitations of the system.

      A particular strength of the MiSiC workflow arises from the image preprocessing into the "Shape Index Map" images (before the neural network analysis). These shape index maps are similar for images that are obtained by phase contrast, brightfield, and fluorescence microscopy. Therefore, the neural network trained with shape index maps can apparently be used to analyze images acquired with at least the above three imaging modalities. It would be important for the authors to unambiguously state whether really only a single network is used for all three types of image input, and whether MiSiC would perform better if three separate networks would be trained.

      A single network is using a shape-index-map rather than the original images as an input. As mentioned by the reviewer this is a major strength of the workflow given that it permits segmentation, independent of the imaging modality, which we now measure for each modality.

      As the reviewer hints, three different models specific to each modality (CP, Fluorescence and BF) could also be used to train three networks, allowing the direct end-to-end segmentation of raw images. In theory, this could improve the segmentation (although this might lead to negligible benefits given the actual segmentation quality).

    1. Author Response:

      Reviewer #1 (Public Review):

      The study by Diebold et al. describes a fast and scalable method that allows to link bacterial plasmids to the organisms that harbor them. The authors then go on to apply this technique to track horizontal gene transfer in an complex bacterial population originating from clinical samples. There is no doubt that the development of such methodologies for better tracking plasmidic resistance genes and following horizontal gene transfer events is very important. The authors do a good job in optimizing their method to be a one step process that has high sensitivity and relatively low error, while it can also be scaled, automated and used with multiplex primers. Subsequently, they apply this method to two clinical patient samples for which metagenomic data is available. In this case, they correctly identify expected relationships between beta-lactamase genes and specific bacterial taxa (and in particular K. pneumoniae), but also find that the same beta-lactamase genes are associated with organisms of the microbiome. With the exception of providing evidence that the association of particular genes with multiple organisms is not due to physical association of the bacteria in question, this is an interesting study putting forward a much needed technique for the study of antibiotic resistance but also other relationships in complex bacterial mixtures.

      We are very thankful for the positive review and the reviewer’s suggestion that we distinguish between gene transfer and physical association. We provide a detailed response to this in major point #1 of the review summary, but to summarize, we performed an OIL-PCR experiment to confirm that the results are indeed due to physical association of the bacteria and updated our manuscript accordingly.

      Reviewer #2 (Public Review):

      Diebold et al. developed a simplified and improved version of the epicPCR method applied to environmental samples. The results section describes well how they perform their development and support the easy to use application. They clearly demonstrate that their methods could be used to screen association of specific genes to taxonomic markers in environmental microbial populations. They then apply their methods on human gut samples ranging from hospitalized patients and demonstrate demonstrate the utility of their methods to characterize the hosts of different targeted genes (notably AMR and plasmid related genes). However, most of their results are based on previous studies on the same sample. Therefore, it appears difficult to know how their method can be used on new samples. Do they need to redo a classical metagenomic analysis in order to obtain data on new samples ? What kind of metagenomic analysis is mandatory before performing their methods ? What is the depth of the metagenomic analysis ? Those are important questions as it will be clearly more expensive to perform the whole metagenomic analysis.

      Thank you for pointing out the need to explain possible screening methods for OIL-PCR on unsequenced samples. We chose to use sequenced stool samples for testing the method in order to provide parallel validation of our results; however, we agree that metagenomic sequencing is not a practical or cost-effective way to select samples for OIL-PCR. qPCR is a more practical method to pre-screen samples for target genes before performing OIL, but we failed to include this important point in our discussion.

      Since drafting and submitting the manuscript, we have demonstrated that the three primers designed for OIL (forward, fusion, and nested primers) can easily be converted into probe- based qPCR assays by designing a fluorescent probe with the nested primer sequence. We have updated the discussion to convey this important feature of OIL-PCR.

      The conclusion of the paper is well supported by data but the overall approach on new sample is never discussed. Moreover, the title appear somehow misleading as their methods do not allow to clearly identify plasmids but rather to link some targeted genes to taxonomic markers.

      Reviewer #3 (Public Review):

      This manuscript is composed of two parts. The first part describes development of an emulsion-based PCR fusion method, called OIL-PCR, for matching two specific gene sequences from the same cell. In this report these are beta-lactamase genes from the V4 section of rRNA, allowing the matching of this horizontally transferred gene with its donor sequence. The second part is a demonstration project that features the use of OIL-PCR to monitor horizontal transfer of beta-lactam genes between gut bacteria from the metagenomes of two neutropenic patients. OIL-PCR was set to multiplexed class A beta-lactam genes. This is a descriptive study that largely recapitulates a previously published work on these samples showing that the relatively unstudied Romboutsia commensal genus is a carrier of these plasmid-borne genes in patient metagenomes.

      Overall, this is a well-written manuscript. Data were comprehensively analyzed with appropriate controls. The figures are excellent.

      OIL-PCR is a derived of other fusion PCR methods, especially epicPCR. There are some nice technical improvements described here, e.g efficient lysis within emulsion droplets using Ready-Lyse lysozyme. This is an incremental technical advance for a fairly niche application (where you have known target genes and are concerned about potential culture-bias) but it may be useful in particular for understanding HGT in microbiomes. There are some problems with the method that are brought to the foreground by the authors rather than quietly dropped, which is commendable.

      Thank you for acknowledging our effort to be up front about the strengths and weaknesses of OIL-PCR. We hope that this information will help inform other researchers in applying this method.

      One problem appears to be that the necessary dilution for single-cell PCR reduces the taxonomic diversity of the metagenome. The only way around this to perform efficient sampling appears to be to perform multiple independent sequencing experiments and pool the results. Another feature of the system is that the accuracy falls slightly as the proportion of the target sequence in the community increases for reasons that are not discussed. However, this effect is not great (97% accuracy at 10% proportion) and most applications, the target cells will be a much lower proportion of the community.

      The results of the demonstration study on metagenomes from neutropenic patients are clearly described and provide a nicely worked example of combining this directed method with metagenome sequencing. The significance is limited but gives some descriptive hits about the mechanism of HGT between Romboutsia and Klebsiella.

      Other points:

      Unfortunately, there was no comparative test where the same samples were run against "competing" technologies (e.g sequencing of cultured beta-lactam resistant strains, epicPCR, Hi-C or single-cell) to directly compare strengths (and weaknesses) of OIL-PCR.

      Thank you for this fair criticism that we did not compare OIL-PCR to other available methods. We address comparing OIL-PCR to Hi-C in our response to major point #4 (above). With regards to epicPCR, we did consider comparing OIL-PCR to epicPCR, but decided against it for two main reasons: 1) Acquiring all the reagents necessary to perform epicPCR was cost- prohibitive (over $1,000 for the one demonstration experiment), and 2) because a large motivation for the development of OIL-PCR is the difficulty of performing epicPCR. Although we believe that both epicPCR and OIL-PCR are robust methods, OIL-PCR is a shorter protocol that does not rely on hazardous, costly and difficult to obtain reagents. We were concerned an inexperienced attempt by us to perform epicPCR would likely have yielded poor results and would not provide a fair comparison. Overall, we feel that the validation experiments we perform with OIL-PCR are enough to highlight both the strengths and weakness of the method.

      As protocol development is central to this manuscript paper, and one of the main advantages claimed for OIL-PCR is ease of use, the supplement should contain a detailed protocol for control sample with a list of equipment and reagents needed and what results should be obtained. This could easily be adapted from the methods section, which is highly detailed. What is the estimated cost-per sample of this procedure and how does it compare roughly with other methods, - EPIC-PCR and culture-based?

      Thank you for the suggestion that we provide a detailed protocol. We hope that the inclusion of this step-by-step protocol will enable more labs to adopt the method. The cost of OIL is approximately $15 per replicate. The cost is largely driven by the large amount of Phusion polymerase needed, which is the same as in epicPCR. Culturing may be less expensive depending on the cost of reagents needed for media, antibiotics etc, but we do not feel the two are comparable. For example, even though we show that Romboutsia did not acquire resistance genes in this case, even if it had, culturing would not have captured it due to the difficult and specific culturing conditions required for growing most Romboutsia strains.

      Line 197-198 reference needed to the Kent et al study here? What is the reason that the Hi-C results from this manuscript are not compared to the results of the OIL-PCR experiments?

      Thank you for this suggestion. The congruence of our results highlights the strengths of both approaches. As we discuss in detail for major point 4 (above), the Hi-C and OIL-PCR results both correctly identify Klebsiella as a carrier of the plasmid with CTX-M and TEM. We have now added this to the manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Chakraborty focuses on methods to direct dsDNA to specific cell types within an intact multicellular organism, with the ultimate goal of targeting DNA-based nanodevices, often as biosensors within endosomes and lysosomes. Taking advantage of the endogenous SID-2 dsRNA receptor expressed in C. elegans intestinal cells, the authors show that dsDNA conjugated to dsRNA can be taken into the intestinal endosomal system via feeding and apical endocytosis, while dsDNA alone is not an efficient endocytic cargo from the gut lumen. Since most cells do not express a dsRNA receptor, the authors sought to develop a more generalizable approach. Via phage display screening they identified a novel camelid antibody 9E that recognizes a short specific DNA sequence that can be included at the 3' end of synthesized dsDNAs. The authors then showed that this antibody can direct binding, and in some cases endocytosis, of such DNAs when 9E was expressed as a fusion with transmembrane protein SNB-1. This approach was successful in targeting microinjected dsDNA pan-neuronally when expressed via the snb-1 promoter, and to specific neuronal subsets when expressed via other promoters. Endocytosed dsDNA appeared in puncta moving in neuronal processes, suggesting entry into endosomes. Plasma membrane targeting appeared feasible using 9E fusion to ODR-2.

      The major strength of the paper is in the identification and testing of the 9E camelid antibody as part of a generalizable dsDNA targeting system. This aspect of the paper will likely be of wide interest and potentially high impact, since it could be applied in any intact animal system subject to transgene expression. A weakness of the paper is the choice of "nanodevice". It was not clear what utility was present in the DNAs used, such as D38, that made them "devices", aside from their fluorescent tag that allowed tracking their localization.

      We used a DNA nanodevice, denoted pHlava-9E, that uses pHrodo as a pH-sensitive dye. pHlava-9E is designed to provide a digital output of compartmentalization i.e., its pH profile is such that even if it is internalized into a mildly acidic vesicle, the pH readout is as high as one would observe with a lysosome. This gives an unambiguous readout of surface-immobilized probe to endocytosed probe.

      Another potential weakness is that the delivered DNA is limited to the cell surface or the lumen of endomembrane compartments without access to the cytoplasm or nucleus. In general the data appeared to be of high quality and was well controlled, supporting the authors conclusions.

      We completely agree that we cannot target DNA nanodevices to sub-cellular locations such as the cytoplasm or the nucleus with this strategy. However, we do not see this as a “weakness”, but rather, as a limitation of the current capabilities of DNA nanotechnology. It must be mentioned that though fluorescent proteins were first described in 1962, it was 30 years before others targeted them to the endoplasmic reticulum (1992) or the nucleus (1993)(Brini et al., 1993; Kendall et al., 1992). Probe technologies undergo stage-wise improvements/expansions. We have therefore added a small section in the conclusions section outlining the future challenges in sub-cellular targeting of DNA-nanodevices.

      Reviewer #2 (Public Review):

      The authors demonstrate the tissue-specific and cell-specific targeting of double-stranded DNA (dsDNA) using C. elegans as a model host animal. The authors focused on two distinct tissues and delivery routes: feeding dsDNA to target a class of organelles within intestinal cells, and injecting dsDNA to target presynaptic endocytic structures in neurons. To achieve efficient intestinal targeting, the authors leveraged dsRNA uptake via endogenous intestinal SID-2 receptors by fusing dsRNA to a fluorophore-labeled dsDNA probe. In contrast, neuronal endosome/synaptic vesicle (SV) targeting was achieved by designing a nanobody that specifically binds a short dsDNA motif fused to the fluorophore-labeled dsDNA probe. Combining dsDNA probe injection with nanobody neuronal expression (fused to a neuronal vSNARE to achieve synaptic targeting), the authors demonstrated that the injected dsDNA could be taken up by a variety of distinct neuronal subtypes.

      Strengths:

      While nanodevices built on dsDNA platforms have been shown to be taken up by scavenger receptors in C. elegans (including previous work from several of these authors), this strategy will not work in many tissue types lacking these receptors. The authors successfully circumvented this limitation using distinct strategies for two cell types in the worm, thereby providing a more general approach for future efforts. The approaches are creative, and the nanobody development in particular allows for endocytic delivery in any cell type. The authors exploited quantitative imaging approaches to examine the subcellular targeting of dsDNA probes in living animals and manipulated endogenous receptors to demonstrate the mechanism of dsRNA-based dsDNA uptake in intestinal cells.

      Weaknesses:

      To validate successful delivery of a functional nanodevice, one would ideally demonstrate the function of a particular nanodevice in at least one of the examples provided in this work. The authors have successfully used a variety of custom-designed dsDNA probes in living worms in numerous past studies, so this would not be a technical hurdle. In the current study, the reader has no means of assessing whether the dsDNA is intact and functional within its intracellular compartment.

      We now demonstrate the use of a functional nanodevice to detect pH profiles of a given microenvironment. This functional nanodevice contains two fluorescent reporter dyes, each attached to one of the strands of a DNA duplex. In order to obtain pH readouts, the device integrity is essential for ratiometric sensing.

      Coelomocytes are cells known for their scavenging and degradative lysosomal machinery. Previous studies of the stability of variously structured DNA nanodevices in coelomocytes, have shown that DNA devices based on 38 bp DNA duplexes have a half life of >8 hours in actively scavenging cells such as coelomocytes (Chakraborty et al., 2017; Surana et al., 2013) Given that our sensing in the gut as well as in the neuron are performed in <1 hour post feeding or injection, pHlava-9E is >97% intact.

      Another minor weakness is the lack of a quantitative assessment of colocalization in intestinal cells or neurons in an otherwise nicely quantitative study. Since characterization of the targeting described here is an essential part of evaluating the method, a stronger demonstration of colocalization would significantly buttress the authors' claims.

      We have now quantified colocalization in each cellular system. Please see Figure R1 below (Figure 1 Supplementary figure 1 and Figure 4 Supplementary figure 2 of the revised manuscript).

      Figure R1: a) Pearson’s correlation coefficient (PCC) calculated for the colocalization between R50D38 (red) and lysosomal markers LMP-1 or GLO-1 (green) in the indicated transgenic worms. b) & d) Representative images of nanodevice nD647 uptake (red) in transgenics expressing both prab-3::gfp::rab-3 (green) and psnb-1:snb-1::9E c - e) Normalized line intensity profiles across the indicated lines in b and d; f) Percentage colocalization of nD647 (red) with RAB3:GFP (green). Error bar represents the standard deviation between two data sets.

      While somewhat incomplete, this study represents a step forward in the development of a general targeting approach amenable to nanodevice delivery in animal models.

  2. May 2021
    1. Reviewer #3 (Public Review):

      This study investigates the temporal orientation abilities of cerebellar degeneration and control subjects during an orientation discrimination task of visual stimuli with showed a contrast near threshold. Participants were queried to express their discrimination decision with a response only after a random delay following target offset, which decreases the motor preparation component of the task in the interval-based condition. CD subjects showed similar visual discrimination performance to controls when cued by a rhythmic set of stimuli but showed no benefit when the target interval was presented aperiodically. The authors interpret these findings as evidence supporting the notion that the cerebellum plays a role in interval based attentional orienting to proactively modulate perception. This is an elegantly simple experiment providing a novel observation in the field.

    2. Reviewer #2 (Public Review):

      The article by Breska and Ivry provides a nice, timely, and relevant continuation of their previous recent work on the role of the cerebellum in interval-based (but not rhythm-based) anticipation in time. While in their related prior work (in particular their recent articles in PNAS and Science Advances) the authors used simple reaction time tasks that made it difficult to attribute the observed effects to visual vs. motor anticipatory mechanisms, in the current work they used a perceptual discrimination task with a delayed response to focus on potential contributions of the cerebellum to temporal anticipation specifically for perceptual sensitivity (where the role of the cerebellum is less obvious, given it has traditionally been implicated more in motor control than in perception). They do so by comparing individuals with cerebellar degeneration to controls, and finding a selective impairment of the individuals with cerebellar degeneration to use interval-based temporal predictions to facilitate visual discrimination, while rhythm-based performance benefits are spared (providing a neat comparison and control).

      I have no major comments to detail. The short report is well written, complements related work by the authors nicely, and makes an important and novel contribution to the literature on temporal anticipation (while also having relevant implications more generally for views on the role of the cerebellum in cognition).

    3. Reviewer #1 (Public Review):

      Breska and Ivry tested the role of the cerebellum in temporal expectation, specifically in how temporal expectation affects perception. The question is interesting, as the neural mechanisms mediating the substantial effects of temporal expectation on perception are not well understood. The authors found that in a perceptual discrimination task, individuals with cerebellar degeneration (CD) showed reduced effects of temporal expectation on discriminability with interval timing cues, but intact effects with rhythmic cues. This shows that the role of the cerebellum in temporal expectation (which had been previously demonstrated by the authors) is not merely one of motor preparation. Rather, the cerebellum appears to play a causal role in bringing about the perceptual consequences of temporal expectation for predictable intervals. It also reveals differences between interval timing and rhythmic manipulations in terms of the mechanisms by which they affect perception.

      This is a straightforward study with a clean experimental approach and clear presentation of the data. However, I felt the manuscript would benefit from a more thorough analysis of the dataset, especially given the rarity of individuals with CD.

    4. Evaluation Summary:

      This study provides evidence that individuals with cerebellar degeneration show reduced effects of temporal expectation on perceptual discriminability with interval timing cues, but intact effects with rhythmic cues. The authors compare individuals with cerebellar degeneration to controls, and find a selective impairment of the individuals with cerebellar degeneration to use interval-based temporal predictions to facilitate visual discrimination, whereas rhythm-based performance benefits are spared. This study is of interest to psychologists and neuroscientists investigating prediction, perception, attention, and motor control, as it demonstrates a key role for the cerebellum in mediating the effects of interval-based temporal expectation on perception.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

    1. Author Response:

      Evaluation Summary:

      This paper will be of interest to biologists who study mechanisms of cell-to-cell variability in gene expression and those who wish to have a tool to alter variability in mammalian cells. Key regulators of gene expression variability in mammalian cells are identified and noise modulation in a synthetic system is shown. The data quality is high. A model for the origin of the observed noise is proposed, but will require some additional experimental evidence.

      We thank the reviewers for their thorough reviews, insightful critics, and very constructive suggestions of our manuscript. It genuinely helps us improve our work and manuscript. We have performed all the additional experiments suggested. We believe that our new results and revised manuscript answered these questions raised by the reviewers and editors.

      Reviewer #1 (Public Review):

      The manuscript aims to identify origins of stochasticity ('noise') in mammalian gene expression focused on the case when a single transcription factor controls the expression of a target gene. It also aims to devise strategies to control mean and variance of gene expression independently.

      The experimental approach uses a light-induced transcriptional activator in two stimulation modes, namely amplitude modulation (AM: time-constant light input) and pulse width modulation (PWM: periodic light inputs in the form of a pulse train). Perturbation experiments target histone-modifying enzymes to influence epigenetic states, with corresponding measurements of single-cell epigenetic states and mRNA dynamics to dissect mechanisms of noise control. Beyond this synthetic setting, the study is complemented by endogenous gene expression noise in human and mouse cells under the same perturbations.

      Major strengths of the study are:

      • The experimental demonstration that, and under which conditions PWM can reduce gene expression noise in mammalian cells; the corresponding data sets could be very valuable for further quantitative analysis.
      • Providing strong evidence via perturbation studies that the extent of gene expression noise is linked to chromatin-modifying activities, specifically opposing HDAC4/5 histone deacetylase activities and CBP/p300 histone acetyltransferase activities.
      • Proposing a positive-feedback model established by these two opposing activities that is consistent with the reported data from perturbation experiments and on chromatin accessibility / modification states.
      • Providing evidence that also in the natural (human and mouse cell) setting, the regulators HDAC4/5 and CBP/p300 contribute to the control of gene expression noise.

      We thank the reviewer for the careful analysis of our manuscript.

      Major weaknesses are:

      We appreciate that the reviewer pointed out two studies with E. coli and yeast with similar PWM. We believed that their concepts were different. The concept of “stabilized unstable steady states” was a specifically developed in control chaos in physical by Ott, Grebogi, and Yorke (OGY theory, https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.64.1196 ). Their motivation was to feedback control chaos with small perturbation in the systems. Non-feedback control with small periodic perturbation has also been shown to control chaos by stabilizing unstable steady state. The E. Coli work to stabilize an unstable steady state could be considered as an extension of these concepts in complex biological systems. In addition, the location of unstable steady state in a bistable system would decrease with increasing light intensity, as shown in the black dashed line in Figure 2E, inconsistent with our result that the mean mRuby is monotonically correlated with the mean light intensity (Figure 1C).

      It is correct that the hypothesis proposed by Benzinger and Khammash in their yeast paper, that the cooperative TF-gene expression curve is sufficient to generate bimodal distribution with high variable TF distribution, shown in Figure 1G. But it is not the case in our study. In our experiment, GAVPO and mRuby expression do not exhibit clear cooperativity. In addition, the authors didn’t show bimodality unless a non-isogenic cell population is used (Fig. 3h in Benzinger and Khammash’s paper).

      • Insufficient evidence for the postulated bistability caused by positive feedback on chromatin states in the mammalian system analyzed, which has implications for the mechanistic explanations provided (e.g., if PWM allows rapid cell switching between 'high' and 'low' states as postulated).

      We agree with the reviewer that the current technology limits the possibility to obtain more direct evidence of bistability in chromatin states. Our scATAC-seq data shows that chromatin openness oscillated between light “on” and “off” phase with reduced heterogeneity comparing to the dark control. Our bulk data suggest that H3K27ac has larger differences between “high” and “low” states. A better measurement would be single-cell ChiP-seq for H3K27ac. However, the current single-cell ChiP-seq technologies provide coverages too low (~1% of scATAC-seq reads) to support measurements at specific loci (https://www.nature.com/articles/s41592-021-01060-3, https://www.nature.com/articles/s41587-021-00869-9 ).

      • Limited theoretical support for the proposed (not directly observable) mechanisms that uses a mathematical model illustrating the potential consistency, but the model is not directly linked to the experimental data and hence of limited use for their interpretation.

      Our ODE model wasn’t built to fit to the experimental data. We used it to generate hypotheses with perturbation in HDAC4/5 and CBP/p300. We validate the model prediction of inhibition p300 reducing heterogeneity.

      It was validated in experiments. We have built a stochastic model containing all the processes in our ODE model, considered nine independent promoters, and have written the code for stochastic simulation algorithm similar to the yeast paper, and performed optimization. But we don’t have enough CPU time to fit to the experimental data and finding the “global minimum” using the parallel tempering Monte-Carlo method (https://pubmed.ncbi.nlm.nih.gov/19810318/).

      Overall, the authors achieved their aim of elucidating mechanisms for noise control in mammalian gene expression by identifying specific, opposing regulators of chromatin states, with clear support in the synthetic setting, and evidence in endogenous expression control. Conceptual advances regarding strategies for the external control of gene expression noise appear limited because of prior work, which includes more in-depth theoretical analysis in simpler (bacterial, yeast) systems.

      Hence, the likely impact of the work will be primarily on the more detailed (in terms of histone regulators, etc.) study of noise control in mammalian cells, while the data sets presented in the study could prove valuable for follow-up quantitative (model-based) analyses because they are unique in combining different readouts such as single-cell protein and mRNA abundances as well as histone and chromatin states.

      We appreciate that reviewer finds this manuscript support that the molecular mechanisms regulate mammalian gene expression noise control in both synthetic and endogenous gene regulations.

      Reviewer #2 (Public Review):

      The manuscript describes a tool to independently tune mean protein expression levels and noise. Light induces dimerization and subsequent activation of transcriptional activator GAVPO. By introducing 5xUAS (a target sequence for dimerized GAVPO) upstream a mRuby reporter gene, the effect of light can be measured on mRuby mean and noise.

      By pulsing light at different periods (from 100-400 minutes), the authors reduce the mRuby noise for intermediate average light intensities. Notably, the pulses are all applied at an absolute light intensity of 100 uW/cm2, with the average light intensity being modulated through the light-off time-periods. Therefore, as all periods tend towards 100 uW/cm2 average light intensity, the PWM duty cycles becomes more similar to the 100 uW/cm2 AM case.

      Strengths:

      The proposed method is an elegant way to independently tune protein mean and noise. This would have a broad application in the field and is much needed to be able to study the consequence of protein expression noise, independently of mean. In addition, the authors use multiple powerful single-cell techniques to try and determine the mechanism underpinning the light-induced noise modulation.

      During constant exposure to light, increased light intensity increases the mean expression of mRuby, while decreasing the noise. This high noise is mostly due to observed bimodality in mRuby expression. Through ODEs and by using small molecule inhibitors, the authors show that this bimodality is caused by some cells being stably off, while other cells enter an on state. In this on state a positive feedback can occur where initial binding of dimerized GAVPO induces histone acetylation and chromatin accessibility, and thus stimulates further GAVPO binding. Bistability induced by constant light exposure is disrupted using small molecule inhibitors of CBP/p300 HAT activity, indicating that histone regulation is a cause for this observed bistability. The stable on state is demonstrated to be more active and accessible through ChIP-seq and ATAC-seq respectively.

      We appreciate that reviewer recognize that our method of independent tuning protein mean and noise has a broad application and is much needed, and our adaptation of integrating multiple single cell analyses to determine noise control mechanism. We believe that this method would be proven especially useful in cell fate control studies, in vitro with stem cell differentiation or in vivo with embryo development.

      Weakness:

      The single-cell ATAC-seq data indicate that pulsing light induces switching from an accessible (light on) to inaccessible (light off) chromatin state. The authors argue that the switching back into a chromatin inaccessible state prevents the positive feedback to occur and thus reduces noise. However, there are weaknesses in the description of the mechanism by which the pulses modulate (i.e., reduce) noise. Overall, since these sections in the manuscript are not easy to understand, it is difficult to parse what mechanism the authors attributed to the observed noise reduction and to assess if the data supports the conclusions.

      We apologize for the lack of clarity in this aspect. We have extensively rewritten the descriptions in the related sections. As the PWM light intensities alternate between 100 uW/cm2 and dark, which located at high and low monostable states. We need to show if the fraction of times at each state are sufficient. The scATAC-seq data indicate, one 150-minute of 100 uW/cm2 light pulse is sufficient to elevate the chromatin accessibility while reduce the cell-cell variations, two features of the high monostable state. The 450-minute dark period will reduce the chromatin accessibility. In this dark period, the cells will fall back to the low monostable state without sufficient activated GAVPO. H3K27ac has larger dynamic range between low and high state (Figure 3J), but single-cell ChiP-seq methods don’t provide sufficient coverage to assess H3K27ac heterogeneity at the 5xUAS-mRuby loci. Nevertheless, indirect evidences with perturbation of p300 activation or GAVPO-p300 interactions support this picture.

      The data from the single-mRNA live-cell imaging experiments are somewhat ambiguous and do not necessarily support some of the arguments. The conclusion that transcription, nuclear export, and mRNA degradation flatten the pulsatile chromatin caused by the PWM is not clear from the data. Especially, since most cells do not show any pulsatile behavior both in the single-cell ATAC-seq and the live-cell imaging data.

      We improved the presentation of the data. With the data presented in logarithm scale, it is visible that most cells exhibit pulsatile behavior (new Figure 5C). These can be further visualized with averaging over subpopulation of cells. As shown in Figure 5G in the revised manuscript. there are approximated 57% of cells show oscillations. The mean mRNA shows a damped periodic oscillation. The statement that nuclear export, and mRNA degradation flatten the pulsatile chromatin caused by the PWM are postulated due to the rate constants in the literatures, and removed in the revised manuscript. The half-life of mRuby is about 24 hours, sufficiently longer than the period of PWM. We have added an analysis of single-cell mRuby dynamics with 400 min PWM, which don’t exhibit periodic oscillations (Figure 5-figure supplement 2).

      Reviewer #3 (Public Review):

      The authors use a synthetic light-controlled transcription factor (GAVPO) to test a model of bistable gene expression that is hypothesized to originate from positive feedback via local histone modifications by trans-activator recruitment of CBP/p300 to facilitate open chromatin, which facilitates GAVPO binding, etc… Their proposed model for the origin of bistability is important because it should apply to any trans-activator that recruits CBP/p300 to modify chromatin and active gene expression. The authors show that periodic modulation of light reduces the bimodal distribution at intermediate light-intensity levels to a unimodal distribution. This is an elegant demonstration of how GAVPO and different temporal patterns of light can reduce cell-to-cell variability in gene expression, if needed.

      Strengths:

      The authors generate an impressive amount of single-cell data of gene expression and chromatin state (flow cytometry, single-cell sequencing, live-cell MS2-tagging) at different intensity levels. The periodic modulation of GAVPO activity by light is a practical demonstration of how to sculpt the gene expression output in useful ways. This may be a very useful tool for future biologists.

      We thank the reviewer for the positive comments on the mammalian noise control mechanism we discovery and its broad implications.

      Weakness:

      The proposed model for bistability is not convincingly tested or supported by the existing data. Each reporter should exhibit a bistable response because the positive feedback is localized to the promoter via cis-effects on gene expression by local chromatin state/GAVPO binding. The authors show a bimodal distribution of gene expression in a population of cells, which is consistent with a bistable response in a single reporter gene. However, their strain has 9 independent reporters integrated into the genome. Thus, I would expect to see up to 10 peaks, not 2 peaks. Moreover, the mathematical model used to validate their observations does not model the total expression from 9 independent promoters, which is a critical omission given the cis-nature of the positive feedback loop. The fact that these 9 promoters generate 2 peaks at intermediate light intensity suggests that the GAVPO bistability likely originates from a trans-effect, i.e., either all 9 promoters are OFF or all 9 promoters are ON, not a cis-effect.

      We appreciate the reviewer’s insight. We agree that theoretically there should be potentially 10 peaks. The separation between two adjacent “high” peaks is about 2 folds. The experimentally measure high mRuby peak with the lowest CV is about 0.47 (cells under maximum light with LMK-235 and A485, Figure 3B). This variation could overshadow the 2-fold differences in mean mRuby and prevent the recognition of multiple “high” peaks. On the other hand, the difference between low state and any of the high states is large enough to be recognized as separate peaks. We emulate the case with the 9 sites chose “low” and “high” states stochastically and stochastically (Figure 3-figure supplement 2). The 9 potential high peaks are convoluted into a broader peak, similar to experimental observations.

      We agree that our model is very simple and didn’t model the total expression from independent promoter. We have built a stochastic model containing all the processes in our ODE model, considered nine independent promoters. Unfortunately the fitting to experimental data using the parallel tempering Monte-Carlo method costs too much time.

      We performed additional experiments to mutate p65AD of GAVPO to specifically reduce its interaction with CBP/p300. The disappearance of bimodal distribution validates that the direct interaction between UAS-binding GAVPO and CBP/p300 causes the bistability, not a trans-effect through intermediates. We performed single-cell mRuby dynamics and selected cells with nearly identical GAVPO (Figure 2H). The mRuby-high cells elevated earlier and stay at high state (red lines in Figure 2G), and the mRuby-low cells remain low (blue lines in Figure 2G). There are a few cells seem to make the transitions between the two states. These data are consistent with bistability model with small rates of stochastic transition in between. Prior exposure to 100 uW/cm2 light also tilted the distribution toward the “high” state, validate the hysteresis properties of the bistability (Figure 2I-J).

    2. Reviewer #1 (Public Review):

      The manuscript aims to identify origins of stochasticity ('noise') in mammalian gene expression focused on the case when a single transcription factor controls the expression of a target gene. It also aims to devise strategies to control mean and variance of gene expression independently.

      The experimental approach uses a light-induced transcriptional activator in two stimulation modes, namely amplitude modulation (AM: time-constant light input) and pulse width modulation (PWM: periodic light inputs in the form of a pulse train). Perturbation experiments target histone-modifying enzymes to influence epigenetic states, with corresponding measurements of single-cell epigenetic states and mRNA dynamics to dissect mechanisms of noise control. Beyond this synthetic setting, the study is complemented by endogenous gene expression noise in human and mouse cells under the same perturbations.

      Major strengths of the study are:

      • The experimental demonstration that, and under which conditions PWM can reduce gene expression noise in mammalian cells; the corresponding data sets could be very valuable for further quantitative analysis.
      • Providing strong evidence via perturbation studies that the extent of gene expression noise is linked to chromatin-modifying activities, specifically opposing HDAC4/5 histone deacetylase activities and CBP/p300 histone acetyltransferase activities.
      • Proposing a positive-feedback model established by these two opposing activities that is consistent with the reported data from perturbation experiments and on chromatin accessibility / modification states.
      • Providing evidence that also in the natural (human and mouse cell) setting, the regulators HDAC4/5 and CBP/p300 contribute to the control of gene expression noise.

      Major weaknesses are:

      • Limited conceptual novelty because noise-reducing effects of PWM have been demonstrated and analyzed previously in synthetic systems in bacteria (with an engineered positive feedback loop; https://www.nature.com/articles/s41467-017-01498-0) and in yeast (with an engineered single transcription factor as in the present study: https://www.nature.com/articles/s41467-018-05882-2#Sec25).
      • Insufficient evidence for the postulated bistability caused by positive feedback on chromatin states in the mammalian system analyzed, which has implications for the mechanistic explanations provided (e.g., if PWM allows rapid cell switching between 'high' and 'low' states as postulated).
      • Limited theoretical support for the proposed (not directly observable) mechanisms that uses a mathematical model illustrating the potential consistency, but the model is not directly linked to the experimental data and hence of limited use for their interpretation.

      Overall, the authors achieved their aim of elucidating mechanisms for noise control in mammalian gene expression by identifying specific, opposing regulators of chromatin states, with clear support in the synthetic setting, and evidence in endogenous expression control. Conceptual advances regarding strategies for the external control of gene expression noise appear limited because of prior work, which includes more in-depth theoretical analysis in simpler (bacterial, yeast) systems.

      Hence, the likely impact of the work will be primarily on the more detailed (in terms of histone regulators, etc.) study of noise control in mammalian cells, while the data sets presented in the study could prove valuable for follow-up quantitative (model-based) analyses because they are unique in combining different readouts such as single-cell protein and mRNA abundances as well as histone and chromatin states.

    3. Reviewer #3 (Public Review):

      The authors use a synthetic light-controlled transcription factor (GAVPO) to test a model of bistable gene expression that is hypothesized to originate from positive feedback via local histone modifications by trans-activator recruitment of CBP/p300 to facilitate open chromatin, which facilitates GAVPO binding, etc... Their proposed model for the origin of bistability is important because it should apply to any trans-activator that recruits CBP/p300 to modify chromatin and active gene expression. The authors show that periodic modulation of light reduces the bimodal distribution at intermediate light-intensity levels to a unimodal distribution. This is an elegant demonstration of how GAVPO and different temporal patterns of light can reduce cell-to-cell variability in gene expression, if needed.

      Strengths:

      The authors generate an impressive amount of single-cell data of gene expression and chromatin state (flow cytometry, single-cell sequencing, live-cell MS2-tagging) at different intensity levels. The periodic modulation of GAVPO activity by light is a practical demonstration of how to sculpt the gene expression output in useful ways. This may be a very useful tool for future biologists.

      Weakness:

      The proposed model for bistability is not convincingly tested or supported by the existing data. Each reporter should exhibit a bistable response because the positive feedback is localized to the promoter via cis-effects on gene expression by local chromatin state/GAVPO binding. The authors show a bimodal distribution of gene expression in a population of cells, which is consistent with a bistable response in a single reporter gene. However, their strain has 9 independent reporters integrated into the genome. Thus, I would expect to see up to 10 peaks, not 2 peaks. Moreover, the mathematical model used to validate their observations does not model the total expression from 9 independent promoters, which is a critical omission given the cis-nature of the positive feedback loop. The fact that these 9 promoters generate 2 peaks at intermediate light intensity suggests that the GAVPO bistability likely originates from a trans-effect, i.e., either all 9 promoters are OFF or all 9 promoters are ON, not a cis-effect.

    4. Reviewer #2 (Public Review):

      The manuscript describes a tool to independently tune mean protein expression levels and noise. Light induces dimerization and subsequent activation of transcriptional activator GAVPO. By introducing 5xUAS (a target sequence for dimerized GAVPO) upstream a mRuby reporter gene, the effect of light can be measured on mRuby mean and noise.

      By pulsing light at different periods (from 100-400 minutes), the authors reduce the mRuby noise for intermediate average light intensities. Notably, the pulses are all applied at an absolute light intensity of 100 uW/cm2, with the average light intensity being modulated through the light-off time-periods. Therefore, as all periods tend towards 100 uW/cm2 average light intensity, the PWM duty cycles becomes more similar to the 100 uW/cm2 AM case.

      Strengths:

      The proposed method is an elegant way to independently tune protein mean and noise. This would have a broad application in the field and is much needed to be able to study the consequence of protein expression noise, independently of mean. In addition, the authors use multiple powerful single-cell techniques to try and determine the mechanism underpinning the light-induced noise modulation.

      During constant exposure to light, increased light intensity increases the mean expression of mRuby, while decreasing the noise. This high noise is mostly due to observed bimodality in mRuby expression. Through ODEs and by using small molecule inhibitors, the authors show that this bimodality is caused by some cells being stably off, while other cells enter an on state. In this on state a positive feedback can occur where initial binding of dimerized GAVPO induces histone acetylation and chromatin accessibility, and thus stimulates further GAVPO binding. Bistability induced by constant light exposure is disrupted using small molecule inhibitors of CBP/p300 HAT activity, indicating that histone regulation is a cause for this observed bistability. The stable on state is demonstrated to be more active and accessible through ChIP-seq and ATAC-seq respectively.

      Weakness:

      The single-cell ATAC-seq data indicate that pulsing light induces switching from an accessible (light on) to inaccessible (light off) chromatin state. The authors argue that the switching back into a chromatin inaccessible state prevents the positive feedback to occur and thus reduces noise. However, there are weaknesses in the description of the mechanism by which the pulses modulate (i.e., reduce) noise. Overall, since these sections in the manuscript are not easy to understand, it is difficult to parse what mechanism the authors attributed to the observed noise reduction and to assess if the data supports the conclusions.

      The data from the single-mRNA live-cell imaging experiments are somewhat ambiguous and do not necessarily support some of the arguments. The conclusion that transcription, nuclear export, and mRNA degradation flatten the pulsatile chromatin caused by the PWM is not clear from the data. Especially, since most cells do not show any pulsatile behavior both in the single-cell ATAC-seq and the live-cell imaging data.

    5. Evaluation Summary:

      This paper will be of interest to biologists who study mechanisms of cell-to-cell variability in gene expression and those who wish to have a tool to alter variability in mammalian cells. Key regulators of gene expression variability in mammalian cells are identified and noise modulation in a synthetic system is shown. The data quality is high. A model for the origin of the observed noise is proposed, but will require some additional experimental evidence.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #3 (Public Review):

      The manuscript titled "The Shu complex prevents mutagenesis and cytotoxicity of single-strand specific alkylation lesions" investigates the biological function of the Shu complex in S. cerevisiae. The Shu complex, containing a DNA binding module comprised of the Csm2-Psy3 heterodimer, is conserved from budding yeast to man, and contributes to the defense against DNA damage caused by DNA alkylation. DNA alkylation occurs due to spontaneous reactions with metabolites and can be greatly increased by exogenous exposure to DNA alkylating agents. Therefore, it is an important question for how the Shu complex acts to detect and direct repair of alkylation damage. It has been well established that loss of the Shu complex sensitizes cells to alkylation damage, but the mechanism by which this complex locates sites of DNA damage and directs repair is not fully understood. This paper measures the methylation-induced mutation spectrum and uses genetic interactions to argue that the Shu complex may be involved in detecting and directing error-free repair of 3-methyl cytosine. This is a plausible hypothesis based on the body of previous work, however the evidence that Csm2-Psy3 directly detects 3-methyl cytosine sites is indirect. It would be highly significant if this complex recognizes many different structures, but future structural information is needed to understand how this could be possible.

      The strengths of the paper are in the use of whole genome sequencing to map mutation type and location in different genetic backgrounds and in the systematic testing for genetic interactions between csm2 and other DNA repair factors. It appears that the mutation spectra are very similar in the presence and absence of csm2, which suggests a broad role of the Shu complex in the cellular response to MMS.

      The impact of the work is that it could help to explain the cellular program for protection against DNA alkylating agents in budding yeast which has been a very valuable model eukaryotic organism, and raise new questions about how DNA alkylation repair pathways might function in humans that differ from yeast in important features such as in the presence of a direct repair pathway performed by ALKBH2 and ALKBH3.

    2. Reviewer #2 (Public Review):

      The manuscript entitled "The Shu Complex Prevents Mutagenesis and Cytotoxicity of Single-Strand Specific Alkylation Lesions" by Bonilla and colleagues reports that the yeast Shu complex promotes repair of 3meC in single-stranded DNA during S phase. Specifically, the authors show that mutations and cell lethality induced by MMS in csm2∆ cells are suppressed by overexpression of the human ALKBH2. Further, the authors find that the Csm2-Psy3 module of the Shu complex has increased affinity for 3meC-containing DNA relative to unmodified DNA. The authors propose a model, where the Shu complex binds to 3meC-containing DNA to facilitate HR-dependent post-replicative gap-filling.

    3. Reviewer #1 (Public Review):

      This study shows that the Shu complex is critical for 3meC damage tolerance in yeast, supporting the existence of a new pathway for the removal of an important DNA lesion that seems essential in yeast but likely contributes in other organisms. At the same time, it contributes to clarify the distinctive role of homologous recombination in double strand break repair and post-replicative repair.

    4. Evaluation Summary:

      This paper is of potential interest to an audience of DNA repair and cancer biologists because it seeks to refine the mechanism by which cells respond to DNA damage. By combining a number of genetic experiments based on cell survival of different mutant combinations and mutation analysis, their results support the view that Shu is critical for 3meC damage tolerance in yeast. Notably, expression of human ALKBH2, responsible for the repair of 3meC rescues the MMS-sensitivity of Shu mutants but not that of homologous recombination mutants. The study supports the existence of a new pathway for the removal of an important DNA lesion that seems essential in yeast, but likely contributes in other organisms, and helps clarify the distinctive role of homologous recombination in DSB repair and post-replicative repair. A few additional experiments are suggested to strengthen the mechanistic conclusions and better support the central model.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #3 (Public Review):

      The authors tackle an interesting question - whether the dentate gyrus is a locus of pathology in Scn1a+/- mice and uncover a strong phenotype - the granule cells of the dentate gyrus are over-activated and the EC to dentate pathway is prone to seizure genesis. In the discussion, they suggest that their results support the idea that the DG may be a common locus to several different types of epilepsy... an attractive hypothesis! There are several strengths of the paper. The team has done a nice job of presenting 'ground-truth' data that their measurements of dF/F across a large population of granule cells correlates with action potentials in these cells. As the authors point out, this is especially important when working in disease models in which the dF/F-action potential relationship may be altered. Throughout, the authors were also careful about considering the limitations of their various techniques and analyze the data in several ways to account for possible artifacts (e.g. ensuring that differences in activation are not arising because of slicing and consideration of kindling in later in vivo seizure threshold experiments). The experiments were well designed and appropriately interpreted.

      One of most intriguing results of the work is that PV interneurons in the DG of Scn1a+/- show only very minor impairments in young adult animals (they show more spike accommodation than in control animals). Rather, it seems that the GCs receive enhanced excitation from the entorhinal cortex. They perform a set of pharmacological experiments to prove that PV interneurons (and more generally inhibition) do not account for the difference in granule cell activation - however, here it would be useful to see the data summarized more consistently. It is difficult to interpret the pharmacological results (both of which are presented as changes in dF/F0) with respect to the initial findings of the manuscript (presented as estimated activation across the entire population). A beautiful aspect of this work is that it goes from cells to circuits to intact brain (in vivo). They nicely show that the heightened excitation from the EC to the DG is sufficient to drive seizures in the Scn1a+/- mice, and finally that since PVs are intact, they can be harnessed to balance out the over activation of GC via optogenetic stimulation of PVs.

    2. Reviewer #2 (Public Review):

      Mattis et al have used a hemizygous mutant of the gene Scn1a to study changes underlying the severe epilepsy disorder Dravet syndrome. They describe a change in activation of the dentate gyrus in this mouse model, due to altered excitatory synaptic input. They show that this occurs in the age range after normalization of early inhibitory interneuron dysfunction. This provides an interesting potential mechanism by which neural circuit function is altered even after deficits in inhibition are seemingly corrected. They also report that stimulation of inputs to the dentate gyrus increase seizure susceptibility when body temperature is elevated. Overall these findings indicate a new form of circuit dysfunction that may underlie the etiology of this severe genetic epilepsy disorder.

      These findings are not fully complete, and the manuscript suffers from some flaws in experimental design.

      The most pressing issue is the lack of a counter-balanced design in experiments testing the ictogenicity of DG stimulation. The authors attempt to justify this stating "there is a theoretical concern that seizure threshold on Day 2 (the second consecutive day of stimulation) could be lowered by a seizure 24 hours prior (a "kindling"-like phenomenon)". In the very next sentence, they cite a study in which this phenomenon has been shown (thus the concern is not theoretical). That said, this is not a semantic argument, but a flaw in experimental design. On day 1, the authors perform experiment A. On day 2, they perform experiment A+B. In an attempt to show that performing experiment A on day 1 does not by itself lead to changes in experiment A+B, they use a separate cohort and show that experiment A does not lead to changes in a repetition of experiment A. Unfortunately, this is not an adequate control. Experiment A+B involves a different set of stimuli, to which the response could very well be altered by the day 1 experiment, but this change would not be revealed with the described experimental design. To determine whether the effect shown in experiment A+B requires a more rigorous, counter-balanced experimental design where one group undergoes experiment A followed by experiment A+B, and a second group undergoes experiment A+B followed by experiment A.

      The second major issue is a lack of wild type control groups for several experiments. The experiments presented in Figures 4, 6C and F, and 7 all lack the necessary wild type control measures. Wild type controls were done for Figure 6E, but the data are not presented in the figure.

      Some of the cell physiology experiments presented were not optimally designed to provide a relevant mechanistic follow-up to the major findings. For the first major finding of the paper, Figure 2 shows clear and interesting changes in DG activation in the mouse model, and Figure 5 reveals changes to synaptic excitation and inhibition in these neurons. Figure 3 and 4 present data showing changes to PV-interneuron intrinsic properties that only reveal themselves under very intense stimulation. While these findings are interesting and worthy of follow-up, the changes aren't relevant to the synaptic stimulation used in Figure 2.

      Finally, Figure 2 has missing data points, seemingly due to cropping of panels. Data visualization is problematic for this vital figure. The fit lines for individual experiments overwhelm the color-filled variance of the mean. Thus, the data in this figure are very difficult to read and interpret. The figure would benefit from including all the individual data points and summary data, but removing the individual fits or putting them into a supplement.

    3. Reviewer #1 (Public Review):

      Dravet syndrome is a developmental and epileptic encephalopathy resulting from mutations in a sodium channel subunit that is widely thought to cause disease by affecting synaptic inhibition. Here the authors use a well-established mouse model to show that circuit dysfunction results from excess synaptic excitation in the dentate gyrus, potentially providing new insight into the pathological mechanisms underlying seizure activity.

      Strengths of the study include the sophisticated approach of 2P Ca2+ imaging of population activity and whole-cell recording in slices that provide well-supported evidence that circuit dysfunction is independent of GABAergic inhibition. Weaknesses include some oversimplification of the results in the data interpretation such that not all the claims are fully supported and lack of in-depth analysis of the circuit dysfunction with a clear presentation of its developmental time course.

    4. Evaluation Summary:

      Dravet syndrome, a severe seizure disorder resulting from a sodium channel mutation, is widely thought to result from impaired synaptic inhibition. Here the authors present multi-level evidence that excess synaptic excitation in the dentate gyrus is a locus of pathology. These results provide new insight into pathological mechanisms in Dravet syndrome that will be of interest to a broad range of neuroscientists studying epilepsy, as well as the role of the hippocampus and synaptic alterations in neurological disease.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #2 (Public Review):

      In the manuscript entitled "The Crystal Structure of Bromide-bound GtACR1 Reveals a Pre-activated State in the Transmembrane Anion Tunnel", Li et al. analyzed the effect of bromide binding to GtACR1 by X-ray crystallography and electrophysiology. The authors propose that a bromide ion is bound to the intracellular pocket in the dark, inactivated state and induces a structural transition from an inactivated to a pre-activated state.

      I agree that some of the amino acid residues in the current crystal structure change their conformations compared to the previous one reported in 2019 (Li et al., 2019), and it is very impressive that the authors determined the structure using state-of-the-art crystallography technique, ISIMX. However, unfortunately, most of the conclusions and claims described in the manuscript are not well supported by the authors' data.

      1) The most serious problem is that the evidence of bromide binding is too weak. The authors showed the composite omit map in Supplementary Figure 1A, but they should present an anomalous difference Fourier map to validate the bromide binding. The authors also claim that they replaced the bromide ion to the water, run the PHENIX refinement, and observed a strong positive electron density at the bromide position in the Fo-Fc difference map (Supplementary Figure 1B). However, when I do the same thing using the provided coordinate and map (I really appreciate the honesty and transparency of the authors), I could not reproduce their result; a weak positive electron density is observed between the bromide position and Pro58 in chain A and there is no positive peak at the position in chain B (Fo-Fc, contoured at 3σ). I am wondering the occupancy and B-factor of the water molecule they show in Supplementary Figure 1B.

      In addition to the insufficient evidence, the current models of bromide ions have significant steric clashes. The PDB validation report shows that the top 5 serious steric clashes observed in the coordinate are the contacts between the bromide ions and surrounding residues (PDB validation report, Page 10). I analyzed them and found that the distance between the bromide ion and CG and CD atoms of Pro58 in chain A are only 2.43Å and 2.36Å, respectively. The authors claim that such a close proline-halide interaction has also been observed in the structure of the chloride-pump rhodopsin CIR, but in the structure (PDB ID: 5G28), the distances between the chloride ion and CD and CG atoms of Pro45 are much larger (3.43 and 3.91Å, respectively) and there is no steric clash. Moreover, the authors claim that Pro58 changes its conformation by bromide binding, but it is very possible that the PHENIX program just displaces Pro58 to alleviate the steric clash between the proline and the bromide ion, so the authors should carefully check the possibility.

      Overall, the authors should analyze the density again, provide more solid evidence for the bromide binding such as anomalous difference Fourier map, and if they could, they should correct the current significant steric clashes in their models.

      2) To analyze the functional importance of putative bromide binding, the authors prepared W246E and W250E mutants and analyzed their electrophysiological properties. Because tryptophan and glutamate are so different in terms of volume and charge, they should analyze other mutants as well. The authors claim that bromide is stabilized by a hydrogen bond interaction formed by the indole NH group of W246, so they should at least test the W246F mutant.

      3) The authors claim that the bromide binding in the intracellular pocket induces the conformational change of R94, but the causal relationship is doubtful. As mentioned in the manuscript, R94 forms a salt-bridge with D234 in chain A. However, the arginine has a completely different conformation and does not have any interaction with D234 in chain B. If the bromide binds both in chain A and B and induces the conformational change of R94, why only R94 in chain A interacts with D234? The authors change the pH in the crystallization condition compared to their 2019 study (Li et al., 2019), so the pH may affect the protonation state of D223 and/or other titratable residues and induces the conformational change of R94. The authors should provide more solid evidence for the causal relationship between the bromide binding and the conformational change of R94.

      4) The authors assume that the conformational change of R94 creates a functional anion binding site with the Schiff base in GtACR1, but it is too speculative. If the anomalous difference Fourier map does not support the idea, they should delete it.

    2. Reviewer #1 (Public Review):

      The dark structure of GtACR1 has been almost simultaneously published at the end of 2018 and beginning of 2019 by the Deisseroth and Spudich groups, respectively. Both groups did not manage to solve a structure with an ion bound and there is very limited information on the open conformation of the channel. Both groups identified a central constriction site as being central for the gating mechanism but the Spudich group proposes two additional constrictions (C1 and C3). In this work Li et al are able to solve the structure of a GtACR1 with a bromide bound near C3, which clearly represents a significant step towards understanding the mechanism of light gated anion channels. The structure reveals that Br binds to the intracellular constriction site (C3) resulting in a small opening of C3. The data support the notion that the partial electropositivity of Pro58 together with two tryptophans play a critical role in anion interaction at C3, which was also confirmed by mutagenesis studies. In addition, there was a noteworthy conformational change in the Bromide bound protein in the extracellular constriction (C1), a 180 degree flip of Arg 94 resulting in a salt bridge to Asp 234 and a slight opening of the C1 constriction.

      While the data and conclusions are sound, the lack of discussion of their data in the context of the work of others is a bit surprising.

    3. Evaluation Summary:

      This manuscript reports a significant contribution towards an improved mechanistic understanding of light gated anion channels. The studies, which use the recently established method of in meso in situ serial data collection (IMISX), provide a basis for optimizing the anion channelrhodopsin GtACR1 from the alga Guillardia theta as a neuron-inhibiting optogenetics tool. The work will be of interest to anyone using optogenetics for functional studies. The reviewers had a few comments regarding technical aspects of the work.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors)

    4. Author Response:

      Reviewer #1 (Public Review):

      The dark structure of GtACR1 has been almost simultaneously published at the end of 2018 and beginning of 2019 by the Deisseroth and Spudich groups, respectively. Both groups did not manage to solve a structure with an ion bound and there is very limited information on the open conformation of the channel. Both groups identified a central constriction site as being central for the gating mechanism but the Spudich group proposes two additional constrictions (C1 and C3). In this work Li et al are able to solve the structure of a GtACR1 with a bromide bound near C3, which clearly represents a significant step towards understanding the mechanism of light gated anion channels. The structure reveals that Br binds to the intracellular constriction site (C3) resulting in a small opening of C3. The data support the notion that the partial electropositivity of Pro58 together with two tryptophans play a critical role in anion interaction at C3, which was also confirmed by mutagenesis studies. In addition, there was a noteworthy conformational change in the Bromide bound protein in the extracellular constriction (C1), a 180 degree flip of Arg 94 resulting in a salt bridge to Asp 234 and a slight opening of the C1 constriction.

      While the data and conclusions are sound, the lack of discussion of their data in the context of the work of others is a bit surprising.

      We thank the reviewer for thorough reading of our submission and constructive criticism, which helped us to improve the quality of our manuscript. As requested, we added the following paragraph at the end of the Results section (lines 219-233):

      “Studies in 3 different laboratories have concluded that Asp234 is neutral in the dark state from measurements of the D234N mutant of GtACR1 by UV-vis absorption spectroscopy (Kim et al., 2018; Sineshchekov et al., 2016), Resonance Raman spectroscopy (Yi et al., 2016), and FTIR (Kim et al., 2018). Both studies of independently determined crystal structures of GtACR1 attribute the major component of its neutralization to hydrogen-bonding to Tyr207 and Tyr72 (Kim et al., 2018, Li et al., 2019), leaving open partial electronegativity of Asp234 participating in hydrogen-bonding to the protonated Schiff base (PSB). The Asp234 residue is expected to be functionally important given its proximity to the PSB and its nearly universal conservation in microbial rhodopsins. Kim et al (Kim et al., 2018) conducted an extensive analysis of Asp234 and report that the D234N mutation nearly abolished photocurrents. Reduced photocurrents to 20% of wild-type from the D234N mutation were also observed by Sineshchekov et al. (Sineshchekov et al., 2015). Differences in extent of photocurrent reduction are likely attributable to different assay conditions used in these studies. The electrostatic interaction of Arg94 with Asp234 in the pre-activated state may be correlated with the change in the electron conjugation of the retinylidene polyene chain in the dark that we observed by FTIR.”

      Reviewer #2 (Public Review):

      In the manuscript entitled "The Crystal Structure of Bromide-bound GtACR1 Reveals a Pre-activated State in the Transmembrane Anion Tunnel", Li et al. analyzed the effect of bromide binding to GtACR1 by X-ray crystallography and electrophysiology. The authors propose that a bromide ion is bound to the intracellular pocket in the dark, inactivated state and induces a structural transition from an inactivated to a pre-activated state.

      I agree that some of the amino acid residues in the current crystal structure change their conformations compared to the previous one reported in 2019 (Li et al., 2019), and it is very impressive that the authors determined the structure using state-of-the-art crystallography technique, ISIMX. However, unfortunately, most of the conclusions and claims described in the manuscript are not well supported by the authors' data.

      1) The most serious problem is that the evidence of bromide binding is too weak. The authors showed the composite omit map in Supplementary Figure 1A, but they should present an anomalous difference Fourier map to validate the bromide binding. The authors also claim that they replaced the bromide ion to the water, run the PHENIX refinement, and observed a strong positive electron density at the bromide position in the Fo-Fc difference map (Supplementary Figure 1B). However, when I do the same thing using the provided coordinate and map (I really appreciate the honesty and transparency of the authors), I could not reproduce their result; a weak positive electron density is observed between the bromide position and Pro58 in chain A and there is no positive peak at the position in chain B (Fo-Fc, contoured at 3σ). I am wondering the occupancy and B-factor of the water molecule they show in Supplementary Figure 1B.

      We appreciate the reviewer’s effort in analysis of our structure. As described in the Discussion section (lines 238-248), the identification of bromide is supported by multiple lines of evidence: (1) the composite omit map indicates the presence of bromide at the cytoplasmic port (Suppl. Fig. 1A-1B); (2) we exclude the possibility of a water at the bromide position as demonstrated in the Fo-Fc difference map (Suppl. Fig. 1C-1D); (3) the bromide binding site exhibits a similar chemical conformation seen in chloride-binding structures (Auffinger et al., 2004); (4) functional analysis of W250F and W246F are consistent with the H-bond interaction in the bromide binding site (Fig. 2B); (5) Specific interaction of GtACR1 with bromide in the dark state was further demonstrated by FTIR analysis (Fig. 3). Differences in major bands that reflect the ethylenic (C=C) stretch mode of the retinylidene chromophore show a large bromide-induced alteration in the electron conjugation of the retinylidene polyene chain in the dark, confirming that bromide causes a significant structural change. In sum, these data confirm the bromide binding conformation in the structure.

      We agree with the reviewer that the signal of bromide in chain A is stronger than in chain B. We now address the difference throughout the main text and Suppl. Fig. 1. The datasets were collected at 0.91882 Å wavelength, but we did not detect any strong bromide signals in the anomalous difference Fourier map. This may be due to preferential orientation of the thin-plate GtACR1 crystals in the IMISX plate. The weak Br signals may also be attributed to the weak bromide binding conformation, its partial occupancy, and poor intrinsic order. It is not unusual that anomalous signals are influenced by the location of the scatter. For example, in our previous structural determination of YfkE (Wu, PNAS 2013), Seleno-methionine was used to label 12 native Met residues. However, we could identify only 10 Se positions and the other 2 Se were undetectable in the anomalous difference map, despite the dataset collection at the Se absorption peak wavelength. Therefore, the lack of strong anomalous signals does not exclude the presence of bromide in the structure.

      Regarding the reviewer’s question, the occupancy of the water is 1 and its B-factor is 71.

      In addition to the insufficient evidence, the current models of bromide ions have significant steric clashes. The PDB validation report shows that the top 5 serious steric clashes observed in the coordinate are the contacts between the bromide ions and surrounding residues (PDB validation report, Page 10). I analyzed them and found that the distance between the bromide ion and CG and CD atoms of Pro58 in chain A are only 2.43Å and 2.36Å, respectively. The authors claim that such a close proline-halide interaction has also been observed in the structure of the chloride-pump rhodopsin CIR, but in the structure (PDB ID: 5G28), the distances between the chloride ion and CD and CG atoms of Pro45 are much larger (3.43 and 3.91Å, respectively) and there is no steric clash. Moreover, the authors claim that Pro58 changes its conformation by bromide binding, but it is very possible that the PHENIX program just displaces Pro58 to alleviate the steric clash between the proline and the bromide ion, so the authors should carefully check the possibility.

      Overall, the authors should analyze the density again, provide more solid evidence for the bromide binding such as anomalous difference Fourier map, and if they could, they should correct the current significant steric clashes in their models.

      We thank the reviewer for pointing out the steric clashes. We have corrected them in the revised structure as demonstrated in the latest validation report. As described in the Results section (line 107-109), the distance between the bromide ion and CG and CD atoms of Pro58 in chain A are now 3.6 Å and 3.1 Å (see the updated structure pdb), respectively, and the distance between the bromide ion and CG and CD atoms of Pro58 in chain B are 4.0 Å and 3.2 Å, respectively, similar to those distances between the chloride ion and CD and CG atoms of Pro45 in ClR (3.43 and 3.91Å, respectively). These modifications do not alter the structure beyond the local binding site of the bromide, and do not change our conclusions.<br> We do not agree that the Br--induced conformational changes are due to the refinement program. To further confirm the Pro58 position, we have performed a refinement by removing Pro58 and adjacent residues using PHENIX. The resulted electron density map shows a positive electron density at the Pro58 position, confirming the conformational changes induced by bromide binding.

      2) To analyze the functional importance of putative bromide binding, the authors prepared W246E and W250E mutants and analyzed their electrophysiological properties. Because tryptophan and glutamate are so different in terms of volume and charge, they should analyze other mutants as well. The authors claim that bromide is stabilized by a hydrogen bond interaction formed by the indole NH group of W246, so they should at least test the W246F mutant.

      We thank the reviewer for this important suggestion, which helps confirm the bromide binding conformation. The glutamate substitutions were chosen to assess the specific anion selectivity and conductivity of GtACR1 due to the negative charge of its side chain. We now include the data of W246F and W250F in Fig 2B. W250F shows reduction of the current amplitude by 50%, whereas W246F behaves like WT. These results are consistent with the structural observations in which W250, but not W246, stabilizes bromide via H-bond interaction. These results are provided in the Results section (lines 136-142) and in the revised Fig. 2B.

      3) The authors claim that the bromide binding in the intracellular pocket induces the conformational change of R94, but the causal relationship is doubtful. As mentioned in the manuscript, R94 forms a salt-bridge with D234 in chain A. However, the arginine has a completely different conformation and does not have any interaction with D234 in chain B. If the bromide binds both in chain A and B and induces the conformational change of R94, why only R94 in chain A interacts with D234? The authors change the pH in the crystallization condition compared to their 2019 study (Li et al., 2019), so the pH may affect the protonation state of D223 and/or other titratable residues and induces the conformational change of R94. The authors should provide more solid evidence for the causal relationship between the bromide binding and the conformational change of R94.

      We did not change the pH in the crystallization condition compared to our previous crystallization of GtACR1. Both structures were obtained at pH 5.5 as noted in the manuscript. In our structure, the only bromide binding site was identified near C3 and no bromide was found at C1. We address this result in Discussion (lines 276-286) as follows:

      “The conformational change of Arg94 near C1 is not likely to be directly induced allosterically by bromide binding at distant C3 since it is only observed in chain A, not in chain B. Instead, this conformational change may reflect the intrinsic flexibility property of Arg94 in the tunnel in the bromide-bound state. Although both Arg94 of GtACR1 (in chain A) and Arg95 of CIR adopt a similar conformation (Fig. 4B), these two counterpart residues appear to be stabilized by distinct H-bond networks. In GtACR1, inward Arg94 only forms a salt-bridge with Asp234 and an H-bond with a water molecule (Suppl. Fig. 2A). However, in the CIR structure, in addition to the salt bridge, R95 is further stabilized by three polar residues, Asn92, Gln224, and Thr228, via two water molecules from the extracellular side of the protein (Suppl. Fig. 2B). The absence of these polar residues and waters in the vicinity may liberalize Arg94 and facilitate its flip-flopping in the tunnel of GtACR1.”

      4) The authors assume that the conformational change of R94 creates a functional anion binding site with the Schiff base in GtACR1, but it is too speculative. If the anomalous difference Fourier map does not support the idea, they should delete it.

      Our hypothesis (not an assumption) is based on the following facts: (1) both rhodopsin proteins GtACR1 and ClR transport the same halide substrates; (2) the chain A of GtACR1 adopts a nearly identical chemical conformation to that in the chloride-binding site (site 1) of CIR, in which the counterpart residue R95 forms a chloride binding site with the Schiff base (Fig. 4B); and (3) Arg94 is important to anion conductivity of GtACR1 (Li et al. eLife 2019). It is reasonable to hypothesize that Arg94 forms a putative anion binding site with the Schiff base in GtACR1. To make this hypothesis clear, we listed these facts in the text and rephrased our hypothesis as follows (lines 217-219): “Based on the similar chemical conformations (Fig. 4B), it is possible that Arg94 rotates its side chain to form an anion binding site with the Schiff base in GtACR1.”

    1. Author Response:

      Reviewer #1 (Public Review):

      The study by Hendley et al takes advantage of duct-specific DBA-lectin expression to purify pancreatic ductal populations that were then subjected to scRNA-seq analysis. The ability to enrich for this relatively low abundant pancreatic cell population resulted in a more robust dataset that had been generated previously from whole pancreas analyses. The manuscript catalogs several different gene clusters that delineate heterogeneous subpopulations of three different pancreatic ductal subpopulations in mice: mouse pancreatic ductal cells, pancreatobiliary cells, and intra pancreatic bile duct cells. Additional comparisons of the resulting data sets with published embryonic and adult datasets is a strength of the study and allows the authors to subclassify the different ductal cell populations and facilitates the identification of potentially novel subpopulations. Pseudotime analysis also identified gene programs that led the authors to speculate the existence of an EMT axis in pancreatic ducts. Overall, the data analyses is strong, but the authors tend to draw conclusions that are not fully supported by the presented data.

      The second half of this study focuses on three candidate proteins that were identified in the transcriptome analysis - Anxa3, SPP1 and Geminin. Crispr-Cas9 was used to delete each gene in an immortalized human duct cell line (HPDE). Deletion of each gene resulted in increased proliferation; SPP1 mutant cells also displayed abnormal morphology. Additional functional studies of the cell lines or in mouse models suggested a role for SPP1 in maintaining the ductal phenotype and Geminin in protecting ductal cells from DNA damage, respectively. Although the provided phenotypic analysis suggest important functional roles for these proteins, follow up studies will be required to fully understand the role of these genes in homeostatic or cancer conditions.

      Strengths:

      1) Enrichment of pancreatic ductal populations enhanced the robustness of the scRNA-Seq dataset

      2) Quality of the sequencing data and extensive computational analysis is extremely good and more comprehensive than previously published datasets

      3) Comparative analysis with existing mouse and human data sets

      4) Use of human ductal cell lines and mouse models to begin to explore the function of candidate ductal genes.

      Weaknesses:

      1) There are many suppositions based on gene expression changes that are somewhat overstated.

      2) The conclusion that there is an EMT axis in pancreatic ducts is not fully supported by the gene expression and immunofluorescence data

      3) A good rationale for choosing Anxa3, SPP1 and Geminin for additional functional analysis is not provided. In addition, it isn't clear why Anxa3 function isn't pursued further.

      4) Although extensive models (transplanted cells for SPP1 and mouse conditional KOs for Geminin) were generated, the functional analysis for each gene is preliminary; additional longer term studies will be necessary to fully understand the role of these proteins in pancreatic duct development and cancer.

      We would like to thank the Reviewer for their fair and thoughtful review of our manuscript. We agree with the comments and have addressed them as described in detail below. In particular, we have focused on streamlining the presentation and description of our bioinformatic analysis, providing additional rationale for using the particular genes we focused on in the follow-up analyses, and including additional data to support the EMT axis.

      Reviewer #2 (Public Review):

      In this study the authors address the heterogeneity of the mouse ductal cell at the single cell level and conduct functional studies for selected marker genes. They isolated duct cells using the DBA lectin as a molecular surface marker. This is an noteworthy approach as it does not rely on the specificity and expression levels of reporter lines. Isolated cells contained a majority of non-duct cells that were identified by their transcriptomic profile and excluded from further analysis. The transcriptomic profiles of bona fide duct cells were then subjected to standard analyses for differentially expressed genes, activated pathways and lineage relationships. Of particular interest is the comparison of these data with human data from a recently published study that used a different sorting strategy for duct cells. As more studies at the single cell level are conducted, these types of comparisons need to become part of them in order to derive commonalities and identify deficits due to methodological or technological limitations. The study was by necessity descriptive up to this point and the authors addressed this with functional studies on SPP1 and GMNN which suggested that SPP1 is necessary for the maintenance of the ductal differentiated phenotype whereas GMNN protects cells against DNA damage during increased proliferation triggered by chronic pancreatitis.

      It is an interesting study, but there are caveats, particularly concerning the functional studies. The functional analysis of SPP1 needs to be strengthened and some findings on the the analysis of GMNN clarified. There is also an over reliance on the outcome of pathways analyses and upstream regulators which are often treated as actual findings rather than possibilities to be explored in this or future studies. The single cell RNA Seq analysis would benefit from reducing speculation and restrict descriptions to the essential features of each cluster. Main figures for this analysis could also be simplified along the same lines.

      We thank the reviewer for appreciating our study as “interesting” and for considering our investigations as a “noteworthy approach”. We are glad that the reviewer acknowledges our efforts in delivering a manuscript with necessary descriptive bioinformatics analysis followed up with functional studies for select subpopulation markers. Conversely, we took the constructive criticism seriously and added new data to further substantiate our claims.

      Reviewer #3 (Public Review):

      In this study, the authors present a high-resolution single-cell transcriptomic atlas of the pancreatic ductal tree. Using a DBA+ lectin sorting strategy murine pancreatic duct, intrapancreatic bile duct, and pancreatobiliary cells were isolated and subjected to scRNA-seq. Computational analysis of the datasets unveiled important heterogeneity within the pancreatic ductal tree and identified unique cellular states. Furthermore, the authors compared these clusters to previously reported mouse and human pancreatic duct populations and focused on the functional properties of selected duct genes, including Spp1, Anxa3 and Geminin. Overall, the results presented here suggest distinct functional roles for subpopulations of duct cells in maintenance of duct cell identity and implication in chronic pancreatic inflammation. Finally, such detailed analysis of the pancreatic duct tree is relevant also in the context of cancer biology and might help elucidating the transition from pancreatitis to pancreatic cancer and/or different predisposition to cancer.

      The study is very well done, with careful controls and well-designed experiments.

      We thank the reviewer for appreciating our study as “very well done” as well as envisaging the potential relevance of our findings to cancer biology.

    1. Author Response:

      Reviewer #1 (Public Review):

      We thank the Reviewer #1 for their valuable comments. We agree with the Reviewer that our current results are not sufficient to confirm the therapeutic effects. The statement related to therapy is removed.

      The study by Song and colleagues explores the role of circRNAs in fibrosis of the endometrium. Endometrial cells for patients with and without fibrosis were subjected to expression profiling analysis, and circPTPN12 and miR-21-5p were strongly separate in fibrosis in endometrial, with circPTPN12 acting as an inhibitory factor for miR-21-5p. Through the use of various molecular approaches, the authors further that miR-21-5p inhibition results in upregulation of ΔNp63α, and transcription factor that induces EMT. The role of circPTPN12 was also confirmed in vivo using a mouse model of mechanically induced endometrial fibrosis. The authors concluded that targeting the path circPTPN12/miR-21-5p/∆Np63α may be a therapeutic strategy for endometrial fibrosis.

      The authors clearly and convincingly show the involvement of the circPTPN12/miR-21-5p/∆Np63α in EMT and its potential involvement in endometrial fibrosis. Whether or not this can be a therapeutic target is too preliminary at this point. First because the in vivo experiments confirm the link between circPTPN12/miR-21-5p/∆Np63α at the RNA level only (p63) and it would be more convincing to see protein data as well.

      We did try to detect the protein of ΔNp63α in mouse with immunochemistry and immunofluorescence, using three antibodies (CST, cat# 67825 and 39692; Abcam, ab124762). Unfortunately, we did not obtain positive results. However, ΔNp63α mRNA was significantly changed.

      The involvement of p63 in the process remains a little elusive in this paper.

      We have reported that ΔNp63α is ectopically expressed in endometrial epithelial cells in IUA patients (Cao et al., 2018), and showed that ΔNp63α promotes the expression of SNAI1 by DUSP4/GSK3B pathway and induces EECs-EMT and fibrosis (Zhao et al., 2020). We've put this description of ΔNp63α in the discussion section (2nd paragraph).

      In addition, if the authors believe this pathway can be a real future target to treat endometrial fibrosis, they could better contextualise such a statement, specifically describe what kinds of therapeutic intervention they think of, like regression or prevention of fibrosis. These should be tested in vitro and in vivo.

      Our results showed that replenishing miR-21-5p can reverse EMT and remit endometrial fibrosis in vivo and in vitro. However, the therapeutic intervention of miR-21-5p in clinic needs more research on other animal models such as rats, pigs, and non-human primates. Thus, we removed therapeutic statement (page 1, Line 1-2; and page 2, Line 37-40; and page 4, Line 74-76; page 13, Line 273).

      More evidence of the involvement of circPTPN12/miR-21-5p/∆Np63α and the correlation between the three players using clinical material is also necessary.

      The involvement of ∆Np63α in endometrial fibrosis has been proved in our published paper and results are quoted in this paper (Zhao et al., 2020). The correlation between circPTPN12 and miR-21-5p using clinical material was listed in Figure 2J. In vivo and ex vivo experiments had confirmed that overexpression of circPTPN12 downregulates miR-21-5p and upregulates ∆Np63α (Figure 3H/Figure 4J/ Figure 5B/ Figure 5E). In addition, ex vivo experiments suggested that the decrease of ∆Np63α is secondary to the increase of miR-21-5p (Figure 4C-E).

    1. Joint Public Review:

      Strengths & Overall Comments:

      This behavioral study aims to provide an account of the spontaneous behavior of mice as they learn to explore a novel maze in search of a water reward. The authors analyze the trajectories of mice as they adapt to the labyrinth with particular focus on decisions taken at nodes and T junctions. They describe extremely rapid route learning to home and discontinuous exploratory learning or 'light bulb' moments as evident by instantaneous improvements in navigation performance. The authors capture most of the variance in their overall data with a predictive Markov models that could account for the much subsequent actions of the mouse as it moves from one node to the next. The study should be important to anyone who spends their time thinking about decision-making in mice. It highlights the importance of considering ethologically relevant tasks for understanding decision making in rodent species.

      In this submission, the authors introduce a new experimental paradigm for the study decision making in naturalistic contexts, presenting an opportunity to observe these dynamics away from the standard two-alternative-forced-choice paradigm. The application of modern tracking and posture analysis to maze exploration by rodents generates rich and interesting data, and allows the authors to do their experiments with many animals, and with nearly no human interference or specific instructions. The design of the maze is clever, using an underlying tree-like structure (with the tree folded so it precisely and fully occupies a rectangular area), and relatively deep (6 branching points from main trunk to a leaf node). Mice explore this voluntarily, and water-restricted mice learn to find water rewards at a leaf of the maze. The authors thus study truly voluntary and highly interesting complex behavior, and in a high-throughput way. By studying the dynamics of a mouse in a maze, the authors perform a careful set of analyses, describing discontinuous learning dynamics and the effects of history on decision-making. These results should be of interest to a wide group of behavioral neuroscientists that are attempting to understand the neural basis of how animals make decisions in complicated natural environments.

      The data set released with this submission will be of broad use to the community, and we would not be surprised to see dozens of papers using it moving forward.

    2. Evaluation Summary:

      This study lays the groundwork for a new level of precision in understanding mouse navigation behaviour by studying complex decisions that approximate those made in the wild, but can nevertheless be analysed with mathematically precise tools. Several exciting observations are made about navigation strategy. The manuscript will therefore be of broad interest across behavioural neuroscience. However, in its current form, some questions remain about some of the major claims.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Reviewer #3 (Public Review):

      The authors have re-sequenced 310 quinoa accessions and carried out field phenotyping of the same set of accessions for two years in order to characterize genetic diversity and analyze the genetic basis of agronomically important traits.

      The main strength of the manuscript is that the authors have carefully characterized more than 300 quinoa accessions, achieving a sufficiently large population size for GWAS analysis with good statistical power. It is especially promising that the phenotypes all show high heritability. This indicates that the field phenotyping was of high quality and provides a good starting point for discovering relevant marker-trait associations. In addition, the authors provide convincing evidence for distinct population characteristics of highland and lowland quinoa, adding additional information compared to previous work (Maughan, 2012).

      The weak points are related to the genotype data and the conclusions drawn based on the GWAS analysis.

      1) An important issue is related to the relatively low depth of coverage (4-10x) that was used for re-sequencing. Across the accessions, there is a pronounced negative correlation between the mean sequencing depth and the heterozygosity level, indicating that heterozygotes are overcalled in individuals with low coverage. This also results in heterozygosity levels that are generally higher than expected for what is assumed to be mainly homozygous inbred lines.

      2) Another potential issue concerns SNPs called in repetitive regions. Among the significant GWAS SNPs identified, a very large proportion appears to be found in intergenic regions. While this does not rule out that some of them are genuinely important associations, it does suggest a potentially high level of noise in the GWAS results. In addition to the filtering already imposed, which includes a filter for mapping quality, the SNPs called in intergenic regions with unusually high coverage could be more closely examined to determine the extent of the issue. Masking repetitive genomic regions using RepeatMasker or similar programs could be useful.

      3) When the authors discuss their GWAS results, they frequently focus on cherry-picked candidate genes, although, in several cases, the top SNPs in the region in question are not found within these candidates. A more broad focus on all genes within the LD blocks, while still mentioning the candidate genes, would be more informative.

      4) The manuscript includes statements that a particular genotype "results in" some phenotypic outcome, although no causal relationship has been demonstrated. In general, there is a tendency to draw too strong conclusions based on the GWAS results.

      5) As this is primarily a resource paper, the authors should make the complete genotype and phenotype data as well as the layout of the field trials available. It would not be possible to reproduce the GWAS analysis based on the data included with the current version. They should also clarify how the quinoa accessions described will be made accessible to the community and provide all scripts used for data analysis through GitHub or a similar repository.

    2. Reviewer #2 (Public Review):

      A key genomic study on emerging, nutritious, alternative grain crop.

      Deep genomic data on hundreds of land races/accessions.

      Population structure analysis, could be enhanced.

      Agronomic growth and yield traits are correlated and environmentally sensitive.

      Genomic dissection via GWAS to multigenic loci with candidate genes add genomic prediction and selection.

      Inference on domestication.

    3. Reviewer #1 (Public Review):

      The paper details a whole genome re-sequencing of 310 accessions of quinoa. This provides a good glimpse of diversity in this orphan crop, plus the GWAS studies are able to help provide the foundations for identifying key genes in quinoa variation. This will certainly advance our knowledge of this increasingly important orphan crop.

      1) One issue that permeates the entire paper is that the analysis is fairly basic and the authors do not make full use of the data. The analysis of population diversity is restricted to PCA, ADMIXTURE and phylogenetic analysis. It would probably broaden the impact of the paper if they can do deeper analysis of quinoa diversity, maybe looking at demographic history, looking at selection of highland vs. lowland, etc.

      2) There is a focus on the rapid LD decay, which the authors attribute to the short breeding history and low selection. That seems like a stretch to make this conclusion based solely on LD decay. As they point out, many other factors could account for this, and the authors should provide other lines of evidence to draw this conclusion.

      3) The GWAS analysis is good and does provide a good foundation for quinoa genetics. The authors discuss possible candidate genes is these GWAS regions. For the thousand seed weight, the relative small span of the GWAS peaks allows for localization of just a few genes in the GWAS region (CqPP2C5 and the CqRING). The GWAS associated with flowering time is larger - 1 Mb with 605 genes - but the authors focus on the GLX2-1 gene. This is again a stretch, as the large region precludes narrowing the candidate list unless there was a compelling mutation (for example a deletion or insertion of a major flowering time gene).

    4. Evaluation Summary:

      This is a comprehensive study of genomic and phenotypic diversity in the orphan crop quinoa. Based on whole genome resequencing of 310 accessions and field phenotyping of the same set of accessions for two years, the study identified the genetic basis of agronomically important traits. Based on this promising work, there will likely be scope for quick improvement of this orphan crop through breeding.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

    1. Reviewer #3 (Public Review):

      Cole and co-authors report the development of a novel immunofluorescence technique, where targets of interest are analysed over iterative cycles of staining-imaging-elution(stripping). This method allows for the multiplexed analysis of protein targets, well beyond the usual constraints of such technique (limited by availability of filters and non-overlapping wavelengths of fluorophores). The authors also present several applications of such technique, highlighting how the advantage of being able to record additional parameters (such as cell morphology) can be an advantage over more high-throughput methods such as spatial-resolved transcriptomics.

      The technique has been carefully tested. Staining for the same markers after several rounds of stripping/reprobing shows high concordance, indicating that the iterative treatment and staining of the same tissue section is not altering the detection of protein markers.

      The authors tested staining with a total of 18 antibodies, and suggest that this number can be increased arbitrarily, as the number of iterations is not limited. Further, they suggest that this technique can be applied to virtually any tissue. It is quite possible that this technique can be readily applied to any other tissue, as the only constraint seem to be the robustness of antibodies. The authors may include the suggestion that previous success of immunofluorescence on a particular tissue type could be a good indication for the success of the iterative staining.

      The proposed 4i method is quite interesting, has great potential and is likely to be of very wide interest.

    2. Reviewer #2 (Public Review):

      Methods to characterize cell types in intact tissue using large scale analysis of molecular expression profiles are now readily available, with the best example being in situ RNA sequencing (spatial transcriptomics). However, these methods depend on separate immunohistochemical investigations to define the precise cellular and subcellular distribution of the protein products. Cole et al use iterative indirect immunofluorescence imaging (4i, Gut et al Science 2018) to compare the immunoreactivity of an impressive 18 different molecules within the same brain sections containing the dentate gyrus from young and old mice. First, they demonstrate that the method can be applied to not only adult mouse brain tissue, but also to human embryonic stem cell derived organoids and mouse embryonic tissue, which is an advance on the original report (Gut et al 2018). This demonstration is particularly important as it shows the potential for applying 4i to different biological disciplines. The rest of the manuscript focuses on the mouse dentate gyrus (DG) at 2, 6 and 12 months of age in order to map the complex changes and associations in the tissue across age. Various combinations of the 18 molecules are used to define different cell types and it incredibly informative to be able to view so many molecules in exactly the same area and will advance the field. This is the greatest strength of the manuscript. They find that neurogenic, radial glia-like stem cells (R cells) and proliferating cells are reduced in aged animals, as are immature (DCX+) cells, but claim that fluorescence intensity increases for the remaining R cells in 12 month old mice. They report that the density of vasculature also decreased with age, as did the associated pericytes, but astrocytes associated with the blood vessels increased. The last part of the manuscript defines 'microniches' (random or targeted regions of interest within the DG) and attempts to show how cell types, especially Nestin+ R cells, change in their associations with vasculature within these sub-regions at 2, 6 and 12 months of age. It is a commendable approach and the authors use a variety of statistical tests to compare the different cell types. However, there are several parts of the methods, along with insufficient details of the results that prevent full interpretation of the data, meaning that it is difficult to determine whether all conclusions are supported.

      1) There are many factors that can affect the measurements of immunoreactive structures (Fritschy, Eur J Neurosci, 2008 vol 28, p. 2365-70). The main limitation is not providing sufficient detail for the immunolabelling design and imaging parameters but providing some unclear details for the imaging analysis (below).

      a. In terms of immunohistochemistry, with the impressive number of tested antibodies, there is potential for variation due to antibody antibody penetration, unreported combinations of secondary antibodies, tissue quality (variations in fixation), etc. It is difficult to have confidence in the conclusions based on a total of 3 mice per age group for a single 40 um section per mouse. Ideally, to increase confidence in individual section variability, it is recommended that measurements should be taken from at least 3 sections per mouse then averaged, before averaging for the age group.

      b. Assuming there were 3 primary antibodies with 3 secondary antibodies per cycle before elution, were the combinations used consistent for all brain sections and mice? Was the testing and elution order the same (i.e. systematic)? There is a risk of cross-excitation and mis-interpretation of true immunoreactivity if spectrally close fluorophores for the secondary antibodies were selected for primary antibodies that recognize spatially overlapping structures. Can the authors show the cycle number and fluorophore for the examples in figures 1 and 2 to determine which markers were imaged together in the same cycle? This would give confidence to the methods for colocalisation and cell type descriptions. For example, can cross-excitation be ruled out for some of the signals in the images used in Fig 2 (duplicated in Fig 4) such as intensely immunopositive Laminin-B1 cells in the MT3 and Sox2 channels (2A) and Ki167, SOX2 and phospho-histone 3 channels (2C)?

      c. For image acquisition, details are required on the resolution (numerical aperture of the lenses) in order to interpret colocalisation measurements in the later figures. Which beamsplitters/filters were used, and was the same laser power used for the same markers over different specimens (important for interpreting figure 4 data)?

      d. For the analysis of ROIs (figures 3-6), were the 20x or 40x images used?

      e. Details of the antibody specificity controls should be provided.

      2) Numerous markers have been used to define different cells, but the proportions are not reported. For example, R cells are defined differently in figures 3 and 4. How many types of R cells (based on combinations of markers) were observed? High resolution examples of each defined cell type (neuronal and glial) would assist the reader in the confidence of the measurements (ideally as single channels side by side, with arrows indicating areas of detectable immunoreactivity that the authors would use to define each cell).

      3) The authors use HOPX and GFAP immunoreactivity and a lack of detectable S100beta immunoreactivity to distinguish R cells from triple immunopositive mature astrocytes. In Figure 3, the images are too low power to be able to confirm this. This part would benefit from some single cell examples showing the separate channels.

      a. Furthermore, the results (paragraph 2, page 7) report changes in cell number, but rather density is reported. Please either state the numbers or refer to density.

      b. Related to Fig 3, there are no details of the number of R cells counted in supplementary table 1. How were the density measurements obtained? How thick were the image stacks and how many R cells per section? Similarly, as stated in methods, for glial cells, 100 cells were randomly counted in each section (presumably the same count for each age), so how was it reported that specifically the numbers of astrocytes were reduced and no significant differences in other glial cell types? (bottom of p.7)

      4) An increase in fluorescence intensity for HOPX and MT3 (also marks R cells) was observed with age (Fig 4), with methods stating that the 5 ROIs used to calculate the background intensity were measured at each [optical?] slice for where the cells were measured, to account for unequal antibody penetrance. Several clarifications are required in order to interpret these results: For the example HOPX images in Fig 4A, for the 2 month old mouse, the background is low, whereas for 12 months, the background is far higher, meaning different background ROI values. Can this difference be explained by differences in laser power, contrast adjustments, optical slice thickness, or whether these are maximum intensity projections of different z thickness? These values must be reported, and for each image presented in the manuscript, details must be included as to what type of image (z-projection or single optical slice, z thickness). Was the optical section(s) of the 12 month mouse imaged closer to the surface of the section for this example in Fig 4A? Were cells sampled at all depths of the imaged volume? Did the antibody show better penetration in the 12 month old mice than the 2 month old mice? How many optical slices would a cell soma cover? In these cases, how was the fluorescence intensity measured? If a soma covered several optical slices, which one was selected for the ROI measurement?

      5) The described methods for studying cellular interactions are not clear, making it difficult to interpret the associations between vasculature, cell types, and age. How was colocalisation defined, and at what resolution? For example, it is expected that GFAP would be associated with but not directly colocalized with collagen IV (Fig 5). In these cases, the manuscript would benefit from high resolution examples of this colocalization/interaction. How many ROIs were taken, how exactly were the ROIs for cell types associated with collagen IV selected, was this in 2D or 3D?

      6) The methods for random microniches are difficult to follow, as are the methods for investigating the associations of other markers to radial processes of R cells. Please provide a definition of a 'spot'. Again, details of the micron per pixel resolution and optical slice thickness would help in the interpretation of results. Additionally, if possible, illustrated examples of the full procedure for niche mapping should be provided in order to follow how the measurements were collected.

    3. Reviewer #1 (Public Review):

      Overall the analysis is conducted well and is convincing. The characterisation of neural stem cells using 7 markers as well as their morphology and position, is particularly thorough.

      My main criticism is that the study purports to address the effect of aging but the ages analysed only range from 2 months to 12-months. As 12 month-old mice are still middle aged, it is difficult to conclude anything about the process of ageing, which is usually studied in much older mice (18-24 months). Indeed, some of the changes that the authors associate with an "ageing phenotype" appear in microniches already in 2 month-old mice and are predominant at 6 months. This suggest that the authors are documenting the transition from an immature/juvenile state, which is predominant in 2 month-old mice, to a mature/adult state, which already appears at 2 months but becomes predominant at 6 and 12 months. Importantly, this adult state, including the reduced number of neural stem cells, might not be dysfunctional but on the contrary, may perform very well its role of producing small numbers of new neurons as required during adult neurogenesis.

      Another, lesser concern is that, based on antibody staining performed in tissues from 2-month and 12-month-old mice, conclusions are made on the different expression levels of HOPX, MT3 and LaminB1 analysed at different ages. This assumes that the efficiency of antibody staining is the same in different samples analysed in parallel but this is not shown.

    4. Evaluation Summary:

      The objective of this study is to develop a novel immunofluorescence technique allowing for the multiplexed analysis of protein targets. This 4i method is an important technical advance will be of great interest for the scientific community.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

    1. Joint Pubic Review:

      Church et al. carry out a mechanistic study focused on regulation of PKA activity at a specific multiprotein complex nucleated by the scaffolding protein AKAP79. The manuscript presents a rigorous biochemical approach combined with computational modeling to address fundamental issues related to PKA signaling. This is a very important but complex system and the authors have nicely addressed it using in vitro approaches. The in vitro data provide evidence that suggests that the phosphatase calcineurin (CaN), by dephosphorylating the PKA regulatory subunit type II (RII), promotes rapid re-association of the PKA catalytic subunit (C) to RII, leading to PKA inactivation. The model proposed is that this modality of PKA inactivation takes place selectively at the multiprotein complex organized by AKAP79, where CaN, PKA and PKA phosphorylation targets are co-localized: the proximity of CaN to RII at the AKAP79 complex would enhances the efficiency of RII dephosphorylation by one order of magnitude, allowing fast re-association of C and RII subunits. This would reduce the proportion of free C subunits and therefore the level of local PKA substrate phosphorylation. Using purified the FRET reporter AKAR4 as a reporter for PKA activity, they further confirm that the level of phosphorylation of this PKA target at a given cAMP concentration depends on the ability of CaN to interact with AKAP79. Based on these findings the authors conclude that CaN anchored to AKAP79 dephosphorylates AKAP79 anchored RII, leading to fast recapturing on C and inhibition of PKA catalytic activity. They then create a kinetic model for this process where cAMP and calcium are working in opposing ways. Notably, the authors also provide an estimate for the concentration of RII subunits in the hippocampal CA1 neuropil layer and find that this falls within the range at which CaM efficiently dephosphorylated RII in vitro.

      In the context of compartmentalized cAMP/PKA signaling, this mechanism would provide yet another regulatory feature to ensure specific control of target phosphorylation at individual subcellular locations. For example, in dendritic spines PKA regulates long-term depression (LTD) of CA3-CA1 hyppocampal synapses via phosphorylation of AMPA-type glutamate receptors, which is facilitated by simultaneous interaction of receptor and kinase with AKAP79. In this context, at a given cAMP concentration, CaN-dependent inhibition of PKA activity would selectively attenuate AMPA phosphorylation and LTD, while PKA may still be able to phosphorylate targets at other sites.

      The paper presents very clear biochemical data but can be further strengthened by some additional attention to the following:

      While the in vitro data convincingly demonstrate the requirement for CaN to be anchored to AKAP79 for efficient dephosphorylation of RII and confirm that phosphorylation of RII at S98 results in more active PKA, the requirement for RII to be anchored to AKAP-79 for this regulation is not investigated, leaving open the possibility that the more efficient dephosphorylation of RII in vitro may be due increased catalytic activity of CaN when the phosphatase is associated to AKAP79c97.

      The authors show convincingly that the pRII subunits are better substrates when the AKAP scaffold is present. However, they need to address the relevance of having the enzyme (CN) and the substrate (pRIIb holoenzyme) scaffolded to the same complex so that diffusion is no longer a rate-limiting factor in the catalytic event. Are MM kinetics relevant for this process? This is a single molecule event that does not necessarily require that the product be released. Instead the product is returned to the active site of the cleft of the C-subunit in the holoenzyme:CN complex where in the cell it is rapidly re-phosphorylated. Also the authors could show what happens when you have a 1:1 concentration of CN and pRIIb. Following this single transfer event does not require dissociation of the holoenzyme and is likely to be more physiologically relevant.

      Do the authors know if calcium vs. Mg influences this process? Calcium stabilizes the product whereas Mg stabilizes the substrate in the case of the kinase. If calcium levels are high following release of the phosphate, would this tend to keep the phosphorylated holoenzyme in a more inhibited state until calcium went down and cAMP went up?

      This process will take place at membranes which may play a significant role in determining whether the A-subunit is released into solution or not.

      Another important question to consider is whether it is even necessary to dissociate the holoenzyme complex at all. Is it sufficient, for example, to simply unleash the linker region of the RII subunit and thereby open up the active site cleft of the C-subunit? Since the tail of the channel is also tethered nearby, it is perfectly reasonable to catalyze this event without dissociating the complex especially given earlier data by Wang, et al showing that the holoenzyme is very stable even when the key arginines in the inhibitor site are mutated. The same motif has access either to the active site of the C-subunit or to the active site of calcineurin in a cAMP/Ca++ dependent cycle. This leaves the phosphorylated tail of the channel free to be dephosphorylated by other phosphatases that are also tethered to AKAP79 and leaves CN committed to recycling of the RII holoenzyme. In principle this does not require dissociation of the RII holoenzyme if CN is tethered nearby. This is a very fundamental question.

      One point that is not addressed in the study and is important for the interpretation of the results is whether interaction of CaN with AKAP79c97 increases CaN activity per se, such that the more effective dephosphorylation of RII is not due to the physical proximity of CaN to RII on the AKAP but to a more active CaN. This could be addressed by testing the dephosphorylation rate of a phospho-substrate other than 32P-RII, in the absence and in the presence of AKAP79c97 or by repeating the experiments shown in Fig 1 in the presence of the AKAP79c97 variant where the PKA (391-400) anchoring site has been removed.

      AKAR4 is a reversible reporter of PKA activity, so it is surprising that the authors find that its phosphorylation is not affected by CaN. One possibility is that AKAR4 is not a good substrate for CaN. However, multiple studies have shown that AKAP4 can effectively be dephosphorylated. The ability of CaN to dephosphorylate AKAR4 should be investigated further to demonstrate more robustly that, in the in vitro experimental conditions used, the observed reduced phosphorylation of AKAR4 is due to less active PKA rather than more active CaN. This could be done, for example, by repeating the experiments summarized in Figure 3-figure supplement 1C & D using a different phosphatase, to ascertain that the experimental conditions allow for detection of AKAR4 dephosphorylation.

      One limitation of the in vitro work is that only AKAR4 is used to measure the level of PKA dependent phosphorylation. AKAR4 is not a natural substrate for either PKA or CaN and the accessibility of the phosphorylation site to these enzymes may be different than for physiological targets. In addition, AKAR4 is not anchored to AKAP79 and may not be the ideal reporter to investigate the effects of CaN-dependent regulation of PKA targets associated to AKAP79.

      Stoichiometry of free RII subunits. The authors have shown convincingly that the RII subunits in particular are present in excess of the C-subunits, and this has led to some new concepts for PKA signaling. There are two questions that need to be addressed here. Perhaps in the discussion is adequate but they do need to be addressed. First is whether there are separate pools of free RII subunits and holoenzymes within single cells. This is essential for the model of PKA signaling taking place in the presence of a 10-fold excess free RII-subunits. Are the dissociated R-subunits in the same subcellular location? Second is whether the free RII subunits are bound to cAMP. The cAMP-free subunits are noticeably less stable and degraded more rapidly that the holoenzymes so are these free R-subunits bound to cAMP? If not, are they bound to something else that keeps them stable? RII subunits do not form membrane-less puncta as was recently reported in Cell by Zhang but is there some other mechanism that allows for the sequestration of large amounts of free RII subunits?

      Do you need to saturate all four sites to have an active C-subunit that can phosphorylate the tail of a channel? This relates to the question above. Perhaps this would not be measured by the AKAR4 reporter but could it be sensed if AKAR4 were fused to the tail of AKAP79 so that it would be tethered close by similar to the tail of the channel.

      Stoichiometry of two calcineurins vs. one RII holoenzyme or one? The authors need to address this stoichiometry question more rigorously. It is quite fundamental for their assays. Does the computational model provide any ability to ascertain stoichiometry of the productive complex?

      While it is true that neither S/A or S/E will be substrates for CN, they will in fact have a different effect on the RII holoenzyme. Ser/Ala and Ser/Glu mutants are, in principle, quite different in terms of their accessibility to the active site of the C-subunit vs. the active site of CN. The Ser/Ala mutant, for example, should be locked into the active site of the C-subunit, and this would be presumably strengthened by ATP since this is a pseudosubstrate. Does the affinity for C-subunit change in an ATP-dependent manner? The Ser/Ala mutant should be a good inhibitor that cannot be regulated by phosphorylation. It could be activated by high concentrations of cAMP but not by the cAMP signaling that is being described here. The Ser/Glu mutation would favor docking into the active site of CN but would be trapped in this state as it also could not be dephosphorylated. Is this consistent with the models proposed by the authors?

      The in vivo work to assess the physiological relevance on this proposed new modality of PKA regulation is very preliminary. By overexpressing S97A and S97E mutants of RII in hippocampal neurons the authors confirm that modulation of PKA sensitivity to cAMP via RII phosphorylation affects spine density. However, no experimental data directly assess the role of CaN-dependent dephosphorylation of RII at the AKAP79 complex and there is no evidence that this mechanism regulates AMPA phosphorylation or phosphorylation of other physiologically relevant targets. Thus, the caveats that are associated with the system and in particular the physiological relevance of the analyses needs to be addressed. Conclusions based on the preliminary 'in cell' data on physiological relevance should be appropriately tempered.

    2. Evaluation Summary:

      This manuscript will be of interest to neuroscientists as well as a broad audience of cell biologists, as it provides new insight into the myriad of cellular functions regulated by the well-studied cAMP-dependent protein kinase, PKA. Rigorous biochemical data supports a model for PKA inactivation wherein dephosphorylation of the PKA regulatory subunit within a multiprotein complex leads to rapid capture of the PKA catalytic subunit limiting signaling duration. Overall, the biochemical data and modeling support the conclusions although a few details can be addressed further and the in vivo data remains preliminary. The work nevertheless presents exciting findings that provide a tantalizing mechanism to selectively modulate PKA activity at precise subcellular locations.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

    1. Reviewer #3 (Public Review):

      The manuscript by Xiang and Bartel explores the molecular coupling of poly(A) tail length and translational efficiency (TE) in frog oocytes and various mammalian cell lines. From their experiments they draw several broad conclusions. Firstly, it is that limiting amounts of PABPC in frog oocytes is the basis for coupling between poly(A) tail length and TE. Secondly, in mammalian somatic cell lines PABPC contributes little to TE and transcript with TUT4 and TUT7-mediated uridylation promoting degradation of transcript with short poly(A) tails. Overall, the experimental design is excellent. The conclusions drawn from the frog oocytes are strongly supported by the data provided whereas the cell line studies are more open to interpretation due to the drastic consequences of PABPC depletion.

    2. Reviewer #2 (Public Review):

      Poly(A) tails are generally thought to stabilize mRNAs and promote translation. However, the mechanisms of this process have been difficult to experimentally assess due to the essential nature of poly(A) binding proteins, homeostatic mechanisms in gene expression, and the pleiotropic effects of altering the transcription, translation or mRNA decay machinery. The length of poly(A) tails are directly proportionally to translational efficiency in early development - the longer the tail, the more efficiently the mRNA is translated - possibly through a closed loop model. However, experiments in other cells, as well as in vitro reconstitution and imaging of single mRNAs in cells, do not support either coupling of poly(A) tail length and TE, or the closed loop model. Thus, it appears that there is a switch from embryonic to post-embryonic regulation of TE. The mechanistic basis for this switch was unclear.

      Here, Xiang and Bartel use reporter assays and transcriptome-wide sequencing technologies, alongside other complementary experiments, to determine the specific circumstances that permit coupling of poly(A) tail length and translational efficiency. The authors are able to synthesize many observations - both from their own lab and from others - to come up with a unified hypothesis. Many of the individual findings have been previously reported or hypothesized but no other work has brought all of these together in one study.

      Overall, the data strongly support the conclusions. Importantly, several different cell types and systems are used. In addition, a number of different methods support the work - including reporter assays, global analyses, experiments in extracts, oocytes and cell lines, etc.

      A description of events that lead to the switch from embryonic to post-embryonic regulation is still lacking. However, the insight provided here is substantial. It will have influence on many areas of study of gene expression - for example, it helps to explain discrepancies in miRNA function.

    3. Reviewer #1 (Public Review):

      This is an excellent manuscript in which Bartel and colleagues use an abundance of approaches to provide compelling evidence relevant to the coupling between poly(A)-tail length and translational efficiency. Without reiterating the results, the data are convincing and the paper is clearly written. Any concerns are too trivial to articulate.

    4. Evaluation Summary:

      This manuscript addresses a long-standing question, namely how does the poly(A) tail influence translational efficiency? It will therefore be of broad interest to readers from many areas of molecular biology including those interested in translation, mRNA stability, development and gene expression in general. The authors convincingly set out three criteria that must be met for coupling of poly(A) tail length with translation.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Author Response:

      Reviewer #1 (Public Review):

      Strengths:

      1) The model structure is appropriate for the scientific question.

      2) The paper addresses a critical feature of SARS-CoV-2 epidemiology which is its much higher prevalence in Hispanic or Latino and Black populations. In this sense, the paper has the potential to serve as a tool to enhance social justice.

      3) Generally speaking, the analysis supports the conclusions.

      Other considerations:

      1) The clean distinction between susceptibility and exposure models described in the paper is conceptually useful but is unlikely to capture reality. Rather, susceptibility to infection is likely to vary more by age whereas exposure is more likely to vary by ethnic group / race. While age cohort are not explicitly distinguished in the model, the authors would do well to at least vary susceptibility across ethnic groups according to different age cohort structure within these groups. This would allow a more precise estimate of the true effect of variability in exposures. Alternatively, this could be mentioned as a limitation of the the current model.

      We agree that this would be an important extension for future work and have indicated this in the Discussion, along with the types of data necessary to fit such models:

      “Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation \cite{Mulberry2021-tc}. These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates.”

      2) I appreciated that the authors maintained an agnostic stance on the actual value of HIT (across the population & within ethnic groups) based on the results of their model. If there was available data, then it might be possible to arrive at a slightly more precise estimate by fitting the model to serial incidence data (particularly sorted by ethnic group) over time in NYC & Long Island. First, this would give some sense of R_effective. Second, if successive waves were modeled, then the shift in relative incidence & CI among these groups that is predicted in Figure 3 & Sup fig 8 may be observed in the actual data (this fits anecdotally with what I have seen in several states). Third, it may (or may not) be possible to estimate values of critical model parameters such as epsilon. It would be helpful to mention this as possible future work with the model.

      Caveats about the impossibility of truly measuring HIT would still apply (due to new variants, shifting use & effective of NPIs, etc….). However, as is, the estimates of possible values for HIT are so wide as to make the underlying data used to train the model almost irrelevant. This makes the potential to leverage the model for policy decisions more limited.

      We have highlighted this important limitation in the Discussion:

      “Finally, we have estimated model parameters using a single cross-sectional serosurvey. To improve estimates and the ability to distinguish between model structures, future studies should use longitudinal serosurveys or case data stratified by race and ethnicity and corrected for underreporting; the challenge will be ensuring that such data are systematically collected and made publicly available, which has been a persistent barrier to research efforts \cite{Krieger2020-ss}. Addressing these data barriers will also be key for translating these and similar models into actionable policy proposals on vaccine distribution and non-pharmaceutical interventions.”

      3) I think the range of R0 in the figures should be extended to go as as low as 1. Much of the pandemic in the US has been defined by local Re that varies between 0.8 & 1.2 (likely based on shifts in the degree of social distancing). I therefore think lower HIT thresholds should be considered and it would be nice to know how the extent of assortative mixing effects estimates at these lower R_e values.

      We agree this would be of interest and have extended the range of R0 values. Figure 1 has been updated accordingly (see below); we also updated the text with new findings: “After fitting the models across a range of $\epsilon$ values, we observed that as $\epsilon$ increases, HITs and epidemic final sizes shifted higher back towards the homogeneous case (Figure \ref{fig:model2}, Figure 1-figure supplement 4); this effect was less pronounced for $R_0$ values close to 1.”

      Figure 1: Incorporating assortativity in variable exposure models results in increased HITs across a range of $R_0$ values. Variable exposure models were fitted to NYC and Long Island serosurvey data.

      4) line 274: I feel like this point needs to be considered in much more detail, either with a thoughtful discussion or with even with some simple additions to the model. How should these results make policy makers consider race and ethnicity when thinking about the key issues in the field right now such as vaccine allocation, masking, and new variants. I think to achieve the maximal impact, the authors should be very specific about how model results could impact policy making, and how we might lower the tragic discrepancies associated with COVID. If the model / data is insufficient for this purpose at this stage, then what type of data could be gathered that would allow more precise and targeted policy interventions?

      We have conducted additional analyses exploring the important suggestion by the reviewers that social distancing could affect these conclusions. The text and figures have been updated accordingly:

      “Finally, we assessed how robust these findings were to the impact of social distancing and other non- pharmaceutical interventions (NPIs). We modeled these mitigation measures by scaling the transmission

      rate by a factor $\alpha$ beginning when 5\% cumulative incidence in the population was reached. Setting the duration of distancing to be 50 days and allowing $\alpha$ to be either 0.3 or 0.6 (i.e. a 70\% or 40\% reduction in transmission rates, respectively), we assessed how the $R_0$ versus HIT and final epidemic size relationships changed. We found that the $R_0$ versus HIT relationship was similar to in the unmitigated epidemic (Figure 1-figure supplement 5). In contrast, final epidemic sizes depended on the intensity of mitigation measures, though qualitative trends across models (e.g. increased assortativity leads to greater final sizes) remained true (Figure 1-figure supplement 6). To explore this further, we systematically varied $\alpha$ and the duration of NPIs while holding $R_0$ constant at 3. We found again that the HIT was consistent, whereas final epidemic sizes were substantially affected by the choice of mitigation parameters (Figure 1-figure supplement 7); the distribution of cumulative incidence at the point of HIT was also comparable with and without mitigation measures (Figure 2-figure supplement 8). The most stringent NPI intensities did not necessarily lead to the smallest epidemic final sizes, an idea which has been explored in studies analyzing optimal control measures \cite{Neuwirth2020- nb,Handel2007-ee}. Longitudinal changes in incidence rate ratios also were affected by NPIs, but qualitative trends in the ordering of racial and ethnic groups over time remained consistent (Figure 3- figure supplement 3).

      Figure 1-figure supplement 6: Final epidemic sizes versus $R_0$ in variable exposure models with mitigation measures for $\alpha = 0.3$ (top) and $\alpha = 0.6$ (bottom). NPIs were initiated when cumulative incidence reached 5\% in all models and continued for 50 days. Models were fitted to NYC and Long Island serosurvey data.

      Figure 1-figure supplement 7: Sensitivity analysis on the impact of intensity and duration of NPIs on final epidemic sizes. HIT values for the same mitigation parameters were 46.4 $\pm$ 0.5\% (range). The smallest final size, corresponding to $\alpha = 0.6$ and duration = 100, was 51\%. Census-informed assortativity models were fit to Long Island seroprevalence data. NPIs were initiated when cumulative incidence reached 5\% in all models.

      See points 1 and 2 above for examples of additional data required.

      Minor issues:

      -This is subjective but I found the words "active" and "high activity" to describe increases in contacts per day to be confusing. I would just say more contacts per day. It might help to change "contacts" to "exposure contacts" to emphasize that not all contacts are high risk.

      To clarify this, we have replaced instances of “activity level” (and similar) with “total contact rate”, indicating the total number of contacts per unit time per individual; e.g. “The estimated total contact rate ratios indicate higher contacts for minority groups such as Hispanics or Latinos and non-Hispanic Black people, which is in line with studies using cell phone mobility data \cite{Chang2020-in}; however, the magnitudes of the ratios are substantially higher than we expected given the findings from those studies.”

      We have also clarified our definition of contacts: “We define contacts to be interactions between individuals that allow for transmission of SARS-CoV-2 with some non-zero probability.”

      -The abstract has too much jargon for a generalist journal. I would avoid words like "proportionate mixing" & "assortative" which are very unique to modeling of infectious diseases unless they are first defined in very basic language.

      We have revised the abstract to convey these same concepts in a more accessible manner: “A simple model where interactions occur proportionally to contact rates reduced the HIT, but more realistic models of preferential mixing within groups increased the threshold toward the value observed in homogeneous populations.”

      -I would cite some of the STD models which have used similar matrices to capture assortative mixing.

      We have added a reference in the assortative mixing section to a review of heterogeneous STD models: “Finally, under the \textit{assortative mixing} assumption, we extended this model by partitioning a fraction $\epsilon$ of contacts to be exclusively within-group and distributed the rest of the contacts according to proportionate mixing (with $\delta_{i,j}$ being an indicator variable that is 1 when $i=j$ and 0 otherwise) \cite{Hethcote1996-bf}:”

      -Lines 164-5: very good point but I would add that members of ethnic / racial groups are more likely to be essential workers and also to live in multigenerational houses

      We have added these helpful examples into the text: “Variable susceptibility to infection across racial and ethnic groups has been less well characterized, and observed disparities in infection rates can already be largely explained by differences in mobility and exposure \cite{Chang2020-in,Zelner2020- mb,Kissler2020-nh}, likely attributable to social factors such as structural racism that have put racial and ethnic minorities in disadvantaged positions (e.g., employment as frontline workers and residence in overcrowded, multigenerational homes) \cite{Henry_Akintobi2020-ld,Thakur2020-tw,Tai2020- ok,Khazanchi2020-xu}.”

      -Line 193: "Higher than expected" -> expected by who?

      We have clarified this phrase: “The estimated total contact rate ratios indicate higher exposure contacts for minority groups such as Hispanics or Latinos and non-Hispanic Black people, which is in line with studies using cell phone mobility data \cite{Chang2020-in}; however, the magnitudes of the ratios are substantially higher than we expected given the findings from those studies.”

      -A limitation that needs further mention is that fact that race & ethnic group, while important, could be sub classified into strata that inform risk even more (such as SES, job type etc….)

      We agree and have added this to the Discussion: “Fourth, due to data availability, we have only considered variability in exposure due to one demographic characteristic; models should ideally strive to also account for the effects of age on susceptibility and exposure within strata of race and ethnicity and other relevant demographics, such as socioeconomic status and occupation \cite{Mulberry2021-tc}. These models could be fit using representative serological studies with detailed cross-tabulated seropositivity estimates.”

      Reviewer #2 (Public Review):

      Overall I think this is a solid and interesting piece that is an important contribution to the literature on COVID-19 disparities, even if it does have some limitations. To this point, most models of SARS-CoV-2 have not included the impact of residential and occupational segregation on differential group-specific covid outcomes. So, the authors are to commended on their rigorous and useful contribution on this valuable topic. I have a few specific questions and concerns, outlined below:

      We thank the reviewer for the supportive comments.

      1) Does the reliance on serosurvey data collected in public places imply a potential issue with left-censoring, i.e. by not capturing individuals who had died? Can the authors address how survival bias might impact their results? I imagine this could bring the seroprevalence among older people down in a way that could bias their transmission rate estimates.

      We have included this important point in the limitations section on potential serosurvey biases: “First, biases in the serosurvey sampling process can substantially affect downstream results; any conclusions drawn depend heavily on the degree to which serosurvey design and post-survey adjustments yield representative samples \cite{Clapham2020-rt}. For instance, because the serosurvey we relied on primarily sampled people at grocery stores, there is both survival bias (cumulative incidence estimates do not account for people who have died) and ascertainment bias (undersampling of at-risk populations that are more likely to self-isolate, such as the elderly) \cite{Rosenberg2020-qw,Accorsi2021-hx}. These biases could affect model estimates if, for instance, the capacity to self-isolate varies by race or ethnicity -- as suggested by associations of neighborhood-level mobility versus demographics \cite{Kishore2020- sy,Kissler2020-nh} -- leading to an overestimate of cumulative incidence and contact rates in whites.”

      2) It might be helpful to think in terms of disparities in HITs as well as disparities in contact rates, since the HIT of whites is necessarily dependent on that of Blacks. I'm not really disagreeing with the thrust of what their analysis suggests or even the factual interpretation of it. But I do think it is important to phrase some of the conclusions of the model in ways that are more directly relevant to health equity, i.e. how much infection/vaccination coverage does each group need for members of that group to benefit from indirect protection?

      We agree with this important point and indeed this was the goal, in part, of the analyses in Figure 2. We have added additional text to the Discussion highlighting this: “Projecting the epidemic forward indicated that the overall HIT was reached after cumulative incidence had increased disproportionately in minority groups, highlighting the fundamentally inequitable outcome of achieving herd immunity through infection. All of these factors underscore the fact that incorporating heterogeneity in models in a mechanism-free manner can conceal the disparities that underlie changes in epidemic final sizes and HITs. In particular, overall lower HIT and final sizes occur because certain groups suffer not only more infection than average, but more infection than under a homogeneous mixing model; incorporating heterogeneity lowers the HIT but increases it for the highest-risk groups (Figure \ref{fig:hitcomp}).”

      For vaccination, see our response to Reviewer #1 point 4.

      3) The authors rely on a modified interaction index parameterized directly from their data. It would be helpful if they could explain why they did not rely on any sources of mobility data. Are these just not broken down along the type of race/ethnicity categories that would be necessary to complete this analysis? Integrating some sort of external information on mobility would definitely strengthen the analysis.

      This is a great suggestion, but this type of data has generally not been available due to privacy concerns from disaggregating mobility data by race and ethnicity (Kishore et al., 2020). Instead, we modeled NPIs as mentioned in Reviewer #1 point 4, with the caveat that reduction in mobility was assumed to be identical across groups. We added this into the text explicitly as a limitation: “Third, we have assumed the impact of non-pharmaceutical interventions such as stay-at-home policies, closures, and the like to equally affect racial and ethnic groups. Empirical evidence suggests that during periods of lockdown, certain neighborhoods that are disproportionately wealthy and white tend to show greater declines in mobility than others \cite{Kishore2020-sy,Kissler2020-nh}. These simplifying assumptions were made to aid in illustrating the key findings of this model, but for more detailed predictive models, the extent to which activity level differences change could be evaluated using longitudinal contact survey data \cite{Feehan2020-ta}, since granular mobility data are typically not stratified by race and ethnicity due to privacy concerns \cite{Kishore2020-mg}.”

      Reviewer #3 (Public Review):

      Ma et al investigate the effect of racial and ethnic differences in SARS-CoV-2 infection risk on the herd immunity threshold of each group. Using New York City and Long Island as model settings, they construct a race/ethnicity-structured SEIR model. Differential risk between racial and ethnic groups was parameterized by fitting each model to local seroprevalence data stratified demographically. The authors find that when herd immunity is reached, cumulative incidence varies by more than two fold between ethnic groups, at approximately 75% of Hispanics or Latinos and only 30% of non-Hispanic Whites.

      This result was robust to changing assumptions about the source of racial and ethnic disparities. The authors considered differences in disease susceptibility, exposure levels, as well as a census-driven model of assortative mixing. These results show the fundamentally inequitable outcome of achieving herd immunity in an unmitigated epidemic.

      The authors have only considered an unmitigated epidemic, without any social distancing, quarantine, masking, or vaccination. If herd immunity is achieved via one of these methods, particularly vaccination, the disparities may be mitigated somewhat but still exist. This will be an important question for epidemiologists and public health officials to consider throughout the vaccine rollout.

      We thank the reviewer for the detailed and helpful summary and suggestions.

    2. Reviewer #3 (Public Review):

      Ma et al investigate the effect of racial and ethnic differences in SARS-CoV-2 infection risk on the herd immunity threshold of each group. Using New York City and Long Island as model settings, they construct a race/ethnicity-structured SEIR model. Differential risk between racial and ethnic groups was parameterized by fitting each model to local seroprevalence data stratified demographically. The authors find that when herd immunity is reached, cumulative incidence varies by more than two fold between ethnic groups, at approximately 75% of Hispanics or Latinos and only 30% of non-Hispanic Whites.

      This result was robust to changing assumptions about the source of racial and ethnic disparities. The authors considered differences in disease susceptibility, exposure levels, as well as a census-driven model of assortative mixing. These results show the fundamentally inequitable outcome of achieving herd immunity in an unmitigated epidemic.

      The authors have only considered an unmitigated epidemic, without any social distancing, quarantine, masking, or vaccination. If herd immunity is achieved via one of these methods, particularly vaccination, the disparities may be mitigated somewhat but still exist. This will be an important question for epidemiologists and public health officials to consider throughout the vaccine rollout.

    3. Reviewer #2 (Public Review):

      Overall I think this is a solid and interesting piece that is an important contribution to the literature on COVID-19 disparities, even if it does have some limitations. To this point, most models of SARS-CoV-2 have not included the impact of residential and occupational segregation on differential group-specific covid outcomes. So, the authors are to commended on their rigorous and useful contribution on this valuable topic. I have a few specific questions and concerns, outlined below:

      1) Does the reliance on serosurvey data collected in public places imply a potential issue with left-censoring, i.e. by not capturing individuals who had died? Can the authors address how survival bias might impact their results? I imagine this could bring the seroprevalence among older people down in a way that could bias their transmission rate estimates.

      2) It might be helpful to think in terms of disparities in HITs as well as disparities in contact rates, since the HIT of whites is necessarily dependent on that of Blacks. I'm not really disagreeing with the thrust of what their analysis suggests or even the factual interpretation of it. But I do think it is important to phrase some of the conclusions of the model in ways that are more directly relevant to health equity, i.e. how much infection/vaccination coverage does each group need for members of that group to benefit from indirect protection?

      3) The authors rely on a modified interaction index parameterized directly from their data. It would be helpful if they could explain why they did not rely on any sources of mobility data. Are these just not broken down along the type of race/ethnicity categories that would be necessary to complete this analysis? Integrating some sort of external information on mobility would definitely strengthen the analysis.

    4. Reviewer #1 (Public Review):

      Strengths:

      1) The model structure is appropriate for the scientific question.

      2) The paper addresses a critical feature of SARS-CoV-2 epidemiology which is its much higher prevalence in Hispanic or Latino and Black populations. In this sense, the paper has the potential to serve as a tool to enhance social justice.

      3) Generally speaking, the analysis supports the conclusions.

      Other considerations:

      1) The clean distinction between susceptibility and exposure models described in the paper is conceptually useful but is unlikely to capture reality. Rather, susceptibility to infection is likely to vary more by age whereas exposure is more likely to vary by ethnic group / race. While age cohort are not explicitly distinguished in the model, the authors would do well to at least vary susceptibility across ethnic groups according to different age cohort structure within these groups. This would allow a more precise estimate of the true effect of variability in exposures. Alternatively, this could be mentioned as a limitation of the the current model.

      2) I appreciated that the authors maintained an agnostic stance on the actual value of HIT (across the population & within ethnic groups) based on the results of their model. If there was available data, then it might be possible to arrive at a slightly more precise estimate by fitting the model to serial incidence data (particularly sorted by ethnic group) over time in NYC & Long Island. First, this would give some sense of R_effective. Second, if successive waves were modeled, then the shift in relative incidence & CI among these groups that is predicted in Figure 3 & Sup fig 8 may be observed in the actual data (this fits anecdotally with what I have seen in several states). Third, it may (or may not) be possible to estimate values of critical model parameters such as epsilon. It would be helpful to mention this as possible future work with the model.

      Caveats about the impossibility of truly measuring HIT would still apply (due to new variants, shifting use & effective of NPIs, etc....). However, as is, the estimates of possible values for HIT are so wide as to make the underlying data used to train the model almost irrelevant. This makes the potential to leverage the model for policy decisions more limited.

      3) I think the range of R0 in the figures should be extended to go as as low as 1. Much of the pandemic in the US has been defined by local Re that varies between 0.8 & 1.2 (likely based on shifts in the degree of social distancing). I therefore think lower HIT thresholds should be considered and it would be nice to know how the extent of assortative mixing effects estimates at these lower R_e values.

      4) line 274: I feel like this point needs to be considered in much more detail, either with a thoughtful discussion or with even with some simple additions to the model. How should these results make policy makers consider race and ethnicity when thinking about the key issues in the field right now such as vaccine allocation, masking, and new variants. I think to achieve the maximal impact, the authors should be very specific about how model results could impact policy making, and how we might lower the tragic discrepancies associated with COVID. If the model / data is insufficient for this purpose at this stage, then what type of data could be gathered that would allow more precise and targeted policy interventions?

      Minor issues:

      -This is subjective but I found the words "active" and "high activity" to describe increases in contacts per day to be confusing. I would just say more contacts per day. It might help to change "contacts" to "exposure contacts" to emphasize that not all contacts are high risk.

      -The abstract has too much jargon for a generalist journal. I would avoid words like "proportionate mixing" & "assortative" which are very unique to modeling of infectious diseases unless they are first defined in very basic language.

      -I would cite some of the STD models which have used similar matrices to capture assortative mixing.

      -Lines 164-5: very good point but I would add that members of ethnic / racial groups are more likely to be essential workers and also to live in multigenerational houses

      -Line 193: "Higher than expected" -> expected by who?

      -A limitation that needs further mention is that fact that race & ethnic group, while important, could be sub classified into strata that inform risk even more (such as SES, job type etc....)

    5. Evaluation Summary:

      This excellent paper by Ma and colleagues assesses the role of assortative mixing in regards to racial and ethnic disparities to estimate herd immunity thresholds (HIT) for SARS-CoV-2. The paper is conceptual in nature and builds on similar models which have been particularly useful to understand the dynamics of sexually transmitted diseases. The model is explained well and the paper is clearly written. The conclusions are justified by the analysis. One limitation is that the model is trained against a single cross-sectional seroprevalence estimate (one in NYC & one in Long Island) which allows for multiple models (ranging from homogeneous mixing to proportionate mixing) to recapitulate the data and in turn does not allow general estimates of HIT for these regions. It is also unclear if a more realistic epidemic simulation that included repeated waves of infection &/or vaccine roll out would change the conclusions regarding HIT according to race and ethnicity.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

    1. Author Response:

      Reviewer #1 (Public Review):

      The gist of this work is that the simple concept of a solubility product determines a threshold for phase separation, thereby enabling buffering even in systems where phase separation is driven by heterotypic interactions. The solubility product or SP is determined by the number of complementary interaction sites and the coordination number i.e., the number of bonds one can make per site.

      The work appears to be motivated by two questions: Are concentrations buffered in systems where heterotypic interactions drive phase separation thereby negating the presence of a rigorously definable saturation concentration? This question was motivated by work from Klosin et al., showing how phase separation can enable buffering of noise in transcription. They relied on the concept of a saturation concentration. In a paper that followed a few months after, Riback et al., showed that the concept of a saturation concentration ceases to exist, as defined for systems where phase separation is driven purely by homotypic interactions. This was taken to imply that the formation of multicomponent condensates via a blend of homotypic and heterotypic interactions causes a loss of buffering capacity afforded by phase separation. The second question motivating the current work is the apparent absence of a theoretical framework for "varying threshold concentrations" in systems governed by heterotypic interactions.

      Using two flavors of simulations, the authors propose that the SP sets an upper limit on the convolution of concentrations that determine phase separation. They show this via simulations where they follow the formation of clusters formed by linear multivalent macromolecules and monitor the emergence of a bimodal distribution of clusters. In 1:1 mixtures of multivalent macromolecules they find that SP sets a threshold beyond which a bimodal distribution of clusters emerges. The authors further find that SP sets an upper limit even in systems that deviate from the 1:1 stoichiometry.

      The authors proceed to show that the SP is influenced by the valence of multivalent macromolecules. They also demonstrate that short rigid linkers can cause an arrest of phase separation through a so-called "dimer trap" reminiscent of the "magic number" postulate put forth by Wingreen and colleagues.

      Is the work significant, novel, and timely? Effectively the authors propose that the driving forces for phase separation can be distilled down to the concept of a solubility product. Given prior knowledge of the valence, coordination number, and affinities can one predict concentration thresholds for phase separation? The authors suggest that this can be gleaned from either network based simulations, which are very inexpensive, or through more elaborate simulations. They further propose that it is the solubility product that sets the threshold.

      It is worth noting that the authors are quantifying what is known in the physical literature as a percolation threshold. The seminal work of Flory and Stockmayer dating back to the 1940s showed how one can calculate a percolation threshold by taking in prior knowledge of valence, coordination numbers, and affinities whilst ignoring cooperativity. These ideas have been refined and advanced in several theoretical contributions by various labs. While none of the papers in the physical literature use the concept of a solubility product, they rely on the concept of a percolation threshold because the transition to large, system-spanning clusters is a continuous one and it is debatable if this is a bona fide phase transition. Rather it is a topological transition.

      Yes, we agree that the novelty and importance of our work rests in the application of the simple and accessible concept of solubility product, which has not been previously considered in relation to LLPS. The relationship of our analysis to the physics underlying phase diagrams is discussed in a new paragraph within the Discussion.

      As for novelty, unfortunately the authors disregard prior work that showed how linker length impacts local vs. global cooperativity in phase transitions that combine phase separation and percolation. Ref. 23 is the work in question and it is mentioned in passing, even though the contributions here are entirely a redux.

      We have eliminated the results on how molecular structural features control LLPS to fully focus our paper on the Ksp concept, as suggested by the Editor. However, in our original manuscript, we described results not just related to linker length, but also steric effects.

      The concept of a solubility product, introduced here to model / understand phase behavior of multivalent macromolecules, is an interesting and potentially appealing simple description. It might make the understanding of phase transitions more accessible, but it has problems: (a) it does not define phase separation; rather it defines percolation transitions; (b) without prior knowledge of the relevant quantities, the solubility product cannot be readily inferred, even from simulations, although one can scan parameter space to arrive at predictions regarding the apparent valence and coordination numbers. (c) the solubility product does not tell us much about properties of condensates, interfaces, or the driving forces for phase transitions that are influenced by the collective effects of interaction domains / motifs and spacers.

      Recent papers have drawn attention to the potential importance of buffering as a biological function of biomolecular condensation, and also the failure of buffering in heterotypic LLPS. We felt that the Ksp would help “rescue” the idea of buffering, as Reviewer 1 has so aptly put it below. We have refocused the paper to emphasize this. Of course, we describe this for a series of ideal systems with known valency and affinities. However, theoretical systems are always “ideal” and the deviations from ideality are what make experiments so vital. We have added a paragraph in the Discussion that relates our work to the physics of phase transitions, providing 2 citations, [13, 21], to support taking the percolation threshold as a proxy for the phase boundary. We also point out at the end of the Discussion, how the Ksp concept might be validated experimentally and might be useful in categorizing the effective valency of molecules comprising a cellular condensate.

      Finally, as for the absence of a theoretical explanation for the apparent loss of buffering in systems with heterotypic interactions, the authors would do well to see the work of Choi et al., published in PLoS Comput. Biol. in 2019. Figure 12 in that work clearly establishes that the concentrations of A and B species in the coexisting dilute phase are set by the slopes of tie lines - the lines of constant chemical potential. These slopes are set by the relative strengths of homotypic vs. heterotypic interactions, and to zeroth order, that is the physical explanation.

      We apologize for missing this very relevant work and have now cited it several times in the paper. However, as Reviewer 1, states, Figure 12 treats the potential competition between homotypic and heterotypic interactions within a system. We did not address this in our paper. Rather, for our purposes, homotypic interactions are a special case that still fits within the solubility product framework. We do now address the relationship of tie-lines in phase diagrams to the Ksp in the Discussion paragraph mentioned above.

      Reviewer #2 (Public Review):

      This paper asks whether systems composed of more than one component (heterotypic) that undergo liquid-liquid phase separation will follow the same rules as ionic solutions. The question is motivated by (i) the behavior of homotypic solutions, where after phase separation, monomer concentrations remain fixed despite addition of new components, which is not true for heterotypic systems and (ii) the known behavior of multivalent ionic salts. This idea has not previously been tested. They show quite clearly through simulations that the solubility product, Ksp, can be used as a quantitative metric to delineate phase transition behavior in heterotypic systems. This is a valuable contribution to the understanding of phase separation in these systems, and could be impactful in analyzing experimental observables, at least in vitro, to determine the valency of interacting systems. It provides a relatively straightforward conceptual basis for observed partitioning of components into dilute and dense phases. The result seems robust and likely to be reproducible experimentally and through alternative simulation studies, particularly given its established history in quantifying the related phenomena in ionic salts.

      A weakness is the rather qualitative comparison to experiment, which is justified by the authors based on the unknown valency of the experimental system. There is also no quantitative comparison between simulation types (spatial vs non-spatial). However, the simulations do seem sufficiently detailed to test and validate the Ksp concept.

      Strengths:

      • The paper is very focused, and uses multiple simulation 'experiments' to test the role of the Ksp in delineating the phase transition, showing good agreement for multiple systems, with both matched and distinct stoichiometries between the components. They see typical behavior at the phase transition point, where they observe the largest variability or fluctuations in the formation of the dense phase. Thus the results strongly support the conclusion that the Ksp delineates phase transitions in these 2-3 component systems.

      • A comparison is made to a recent experimental result with three components, showing qualitative agreement with an observed lack of buffering, which was unexpected at the time due to the behavior observed for homotypic systems. Here this result is now rationalized via the Ksp, which does plateau despite the monomer concentrations changing.

      • Spatial simulations probe the role of structure and flexibility in impacting phase separation, finding general agreement with previously published experimental and modeling work. These observations about flexibility and matched valency are also relatively intuitive.

      Weaknesses

      • There is no quantitative comparison between the two simulation approaches (spatial and non-spatial), which should be straightforward. By using the same composition and KD in both types of simulations and directly comparing outcomes, it would help explain when and why the spatial simulations differ from the non-spatial ones-see subsequent comments below:

      • A related methodological point: On Line 97 it states that NFSim does not allow intramolecular bonds to form, but this is not true. On one hand, they can be written out explicitly. E.g. A(a1!1, a2).B(b1!1, b2)->A(a1!1, a2!2).B(b1!1, b2!2), would form a second bond between an AB complex that already had one bond. While quite tedious, these could be enumerated, allowing for the zippering effect they see spatially, although the rates would not be bimolecular. This would still leave out intra-complex bonds between proteins without a direct link. However, based on the NFsim website, by default it does in fact allow these types of intra-complex bonds to be formed (http://michaelsneddon.net/nfsim/pages/support/support.html) see "Reactant Connectivity Enforcement". So it is not clear to me which option was used in this paper. According to what is written in the methods, no intra-complex bonds are formed, but this is not the default in NFsim and is indeed allowable.

      The reviewer misinterpreted this admittedly unclear statement: “The binding rules only allow inter-molecular binding; internal bond formation within the molecular clusters is not permitted, as NFsim cannot account for proximity of binding sites within clusters.” We did not intend this to imply that NFSim does not support intramolecular binding; rather we meant that our choice was to only allow intermolecular bond formation. We made this choice because, being non-spatial, NFSIM cannot account for spatial proximity or steric effects. We have clarified this in the revised ms as follows: “We chose binding rules to only allow inter-molecular binding; we felt this was appropriate because NFsim cannot account for spatial proximity of binding sites or steric crowding within clusters.”

      • The spatial simulations do not show the bimodal distribution under the fixed concentrations (Fig S9). This is a significant difference from the non-spatial result. They attribute this to a 'dimer trap', but given they see the dense phase in the clamped monomer simulations, this cannot be the only explanation. What about kinetic effects, due to the differences in initial concentrations of monomers in the two simulation approaches? The rate constants are not listed anywhere. They only seem to see large clusters at fixed concentrations for the mismatched sizes (Fig S12B), where the Ksp behavior does not hold. Can they increase monomer flexibility more and start to see bimodal at fixed concentration, or change the rates and see a bimodal distribution?

      In general, there is a limited ability of a small number of molecules in the FTC simulations to form a clear bimodal distribution, whether spatial or non-spatial. This is directly demonstrated in Figure 1C, where the non-spatial simulations become increasingly bimodal as the number of molecules increases, keeping concentration constant. Because of the greater computational cost of SpringSaLaD calculations, we kept the FTC simulations in Figure 7 to 200 molecules. However, the histograms that are averaged over 50 runs obscure the clear separation that is apparent when examining molecule size distribution in individual trajectories for the FTC case. We now include these in the supporting figures as Figure 1- figure supplement 3 (NFsim) and Figure 7- figure supplement 2 (SpringSaLaD). Above Ksp, we see a consistent group of small oligomers (which is reinforced in the averaged histograms) and individual large clusters (which are smeared out in the average histograms). As Reviewer 2 noted, we were also able to convincingly demonstrate bimodality at and above Ksp with the CMC simulations, which are allowed to continue until they stochastically nucleate large clusters and take off.

      All the FTC simulations are run to steady state, so only the Kds should matter, not the rate constants, which were actually available in the input files in the Git repository; we have now included the SpringSaLaD rate constants in the manuscript as well.

      • Related-I am surprised that the sterically hindered monomers would not form large clusters at fixed concentration, as it looks like it is impossible for them to 'zipper' up their binding sites and become trapped in dimers. Is the distribution at fixed concentrations bimodal? The data is not shown.

      We have removed the additional spatial simulation Results for structures other than the one in Figure 7 as requested by the Editor. We hope to thoroughly explore the molecular-structural determinants of Ksp and LLPS in a subsequent paper.

      Reviewer #3 (Public Review):

      In this work, Chattaraj and colleagues utilize simulation models to study collective behaviors of molecules with multiple binding sites (multivalency). When the concentrations are low, the molecules do not bind to each other frequently, and they are called free. On the other hand, if the concentrations increase, they start to bind and eventually form a wide network of molecules connected by molecular binding. This transition can be considered as a model for liquid-liquid phase separation. Their major claim is that the solubility product, a simple product of the concentrations of the free molecules, can be used as a proxy to the phase separation threshold (known as the saturation concentration). They observed in various simulation conditions that as the total concentration of molecules increases, the solubility product first increases but eventually converges to a certain value, and the value is consistent over different simulation conditions. The value is the upper limit of the solubility product, after which the molecules start to form a molecular network.

      After establishing the model, they tested systems with different valences. Higher valency leads to reduction of the threshold (and phase separation occurs at lower concentrations). The theory was also valid for systems with non-equal valences (e.g. pentavalent A + trivalent B). They applied their models to a three-component system, and found that the results qualitatively explain the published experimental patterns. Lastly, using off-lattice coarse-grained simulations, they show that the linker flexibility and the spacing of binding sites are important determinants of the threshold, which confirms the findings from other computational and experimental works.

      The authors successfully defend their claim by using different types of simulations, and their methods to crosscheck the physical validity of their models may be useful for other simulation works. For example, the authors checked if increasing the number of molecules and reducing the system size give the same results for equal concentrations. Also, they employed two different methods (so-called FTC and CMC in the manuscript) to determine the threshold concentrations. However, the conclusions are not easily transferable to real biopolymer systems, since it is hard to determine the valences (and binding affinities) of biopolymers such as intrinsically disordered proteins.

      Our work was motivated by recent work highlighting the importance of buffering as a biological function of biomolecular condensation, but also the failure of buffering in heterotypic LLPS. We realized that Ksp offers a more general framework than buffering that encompasses complex multicomponent (heterotypic) systems. But our original manuscript was not sufficiently focused on this primary motivation and has been revised accordingly. Of course, we used simulations on ideal systems to establish this idea. We suggest at the end of the discussion that the Ksp concept may potentially be used to derive effective parameters for experimental systems.

    2. Reviewer #3 (Public Review):

      In this work, Chattaraj and colleagues utilize simulation models to study collective behaviors of molecules with multiple binding sites (multivalency). When the concentrations are low, the molecules do not bind to each other frequently, and they are called free. On the other hand, if the concentrations increase, they start to bind and eventually form a wide network of molecules connected by molecular binding. This transition can be considered as a model for liquid-liquid phase separation. Their major claim is that the solubility product, a simple product of the concentrations of the free molecules, can be used as a proxy to the phase separation threshold (known as the saturation concentration). They observed in various simulation conditions that as the total concentration of molecules increases, the solubility product first increases but eventually converges to a certain value, and the value is consistent over different simulation conditions. The value is the upper limit of the solubility product, after which the molecules start to form a molecular network.

      After establishing the model, they tested systems with different valences. Higher valency leads to reduction of the threshold (and phase separation occurs at lower concentrations). The theory was also valid for systems with non-equal valences (e.g. pentavalent A + trivalent B). They applied their models to a three-component system, and found that the results qualitatively explain the published experimental patterns. Lastly, using off-lattice coarse-grained simulations, they show that the linker flexibility and the spacing of binding sites are important determinants of the threshold, which confirms the findings from other computational and experimental works.

      The authors successfully defend their claim by using different types of simulations, and their methods to crosscheck the physical validity of their models may be useful for other simulation works. For example, the authors checked if increasing the number of molecules and reducing the system size give the same results for equal concentrations. Also, they employed two different methods (so-called FTC and CMC in the manuscript) to determine the threshold concentrations. However, the conclusions are not easily transferable to real biopolymer systems, since it is hard to determine the valences (and binding affinities) of biopolymers such as intrinsically disordered proteins.

    3. Reviewer #2 (Public Review):

      This paper asks whether systems composed of more than one component (heterotypic) that undergo liquid-liquid phase separation will follow the same rules as ionic solutions. The question is motivated by (i) the behavior of homotypic solutions, where after phase separation, monomer concentrations remain fixed despite addition of new components, which is not true for heterotypic systems and (ii) the known behavior of multivalent ionic salts. This idea has not previously been tested. They show quite clearly through simulations that the solubility product, Ksp, can be used as a quantitative metric to delineate phase transition behavior in heterotypic systems. This is a valuable contribution to the understanding of phase separation in these systems, and could be impactful in analyzing experimental observables, at least in vitro, to determine the valency of interacting systems. It provides a relatively straightforward conceptual basis for observed partitioning of components into dilute and dense phases. The result seems robust and likely to be reproducible experimentally and through alternative simulation studies, particularly given its established history in quantifying the related phenomena in ionic salts.

      A weakness is the rather qualitative comparison to experiment, which is justified by the authors based on the unknown valency of the experimental system. There is also no quantitative comparison between simulation types (spatial vs non-spatial). However, the simulations do seem sufficiently detailed to test and validate the Ksp concept.

      Strengths:

      • The paper is very focused, and uses multiple simulation 'experiments' to test the role of the Ksp in delineating the phase transition, showing good agreement for multiple systems, with both matched and distinct stoichiometries between the components. They see typical behavior at the phase transition point, where they observe the largest variability or fluctuations in the formation of the dense phase. Thus the results strongly support the conclusion that the Ksp delineates phase transitions in these 2-3 component systems.

      • A comparison is made to a recent experimental result with three components, showing qualitative agreement with an observed lack of buffering, which was unexpected at the time due to the behavior observed for homotypic systems. Here this result is now rationalized via the Ksp, which does plateau despite the monomer concentrations changing.

      • Spatial simulations probe the role of structure and flexibility in impacting phase separation, finding general agreement with previously published experimental and modeling work. These observations about flexibility and matched valency are also relatively intuitive.

      Weaknesses

      • There is no quantitative comparison between the two simulation approaches (spatial and non-spatial), which should be straightforward. By using the same composition and KD in both types of simulations and directly comparing outcomes, it would help explain when and why the spatial simulations differ from the non-spatial ones-see subsequent comments below:

      • A related methodological point: On Line 97 it states that NFSim does not allow intramolecular bonds to form, but this is not true. On one hand, they can be written out explicitly. E.g. A(a1!1, a2).B(b1!1, b2)->A(a1!1, a2!2).B(b1!1, b2!2), would form a second bond between an AB complex that already had one bond. While quite tedious, these could be enumerated, allowing for the zippering effect they see spatially, although the rates would not be bimolecular. This would still leave out intra-complex bonds between proteins without a direct link. However, based on the NFsim website, by default it does in fact allow these types of intra-complex bonds to be formed (http://michaelsneddon.net/nfsim/pages/support/support.html) see "Reactant Connectivity Enforcement". So it is not clear to me which option was used in this paper. According to what is written in the methods, no intra-complex bonds are formed, but this is not the default in NFsim and is indeed allowable.

      • The spatial simulations do not show the bimodal distribution under the fixed concentrations (Fig S9). This is a significant difference from the non-spatial result. They attribute this to a 'dimer trap', but given they see the dense phase in the clamped monomer simulations, this cannot be the only explanation. What about kinetic effects, due to the differences in initial concentrations of monomers in the two simulation approaches? The rate constants are not listed anywhere. They only seem to see large clusters at fixed concentrations for the mismatched sizes (Fig S12B), where the Ksp behavior does not hold. Can they increase monomer flexibility more and start to see bimodal at fixed concentration, or change the rates and see a bimodal distribution?

      • Related-I am surprised that the sterically hindered monomers would not form large clusters at fixed concentration, as it looks like it is impossible for them to 'zipper' up their binding sites and become trapped in dimers. Is the distribution at fixed concentrations bimodal? The data is not shown.

    4. Reviewer #1 (Public Review):

      The gist of this work is that the simple concept of a solubility product determines a threshold for phase separation, thereby enabling buffering even in systems where phase separation is driven by heterotypic interactions. The solubility product or SP is determined by the number of complementary interaction sites and the coordination number i.e., the number of bonds one can make per site.

      The work appears to be motivated by two questions: Are concentrations buffered in systems where heterotypic interactions drive phase separation thereby negating the presence of a rigorously definable saturation concentration? This question was motivated by work from Klosin et al., showing how phase separation can enable buffering of noise in transcription. They relied on the concept of a saturation concentration. In a paper that followed a few months after, Riback et al., showed that the concept of a saturation concentration ceases to exist, as defined for systems where phase separation is driven purely by homotypic interactions. This was taken to imply that the formation of multicomponent condensates via a blend of homotypic and heterotypic interactions causes a loss of buffering capacity afforded by phase separation. The second question motivating the current work is the apparent absence of a theoretical framework for "varying threshold concentrations" in systems governed by heterotypic interactions.

      Using two flavors of simulations, the authors propose that the SP sets an upper limit on the convolution of concentrations that determine phase separation. They show this via simulations where they follow the formation of clusters formed by linear multivalent macromolecules and monitor the emergence of a bimodal distribution of clusters. In 1:1 mixtures of multivalent macromolecules they find that SP sets a threshold beyond which a bimodal distribution of clusters emerges. The authors further find that SP sets an upper limit even in systems that deviate from the 1:1 stoichiometry.

      The authors proceed to show that the SP is influenced by the valence of multivalent macromolecules. They also demonstrate that short rigid linkers can cause an arrest of phase separation through a so-called "dimer trap" reminiscent of the "magic number" postulate put forth by Wingreen and colleagues.

      Is the work significant, novel, and timely? Effectively the authors propose that the driving forces for phase separation can be distilled down to the concept of a solubility product. Given prior knowledge of the valence, coordination number, and affinities can one predict concentration thresholds for phase separation? The authors suggest that this can be gleaned from either network based simulations, which are very inexpensive, or through more elaborate simulations. They further propose that it is the solubility product that sets the threshold.

      It is worth noting that the authors are quantifying what is known in the physical literature as a percolation threshold. The seminal work of Flory and Stockmayer dating back to the 1940s showed how one can calculate a percolation threshold by taking in prior knowledge of valence, coordination numbers, and affinities whilst ignoring cooperativity. These ideas have been refined and advanced in several theoretical contributions by various labs. While none of the papers in the physical literature use the concept of a solubility product, they rely on the concept of a percolation threshold because the transition to large, system-spanning clusters is a continuous one and it is debatable if this is a bona fide phase transition. Rather it is a topological transition.

      As for novelty, unfortunately the authors disregard prior work that showed how linker length impacts local vs. global cooperativity in phase transitions that combine phase separation and percolation. Ref. 23 is the work in question and it is mentioned in passing, even though the contributions here are entirely a redux.

      The concept of a solubility product, introduced here to model / understand phase behavior of multivalent macromolecules, is an interesting and potentially appealing simple description. It might make the understanding of phase transitions more accessible, but it has problems: (a) it does not define phase separation; rather it defines percolation transitions; (b) without prior knowledge of the relevant quantities, the solubility product cannot be readily inferred, even from simulations, although one can scan parameter space to arrive at predictions regarding the apparent valence and coordination numbers. (c) the solubility product does not tell us much about properties of condensates, interfaces, or the driving forces for phase transitions that are influenced by the collective effects of interaction domains / motifs and spacers.

      Finally, as for the absence of a theoretical explanation for the apparent loss of buffering in systems with heterotypic interactions, the authors would do well to see the work of Choi et al., published in PLoS Comput. Biol. in 2019. Figure 12 in that work clearly establishes that the concentrations of A and B species in the coexisting dilute phase are set by the slopes of tie lines - the lines of constant chemical potential. These slopes are set by the relative strengths of homotypic vs. heterotypic interactions, and to zeroth order, that is the physical explanation.

      Overall, the two interesting observations are that the percolation threshold can be cast as a solubility product and that this product sets an upper limit on joint concentration thresholds for phase separation, even in systems with heterotypic interactions, thereby rescuing the concept of buffering.

    5. Evaluation Summary:

      Recent experiments have raised questions regarding concentration buffering provided by the formation of multicomponent biomolecular condensates via phase separation driven by heterotypic interactions. In this work, Chattaraja et al., demonstrate that the concept of a solubility product, used to describe the solubility limits of ionic solutions, sets an upper limit on concentration thresholds, even in systems where the driving forces for phase separation are primarily heterotypic in nature. Their work suggests that the concept of a solubility product rescues the concept of buffering via phase separation.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Author Response:

      Reviewer #2 (Public Review):

      The manuscript by Li et al describes the development of styrylpyridines as cell permeant fluorescent sensors of SARM1 activity. This work is significant because SARM1 activity is increased during neuron damage and SARM1 knockout mice are protected from neuronal degeneration caused by a variety of physical and chemical insults. Thus, SARM1 is a key player in neuronal degeneration and a novel therapeutic target. SARM1 is an NAD+ hydrolase that cleaves NAD+ to form nicotinamide and ADP ribose (and to a small extent cyclic ADP ribose) via a reactive oxocarbenium intermediate. Notably, this intermediate can either react with water (hydrolysis), the adenosine ring (cyclization to cADPR), or with a pyridine containing molecule in a 'base-exchange reaction'. The styrylpyridines described by Li et al exploit this base-exchange reaction; the styrylpyridines react with the intermediate to form a fluorescent product. Notably, the best probe (PC6) can be used to monitor SARM1 activity in vitro and in cells. Upon validating the utility of PC6, the authors use this compound to perform a high throughput screen of the Approved Drug Library (L1000) from TargetMol and identify nisoldipine as a hit. Further studies revealed that a minor metabolite, dehydronitrosonisoldipine (dHNN), is the true inhibitor, acting with single digit micromolar potency. The authors provide structural and proteomic data suggesting that dHNN inhibits SARM1 activity via the covalent modification of C311 which stabilizes the enzyme in the autoinhibited state.

      Thanks to the positive comments and suggestions from Reviewer #2 !

      Key strengths of the manuscript include the probe design and the authors demonstration that they can be used to monitor SARM1 activity in vitro in an HTS format and in cells. The identification of C311 as potential reactive cysteine that could be targeted for drug development is an important and significant insight.

      Key weaknesses include the fact that dHNN is a highly reactive molecule and the authors note that it modifies multiple sites on the protein (they mentioned 8 but MS2 spectra for only 5 are provided). As such, the compound appears to be a non-specific alkylator that will have limited utility as a SARM1 inhibitor. Additionally, no information is provided on the proteome-wide selectivity of the compound.

      Although dHNN may react with cysteines in general, our results indicate it does target specifically Cys311. Quantification of cysteine-containing peptides of other proteins showed no dHNN modification. So, we conclude that dHNN shows significant specificity to the Cys311 of SARM1. Some other SH-reactive agents we tested show little inhibition on SARM1. The evidence for Cys311 being dominant includes quantification of the intensity of the modified peptides and normalizing with that of the corresponding total peptides, with or without modification, showing that the modification is mainly on Cys311 (Figure 5—figure supplement 1). The dominant role of Cys311 is also confirmed by our mutagenesis and structural studies. Our result strongly suggested that the C311 is a druggable site for designing allosteric inhibitors against SARM1 activation.

      dHNN is effective in inhibiting SARM1 activation and AxD at low micromolar range, making it a useful inhibitor. Considering that the neuroprotective effect of NSDP, an approved drug, may well be due to dHNN, labeling it as inhibitor of SARM1 serves focus more attentions.

      Revision has been made in Discussion.

      An additional key weakness is the lack of any mechanistic insights into how the adducts are generated. Moreover, it is not clear how the proposed sulphonamide and thiohydroxylamine adducts are formed.

      From the images presented, it is unclear whether there is sufficient 'density' in the cryoEM maps to accurately predict the sites of modification.

      Please refer to Fig . 5 F, in which we show the close up view of dHNN in the ARM domain. dHNN ( purple ) linked to the residue C311 and formed the hydrophobic interactions with surrounding residues E264, L268, R307, F308, and A315. The extra electron densities near the residue C311 fit the shape of dHNN and were shown as grey mesh.

      Finally, the authors do not show whether the conversion of PC6 to PAD6 is stable or if PAD6 can also be hydrolyzed to form ADPR.

      PAD6 is stable and cannot be hydrolyzed by the activated SARM1, as shown in the following figure. The reactions contain 10μM PAD6, 100 μM NMN, 2.65 μg/mL SARM1 or blank as a control. The PAD6 fluorescence was monitored for one hour and did not change in both groups.

    2. Reviewer #2 (Public Review):

      The manuscript by Li et al describes the development of styrylpyridines as cell permeant fluorescent sensors of SARM1 activity. This work is significant because SARM1 activity is increased during neuron damage and SARM1 knockout mice are protected from neuronal degeneration caused by a variety of physical and chemical insults. Thus, SARM1 is a key player in neuronal degeneration and a novel therapeutic target. SARM1 is an NAD+ hydrolase that cleaves NAD+ to form nicotinamide and ADP ribose (and to a small extent cyclic ADP ribose) via a reactive oxocarbenium intermediate. Notably, this intermediate can either react with water (hydrolysis), the adenosine ring (cyclization to cADPR), or with a pyridine containing molecule in a 'base-exchange reaction'. The styrylpyridines described by Li et al exploit this base-exchange reaction; the styrylpyridines react with the intermediate to form a fluorescent product. Notably, the best probe (PC6) can be used to monitor SARM1 activity in vitro and in cells. Upon validating the utility of PC6, the authors use this compound to perform a high throughput screen of the Approved Drug Library (L1000) from TargetMol and identify nisoldipine as a hit. Further studies revealed that a minor metabolite, dehydronitrosonisoldipine (dHNN), is the true inhibitor, acting with single digit micromolar potency. The authors provide structural and proteomic data suggesting that dHNN inhibits SARM1 activity via the covalent modification of C311 which stabilizes the enzyme in the autoinhibited state.

      Key strengths of the manuscript include the probe design and the authors demonstration that they can be used to monitor SARM1 activity in vitro in an HTS format and in cells. The identification of C311 as potential reactive cysteine that could be targeted for drug development is an important and significant insight.

      Key weaknesses include the fact that dHNN is a highly reactive molecule and the authors note that it modifies multiple sites on the protein (they mentioned 8 but MS2 spectra for only 5 are provided). As such, the compound appears to be a non-specific alkylator that will have limited utility as a SARM1 inhibitor. Additionally, no information is provided on the proteome-wide selectivity of the compound. An additional key weakness is the lack of any mechanistic insights into how the adducts are generated. Moreover, it is not clear how the proposed sulphonamide and thiohydroxylamine adducts are formed. From the images presented, it is unclear whether there is sufficient 'density' in the cryoEM maps to accurately predict the sites of modification. Finally, the authors do not show whether the conversion of PC6 to PAD6 is stable or if PAD6 can also be hydrolyzed to form ADPR.

    3. Reviewer #1 (Public Review):

      The authors aimed to develop cell-permeable small molecule probes that can monitor the activity of SARM1, an enzyme that hydrolyzes NAD+ and is thought to be important for axon degeneration. They successfully achieved this goal using the base exchange activity of SARM1 to make a donor-π-acceptor type of fluorophore. The best probe described in the manuscript is PC6. A number of experiments were carried to rigorously test that the probe works as expected. PC6 has a number of nice features. It is cell permeable, gives much stronger signal than any other probes known for SARM1, is specific for SARM1 and does not detect the activity of CD38 (another enzyme that has similar activity), and allows detection of endogenous SARM1 activation in neurons.

      Using this probe PC6, the authors was able to monitor SARM1 activity in neurons treated with vincristine and demonstrated that SARM1 activation precedes axon degeneration and is important but not sufficient for axon degeneration. Most importantly, using this probe to monitor SARM activity, they screened a library of about 2000 drug molecules and discovered that a hypertension drug, nisoldipine, could inhibits SARM1. Surprisingly, further studies showed that a derivative of nisoldipine, dehydronitrosonisoldipine (dHNN, present in the nisoldipine compound used ), is actually the inhibitor of SARM1. They then carried nice mechanistic studies (including mass spectrometry and cryo-EM structures) showing that dHNN inhibits SARM1 by covalently modify Cys311 residue in the ARM domain. The dHNN binding site is similar to the previously established NAD+ inhibitory site.

      Overall, the probe is novel with many useful features, the study is rigorous and rather complete, and the conclusion is well supported. I believe the study will be important for the field and will be well received by the field.

      The only minor thing is that the writing can be further improved, especially in the introduction section.

    4. Evaluation Summary:

      SARM1, an enzyme that can convert NAD+ to ADP-ribose or cyclic ADP-ribose, is implicated in axon degeneration. This manuscript describes the development of small molecule probes that can detect the activity of SARM1 in live cells. In the course of the work, a small molecule derived from an hypertension drug was discovered as an effective SARM1 inhibitor. Although the activity probes are novel, the mechanism of SARM1 inactivation by dHNN has not been established. The probe and the inhibitor described in the manuscript could lead to future therapeutic development targeting SARM1 to treat axon degeneration.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Author Response:

      Reviewer #1 (Public Review):

      The work by Wang et al. examined how task-irrelevant, high-order rhythmic context could rescue the attentional blink effect via reorganizing items into different temporal chunks, as well as the neural correlates. In a series of behavioral experiments with several controls, they demonstrated that the detection performance of T2 was higher when occurring in different chunks from T1, compared to when T1 and T2 were in the same chunk. In EEG recordings, they further revealed that the chunk-related entrainment was significantly correlated with the behavioral effect, and the alpha-band power for T2 and its coupling to the low-frequency oscillation were also related to behavioral effect. They propose that the rhythmic context implements a second-order temporal structure to the first-order regularities posited in dynamic attention theory.

      Overall, I find the results interesting and convincing, particularly the behavioral part. The manuscript is clearly written and the methods are sound. My major concerns are about the neural part, i.e., whether the work provides new scientific insights to our understanding of dynamic attention and its neural underpinnings.

      1) A general concern is whether the observed behavioral related neural index, e.g., alpha-band power, cross-frequency coupling, could be simply explained in terms of ERP response for T2. For example, when the ERP response for T2 is larger for between-chunk condition compared to within-chunk condition, the alpha-power for T2 would be also larger for between-chunk condition. Likewise, this might also explain the cross-frequency coupling results. The authors should do more control analyses to address the possibility, e.g., plotting the ERP response for the two conditions and regressing them out from the oscillatory index.

      Many thanks for the comment. In short, the enhancement in alpha power and cross-frequency coupling results in the between-cycle condition compared with those in the within-cycle condition cannot be accounted for by the ERP responses for T2.

      In general, the rhythmic stimulation in the AB paradigm prevents EEG signals from returning to the baseline. Therefore, we cannot observe typical ERP components purely related to individual items, except for the P1 and N1 components related to the stream onset, which reveals no difference between the two conditions and are trailed by steady-state responses (SSRs) resonating at the stimulus rate (Fig. R1).

      Fig. R1. ERPs aligned to stream onset. EEG signals were filtered between 1–30 Hz, baseline-corrected (-200 to 0 ms before stream onset) and averaged across the electrodes in left parieto-occipital area where 10-Hz alpha power showed attentional modulation effect.

      To further inspect the potential differences in the target-related ERP signals between the within- and between-cycle conditions, we plotted the target-aligned waveforms for these experimental conditions. As shown in Fig. R2, a drop of ERP amplitude occurred for both conditions around T2 onset, and the difference between these two conditions was not significant (paired t-test estimated on mean amplitude every 20 ms from 0 to 700 ms relative to T1 onset, p > .05, FDR-corrected).

      Fig. R2. ERPs aligned to T1 onset. EEG signals were filtered between 1–30 Hz, and baseline-corrected using signals -100 to 0 ms before T1 onset. The two dash lines indicate the onset of T1 and T2, respectively.

      Since there is a trend of enhanced ERP response for the between-cycle relative to the within-cycle condition during the period of 0 to 100 ms after T2 onset (paired t-test on mean amplitude, p =.065, uncorrected), we then directly examined whether such post-T2 responses contribute to the behavioral attentional modulation effect and behavior-related neural indices. Crucially, we did not find any significant correlation of such T2-related ERP enhancement with the behavioral modulation index (BMI), or with the reported effects of alpha power and cross-frequency coupling (PAC). Furthermore, after controlling for the T2-related ERP responses, there still remains a significant correlation between the delta-alpha PAC and the BMI (rpartial = .596, p = .019), which is not surprising given that the PAC is calculated based on an 800-ms time window covering more pre-T2 than post-T2 periods (see the response to point #4 for details) rather than around the T2 onset. Taken together, these results clearly suggest that the T2-related ERP responses cannot explain the attentional modulation effect and the observed behavior-related neural indices.

      2) The alpha-band increase for T2 is indeed contradictory to the well known inhibitory function of alpha-band in attention. How could a target that is better discriminated elicit stronger inhibitory response? Related to the above point, the observed enhancement in alpha-band power and its coupling to low-frequency oscillation might derive from an enhanced ERP response for T2 target.

      Many thanks for the comment. We have briefly discussed this point in the revised manuscript (page 18, line 477).

      A widely accepted function of alpha activity in attention is that alpha oscillations suppress irrelevant visual information during spatial selection (Kelly et al., 2006; Thut et al., 2006; Worden et al., 2000). However, it becomes a controversial issue when there exists rhythmic sensory stimulation at alpha-band, just like the situation in the current study where both the visual stream and the contextual auditory rhythm were emitted at 10 Hz. In such a case, alpha-band neural responses at the stimulation frequency can be interpreted as either passively evoked steady-state responses (SSR) or actively synchronized intrinsic brain rhythms. From the former perspective (i.e., the SSR view), an increase in the amplitude or power at the stimulus frequency may indicate an enhanced attentional allocation to the stimulus stream that may result in better target detection (Janson et al., 2014; Keil et al., 2006; Müller & Hübner, 2002). Conversely, the latter view of the inhibitory function of intrinsic alpha oscillations would produce the opposite prediction. In a previous AB study, Janson and colleagues (2014) investigated this issue by separating the stimulus-evoked activity at 12 Hz (using the same power analysis method as ours) from the endogenous alpha oscillations ranging from 10.35 to 11.25 Hz (as indexed by individual alpha frequency, IAF). Interestingly, they found a dissociation between these two alpha-band neural responses, showing that the RSVP frequency power was higher in non-AB trials (T2 detected) than in AB trials (T2 undetected) while the IAF power exhibited the opposite pattern. According to these findings, the currently observed increase in alpha power for the between-cycle condition may reflect more of the stimulus-driven processes related to attentional enhancement. However, we don’t negate the effect of intrinsic alpha oscillations in our study, as the current design is not sufficient to distinguish between these two processes. We have discussed this point in the revised manuscript (page 18, line 477). Also, we have to admit that “alpha power” may not be the most precise term to describe our findings of the stimulus-related results. Thus, we have specified it as “neural responses to first-order rhythms at 10 Hz” and “10-Hz alpha power” in the revised manuscript (see page 12 in the Results section and page 18 in the Discussion section).

      As for the contribution of T2-related ERP response to the observed effect of 10 Hz power and cross-frequency coupling, please refer to our response to point #1.

      References:

      Janson, J., De Vos, M., Thorne, J. D., & Kranczioch, C. (2014). Endogenous and Rapid Serial Visual Presentation-induced Alpha Band Oscillations in the Attentional Blink. Journal of Cognitive Neuroscience, 26(7), 1454–1468. https://doi.org/10.1162/jocn_a_00551

      Keil, A., Ihssen, N., & Heim, S. (2006). Early cortical facilitation for emotionally arousing targets during the attentional blink. BMC Biology, 4(1), 23. https://doi.org/10.1186/1741-7007-4-23

      Kelly, S. P., Lalor, E. C., Reilly, R. B., & Foxe, J. J. (2006). Increases in Alpha Oscillatory Power Reflect an Active Retinotopic Mechanism for Distracter Suppression During Sustained Visuospatial Attention. Journal of Neurophysiology, 95(6), 3844–3851. https://doi.org/10.1152/jn.01234.2005

      Müller, M. M., & Hübner, R. (2002). Can the Spotlight of Attention Be Shaped Like a Doughnut? Evidence From Steady-State Visual Evoked Potentials. Psychological Science, 13(2), 119–124. https://doi.org/10.1111/1467-9280.00422

      Thut, G., Nietzel, A., Brandt, S., & Pascual-Leone, A. (2006). Alpha-band electroencephalographic activity over occipital cortex indexes visuospatial attention bias and predicts visual target detection. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 26(37), 9494–9502. https://doi.org/10.1523/JNEUROSCI.0875-06.2006

      Worden, M. S., Foxe, J. J., Wang, N., & Simpson, G. V. (2000). Anticipatory Biasing of Visuospatial Attention Indexed by Retinotopically Specific α-Bank Electroencephalography Increases over Occipital Cortex. Journal of Neuroscience, 20(6), RC63–RC63. https://doi.org/10.1523/JNEUROSCI.20-06-j0002.2000

      3) To support that it is the context-induced entrainment that leads to the modulation in AB effect, the authors could examine pre-T2 response, e.g., alpha-power, and cross-frequency coupling, as well as its relationship to behavioral performance. I think the pre-stimulus response might be more convincing to support the authors' claim.

      Many thanks for the insightful suggestion. We have conducted additional analyses.

      Following this suggestion, we have examined the 10-Hz alpha power within the time window of -100–0 ms before T2 onset and found stronger activity for the between-cycle condition than for the within-cycle condition. This pre-T2 response is similar to the post-T2 response except that it is more restricted to the left parieto-occipital cluster (CP3, CP5, P3, P5, PO3, PO5, POZ, O1, OZ, t(15) = 2.774, p = .007), which partially overlaps with the cluster that exhibits a delta-alpha coupling effect significantly correlated with the BMI. We have incorporated these findings into the main text (page 12, line 315) and the Fig. 5A of the revised manuscript.

      As for the coupling results reported in our manuscript, the coupling index (PAC) was calculated based on the activity during the second and third cycles (i.e., 400 to 1200 ms from stream onset) of the contextual rhythm, most of which covers the pre-T2 period as T2 always appeared in the third cycle for both conditions. Together, these results on pre-T2 10-Hz alpha power and cross-frequency coupling, as well as its relationship to behavioral performance, jointly suggest that the observed modulation effect is caused by the context-induced entrainment rather than being a by-product of post-T2 processing.

      4) About the entrainment to rhythmic context and its relation to behavioral modulation index. Previous studies (e.g., Ding et al) have demonstrated the hierarchical temporal structure in speech signals, e.g., emergence of word-level entrainment introduced by language experience. Therefore, it is well expected that imposing a second-order structure on a visual stream would elicit the corresponding steady-state response. I understand that the new part and main focus here are the AB effects. The authors should add more texts explaining how their findings contribute new understandings to the neural mechanism for the intriguing phenomena.

      Many thanks for the suggestion. We have provided more discussion in the revised manuscript (page 17, line 447).

      We have provided more discussion on this important issue in the revised manuscript (page 17, line 447). In brief, our study demonstrates how cortical tracking of feature-based hierarchical structure reframes the deployment of attentional resources over visual streams. This effect, distinct from the hierarchical entrainment to speech signals (Ding et al., 2016; Gross et al., 2013), does not rely on previously acquired knowledge about the structured information and can be established automatically even when the higher-order structure comes from a task-irrelevant and cross-modal contextual rhythm. On the other hand, our finding sheds fresh light on the adaptive value of the structure-based entrainment effect by expanding its role from rhythmic information (e.g., speech) perception to temporal attention deployment. To our knowledge, few studies have tackled this issue in visual or speech processing.

      References:

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164. https://doi.org/10.1038/nn.4186

      Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., & Garrod, S. (2013). Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain. PLoS Biol, 11(12). https://doi.org/10.1371/journal.pbio.1001752

      Reviewer #2 (Public Review):

      In cognitive neuroscience, a large number of studies proposed that neural entrainment, i.e., synchronization of neural activity and low-frequency external rhythms, is a key mechanism for temporal attention. In psychology and especially in vision, attentional blink is the most established paradigm to study temporal attention. Nevertheless, as far as I know, few studies try to link neural entrainment in the cognitive neuroscience literature with attentional blink in the psychology literature. The current study, however, bridges this gap.

      The study provides new evidence for the dynamic attending theory using the attentional blink paradigm. Furthermore, it is shown that neural entrainment to the sensory rhythm, measured by EEG, is related to the attentional blink effect. The authors also show that event/chunk boundaries are not enough to modulate the attentional blink effect, and suggest that strict rhythmicity is required to modulate attention in time.

      In general, I enjoyed reading the manuscript and only have a few relatively minor concerns.

      1) Details about EEG analysis.

      . First, each epoch is from -600 ms before the stimulus onset to 1600 ms after the stimulus onset. Therefore, the epoch is 2200 s in duration. However, zero-padding is needed to make the epoch duration 2000 s (for 0.5-Hz resolution). This is confusing. Furthermore, for a more conservative analysis, I recommend to also analyze the response between 400 ms and 1600 ms, to avoid the onset response, and show the results in a supplementary figure. The short duration reduces the frequency resolution but still allows seeing a 2.5-Hz response.

      Thanks for the comments. Each epoch was indeed segmented from -600 to 1600 ms relative to the stimulus onset, but in the spectrum analysis, we only used EEG signals from stream onset (i.e., time point 0) to 1600 ms (see the Materials and Methods section) to investigate the oscillatory characteristics of the neural responses purely elicited by rhythmic stimuli. The 1.6-s signals were zero-padded into a 2-s duration to achieve a frequency resolution of 0.5 Hz.

      According to the reviewer’s suggestion, we analyzed the EEG signals from 400 ms to 1600 ms relative to stream onset to avoid potential influence of the onset response, and showed the results in Figure 4. Basically, we can still observe spectral peaks at the stimulus frequencies of 2.5, 5 (the harmonic of 2.5 Hz), and 10 Hz for both power and ITPC spectrum. However, the peak magnitudes were much weaker than those of 1.6-s signals especially for 2.5 Hz, and the 2.5-Hz power did not survive the multiple comparisons correction across frequencies (FDR threshold of p < .05), which might be due to the relatively low signal-to-noise ratio for the analysis based on the 1.2-s epochs (only three cycles to estimate the activity at 2.5 Hz). Importantly, we did identify a significant cluster for 2.5 Hz ITPC in the left parieto-occipital region showing a positive correlation with the individuals’ BMI (Fig. R3; CP5, TP7, P5, P7, PO5, PO7, O1; r = .538, p = .016), which is consistent with the findings based on the longer epochs.

      Fig. R3. Neural entrainment to contextual rhythms during the period of 400–1600 ms from stream onset. (A) The spectrum for inter-trial phase coherence (ITPC) of EEG signals from 400 to 1600 ms after the stimulus onset. Shaded areas indicate standard errors of the mean. (B) The 2.5-Hz ITPC was significantly correlated with the behavioral modulation index (BMI) in a parieto-occipital cluster, as indicated by orange stars in the scalp topographic map.

      Second, "The preprocessed EEG signals were first corrected by subtracting the average activity of the entire stream for each epoch, and then averaged across trials for each condition, each participant, and each electrode." I have several concerns about this procedure.

      (A) What is the entire stream? It's the average over time?

      Yes, as for the power spectrum analysis, EEG signals were first demeaned by subtracting the average signals of the entire stream over time from onset to offset (i.e., from 0 to 1600 ms) before further analysis. We performed this procedure following previous studies on the entrainment to visual rhythms (Spaak et al., 2014). We have clarified this point in the “Power analysis” part of the Materials and Methods section (page 25, line 677).

      References:

      Spaak, E., Lange, F. P. de, & Jensen, O. (2014). Local Entrainment of Alpha Oscillations by Visual Stimuli Causes Cyclic Modulation of Perception. The Journal of Neuroscience, 34(10), 3536–3544. https://doi.org/10.1523/JNEUROSCI.4385-13.2014

      (B) I suggest to do the Fourier transform first and average the spectrum over participants and electrodes. Averaging the EEG waveforms require the assumption that all electrodes/participants have the same response phase, which is not necessarily true.

      Thanks for the suggestion. In an AB paradigm, the evoked neural responses are sufficiently time-locked to the periodic stimulation, so it is reasonable to quantify power estimate with spectral decomposition performed on trial-averaged EEG signals (i.e., evoked power). Moreover, our results of inter-trial phase coherence (ITPC), which estimated the phase-locking value across trials based on single-trial decomposed phase values, also provided supporting evidence that the EEG waveforms were temporally locked across trials to the 2.5-Hz temporal structure in the context session.

      Nevertheless, we also took the reviewer’s suggestion seriously and analyzed the power spectrum on the average of single-trial spectral transforms, i.e., the induced power, which puts emphasis on the intrinsic non-phase-locked activities. In line with the results of evoked power and ITPC, the induced power spectrum in context session also peaked at 2.5 Hz and was significantly stronger than that in baseline session at 2.5 Hz (t(15) = 4.186, p < .001, FDR-corrected with a p value threshold < .001). Importantly, Person correlation analysis also revealed a positive cluster in the left parieto-occipital region, indicating the induced power at 2.5 Hz also had strong relevance with the attentional modulation effect (P7, PO7, PO5, PO3; r = .606, p = .006). We have added these additional findings to the revised manuscript (page 11, line 288; see also Figure 4—figure supplement 1).

      2) The sequences are short, only containing 16 items and 4 cycles. Furthermore, the targets are presented in the 2nd or 3rd cycle. I suspect that a stronger effect may be observed if the sequence are longer, since attention may not well entrain to the external stimulus until a few cycles. In the first trial of the experiment, they participant may not have a chance to realize that the task-irrelevant auditory/visual stimulus has a cyclic nature and it is not likely that their attention will entrain to such cycles. As the experiment precedes, they learns that the stimulus is cyclic and may allocate their attention rhythmically. Therefore, I feel that the participants do not just rely on the rhythmic information within a trial but also rely on the stimulus history. Please discuss why short sequences are used and whether it is possible to see buildup of the effect over trials or over cycles within a trial.

      Thanks for the comments. Typically, to induce a classic pattern of AB effect, the RSVP stream should contain 3–7 distractors before the first target (T1), with varying lengths of distractors (0–7) between two targets and at least 2 items after the second target (T2). In our study, we created the RSVP streams following these rules, which allowed us to observe the typical AB effect that T2 performance was deteriorated at Lag 2 relative to that at Lag 8. Nevertheless, we agree with the reviewer that longer streams would be better for building up the attentional entrainment effect, as we did observe the attentional modulation effect ramped up as the stream proceeded over cycles, consistent with the reviewer’s speculation. In Experiments 1a (using auditory context) and 2a (using color-defined visual context), we adopted two sets of target positions—an early one where T2 appeared at the 6th or 8th position (in the 2nd cycle) of the visual stream, and a late one where T2 appeared at the 10th or 12th position (in the 3rd cycle) of the visual stream. In the manuscript, we reported T2 performance with all the target positions combined, as no significant interaction was found between the target positions and the experimental conditions (ps. > .1). However, additional analysis demonstrated a trend toward an increase of the attentional modulation effect over cycles, from the early to the late positions. As shown in Fig. R4, the modulation effect went stronger and reached significance for the late positions (for Experiment 1a, t(15) = 2.83, p = .013, Cohen’s d = 0.707; for Experiment 2a, t(15) = 3.656, p = .002, Cohen’s d = 0.914) but showed a weaker trend for the early positions (for Experiment 1a, t(15) = 1.049, p = .311, Cohen’s d = 0.262; for Experiment 2a, t(15) = .606, p = .553, Cohen’s d = 0.152).

      Fig. R4. Attentional modulation effect built up over cycles in Experiments 1a & 2a. Error bars represent 1 SEM; p<0.05, * p<0.01.

      However, we did not observe an obvious buildup effect across trials in our study. The modulation effect of contextual rhythms seems to be a quick process that the effect is evident in the first quarter of trials in Experiment 1a (for, t(15) = 2.703, p = .016, Cohen’s d = 0.676) and in the second quarter of trials in Experiment 2a (for, t(15) = 2.478, p = .026, Cohen’s d = 0.620.

      3) The term "cycle" is used without definition in Results. Please define and mention that it's an abstract term and does not require the stimulus to have "cycles".

      Thanks for the suggestion. By its definition, the term “cycle” refers to “an interval of time during which a sequence of a recurring succession of events or phenomena is completed” or “a course or series of events or operations that recur regularly and usually lead back to the starting point” (Merriam-Webster dictionary). In the current study, we stuck to the recurrent and regular nature of “cycle” in general while defined the specific meaning of “cycle” by feature-based periodic changes of the contextual stimuli in each experiment (page 5, line 101; also refer to Procedures in the Materials and Methods section for details). For example, in Experiment 1a, the background tone sequence changed its pitch value from high to low or vice versa isochronously at a rate of 2.5 Hz, thus forming a rhythmic context with structure-based cycles of 400 ms. Note that we did not use the more general term “chunk”, because arbitrary chunks without the regularity of cycles are insufficient to trigger the attentional modulation effect in the current study. Indeed, the effect was eliminated when we replaced the rhythmic cycles with irregular chunks (Experiments 1d & 1e).

      4) Entrainment of attention is not necessarily related to neural entrainment to sensory stimulus, and there is considerable debate about whether neural entrainment to sensory stimulus should be called entrainment. Too much emphasis on terminology is of course counterproductive but a short discussion on these issues is probably necessary.

      Thanks for the comments. As commonly accepted, entrainment is defined as the alignment of intrinsic neuronal activity to the temporal structure of external rhythmic inputs (Lakatos et al., 2019; Obleser & Kayser, 2019). Here, we are interested in the functional roles of cortical entrainment to the higher-order temporal structure imposed on first-order sensory stimulation, and used the term entrainment to describe the phase-locking neural responses to such hierarchical structure following literature on auditory and visual perception (Brookshire et al., 2017; Doelling & Poeppel, 2015). In our study, the consistent results of power and ITPC have provided strong evidence that neural entrainment at the structure level (2.5 Hz) is significantly correlated with the observed attentional modulation effect. However, this does not mean that the entrainment of attention is necessarily associated with neural entrainment to sensory stimulus in a broader context, as attention may also be guided by predictions based on non-isochronous temporal regularity without requiring stimulus-based oscillatory entrainment (Breska & Deouell, 2017; Morillon et al._2016).

      On the other hand, there has been a debate about whether the neural alignment to rhythmic stimulation reflects active entrainment of endogenous oscillatory processes (i.e., induced activity) or a series of passively evoked steady-state responses (Keitel et al., 2019; Notbohm et al., 2016; Zoefel et al., 2018). The latter process is also referred to as “entrainment in a broad sense” by Obleser & Kayser (2019). Given that a presented rhythm always evokes event-related potentials, a better question might be whether the observed alignment reflects the entrainment of endogenous oscillations in addition to evoked steady-state responses. Here we attempted to tackle this issue by measuring the induced power, which emphasizes the intrinsic non-phase-locked activity, in addition to the phase-locked evoked power. Specifically, we quantified these two kinds of activities with the average of single-trial EEG power spectra and the power spectra of trial-averaged EEG signals, respectively, according to Keitel et al. (2019). In addition to the observation of evoked responses to the contextual structure, we also demonstrated an attention-related neural tracking of the higher-order temporal structure based on the induced power at 2.5 Hz (see Figure 4—figure supplement 1), suggesting that the observed attentional modulation effect is at least partially derived from the entrainment of intrinsic oscillatory brain activity. We have briefly discussed this point in the revised manuscript (page 17, line 460).

      References:

      Breska, A., & Deouell, L. Y. (2017). Neural mechanisms of rhythm-based temporal prediction: Delta phase-locking reflects temporal predictability but not rhythmic entrainment. PLOS Biology, 15(2), e2001665. https://doi.org/10.1371/journal.pbio.2001665

      Brookshire, G., Lu, J., Nusbaum, H. C., Goldin-Meadow, S., & Casasanto, D. (2017). Visual cortex entrains to sign language. Proceedings of the National Academy of Sciences, 114(24), 6352–6357. https://doi.org/10.1073/pnas.1620350114

      Doelling, K. B., & Poeppel, D. (2015). Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences, 112(45), E6233–E6242. https://doi.org/10.1073/pnas.1508431112

      Henry, M. J., Herrmann, B., & Obleser, J. (2014). Entrained neural oscillations in multiple frequency bands comodulate behavior. Proceedings of the National Academy of Sciences, 111(41), 14935–14940. https://doi.org/10.1073/pnas.1408741111

      Keitel, C., Keitel, A., Benwell, C. S. Y., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-Driven Brain Rhythms within the Alpha Band: The Attentional-Modulation Conundrum. The Journal of Neuroscience, 39(16), 3119–3129. https://doi.org/10.1523/JNEUROSCI.1633-18.2019

      Lakatos, P., Gross, J., & Thut, G. (2019). A New Unifying Account of the Roles of Neuronal Entrainment. Current Biology, 29(18), R890–R905. https://doi.org/10.1016/j.cub.2019.07.075

      Morillon, B., Schroeder, C. E., Wyart, V., & Arnal, L. H. (2016). Temporal Prediction in lieu of Periodic Stimulation. Journal of Neuroscience, 36(8), 2342–2347. https://doi.org/10.1523/JNEUROSCI.0836-15.2016

      Notbohm, A., Kurths, J., & Herrmann, C. S. (2016). Modification of Brain Oscillations via Rhythmic Light Stimulation Provides Evidence for Entrainment but Not for Superposition of Event-Related Responses. Frontiers in Human Neuroscience, 10. https://doi.org/10.3389/fnhum.2016.00010

      Obleser, J., & Kayser, C. (2019). Neural Entrainment and Attentional Selection in the Listening Brain. Trends in Cognitive Sciences, 23(11), 913–926. https://doi.org/10.1016/j.tics.2019.08.004

      Zoefel, B., ten Oever, S., & Sack, A. T. (2018). The Involvement of Endogenous Neural Oscillations in the Processing of Rhythmic Input: More Than a Regular Repetition of Evoked Neural Responses. Frontiers in Neuroscience, 12. https://doi.org/10.3389/fnins.2018.00095

      Reviewer #3 (Public Review):

      The current experiment tests whether the attentional blink is affected by higher-order regularity based on rhythmic organization of contextual features (pitch, color, or motion). The results show that this is indeed the case: the AB effect is smaller when two targets appeared in two adjacent cycles (between-cycle condition) than within the same cycle defined by the background sounds. Experiment 2 shows that this also holds for temporal regularities in the visual domain and Experiment 3 for motion. Additional EEG analysis indicated that the findings obtained can be explained by cortical entrainment to the higher-order contextual structure. Critically feature-based structure of contextual rhythms at 2.5 Hz was correlated with the strength of the attentional modulation effect.

      This is an intriguing and exciting finding. It is a clever and innovative approach to reduce the attention blink by presenting a rhythmic higher-order regularity. It is convincing that this pulling out of the AB is driven by cortical entrainment. Overall, the paper is clear, well written and provides adequate control conditions. There is a lot to like about this paper. Yet, there are particular concerns that need to be addressed. Below I outline these concerns:

      1) The most pressing concern is the behavioral data. We have to ensure that we are dealing here with a attentional blink. The way the data is presented is not the typical way this is done. Typically in AB designs one see the T2 performance when T1 is ignored relative to when T1 has to be detected. This data is not provided. I am not sure whether this data is collected but if so the reader should see this.

      Many thanks for the suggestion. We appreciate the reviewer for his/her thoughtful comments. To demonstrate the AB effect, we did include two T2 lag conditions in our study (Experiments 1a, 1b, 2a, and 2b)—a short-SOA condition where T2 was located at the second lag of T1 (i.e., SOA = 200 ms), and a long-SOA condition where T2 appeared at the 8th lag of T1 (i.e., SOA = 800 ms). In a typical AB effect, T2 performance at short lags is remarkably impaired compared with that at long lags. In our study, we consistently replicated this effect across the experiments, as reported in the Results section of Experiment 1 (page 5, line 106). Overall, the T2 detection accuracy conditioned on correct T1 response was significantly impaired in the short-SOA condition relative to that in the long-SOA condition (mean accuracy > 0.9 for all experiments), during both the context session and the baseline session. More crucially, when looking into the magnitude of the AB effect as measured by (ACClong-SOA - ACCshort-SOA)/ACClong-SOA, we still obtained a significant attentional modulation effect (for Experiment 1a, t(15) = -2.729, p = .016, Cohen’s d = 0.682; for Experiment 2a, t(15) = -4.143, p <.001, Cohen’s d = 1.036) similar to that reflected by the short-SOA condition alone, further confirming that cortical entrainment effectively influences the AB effect.

      Although we included both the long- and short-SOA conditions in the current study, we focused on T2 performance in the short-SOA condition rather than along the whole AB curve for the following reasons. Firstly, for the long-SOA conditions, the T2 performance is at ceiling level, making it an inappropriate baseline to probe the attentional modulation effect. We focused on Lag 2 because previous research has identified a robust AB effect around the second lag (Raymond et al., 1992), which provides a reasonable and sensitive baseline to probe the potential modulation effect of the contextual auditory and visual rhythms. Note that instead of using multiple lags, we varied the length of the rhythmic cycles (i.e., a cycle of 300 ms, 400 ms, and 500 ms corresponding to a rhythm frequency of 3.3 Hz, 2.5 Hz, and 2 Hz, respectively, all within the delta band), and showed that the attentional modulation effect could be generalized to these different delta-band rhythmic contexts, regardless of the absolute positions of the targets within the rhythmic cycles.

      As to the T1 performance, the overall accuracy was very high, ranging from 0.907 to 0.972, in all of our experiments. The corresponding results have been added to the Results section of the revised manuscript (page 5, line 103). Notably, we did not find T1-T2 trade-offs in most of our experiments, except in Experiment 2a where T1 performance showed a moderate decrease in the between-cycle condition relative to that in the within-cycle condition (mean ± SE: 0.888 ± 0.026 vs. 0.933 ± 0.016, respectively; t(15) = -2.217, p = .043). However, by examining the relationship between the modulation effects (i.e., the difference between the two experimental conditions) on T1 and T2, we did not find any significant correlation (p = .403), suggesting that the better performance for T2 was not simply due to the worse performance in detecting T1.

      Finally, previous studies have shown that ignoring T1 would lead to ceiling-level T2 performance (Raymond et al., 1992). Therefore, we did not include such manipulation in the current study, as in that case, it would be almost impossible for us to detect any contextual modulation effect.

      References:

      Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18(3), 849–860. https://doi.org/10.1037/0096-1523.18.3.849

      2) Also, there is only one lag tested. The ensure that we are dealing here with a true AB I would like to see that more than one lag is tested. In the ideal situation a full AB curve should be presented that includes several lags. This should be done for at least for one of the experiments. It would be informative as we can see how cortical entrainment affects the whole AB curve.

      Many thanks for the suggestion. Please refer to our response to the point #1 for “Reviewer #3 (Public Review)”. In short, we did include two T2 lag conditions in our study (Experiments 1a, 1b, 2a and 2b), and the results replicated the typical AB effect. We have clarified this point in the revised manuscript (page 5, line 106).

      3) Also, there is no data regarding T1 performance. It is important to show that this the better performance for T2 is not due to worse performance in detecting T1. So also please provide this data.

      Many thanks for the suggestion. Please refer to our response to the point #1 or “Reviewer #3 (Public Review)”. We have reported the T1 performance in the revised manuscript (page 5, line 103), and the results didn’t show obvious T1-T2 trade-offs.

      4) The authors identify the oscillatory characteristics of EEG signals in response to stimulus rhythms, by examined the FFT spectral peaks by subtracting the mean power of two nearest neighboring frequencies from the power at the stimulus frequency. I am not familiar with this procedure and would like to see some justification for using this technique.

      According to previous studies (Nozaradan, 2011; Lenc e al., 2018), the procedure to subtract the average amplitude of neighboring frequency bins can remove unrelated background noise, like muscle activity or eye movement. If there were no EEG oscillatory responses characteristic of stimulus rhythms, the amplitude at a given frequency bin should be similar to the average of its neighbors, and thus no significant peaks could be observed in the subtracted spectrum.

      References:

      Lenc, T., Keller, P. E., Varlet, M., & Nozaradan, S. (2018). Neural tracking of the musical beat is enhanced by low-frequency sounds. Proceedings of the National Academy of Sciences, 115(32), 8221–8226. https://doi.org/10.1073/pnas.1801421115

      Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011). Tagging the Neuronal Entrainment to Beat and Meter. The Journal of Neuroscience, 31(28), 10234–10240. https://doi.org/10.1523/JNEUROSCI.0411-11.2011

    2. Reviewer #3 (Public Review):

      The current experiment tests whether the attentional blink is affected by higher-order regularity based on rhythmic organization of contextual features (pitch, color, or motion). The results show that this is indeed the case: the AB effect is smaller when two targets appeared in two adjacent cycles (between-cycle condition) than within the same cycle defined by the background sounds. Experiment 2 shows that this also holds for temporal regularities in the visual domain and Experiment 3 for motion. Additional EEG analysis indicated that the findings obtained can be explained by cortical entrainment to the higher-order contextual structure. Critically feature-based structure of contextual rhythms at 2.5 Hz was correlated with the strength of the attentional modulation effect.

      This is an intriguing and exciting finding. It is a clever and innovative approach to reduce the attention blink by presenting a rhythmic higher-order regularity. It is convincing that this pulling out of the AB is driven by cortical entrainment. Overall, the paper is clear, well written and provides adequate control conditions. There is a lot to like about this paper. Yet, there are particular concerns that need to be addressed. Below I outline these concerns:

      1) The most pressing concern is the behavioral data. We have to ensure that we are dealing here with a attentional blink. The way the data is presented is not the typical way this is done. Typically in AB designs one see the T2 performance when T1 is ignored relative to when T1 has to be detected. This data is not provided. I am not sure whether this data is collected but if so the reader should see this.

      2) Also, there is only one lag tested. The ensure that we are dealing here with a true AB I would like to see that more than one lag is tested. In the ideal situation a full AB curve should be presented that includes several lags. This should be done for at least for one of the experiments. It would be informative as we can see how cortical entrainment affects the whole AB curve.

      3) Also, there is no data regarding T1 performance. It is important to show that this the better performance for T2 is not due to worse performance in detecting T1. So also please provide this data.

      4) The authors identify the oscillatory characteristics of EEG signals in response to stimulus rhythms, by examined the FFT spectral peaks by subtracting the mean power of two nearest neighboring frequencies from the power at the stimulus frequency. I am not familiar with this procedure and would like to see some justification for using this technique

    3. Reviewer #2 (Public Review):

      In cognitive neuroscience, a large number of studies proposed that neural entrainment, i.e., synchronization of neural activity and low-frequency external rhythms, is a key mechanism for temporal attention. In psychology and especially in vision, attentional blink is the most established paradigm to study temporal attention. Nevertheless, as far as I know, few studies try to link neural entrainment in the cognitive neuroscience literature with attentional blink in the psychology literature. The current study, however, bridges this gap.

      The study provides new evidence for the dynamic attending theory using the attentional blink paradigm. Furthermore, it is shown that neural entrainment to the sensory rhythm, measured by EEG, is related to the attentional blink effect. The authors also show that event/chunk boundaries are not enough to modulate the attentional blink effect, and suggest that strict rhythmicity is required to modulate attention in time.

      In general, I enjoyed reading the manuscript and only have a few relatively minor concerns.

      1) Details about EEG analysis.

      First, each epoch is from -600 ms before the stimulus onset to 1600 ms after the stimulus onset. Therefore, the epoch is 2200 s in duration. However, zero-padding is needed to make the epoch duration 2000 s (for 0.5-Hz resolution). This is confusing. Furthermore, for a more conservative analysis, I recommend to also analyze the response between 400 ms and 1600 ms, to avoid the onset response, and show the results in a supplementary figure. The short duration reduces the frequency resolution but still allows seeing a 2.5-Hz response.

      Second, "The preprocessed EEG signals were first corrected by subtracting the average activity of the entire stream for each epoch, and then averaged across trials for each condition, each participant, and each electrode." I have several concerns about this procedure.

      (A) What is the entire stream? It's the average over time?

      (B) I suggest to do the Fourier transform first and average the spectrum over participants and electrodes. Averaging the EEG waveforms require the assumption that all electrodes/participants have the same response phase, which is not necessarily true.

      2) The sequences are short, only containing 16 items and 4 cycles. Furthermore, the targets are presented in the 2nd or 3rd cycle. I suspect that a stronger effect may be observed if the sequence are longer, since attention may not well entrain to the external stimulus until a few cycles. In the first trial of the experiment, they participant may not have a chance to realize that the task-irrelevant auditory/visual stimulus has a cyclic nature and it is not likely that their attention will entrain to such cycles. As the experiment precedes, they learns that the stimulus is cyclic and may allocate their attention rhythmically. Therefore, I feel that the participants do not just rely on the rhythmic information within a trial but also rely on the stimulus history. Please discuss why short sequences are used and whether it is possible to see buildup of the effect over trials or over cycles within a trial.

      3) The term "cycle" is used without definition in Results. Please define and mention that it's an abstract term and does not require the stimulus to have "cycles".

      4) Entrainment of attention is not necessarily related to neural entrainment to sensory stimulus, and there is considerable debate about whether neural entrainment to sensory stimulus should be called entrainment. Too much emphasis on terminology is of course counterproductive but a short discussion on these issues is probably necessary.

    4. Reviewer #1 (Public Review):

      The work by Wang et al. examined how task-irrelevant, high-order rhythmic context could rescue the attentional blink effect via reorganizing items into different temporal chunks, as well as the neural correlates. In a series of behavioral experiments with several controls, they demonstrated that the detection performance of T2 was higher when occurring in different chunks from T1, compared to when T1 and T2 were in the same chunk. In EEG recordings, they further revealed that the chunk-related entrainment was significantly correlated with the behavioral effect, and the alpha-band power for T2 and its coupling to the low-frequency oscillation were also related to behavioral effect. They propose that the rhythmic context implements a second-order temporal structure to the first-order regularities posited in dynamic attention theory.

      Overall, I find the results interesting and convincing, particularly the behavioral part. The manuscript is clearly written and the methods are sound. My major concerns are about the neural part, i.e., whether the work provides new scientific insights to our understanding of dynamic attention and its neural underpinnings.

      1) A general concern is whether the observed behavioral related neural index, e.g., alpha-band power, cross-frequency coupling, could be simply explained in terms of ERP response for T2. For example, when the ERP response for T2 is larger for between-chunk condition compared to within-chunk condition, the alpha-power for T2 would be also larger for between-chunk condition. Likewise, this might also explain the cross-frequency coupling results. The authors should do more control analyses to address the possibility, e.g., plotting the ERP response for the two conditions and regressing them out from the oscillatory index.

      2) The alpha-band increase for T2 is indeed contradictory to the well known inhibitory function of alpha-band in attention. How could a target that is better discriminated elicit stronger inhibitory response? Related to the above point, the observed enhancement in alpha-band power and its coupling to low-frequency oscillation might derive from an enhanced ERP response for T2 target.

      3) To support that it is the context-induced entrainment that leads to the modulation in AB effect, the authors could examine pre-T2 response, e.g., alpha-power, and cross-frequency coupling, as well as its relationship to behavioral performance. I think the pre-stimulus response might be more convincing to support the authors' claim.

      4) About the entrainment to rhythmic context and its relation to behavioral modulation index. Previous studies (e.g., Ding et al) have demonstrated the hierarchical temporal structure in speech signals, e.g., emergence of word-level entrainment introduced by language experience. Therefore, it is well expected that imposing a second-order structure on a visual stream would elicit the corresponding steady-state response. I understand that the new part and main focus here are the AB effects. The authors should add more texts explaining how their findings contribute new understandings to the neural mechanism for the intriguing phenomena.

    5. Evaluation Summary:

      This study by Wang et al. used a series of carefully designed behavioral experiments to convincingly demonstrate that the attentional blink (AB) could be modulated by higher-order rhythmic regularity. EEG results further support the link between the elicited neural entrainment and the AB modulation effect. They propose that the rhythmic context implements a second-order temporal structure to the first-order regularities posited in dynamic attention theory.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  3. Apr 2021
    1. Reviewer #4 (Public Review):

      In this paper, the author uses an impressive comparative dataset of 172 species to investigate the relationship between intraspecific genetic diversity and census (actual) population size. They find that even when they use phylogenetic comparative methods, the relationship between neutral diversity and population size is much weaker than predicted by theory and that selection on linked sites is unlikely to explain this difference. The paper convincingly demonstrates that the paradox of variation first pointed out by Lewinton in the 70s remains paradoxical.

      This paper is exceptionally strong in multiple ways. First, it is statistically rigorous; this is particularly impressive given that the paper uses methods and data from multiple fields (genomics, macroecology, conservation biology, macroevolution). This is the most robust estimate of the relationship between diversity and population size that has been published to date. Second, it is conceptually rigorous: the paper clearly lays out the various hypotheses that have been put forth over the years for this pattern as well as the logic behind these. The author has done a great job at synthesizing some complex debates and different types of data that are potentially relevant to resolving it. Third, it is exceptionally well-written. I sincerely enjoyed reading it. Overall, I think this is a major contribution to this field and though the paper does not resolve the challenge laid down by Lewinton, I think these analyses (and curated data/computational scripts) will inspire other researchers to dig into this question.

      I do however, have some suggestions as to how this paper could be strengthened.

      First, in phylogenetic comparative methods (PCMs) there has been a persistent confusion as to what phylogenetic signal is relevant -- when applying a phylogenetic generalized linear model with a phylogenetically structured residual structure (which the author does here), one is estimating the phylogenetic structure in the errors and not the traits themselves. The comparative analysis are well-done and properly interpreted but at some points in the text, particularly when addressing Lynch's conjecture that PCMs are irrelevant for coalescent times and comments/analysis on the appropriateness of Brownian motion as a model of evolution, that there is some conceptual slippage and I suggest that author take a close look and make sure their language is consistent. Strictly speaking the PGLM approach doesn't assume that the underlying traits are purely BM -- only that the phylogenetic component of the error model is Brownian. As such running the node-height test on the both the predictors and the response variable separately -- while interesting and informative about the phylogenetic patterns in the data (including the shift points you have observed) isn't really a test of the assumptions of the phylogenetic regression model. It is at least theoretically plausible (if not biologically) that both Y and X have phylogenetic structure but that the estimated lambda = 0 (if for instance, Y and X were perfectly correlated because changes in Y were only the result of changes in X). To be clear, I am fine with the PGLM analysis you've done and with the node-height test; I just don't think that the latter justifies the former.

      One note about the ancestral character reconstruction: I think it is a fine visualization and realize you didn't put too much emphasis on it but strictly speaking the ASR's were done under a constant process model and therefore they wouldn't provide evidence for (a probably very real shift) between phyla. I think it was a good idea to run the analyses on the clade specific trees (particularly given how deep and uncertain the branches dividing the phyla are) but I just don't think you could have gotten there from the ASR.

      I am not convinced that the IUCN RedList analysis helps that much here and in my view, you might consider dropping this from the main text. This is for two reasons: 1) species may be of conservation concern both because they have low abundance in general and/or that their abundance is known to have experienced a recent decline -- distinguishing these two scenarios is impossible to do with the data at hand; and 2) there is of course a huge taxonomic bias in which species are considered; I don't think we can infer anything ecologically relevant from whether a species is listed on the RedList or not (as you suggest regarding the lynx, wolverine, and Massasauga rattlesnake) except that people care about it.

      This is not really a weakness but I find it notable that recombination map length is correlated with body size. I realize this is old news but I was left really curious as to a) why such a relationship exists; and b) whether the mechanism that generates this might help explain some of the patterns you've observed. I would be keen to read a bit more discussion on this point.

    2. Reviewer #3 (Public Review):

      This study is quite directly a follow-up study of the recent work of Corbett-Detig et al (2015) and the commentary by Coop (2016) which aimed to understand the relation between population size and diversity, and the degree to which the shape of the relation could be explained by the action of linked selection. The analysis here scales up the sample size for a large-scale focus on comparative analyses of animals, and introduces the application of phylogenetic correction to control for relatedness.

      As the most comprehensive analysis of its type to date, and with the addition of phylogenetic correction, this work's strength primarily lies in confirming the conclusions laid out in the commentary by Coop, notably that linked selection is unable to fully explain the narrowness of the diversity across species with orders of magnitude variation in population sizes. Through an explicit model-fitting of the effects of linked selection, the main conclusions are essentially that Lewontin's Paradox remains unexplained. The Introduction and discussion provide a very nice accounting of the range of possible explanations. I also appreciated the connection of the population size inferences to IUCN status.

      I wasn't so convinced that the assessment of phylogenetic inertia (Lambda>0) really provides a way to assess Lynch's argument that coalescent times are too short to have a phylogenetic effect. For reasons outlined by the author in the discussion, it could well be that any phylogenetic inertia signal is due to inertia of life history traits correlated with effective population size rather than with diversity itself. The discussion raises this important point, but I think leaves us with the difficulty of really assessing how important that phylogenetic correction really is: if diversity has no direct phylogenetic non-independence, I am a bit unsure how much we have learned through this analysis alone (i.e. what is lambda telling us), without an explicit assessment of how often divergence times may actually truly be on the same order as coalescent times.

      That said, I think it's a very open question whether diversity actually has phylogenetic independence because of short split times relative to effective population sizes. The author mentions the possible effect of large Ne on causing this to be violated; but I also wondered whether many of the small Nc species are still retaining a fair bit of ancestral polymorphism, further homogenizing diversity levels.

      Overall a number of possible explanations (such as the effect of variable selected site densities, and variable recombination) were raised, and rather quickly rejected as 'unlikely to explain the qualitative patterns'. In a number of cases these statements were fairly brief, and I wondered whether in aggregate how likely a combination of these COULD explain the patterns. Looking at Figure 5B, it seems like the major effect of phylogeny (or correlated life history) is also apparent for the discrepancy between observed and predicted diversity- Chordates seem to have the largest discrepancy. With that in mind, I do wonder whether some feature of genome structure in Cordates, including a combination of the effects discussed in the paper that could account for the discrepancy (e.g. the effects of variable recombination rates/genome size and functional densities, variation in mutation rates, etc.) could collectively account for the paradox, even though individually the author rules them out as being able to explain the 'qualitative pattern'. Could the genome structure of chordates lead to a major difference in linked selection that's unaccounted for here?

      Mei et al (2018) (American Journal of Botany, Volume 105, Issue 1, p1-124) argued that species with larger genomes have greater 'functional space', implying a greater deleterious mutation rate in species with larger genomes. This could potentially be a factor driving those Chordates with intermediate Nc values furthest below the predicted line?

    3. Reviewer #2 (Public Review):

      This manuscript presents a thorough reanalysis of estimates of genetic heterozygosity pi, its distribution among animals, and its relationship with the census population size, here estimated from organism body mass and species range. A significant phylogenetic effect on pi is uncovered, and a formal model of linked selection is shown to be insufficient to explain the so-called Lewontin's paradox.

      My first and maybe most important comment is that the introduction, discussion and overall writing of the manuscript are really excellent. This might be the most lucid, extensive, balanced overview of Lewontin's paradox and the associated literature I've ever read.

      My second comment, somehow counterbalancing the first one, is that the major point made here, that linked selection alone cannot explain Lewontin's paradox, has been made before, e.g. by Coop (2016) and Ellegren & Galtier (2016) commenting on Corbett-Detig et al (2015). The material presented here substantiates this point further, but is perhaps not a major advance per se, so that the manuscript lies somewhere between a review and research article.

      I have a few additional, more specific comments below. I think this is a great addition to the existing literature, which clarifies and synthetizes many aspects of a complex question.

      1) Phylogenetic inertia

      I am not sure I get the point of the phylogenetic inertia analysis. It seems to be intended as a response to Lynch 2011, who, responding to a criticism by Whitney & Garland, stated that the coalescence time is not inherited across the phylogeny. That quote from Lynch is mentioned several times, and as a motivation for performing this analysis. Yet the result reported here, i.e., that pi has some phylogenetic inertia, does not seem to contradict this specific statement, for at least two reasons. First pi might have some inertia via inertia on the mutation rate, not on coalescence time. Secondly, pi might have some inertia because it is in part determined by traits that have some inertia, such has body mass for instance. The text insightfully discusses these aspects (l399-407), but honestly I do think that this analysis invalidates Lynch's (somewhat trivial) point that coalescence time is not a trait that can be inherited.

      I still agree that the analysis is worth doing and publishing, but I would suggest putting less emphasis on the Garland/Lynch controversy. Also it might be fair to mention that Leffler et al (2012) and Romiguier et al. (2014) did attempt to correct for phylogenetic inertia when correlating pi to various traits, although they did not analyse the phylogenetic effect as thoroughly as it is done here.

      2) Range effect

      I was surprised to read that species range alone has a significant effect on pi. The reason is that I suspected species range varied at a shorter time scale than coalescence time - e.g. think of what ranges were 20,000 years ago, when pi was probably, I thought, very similar to current pi; maybe worth discussing?

      3) IUCN categories

      I found the result that endangered species have a lower estimated Nc and a lower pi than non-endangered species a bit trivial, knowing that lare body sized vertebrates are typically more threatened, and more of concern, than small body sized invertebrates. What would be more relevant to conservation biology is an analysis that controls for body size, e.g., are endangered large mammals less polymorphic than non-endangered large mammals. There is a fairly large amount of literature on this topic.

      4) The Methods section (l580-581) states that map length data are available in 41 species, but figure 5A shows a relationship with 131 data points; some clarification needed here

      5) abstract line 10: "vary two orders of magnitude", word missing

    4. Reviewer #1 (Public Review):

      The standard neutral model, which is our null model for levels of genetic variation, predicts that they should be proportional to census population sizes. In reality census population sizes across metazoan species span several orders of magnitude more than the ~3 orders spanned by levels of genetic diversity. This discrepancy is referred to as Lewontin's paradox, and to resolve it would mean to explain how basic population genetic processes lead to the modest span of genetic diversity levels that we observe. This is a central question in population genetics (which is, after all, concerned with understanding patterns of genetic variation) and is of substantial general interest.

      The manuscript addresses Lewontin's paradox through three main analyses:

      1) It derives novel estimates of census population size across metazoans, which alongside previous estimates of neutral diversity levels, enables a revised quantification of the relationship between diversity levels (\pi) and census populations sizes (Nc).

      2) It quantifies the relationship between \pi and Nc controlling for phylogenetic relatedness.

      3) It revisits the question of whether this relationship can be accounted for by the effects of selection at linked loci (e.g., sweeps and background selection). I address each of these analyses in turn.

      Novel estimation of census population sizes in metazoans: The estimates are derived by: 1) estimating the density of individuals within their range, based on body size and a previously observed linear relationship between body size and density (Damuth 1981, 1987); 2) applying a geometric algorithm (finding the minimum alpha-shape computationally, sometimes adjusting alpha manually) to geographic occurrence data to estimate the area of the range; and 3) multiplying the two.

      The results are sometimes surprising. For example, Drosophila melanogaster is estimated to have a population size > 10^17 (Fig. 1); if the volume of an individual is 1 mm3, this implies a total volume > 1km x 1km x 100 m. Additionally, some species classified as endangered have census estimates > 10^8 (Fig. 3). The author compares his area estimates with estimates for species in the IUCN Red List (focused on endangered species) to find that they largely correlate (although this is not quantified). I think further investigation of the quality of the census size estimates is warranted. Are there are other estimates of census size or biomass that can be used for validation, e.g., for species of economic and biomedical importance (e.g., herring and anopheles)?

      If the proposed method proves to work well, I imagine that the estimates of census size may be of broad interest in other contexts. In the context of Lewontin's paradox, it may be interesting to quantify the difference in the relationship between \pi and Nc suggested by the new estimates vs the proxies used in previous work (e.g., Leffler et al. 2012).

      Quantifying the relationship between \pi and Nc controlling for phylogenetic relatedness: I am unclear about the motivation for this analysis. As Lynch argued (and the author describes), if TMRCAs of neutral loci within a species are smaller than the split time from another species in the sample, its genetic diversity level was shaped after the split, and it could be considered an independent sample for the relationship between \pi and Nc. There may be underlying factors shaping this relationship that are not phylogenetically independent (e.g., similar life history traits) but it is unclear why that would justify down-weighting a sample. In that sense, I am not convinced by the authors argument that finding a 'phylogenetic signal' justifies the correction. Stated differently, it is not obvious what is the 'true' relationship being estimated and why relatedness biases it. One could imagine that the 'true' relationship is the one across extant species, in which case the correction is not needed (with the possible exception of species in which TMRCAs are on the same order or greater than split times). I don't know what an alternative 'true' relationship would be.

      Moreover, I am not sure how a more precise 'quantification' of the relationship between diversity and census size serves us. Regardless of corrections, it is obvious that the null provided by the standard neutral model is off by orders of magnitude. Perhaps once we have alternative explanations for this relationship then testing them may require corrections, but presumably the corrections will depend on the explanations.

      One context in which phylogenetic considerations and quantification may be relevant is the comparison of the \pi - Nc relationship among clades. Notably, one could imagine that different population genetic processes are important in different clades (e.g., due to reproductive strategy) and a comparative analysis may highlight such differences. It is less clear whether the corrections that are applied here are the relevant ones. Separating clades makes sense in this regard, but it is unclear why to correct for non-independence within a clade. Furthermore, it seems that in order to point to different processes one would like to control for the distribution of census population sizes in comparisons between clades (to the extent possible). Otherwise, one can imagine the same process shaping the relationship in different clades, but having a non-linear (in log-log scale) functional dependence on census population size (as in the case of genetic draft studied next). In this regard, I am not sure I follow the argument attributed to Gillespie (1991) and specifically how the current analysis supports it.

      In summary, I find the ideas of clade level analyses and of using phylogenetic comparative methods (PCMs) to look at census population size (and possibly diversity levels) promising. For example, as the author alludes to in the Discussion (bottom of P. 13), PCMs may be informative about the hypothesis that species with large census sizes have a greater rate of speciation. Yet I find the current analyses difficult to interpret.

      Analysis of the effects of linked selection: The author investigates whether the effect of selection at linked sites (e.g., selective sweeps and background selection) can account for the observed relationship between diversity levels and census population size. To this end, he assumes that different species have the same sweeps and background selection parameters inferred in Drosophila melanogaster, but differ in census size and genetic map length.

      As justification for using selection parameters inferred in D. melanogaster, the author argues that this is a "generous" assumption in that the effects of linked selection in this species are on the high end. One issue with this argument is that among reasons for the strong effects in D. melanogaster is its short genetic map length. This is not a substantial caveat, given that the analysis is meant as an illustration and it can be resolved by using appropriate wording. Perhaps more troubling is that the author's estimate of the reduction in diversity level in D. melanogaster is much greater than the reduction estimated in the inference that he relies on (several orders of magnitude and less than one, respectively). This discrepancy is mentioned but should probably be addressed more substantially.

      The results of the analysis are intriguing. The effects of linked selection `shrink' the ~13 orders of magnitude of census population sizes to ~3 orders of magnitude of diversity levels. This massive effect is largely due to the genetic draft (Gillespie 2001) and to a lesser extent to the decrease in map length with increasing census size: when the census population size becomes very large (Nc~10^9) and coalescence rates due to genetic drift decrease accordingly (~1/2Nc), coalescence rates due to sweeps, which increase owing to the smaller map lengths (and would otherwise remain constant), become dominant. In hindsight this is quite intuitive and aligns with Gillespie's original argument, but this is in hindsight, and using this argument in conjunction with data, specifically with census population size and map length estimates, is novel.

      As the author points out, the resulting relationship between diversity levels and census population sizes does not fit the data well. Notably, predicted diversity levels are too high in the intermediate range of census population sizes. Nonetheless, their analysis suggests that linked selection may play a much greater role than previous studies suggested (i.e., the analyses of Corbett-Detig et al. (2015) and Coop (2016) suggests that it cannot account for more than 1 order of magnitude). Maybe the poor fit is due to the importance of other factors (e.g., bottlenecks) in species with intermediate census population sizes?

      I also wonder whether the potential role of linked selection may be clearer if the different effects are shown separately, and perhaps with less reliance on the estimates from D. melanogaster. Namely, the effects of background selection can be shown for a few different values of Udel, e.g., between 0.3-3 (this range seems plausible based on many estimates). They can be shown both accounting and not accounting for the relationship between map length and census size. Similarly, the effect of sweeps can be shown for several values of corresponding parameters, and perhaps even for different models for how the number of beneficial substitutions varies with census size (see Gillespie's work to that effect). I believe that such illustrations will be fairly intuitive and less restrictive.

    5. Evaluation Summary:

      The manuscript revisits an enduring and central question in population genetics known as Lewontin's paradox: that in contrast to the prediction of the field's null model, which suggests that levels of neutral genetic diversity should be proportional to the census population size, in reality, census population sizes span several orders of magnitude more than the approximately three orders of magnitude spanned by levels of genetic diversity. The manuscript provides a nice review of previous work as well as thought-provoking novel analyses. There are also several issues that make it difficult to interpret the new results.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #4 agreed to share their names with the authors.)

    1. Reviewer #2 (Public Review):

      It is well established in diverse sensory modalities that fluctuating excitability of cortical regions is likely reflected in ongoing alpha activity in these respective areas. However, how this oscillatory activity relates to "intensities" of neural (~evoked) responses and perception following supra-threshold stimulation is not well established. Building up and extending also their own previous work in the somatosensory domain (Stephani et al., 2020), this is the main goal of the authors.

      To achieve their goals the authors implement a straight-forward somatosensory discrimination task while recording EEG. The study builds up on very high quality data as well as analysis approaches and along with a decent sample size allows draw conclusions with respect to the aforementioned questions. Using CCA to analyse ongoing and stimulus (single-trial) evoked responses from a (for the non-invasive researcher world) well-circumscribed brain region is a clear strength, when studying the inter-relationships between these brain activity features. The displayed results of the structural equation model (Figure 4) is a great summary of the main effects of the results and an important contribution to the field. In particular, I really appreciate the inclusion of peripheral responses, that convincingly make the case that the non-trivial relationship between stimulus and perceptual intensity on the one hand side and early evoked response (N20) on the other hand side indeed emerges at a brain level.

      However there are also some weaknesses that need to be mentioned:

      • The main weaknesses of the manuscript becomes most apparent with respect to the stated impact that "The widespread belief that a larger brain response corresponds to a stronger percept of a stimulus may need to be revisited.". I am not really sure if there are many cognitive neuroscientists, that would actually subscribe to such a simplistic relationship between evoked responses and perception and that temporal differentiation (early vs late responses) and the biasing influence of prestimulus activity patterns are becoming increasingly recognized. So rather than actually changing a dominant paradigm, this work is an (excellent) contribution to a paradigm shift that is already taking place.

      • Also it should be considered that with regards to the analysis approach using CCA, the claims are mainly restricted to BA3b: i.e. while I also think that this is a strength of the current study, one should refrain from over-interpreting the results in a very generalized manner. The authors do include some "thalamus" and "late" evoked response patterns as well, however that presentation of the results is somewhat changed now as compared to the N20 (e.g. using LMEs rather than comparison of extremes; not using SEMs). The readability of results and especially the comparison of effects would profit from a more coherent approach.

      • I have some concerns whether the relationship between large alpha power and more negative N20s could be driven by more trivial factors rather than the model explanations the authors develop in the discussion. Concretely the question whether phase locking of large alpha power along with >30 Hz high pass filtering could produce a similar finding as shown e.g. in Figure 2c. This is an important issue, as prestimulus alpha influences the N20 amplitudes as well as the perceptual reports.

      • It is important to emphasize that the model develop is a post-hoc one, i.e. the authors do not develop already in the discussion various alternative scenario results based on different model predictions. Therefore there is no strong evidence in support of the specific one advanced in the discussion.

    2. Reviewer #1 (Public Review):

      In this study, Stephani et al. addresses the question of how ongoing fluctuations in neuronal excitability, as well as stimulus strength, impact the perception of above-threshold tactile stimuli and the subsequent stimulus-evoked brain activity. Specifically, pre-stimulus alpha oscillation amplitude and the N20 component of the SEP are used as a readout of cortical excitability, while signal detection theory quantities - sensitivity and criterion - derived from participant response are used as the behavioral correlates. The authors find that 1) higher prestimulus alpha amplitude is associated with a higher criterion, i.e., participants tend to rate stimuli as "weaker" regardless of the actual intensity, while there was no effect on sensitivity; 2) larger N20 amplitude (more negative) is associated with stronger stimulus intensity; 3) conditioned on actual stimulus intensity, larger N20 amplitude is associated with a higher criterion, similar to prestim alpha; 4) the above effects are confirmed using a multi-level structural equation model while also accounting for peripheral control measures; and finally 5) that the thalamic response, as measured in very early components, have no association with perceptual response and previous findings on later SEP components (N140) is reproduced in this data. The authors offer a physiological interpretation that explains the seemingly contradictory result by accounting for the recruitment level of cortical neurons and their membrane depolarization in excitable stages.

      Overall, I find this study to be very nicely done, well-written, and with informative figures. My expertise in signal detection theory and awareness of the SEP literature are limited, and the following comments will probably reflect that. Considering that, the introduction was very concise yet informative regarding the state of the field, and nicely motivates why suprathreshold stimulation is an interesting question to investigate, and was overall just a pleasure to read. The data and analyses seem convincing in supporting the authors' conclusions. The results are indeed puzzling (in an interesting way), and while the authors provide a nicely parsimonious explanation rooted in the underlying neurophysiology, I think this study has the potential to further motivate many lines of investigation, especially considering that the majority of works done in this field looks at the effect of ongoing neural activity on the detection of near-threshold sensory stimuli (as far as I know). I have some major concerns broadly regarding the interplay between alpha oscillation and the N20 (detailed below), the rest are mostly clarifying comments/questions that I believe may help the authors improve this paper, as well as other interesting points to consider in the discussion to relate to the broader literature.

      -

      N20 and alpha oscillation

      My main technical concern lies in the choice of decomposition filter for SEP and alpha oscillations, and the conclusions the authors draw from that. Specifically, a CCA spatial filter is optimized here for the N20 component, which is then identically applied to isolate for alpha sources, with the logic being that this procedure extracts the alpha oscillation from the same sources (e.g., L359). I have no issues (or expertise) with using the CCA filter for the SEP, but if my understanding of the authors' intent is correct, then I don't agree with the logic that using the same filter isolate for alpha as well. The prestimulus alpha oscillation can have arbitrary source configurations that are different from the SEP sources, which may hypothetically have a different association with the behavioral responses when it's optimally isolated. In other words, just because one uses the same spatial filter, it does not imply that one is isolating alpha from the same source as the SEP, but rather simply projecting down to the same subspace - looking at a shadow on the same wall, if you will. To show that they are from the same sources, alpha should be isolated independently of the SEP (using CCA, ICA, or other methods), and compared against the SEP topology. If the topology is similar, then it would strengthen the authors' current claims, but ideally the same analyses (e.g., using the 1st and 5th quintile of alpha amplitude to partition the responses) is repeated using alpha derived from this procedure. Also, have the authors considered using individualized alpha filters given that alpha frequency vary across individuals? Why or why not?

      In the same vein, both alpha and N20 amplitude relate to perceptual judgement, and to each other. I believe this is nicely accounted for in the multivariate analysis using the SEM, but the analysis that partitions the behavioral responses using the 20% and 80% are done separately, which means that different behavioral trials are used to compute the effect of N20 and alpha on sensitivity and criterion. While this is not necessarily an issue given that there IS a multivariate analysis, I would like to know how many of those trials overlap between the two analyses.

      At multiple points, the authors comment that the covariation of N20 and alpha amplitude in the same direction is counterintuitive (e.g., L123-125), and it wasn't clear to me why that should be the case until much later on in the paper. My naive expectation (perhaps again being unfamiliar with the field) is that alpha amplitude SHOULD be positively correlated with SEP amplitude, due to the brain being in a general state of higher variability. It was explained later in the manuscript that lower alpha amplitude and higher SEP amplitude are associated with excitability, and hence should have the opposite directions. This could be explicitly stated earlier in the introduction, as well as the expected relationship between alpha amplitude and behavior.

      Furthermore, I have a concern with the interpretation here that's rooted in the same issue as the assumption that they are from the same sources: the authors' physiological interpretation makes sense if alpha and N20 originated from the same sources, but that is not necessarily the case. In fact, the population driving the alpha oscillation could hypothetically have a modulatory effect on the (separate) population that eventually encodes the sensory representation of the stimulus, in which case the explanation the authors provide would not be wrong per se, just not applicable. A comment on this would be appreciated in the revision.

      In addition, given how closely related the investigation of these two quantities are in this specific study, I think it would be relevant to discuss the perspective that SEPs are potentially oscillation phase resets. Even though the SEP is extracted using an entirely different filter range, it could nevertheless be possible that when averaged over many trials, small alpha residues (or other low freq components) do have a contribution in the SEP. If the authors are motivated enough, a simulation study could be done to check this, but is not necessary from my point of view if there is an adequate discussion on this point.

    3. Evaluation Summary:

      Stephani et al. address the question of how ongoing fluctuations in neuronal excitability, as well as stimulus strength, impact the perception of above-threshold tactile stimuli and the subsequent stimulus-evoked brain activity. The results are puzzling in an interesting way, and while the authors provide a nicely parsimonious explanation rooted in the underlying neurophysiology, editors and reviewers think this study has the potential to further motivate many lines of investigation. This manuscript will be of interest mainly to researchers using electrophysiological methods (EEG, MEG, ECoG etc.), as the authors have produced a very high-quality EEG data-set (including uncommon peripheral measurements).

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

    1. Reviewer #3 (Public Review):

      In the paper by Victorino et al., the authors describe the role for transcription factor HIF1a in NK cells during MCMV infection. They clearly demonstrate that HIF1a-deficiency results in impaired viral control, with a major effect visible in the impacted expansion of MCMV-specific NK cells. The paper brings novelty to the field as the role of HIF1a has not been addressed in NK cells in the course of viral infection.

      The conclusions of the paper are mostly well supported by the data however there are still some aspects of the study that need clarification and extension.

      i) It remains unclear what induces HIF1a expression during MCMV infection.

      ii) The authors could speculate on the mechanisms of how HIF1a promotes repression of Bim during MCMV infection?

      iii) The lack of expression of HIF1a glycolytic genes in HIF1a-deficient NK cells may not be surprising but it is very clear and convincing and supports the idea that HIF1a promotes survival of cells by promoting glycolysis. However, the study would benefit with a formal proof of this metabolic adaptation in the context of MCMV infection.

    2. Reviewer #2 (Public Review):

      In this manuscript, the authors analyzed the role of HIF1a in NK cells in a variety of settings, including viral infection. HIF1a deficient NK cells appear to be mostly functional in terms of effector functions and ability to proliferate with only subtle differences with WT NK cells. This was also observed in HIF1a deficient Ly49H+ NK cells, yet in vivo Ly49H expansion is reduced in HIF1a KO mice. Response to IL-2 demonstrate that despite similar proliferation rate NK cell numbers were reduced indicating to the authors an NK cell survival issue. This was confirmed by measuring Bim and Bcl2, which were respectively decreased and increased. Increased cell death of HIF1a deficient NK cells during MCMV was confirmed. Mechanistically, the authors found that cell death was autophagy independent but due to an impaired glycolytic activity. The author concluded that in the absence of HIF1a, NK cells had an increase apoptosis due to abnormal glucose metabolism. Overall, the experiments are well executed and are logical and the conclusions are supported by the data presented.

    3. Reviewer #1 (Public Review):

      The manuscript by Victorino et al. describes the role of the metabolic adaptor hypoxia inducible factor-1α (HIF1α) in NK cells during viral infection. They first showed that NK cells constitutively express HIF1α and it is upregulated by murine cytomegalovirus (MCMV) infection. Using HIF1α KO mice, they provided evidence that HIF1α is dispensable for normal NK cell development, but important for NK cell dependent virus control and morbidity, NK cell number and their expansion. Although the lack of HIF1α affects the NK cell dependent virus control, it appears that HIF1α is not required for NK cell effector functions. In spite of the fact that proliferation of NK cells in HIF1α KO was not affected, their ultimate number was reduced due to the upregulation of pro-apoptotic protein Bim coupled with increased caspase activity and impaired glucose metabolism. As authors pointed out, the data presented in this manuscript are in sharp contrast to previous finding on the role of HIF1α in NK cell responses to tumors, suggesting the impact of tumor microenvironment.

    4. Evaluation Summary:

      By using mice lacking the hypoxia inducible factor-1α (HIF1α) in NK cells, the study unravels a previously unknown function of this transcription factor in virus control by NK cells. Mechanistically, the authors provided evidence that HIF1α supports survival of NK cells through an efficient glucose metabolism required for optimal NK cell response to viral infection.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

    1. Reviewer #3 (Public Review):

      Sorrentino et al. utilise Magnetoencephalography (MEG) and diffusion MRI tractography to investigate the mapping between the structure and function of the human brain and any constrains imposed from this coupling. Their work builds upon a growing number of studies that use functional Magnetic Resonance Imaging (fMRI) to provide evidence of structure shaping neural functioning. In this case, the authors utilise the fine temporal resolution of MEG to explore the propagation of the neural signal and investigate how this can be linked to a structural connectome derived from deterministic diffusion MRI tractography. Following critical dynamics analysis pipelines, they identified neuronal avalanches in the MEG data and showed that their spread is more likely between pairs of grey matter regions with increased structural connectivity strengths, quantified by the streamline count among them. This result provides new evidence on how the structural architecture of the human brain can influence intrinsic neural dynamics and suggests a potential mechanism, based on scale invariant properties in space and time, for similar previous findings based on the slower temporal scales of fMRI.

      The analyses presented are clear and concise. They highlight an efficient and clever way to combine MEG and diffusion data, maximising the benefits of both modalities, to explore structure-function associations. The authors have tested a number of different configurations, using multiple connectome mapping pipelines, atlases, as well as a replication sample from the Human Connectome Project and the results were robust both at the individual and the group level, which is reassuring and impressive.

      Given the short report format of the manuscript, it is understandable that some additional information and results were described very briefly or omitted altogether. However, there are a few points that, I think, if discussed (even succinctly) could improve the strength of the presented evidence and increase the manuscript's impact to the field. For example:

      Given that the foundations for all subsequent functional analyses are the time bin length and the branching parameter, it would be useful to have a couple of graphs showing their relationship. i.e. a graph showing the association between bin size and σ, for a wider range of bins (in addition to 1, 3, and 5 that are reported). Is bin size 3 the only bin size that σ = 1 and if not, how does this affect the rest of the results (especially the transition matrix). A second interesting graph dealing with avalanche dynamics would be to show the avalanche size distributions for a single subject and the group, for different bin lengths, highlighting whether they are following a power law, indicator of critical dynamics, and briefly discussing their power law exponents, α.

      The correlation between the structural connectivity and randomised transition matrices still seems relatively high. It'd be of interest for the authors to provide a brief interpretation of this, along with a justification for keeping the spatial structure unchanged during their randomisation routine.

      As the different size of parcels in the atlases can have an effect for both structural and functional analyses, it would be of interest to know if the authors controlled for that and how.

      Given the varying SNR that the AAL parcels will have due to their location, it could be of interest to present some information about the avalanches' spatial distribution (i.e. but not limited to a whole-brain map, where each parcel's intensity could correspond to the number of times it goes supra-threshold on average). This could highlight any issues where avalanches involve some parcels more (or less) than others due to challenges in recording and localising their activity.

      In addition to the above challenges with MEG, deterministic tractography analyses also present limitations on how accurately they can describe the underlying structural connectome. i.e. issues with crossing fibres (of varying degree among parcels due to their location), spurious tracts, and invalid, non-biologically plausible connections. A brief mention of these challenges both for MEG and DWI and how they might affect and impose limitations on the manuscript's results would be beneficial.

      Finally, values in the scatter plots in Figure 2 are probably mean centered? For visualisation purposes it might be better if they were not, as it seems a bit odd to have negative values or numbers higher than 1 for structural connectivity and transition probabilities. Also, there seems to be lots of ROI pairs with 0 structural connectivity but high transition probabilities, which might justify a brief mention in the manuscript and an interpretation.

    2. Reviewer #2 (Public Review):

      Is this submission Sorrentino and co. are investigating the relationship between the structural and electrophysiological functional connectome. In particular they are asking whether the white matter structure is a large contributor to the patterns of function we see, and (importantly) whether this is or not a source-reconstruction artefact. The relationship between structure and the emergence of these functional networks is of interest to many, it has been previously shown in fMRI and I believe a lot of modelling work to match empirical observations of the electrophysiology has been previously done.

      The paper is clear in its motivations, and I believe fairly clearly reported. The simplicity of this is definitely one of the strengths of the report. Conceptually I believe this is a plausible hypothesis and of interest and (assuming the technical methods are correct) I'd say this is an elegant approach to supporting this.

    3. Reviewer #1 (Public Review):

      Sorrentino et al explore the possible link between 'neuronal avalanches' in resting MEG signal and structural connectivity in the human brain. They estimate neuronal avalanches by applying a threshold to identify large perturbations in the source reconstructed MEG data before binarising the time-series to define 'active' and 'passive' windows in each voxel. Sequences of 'active' voxels are identified starting with any region becoming active and ending when all voxels become passive. The probability of an avalanche transitioning between any two voxels in the MEG data is compared to network structure identified from diffusion imaging in the same individuals. The authors show that brain regions with a high function transition probability are also likely to be structurally connected. Whilst the core finding is interesting, the results are undermined by a lack of controls for confounds.

      Strengths

      This paper utilises a straightforward and intuitive analysis approach to tackle a complex question - how does functional activity spread throughout the brain? The simple thresholding in the neuronal avalanches approach avoids a number of complex steps typically associated with electrophysiology connectivity estimation such as strong filtering and complex frequency transforms. Sorrentino et al are able to show that this simple time-domain measure is able to provide an interesting overview of functional network structure. Moreover, this method naturally works to explore networks structure in transient, aperiodic signals which are often overlooked in favour of an oscillatory perspective.

      The authors consider a range of analysis pipelines to show that the core results are robust to key analysis decisions. Two different parcellations and methods for computing transition probabilities are considered and the results are shown to hold when using diffusion MR data from the HCP project.

      Weaknesses

      The authors claim that these results are unlikely to be caused or affected by linear mixing or volume conduction - however this is not clear to me based on the presented information. Specifically, if a perturbation arises in one region and is mixed by volume conduction into a second region, part of its shape will be preserved but this will be at a lower overall amplitude. Therefore, as the whole perturbation shape will be scaled down in the second mixed region, it is likely that its rising edge will reach the z-score threshold at a later time than in the original signal. In this way linear mixing by volume conduction has the potential to create spurious time-lagged in this analysis. Previous literature on neuronal avalanches in MEG have included extensive control analyses and discussions on linear signal mixing for this reason (10.1523/JNEUROSCI.4286-12.2013). This point is not tackled in the analysis and not clearly discussed in the paper.

      The correlation in Figure 2 B and C is interesting but is not supported by control analyses to account for confounds. For example, ROI size could potentially lead to more apparent structural connectivity and stronger MEG signal driving an apparent correlation between the modalities. This authors conclusions would be better supported if such effects were ruled out.

      The main results are not well developed from the available data. The group level correlations are visualised and the subject-specific correlations are brieflly shown but not described in detail. It is unclear which regions and connections show the highest correlations. Similarly, there is wide between subject variability in the structure<->function correlation which ranges betwee 0.1 and 0.35 but the analysis does not explore whether this is reproducible, neuronal variability or driven by differences in SNR.

    4. Evaluation Summary:

      The present paper addresses the relationship between the electrophysiological and the anatomical connectomes, utilising a method to describe avalances of activity. The editors feel that this work might be pushing the limits of MEG as a modality, since it implies more spatial precision that most would assume possible, which makes the manuscript particularly interesting to M/EEG researchers. While all reviewers agree that the paper has broad interest and the method is promising, some potential concerns have however been raised that compromise the validity of the results. Most importantly: the issue of volume conduction (proximity) driving the results as opposed to anatomical connectivity, which in the worst case could deemed the results trivial. Other confounds, such as the size of the parcels and their SNR, would also require major review.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

    1. Reviewer #2 (Public Review):

      Certain biological structures have evolved to attain certain forms that may enhance their function. The authors suggest that the shape of a cilium can enhance its sensory function in both quiescent fluid and shear flow, and compared the extent of this enhancement in a number of representative settings. This simple yet compelling possibility has not been explored in detail previously, and is deserving of further attention from both theoretical and experimental perspectives.

      The present work is clearly a step in the right direction, proposing a quantitative framework and systematic approach to address this problem. The authors first extended the classical study by Berg and Purcell for spherical absorbers to prolate spheroids with slender aspect ratio, and compared this with a circular patch, showing the effectiveness of a cilium as a receptor. They then incorporated shear flow, showing that the cilium again outperforms a patch. Finally, they considered the case of an actively beating cilium or a motile bundle - a case which may be important for symmetry breaking in the vertebrate node.

      However, a weakness of the current set-up is that it is highly idealised. To improve the overall impact and biological relevance of this work more careful analysis and simulations would be needed.

    2. Reviewer #1 (Public Review):

      The authors consider the effects of the cilium geometry and motility on its performance in detecting chemicals in the surrounding fluid. They begin by presenting a classic solution of the diffusion equation in an infinite fluid domain at rest, bounded internally by a single cilium. The cilium is modeled as a cylinder of finite length and perfectly absorbing boundary. They compare the capture rate of ambient chemicals at the cilium boundary to that of an absorbing circular patch on a reflecting wall of similar surface area. The latter is another classic solution of the diffusion equation. They find that the capture rate by the cilium exceeds the capture rate by the circular patch. Then, they solve the advection-diffusion equation around the cilium numerically, assuming perfectly absorbing boundary conditions along the cilium and reflecting boundary conditions on the wall. They apply this numerical framework to cases (i) where cilium is at rest in an external shear flow, (ii) where the cilium is actively beating, and (iii) where a bundle of hydrodynamically-interacting cilia are either at rest or actively beating. They observe an increase in capture rate when shear flows and motility are accounted for.

    3. Evaluation Summary:

      The authors consider how the geometry and motility of cilia affect their performance in detecting chemicals in the surrounding fluid. Based on a theoretical model, the authors suggest that the distinctive elongated shape of a cilium may be coupled to its sensory function. The conjectures presented in this work are likely to be of interest to a wide readership, but whether this actually applies to real biological systems requires more careful validation.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This report examines the mechanism by which the KSHV KaposinB (KapB) protein causes disassembly of processing bodies (PBs) in HUVECs. The authors show that the oncogenic transcription factor YAP is an important component in the signaling pathway of KapB of the oncogenic herpesvirus Kaposi's Sarcoma herpesvirus, which involves the host cell GTPase RhoA, leading to disassembly of processing bodies (PBs).

    1. Reviewer #2 (Public Review):

      In this manuscript, McLeod and Gandon present a thorough mathematical modeling framework to describe the evolution of multi-drug resistance (MDR) in microbial populations. By expressing the model in terms of linkage disequilibrium, the equations take on a form that make it easier to identify the key drivers of MDR evolution and propagation. This work helps to unify and generalize previous studies and constitutes an important advance in our understanding of microbial population dynamics.

    2. Reviewer #1 (Public Review):

      In this manuscript, McLeod and Gandon propose a framework for understanding multidrug resistance (MDR) evolution in a structured population in terms of linkage disequilibrium (LD) dynamics, and apply this framework to three concrete examples of MDR evolution. I was asked to evaluate this manuscript, as well as the authors' response to comments from previous reviewers. My expertise is in epidemiological modelling of antibiotic resistance; I am not hugely familiar with population genetics.

      Overall, I think the authors address an important and interesting question, and I think the approach has the potential to generate valuable insights. I also think the authors addressed the previous reviewers' comments well. However, I have substantial concerns about the modelling framework and the interpretation of the results. In particular: i) there are some problems with the interpretation that LD arises from variation in susceptible density; ii) presenting these results as a re-interpretation and generalisation of Lehtinen et al. 2019 is incorrect; and iii) the modelling of additive transmission costs needs further thought/explanation.

      1) Interpretation of results and re-interpretation of Lehtinen et al. 2019.

      The authors present their results as a generalisation of the effect observed in Lehtinen et al. 2019. Both models show that variation in the strength of selection for resistance between populations can give rise to LD in a model of multiple resistances. In Lehtinen et al., this variation in selection is attributed to variation in clearance rate. The authors re-interpreting the effect as arising from variation in susceptible density instead. This re-interpretation is incorrect: the change in how costs of resistance are modelled (additive here, multiplicative in Lehtinen et al.) changes the evolutionary dynamics, so the two models capture different evolutionary effects. (See points 2 and 3 for further discussion of additive vs multiplicative costs).

      One way to see this is to consider a simple model of single resistance as presented in Lehtinen et al. eqn 1, in which resistance is selected for when: B_r/a_r > B_s/(a_s + tau), where "B" is the transmission rate, "a" the clearance rate and tau the treatment rate. Re-arranging for tau shows how the threshold of selection for resistance depends on the strain's properties (B and a) under different assumptions about cost. With an additive cost in transmission (i.e. B_r = B_s - c), this threshold depends on both transmission rate and clearance rate, predicting LD if populations vary in either transmissibility or duration of carriage. With an additive cost in clearance, this threshold is independent of the strain's properties, predicting no LD. These are precisely the results the authors describe lines 268-277 and Figure 3.

      However, if the costs are multiplicative, this threshold depends on clearance rate only, whether costs are modelled as part of clearance or transmission rate. This is why the model in Lehtinen et al. 2019 predicts LD when populations vary in duration of carriage, even when there is no transmission cost. The author's re-interpretation of the effect in Lehtinen et al. as arising from variation in the density of susceptibles, contingent on an explicit transmission cost, is therefore not correct. More generally, representing one model as a generalisation of the other is misleading.

      I am also not sure about the authors' interpretation that the effects in the model with additive costs arise from variation in susceptible density. Variation in the density of susceptibles can also be generated by variation in the overall population density, so if I understand correctly, this interpretation would predict that LD would arise if the population density was different between populations? And that the selective pressure on single resistance would also depend on overall population density (argument stating line 261)? I am not able to reproduce this dependence of population density in a simple model. I would instead interpret the effect the authors observe as arising because the same additive transmission cost is much more significant if the baseline transmission rate is low (e.g. with c = 1, a strain with B_s 1 would never evolve resistance because B_r would be 0, which would not be the case for a strain with baseline transmission rate B_s = 3).

      The problem with the interpretation in terms of susceptible density is clear in the section on serotype dynamics. The main text refers to serotype-specific susceptibles (S^x) (line 303) and explains observed effects in terms of variation in S^x. In the supporting information however, the authors present a model of serotype dynamics which does not have serotype-specific susceptible classes and the pool of susceptibles is the same for all serotypes (eqn 43). While I absolutely agree this is a better model to study transient effects than introducing a serotype-specific susceptible class, I don't understand what the authors mean by serotype-specific susceptible density in the main text.

      2) The use of an additive transmission cost

      The use of an additive transmission cost requires further consideration/discussion. An additive transmission cost is difficult to interpret epidemiologically and can lead to implausible consequences. For example, if costs are high enough compared to baseline transmission rate, additive costs with no epistasis would lead to a negative transmission rate for the dually resistant strains, which does not make sense (say B_ab = 2 and B_Ab = B_aB = 0.5, then B_AB = -1).

      3) Why is epistasis defined in terms of an additive rather than multiplicative expectation?

      I also have quite a basic question about the overall framework (eqn. 2). In the modelling framework, epistasis is the difference between the actual per capita growth rate of the dually-resistant infections and the expected growth rate, defined as the sum of the difference between the growth rates of the singly-resistant infections and the baseline rate. It was not obvious to me whether the expectation needs to be additive, or whether this is a question of definition (could the expectation be defined, for example, as a multiplicative rather than additive effect?). In particular, I was wondering about this in the context of the authors' suggestion that multiplicative costs are problematic because they give rise to epistasis - this seemed a little tautological to me because epistasis has been specifically defined as deviation from an additive expectation. I think a discussion about why epistasis is defined in terms of additive effects, and the implications for the derivation of the dynamics of D, would be very interesting and also helpful in making the paper more accessible.

    3. Evaluation Summary:

      This paper addresses the important question of multidrug resistance evolution, which is of both theoretical and applied interest. The authors efforts to carefully distinguish population and metapopulation linkage disequilibrium and to develop a framework to rigorously analyze the relationship between the two has promise, although we have noted concerns about the modeling framework used and results interpretation. If these concerns can be sufficiently addressed, then this paper has the potential to represent a clear advance in our understanding of microbial population dynamics.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript Rao et al. describe an interesting relationship between KSR1 and the translation regulation of EPSTI1 (a regulator of EMT). They identified this relationship by polysome RNAseq of CRC cells in the context of KSR1 knockdown (KD) which they confirm by polysome QPCR. They then go on to show that KSR KD and add back influences EPSTI1 expression at the protein but not mRNA level and impacts cell viability, anchorage-independent growth, and possibly cell migration. They focus on the cell migration phenotype and show that it is associated with changes in EMT-related genes including E-cad and N-cad. Interestingly, add back of EPSTI1 can reverse the phenotype elicited by KSR1 deletion. Overall, this story is interesting and translation regulation by KSR1 has not been described previously. However, Rao et al. do not provide a mechanism for how KSR1 regulates the translation of EPSTI1, and it is unclear whether this occurs through eIF4E, as the authors suggest.

      We agree completely that our observation that KSR1-dependent ERK regulation of EPSTI1 to promote an EMT-like phenotype raises new questions regarding how the translation of EPSTI1 mRNA is regulated. An additional intriguing question that arises from out work is how this relatively nondescript protein enhances the E- to N-cadherin switch in the colon cancer cells. Multiple possibilities (e.g., altered RNA processing or ribosome heterogeneity) may mediate ERK-dependent regulation of EPSTI1 translation and induction of the cadherin switch. RNA-binding proteins affect discrete cell behaviors, including motility and invasion, by selectively regulating pre-mRNA splicing, mRNA stability, and localization. However, it is hard to imagine a general mechanism involving ERK-mediated regulation of 4E-BP1 and eIF4E, which would affect global mRNA translation, as responsible for a selective effect EPSTI1 mRNA translation and discrete components of EMT-like behavior. Indeed, while KSR1 disruption and ERK inhibition potently suppressed EPSTI1 translation, robust inhibition of mTOR signaling had little effect on EPSTI1. Further development of the detailed cellular mechanisms and critical regulators mediating translation- dependent EMT-like behavior should now be possible.

      Reviewer #2 (Public Review):

      KSR1 functions as a critical rheostat to fine-tune MAPK signalling, and identifying modes by which its over-expression promotes tumor progression is clinically important and potentially druggable. Ras is highly mutated in CRC and unfortunately inhibitors of Ras have been challenging to develop. However, small molecules which stabilize an inactive form of the KSR are actively being developed in an attempt to repress RAS signaling. Thus, this study, which seeks to identify how KSR1 promotes oncogenic mRNA translation, is potentially highly clinically relevant, as it may identify novel druggable targets.

      In this manuscript the authors performed polysome profiling in colorectal cancer (CRC) cells and proposed that KSR1 and ERK regulate the translation of EPSTI1 mRNA. They go on to characterize the phenotypes associated with knock-down or knock-out of KSR1 in CRC, and show that their defects in invasion, anchorage-independent growth and switch to a less EMT-like phenotype are all EPSTI1-dependent.

      The authors succeeded in providing ample in vitro data that KSR1 and EPSTI1 are potential therapeutic targets in CRC. However, the data demonstrating that KSR1 and ERK regulate EPSTI1 mRNA translation is tenuous. Although the authors state that "EPSTI1 is necessary and sufficient for EMT in CRC cells", the data presented are consistent with a more restrained conclusion of a partial-EMT and not EMT per se. Finally, without an in vivo model it is difficult to glean novel insight into the mechanism by which KSR1 and/or EPSTI1 control the invasive and metastatic behaviour of cells.

      We greatly appreciate your comments and are excited about the implications of KSR1-EPSTI1 signaling in promoting the EMT-like phenotype in colon cancer cell lines. We have corrected the use of term ‘EMT’ to ‘EMT-like phenotype’ within the text of the manuscript. We recognize the limitations of using only in vitro data to demonstrate the role of KSR1 and EPSTI1 in promoting motility and invasion in colon cancer cells. In vivo studies will be invaluable to our future efforts to determine the extent to which EPSTI1 promotes metastatic behavior in colon tumors.

      Reviewer #3 (Public Review):

      It is established that Kinase suppressor of Ras 1 (KSR1) contributes to the oncogenic actions of Ras by promoting ERK activation. However, the downstream actions of this pathway are poorly understood. Here Rao et al. demonstrate that this KSR1-dependent pathway increases translation of Epithelial-Stromal Interaction-1 (EPSTI1) mRNA and expression of EPSTI1 protein. This is significant because EPSTI1 drives aspects of EMT, including expression of ZEB1, SLUG, and N-Cadherin. The analysis is thorough and includes both loss-of-function and gain-of-function studies. Overall, the conclusions of this study are convincing and advance our understanding of cancer development.

      We appreciate the positive feedback, and we are excited on implications of our findings on translation regulation of KSR1 on EPSTI1.

    1. Reviewer #3 (Public Review):

      Meier et al. used electroencephalography (EEG) to test the mechanism underlying a well-known phenomenon where stress induces subjects to behave in a more habitual way during decision-making, as opposed to using a more deliberative goal-directed strategy. The authors tested two groups of human subjects who were randomly assigned to a stress manipulation or a similar control manipulation. These participants then carried out a reinforcement learning task where they had to choose between two alternative responses to a stimulus. On some blocks the value of one response would be 'devalued' such that the alternative action would be more appropriate. Participants who went through the stress manipulation were more likely to persist with an action that previously yielded a high reward outcome even when this response had been devalued - indicative of a failure in goal-directed decision-making. Critically, the authors associated responses and outcomes with stimuli that were decodable from EEG signals, making it possible to evaluate whether participants were prospectively considering the correct response or outcome prior to committing a response or receiving feedback. Meier et al. find that, over time, the stressed participants came to prospectively represent the coming response more and the outcome less, while the control group showed reduced prospective representation of the response. The degree of this change toward greater representation of responses versus outcomes across participants was also correlated with a more habit-based decision strategy in devaluation trials.

      Overall, this is a well-designed and sophisticated study that makes an important contribution to our understanding of the mechanism by which stress promotes more habit-like behavior, with broad implications for our understanding of how maladaptive behaviors might be formed in many clinical conditions. The conclusions are well supported by the data and confidence in the results is bolstered by several additional control measurements. However, I would have appreciated more effort to link this work to other related literature, as well as some more detail in some parts of the methods and additional control analyses to rule out alternative explanations for some of the main results of interest.

    2. Reviewer #2 (Public Review):

      A number of psychological states and traits have been demonstrated to render behavior under goal-directed or habitual control, stress being one of them. In this paper, using electroencephalography, the authors investigated the neural representations of stimulus, responses and outcomes in a task whose aim was to distinguish between the two types of behavioral control. By training a classifier to distinguish between neural signals related to the representations of instrumental responses and the outcomes produced by those responses, the authors found that during the last block of the experiment (after more extended training in the task), signals for outcome representations were weaker and response representations stronger in a stress-induced group compared to a control group. This is consistent with the idea that habits are performed when there is a stronger link between stimuli and responses that does not require a representation of the outcomes that follow from behavior. Although the methods of this paper are sound and the idea interesting and relevant for the current state of the art in habit research, it is not clear if the underlying theoretical contribution it should motivate is supported by the data produced by the experimental design employed by the authors.

    3. Reviewer #1 (Public Review):

      The authors used EEG-based multivariate pattern analysis and acute stress induction to assess the neural representations mediating a previously demonstrated influence of stress on the balance between goal-directed and habitual responding. They found that stress reduced neural outcome representations and enhanced response representations - results that are consistent with associative structures thought to mediate goal-directed and habitual response strategies, respectively. The study addresses an important and open question, and the combination of clinical, neural and behavioral assays is appealing. However, the interpretability, and thus impact, is threatened by an apparent lack of temporal synchrony between relevant measures, and by the potential effects of social feedback.

      Specifically, it is hard to understand how neural and behavioral devaluation differences between groups can be stress related given that they emerge at a point when differences in stress measures (e.g., cortisol) are no longer present. It seems more likely that, at the time when devaluation insensitivity became more pronounced in the stress group, this group was being released from stress, perhaps experiencing corollary fatigue or buoyancy.

      Another concern is that it is unclear whether the "Error" feedback screen was being employed during devaluation blocks. This is important, because most human psychology experiments use accuracy as the only incentive, and it appears to be a pretty effective motivator. Given that participants in the stress group had just been subjected to an aversive social stressor, they might have found the socially relevant error feedback more painful than the relatively minor response cost.

    4. Evaluation Summary:

      The authors used EEG-based multivariate pattern analysis and acute stress induction to assess the neural representations mediating a previously demonstrated influence of stress on the balance between goal-directed and habitual responding. While the results should be of interest to a wide range of neuroscientists, the temporal alignment of clinical, behavioral, and neural measures somewhat obscures the underlying causal mechanisms.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Author Response:

      Evaluation Summary:

      Since DBS of the habenula is a new treatment, these are the first data of its kind and potentially of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap. This paper is of interest to neuroscientists studying emotions and clinicians treating psychiatric disorders. Specifically the paper shows that the habenula is involved in processing of negative emotions and that it is synchronized to the prefrontal cortex in the theta band. These are important insights into the electrophysiology of emotion processing in the human brain.

      The authors are very grateful for the reviewers’ positive comments on our study. We also thank all the reviewers for the comments which has helped to improve the manuscript.

      Reviewer #1 (Public Review):

      The study by Huang et al. report on direct recordings (using DBS electrodes) from the human habenula in conjunction with MEG recordings in 9 patients. Participants were shown emotional pictures. The key finding was a transient increase in theta/alpha activity with negative compared to positive stimuli. Furthermore, there was a later increase in oscillatory coupling in the same band. These are important data, as there are few reports of direct recordings from the habenula together with the MEG in humans performing cognitive tasks. The findings do provide novel insight into the network dynamics associated with the processing of emotional stimuli and particular the role of the habenula.

      Recommendations:

      How can we be sure that the recordings from the habenula are not contaminated by volume conduction; i.e. signals from neighbouring regions? I do understand that bipolar signals were considered for the DBS electrode leads. However, high-frequency power (gamma band and up) is often associated with spiking/MUA and considered less prone to volume conduction. I propose to also investigate that high-frequency gamma band activity recorded from the bipolar DBS electrodes and relate to the emotional faces. This will provide more certainty that the measured activity indeed stems from the habenula.

      We thank the reviewer for the comment. As the reviewer pointed out, bipolar macroelectrode can detect locally generated potentials, as demonstrated in the case of recordings from subthalamic nucleus and especially when the macroelectrodes are inside the subthalamic nucleus (Marmor et al., 2017). However, considering the size of the habenula and the size of the DBS electrode contacts, we have to acknowledge that we cannot completely exclude the possibility that the recordings are contaminated by volume conduction of activities from neighbouring areas, as shown in Bertone-Cueto et al. 2019. We have now added extra information about the size of the habenula and acknowledged the potential contamination of activities from neighbouring areas through volume conduction in the ‘Limitation’:

      "Another caveat we would like to acknowledge that the human habenula is a small region. Existing data from structural MRI scans reported combined habenula (the sum of the left and right hemispheres) volumes of ~ 30–36 mm3 (Savitz et al., 2011a; Savitz et al., 2011b) which means each habenula has the size of 2~3 mm in each dimension, which may be even smaller than the standard functional MRI voxel size (Lawson et al., 2013). The size of the habenula is also small relative to the standard DBS electrodes (as shown in Fig. 2A). The electrodes used in this study (Medtronic 3389) have electrode diameter of 1.27 mm with each contact length of 1.5 mm, and contact spacing of 0.5 mm. We have tried different ways to confirm the location of the electrode and to select the contacts that is within or closest to the habenula: 1.) the MRI was co-registered with a CT image (General Electric, Waukesha, WI, USA) with the Leksell stereotactic frame to obtain the coordinate values of the tip of the electrode; 2.) Post-operative CT was co-registered to pre-operative T1 MRI using a two-stage linear registration using Lead-DBS software. We used bipolar signals constructed from neighbouring macroelectrode recordings, which have been shown to detect locally generated potentials from subthalamic nucleus and especially when the macroelectrodes are inside the subthalamic nucleus (Marmor et al., 2017). Considering that not all contacts for bipolar LFP construction are in the habenula in this study, as shown in Fig. 2, we cannot exclude the possibility that the activities we measured are contaminated by activities from neighbouring areas through volume conduction. In particular, the human habenula is surrounded by thalamus and adjacent to the posterior end of the medial dorsal thalamus, so we may have captured activities from the medial dorsal thalamus. However, we also showed that those bipolar LFPs from contacts in the habenula tend to have a peak in the theta/alpha band in the power spectra density (PSD); whereas recordings from contacts outside the habenula tend to have extra peak in beta frequency band in the PSD. This supports the habenula origin of the emotional valence related changes in the theta/alpha activities reported here."

      We have also looked at gamma band oscillations or high frequency activities in the recordings. However, we didn’t observe any peak in high frequency band in the average power spectral density, or any consistent difference in the high frequency activities induced by the emotional stimuli (Fig. S1). We suspect that high frequency activities related to MUA/spiking are very local and have very small amplitude, so they are not picked up by the bipolar LFPs measured from contacts with both the contact area for each contact and the between-contact space quite large comparative to the size of the habenula.

      A

      B

      Figure S1. (A) Power spectral density of habenula LFPs across all time period when emotional stimuli were presented. The bold blue line and shadowed region indicates the mean ± SEM across all recorded hemispheres and the thin grey lines show measurements from individual hemispheres. (B) Time-frequency representations of the power response relative to pre-stimulus baseline for different conditions showing habenula gamma and high frequency activity are not modulated by emotional

      References:

      Savitz JB, Bonne O, Nugent AC, Vythilingam M, Bogers W, Charney DS, et al. Habenula volume in post-traumatic stress disorder measured with high-resolution MRI. Biology of Mood & Anxiety Disorders 2011a; 1(1): 7.

      Savitz JB, Nugent AC, Bogers W, Roiser JP, Bain EE, Neumeister A, et al. Habenula volume in bipolar disorder and major depressive disorder: a high-resolution magnetic resonance imaging study. Biological Psychiatry 2011b; 69(4): 336-43.

      Lawson RP, Drevets WC, Roiser JP. Defining the habenula in human neuroimaging studies. NeuroImage 2013; 64: 722-7.

      Marmor O, Valsky D, Joshua M, Bick AS, Arkadir D, Tamir I, et al. Local vs. volume conductance activity of field potentials in the human subthalamic nucleus. Journal of Neurophysiology 2017; 117(6): 2140-51.

      Bertone-Cueto NI, Makarova J, Mosqueira A, García-Violini D, Sánchez-Peña R, Herreras O, et al. Volume-Conducted Origin of the Field Potential at the Lateral Habenula. Frontiers in Systems Neuroscience 2019; 13:78.

      Figure 3: the alpha/theta band activity is very transient and not band-limited. Why refer to this as oscillatory? Can you exclude that the TFRs of power reflect the spectral power of ERPs rather than modulations of oscillations? I propose to also calculate the ERPs and perform the TFR of power on those. This might result in a re-interpretation of the early effects in theta/alpha band.

      We agree with the reviewer that the activity increase in the first time window with short latency after the stimuli onset is very transient and not band-limited. This raise the question that whether this is oscillatory or a transient evoked activity. We have now looked at this initial transient activity in different ways: 1.) We quantified the ERP in LFPs locked to the stimuli onset for each emotional valence condition and for each habenula. We investigated whether there was difference in the amplitude or latency of the ERP for different stimuli emotional valence conditions. As showing in the following figure, there is ERP with stimuli onset with a positive peak at 402 ± 27 ms (neutral stimuli), 407 ± 35 ms (positive stimuli), 399 ± 30 ms (negative stimuli). The flowing figure (Fig. 3–figure supplement 1) will be submitted as figure supplement related to Fig. 3. However, there was no significant difference in ERP latency or amplitude caused by different emotional valence stimuli. 2.) We have quantified the pure non-phase-locked (induced only) power spectra by calculating the time-frequency power spectrogram after subtracting the ERP (the time-domain trial average) from time-domain neural signal on each trial (Kalcher and Pfurtscheller, 1995; Cohen and Donner, 2013). This shows very similar results as we reported in the main manuscript, as shown in Fig. 3–figure supplement 2. These further analyses show that even though there were event related potential changes time locked around the stimuli onset, and this ERP did NOT contribute to the initial broad-band activity increase at the early time window shown in plot A-C in Figure 3. The figures of the new analyses and following have now been added in the main text:

      "In addition, we tested whether stimuli-related habenula LFP modulations primarily reflect a modulation of oscillations, which is not phase-locked to stimulus onset, or, alternatively, if they are attributed to evoked event-related potential (ERP). We quantified the ERP for each emotional valence condition for each habenula. There was no significant difference in ERP latency or amplitude caused by different emotional valence stimuli (Fig. 3–figure supplement 1). In addition, when only considering the non phase-locked activity by removing the ERP from the time series before frequency-time decomposition, the emotional valence effect (presented in Fig. 3–figure supplement 2) is very similar to those shown in Fig.3. These additional analyses demonstrated that the emotional valence effect in the LFP signal is more likely to be driven by non-phase-locked (induced only) activity."

      A

      B

      Fig. 3–figure supplement 1. Event-related potential (ERP) in habenula LFP signals in different emotional valence (neutral, positive and negative) conditions. (A) Averaged ERP waveforms across patients for different conditions. (B) Peak latency and amplitude (Mean ± SEM) of the ERP components for different conditions.

      Fig. 3–figure supplement 2. Non-phase-locked activity in different emotional valence (neutral, positive and negative) conditions (N = 18). (A) Time-frequency representation of the power changes relative to pre-stimulus baseline for three conditions. Significant clusters (p < 0.05, non-parametric permutation test) are encircled with a solid black line. (B) Time-frequency representation of the power response difference between negative and positive valence stimuli, showing significant increased activity the theta/alpha band (5-10 Hz) at short latency (100-500 ms) and another increased theta activity (4-7 Hz) at long latencies (2700-3300 ms) with negative stimuli (p < 0.05, non-parametric permutation test). (C) Normalized power of the activities at theta/alpha (5-10 Hz) and theta (4-7 Hz) band over time. Significant difference between the negative and positive valence stimuli is marked by a shadowed bar (p < 0.05, corrected for multiple comparison).

      References:

      Kalcher J, Pfurtscheller G. Discrimination between phase-locked and non-phase-locked event-related EEG activity. Electroencephalography and Clinical Neurophysiology 1995; 94(5): 381-4.

      Cohen MX, Donner TH. Midfrontal conflict-related theta-band power reflects neural oscillations that predict behavior. Journal of Neurophysiology 2013; 110(12): 2752-63.

      Figure 4D: can you exclude that the frontal activity is not due to saccade artifacts? Only eye blink artifacts were reduced by the ICA approach. Trials with saccades should be identified in the MEG traces and rejected prior to further analysis.

      We understand and appreciate the reviewer’s concern on the source of the activity modulations shown in Fig. 4D. We tried to minimise the eye movement or saccade in the recording by presenting all figures at the centre of the screen, scaling all presented figures to similar size, and presenting a white cross at the centre of the screen preparing the participants for the onset of the stimuli. Despite this, participants my still make eye movements and saccade in the recording. We used ICA to exclude the low frequency large amplitude artefacts which can be related to either eye blink or other large eye movements. However, this may not be able to exclude artefacts related to miniature saccades. As shown in Fig. 4D, on the sensor level, the sensors with significant difference between the negative vs. positive emotional valence condition clustered around frontal cortex, close to the eye area. However, we think this is not dominated by saccades because of the following two reasons:

      1.) The power spectrum of the saccadic spike artifact in MEG is characterized by a broadband peak in the gamma band from roughly 30 to 120 Hz (Yuval-Greenberg et al., 2008; Keren et al., 2010). In this study the activity modulation we observed in the frontal sensors are limited to the theta/alpha frequency band, so it is different from the power spectra of the saccadic spike artefact.

      2.) The source of the saccadic spike artefacts in MEG measurement tend to be localized to the region of the extraocular muscles of both eyes (Carl et al., 2012).We used beamforming source localisation to identify the source of the activity modulation reported in Fig. 4D. This beamforming analysis identified the source to be in the Broadmann area 9 and 10 (shown in Fig. 5). This excludes the possibility that the activity modulation in the sensor level reported in Fig. 4D is due to saccades. In addition, Broadman area 9 and 10, have previously been associated with emotional stimulus processing (Bermpohl et al., 2006), Broadman area 9 in the left hemisphere has also been used as the target for repetitive transcranial magnetic stimulation (rTMS) as a treatment for drug-resistant depression (Cash et al., 2020). The source localisation results, together with previous literature on the function of the identified source area suggest that the activity modulation we observed in the frontal cortex is very likely to be related to emotional stimuli processing.

      References:

      Yuval-Greenberg S, Tomer O, Keren AS, Nelken I, Deouell LY. Transient induced gamma-band response in EEG as a manifestation of miniature saccades. Neuron 2008; 58(3): 429-41.

      Keren AS, Yuval-Greenberg S, Deouell LY. Saccadic spike potentials in gamma-band EEG: characterization, detection and suppression. NeuroImage 2010; 49(3): 2248-63.

      Carl C, Acik A, Konig P, Engel AK, Hipp JF. The saccadic spike artifact in MEG. NeuroImage 2012; 59(2): 1657-67.

      Bermpohl F, Pascual-Leone A, Amedi A, Merabet LB, Fregni F, Gaab N, et al. Attentional modulation of emotional stimulus processing: an fMRI study using emotional expectancy. Human Brain Mapping 2006; 27(8): 662-77.

      Cash RFH, Weigand A, Zalesky A, Siddiqi SH, Downar J, Fitzgerald PB, et al. Using Brain Imaging to Improve Spatial Targeting of Transcranial Magnetic Stimulation for Depression. Biological Psychiatry 2020.

      The coherence modulations in Fig 5 occur quite late in time compared to the power modulations in Fig 3 and 4. When discussing the results (in e.g. the abstract) it reads as if these findings are reflecting the same process. How can the two effect reflect the same process if the timing is so different?

      As the reviewer pointed out correctly, the time window where we observed the coherence modulations happened quite late in time compared to the initial power modulations in the frontal cortex and the habenula (Fig. 4). And there was another increase in the theta band activities in the habenula area even later, at around 3 second after stimuli onset when the emotional figure has already disappeared. Emotional response is composed of a number of factors, two of which are the initial reactivity to an emotional stimulus and the subsequent recovery once the stimulus terminates or ceases to be relevant (Schuyler et al., 2014). We think these neural effects we observed in the three different time windows may reflect different underlying processes. We have discussed this in the ‘Discussion’:

      "These activity changes at different time windows may reflect the different neuropsychological processes underlying emotion perception including identification and appraisal of emotional material, production of affective states, and autonomic response regulation and recovery (Phillips et al., 2003a). The later effects of increased theta activities in the habenula when the stimuli disappeared were also supported by other literature showing that, there can be prolonged effects of negative stimuli in the neural structure involved in emotional processing (Haas et al., 2008; Puccetti et al., 2021). In particular, greater sustained patterns of brain activity in the medial prefrontal cortex when responding to blocks of negative facial expressions was associated with higher scores of neuroticism across participants (Haas et al., 2008). Slower amygdala recovery from negative images also predicts greater trait neuroticism, lower levels of likability of a set of social stimuli (neutral faces), and declined day-to-day psychological wellbeing (Schuyler et al., 2014; Puccetti et al., 2021)."

      References:

      Schuyler BS, Kral TR, Jacquart J, Burghy CA, Weng HY, Perlman DM, et al. Temporal dynamics of emotional responding: amygdala recovery predicts emotional traits. Social Cognitive and Affective Neuroscience 2014; 9(2): 176-81.

      Phillips ML, Drevets WC, Rauch SL, Lane R. Neurobiology of emotion perception I: The neural basis of normal emotion perception. Biological Psychiatry 2003a; 54(5): 504-14.

      Haas BW, Constable RT, Canli T. Stop the sadness: Neuroticism is associated with sustained medial prefrontal cortex response to emotional facial expressions. NeuroImage 2008; 42(1): 385-92.

      Puccetti NA, Schaefer SM, van Reekum CM, Ong AD, Almeida DM, Ryff CD, et al. Linking Amygdala Persistence to Real-World Emotional Experience and Psychological Well-Being. Journal of Neuroscience 2021: JN-RM-1637-20.

      Be explicit on the degrees of freedom in the statistical tests given that one subject was excluded from some of the tests.

      We thank the reviewers for the comment. The number of samples used for each statistics analysis are stated in the title of the figures. We have now also added the degree of freedom in the main text when parametric statistical tests such as t-test or ANOVAs have been used. When permutation tests (which do not have any degrees of freedom associated with it) are used, we have now added the number of samples for the permutation test.

      Reviewer #2 (Public Review):

      In this study, Huang and colleagues recorded local field potentials from the lateral habenula in patients with psychiatric disorders who recently underwent surgery for deep brain stimulation (DBS). The authors combined these invasive measurements with non-invasive whole-head MEG recordings to study functional connectivity between the habenula and cortical areas. Since the lateral habenula is believed to be involved in the processing of emotions, and negative emotions in particular, the authors investigated whether brain activity in this region is related to emotional valence. They presented pictures inducing negative and positive emotions to the patients and found that theta and alpha activity in the habenula and frontal cortex increases when patients experience negative emotions. Functional connectivity between the habenula and the cortex was likewise increased in this band. The authors conclude that theta/alpha oscillations in the habenula-cortex network are involved in the processing of negative emotions in humans.

      Because DBS of the habenula is a new treatment tested in this cohort in the framework of a clinical trial, these are the first data of its kind. Accordingly, they are of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap.

      In terms of community impact, I see the strengths of this paper in basic science rather than the clinical field. The authors demonstrate the involvement of theta oscillations in the habenula-prefrontal cortex network in emotion processing in the human brain. The potential of theta oscillations to serve as a marker in closed-loop DBS, as put forward by the authors, appears less relevant to me at this stage, given that the clinical effects and side-effects of habenula DBS are not known yet.

      We thank the reviewers for the favourable comments about the implication of our study in basic science and about the value of our study in closing a knowledge gap. We agree that further studies would be required to make conclusions about the clinical effects and side-effects of habenula DBS.

      Detailed comments:

      The group-average MEG power spectrum (Fig. 4B) suggests that negative emotions lead to a sustained theta power increase and a similar effect, though possibly masked by a visual ERP, can be seen in the habenula (Fig. 3C). Yet the statistics identify brief elevations of habenula theta power at around 3s (which is very late), a brief elevation of prefrontal power a time 0 or even before (Fig. 4C) and a brief elevation of Habenula-MEG theta coherence around 1 s. It seems possible that this lack of consistency arises from a low signal-to-noise ratio. The data contain only 27 trails per condition on average and are contaminated by artifacts caused by the extension wires.

      With regard to the nature of the activity modulation with short latency after stimuli onset: whether this is an ERP or oscillation? We have now investigated this. In summary, by analysing the ERP and removing the influence of the ERP from the total power spectra, we didn’t observe stimulus emotional valence related modulation in the ERP, and the modulation related to emotional valence in the pure induced (non-phase-locked) power spectra was similar to what we have observed in the total power shown in Fig. 3. Therefore, we argue that the theta/alpha increase with negative emotional stimuli we observed in both habenula and prefrontal cortex 0-500 ms after stimuli onset are not dominated by visual or other ERP.

      With regard to the signal-to-noise ratio from only 27 trials per condition on average per participant: We have tried to clean the data by removing the trials with obvious artefacts characterised by increased measurements in the time domain over 5 times the standard deviation and increased activities across all frequency bands in the frequency domain. After removing the trials with artefacts, we have 27 trials per condition per subject on average. We agree that 27 trials per condition on average is not a high number, and increasing the number of trials would further increase the signal-to-noise ratio. However, our studies with EEG recordings and LFP recordings from externalised patients have shown that 30 trials was enough to identify reduction in the amplitude of post-movement beta oscillations at the beginning of visuomotor adaption in the motor cortex and STN (Tan et al., 2014a; Tan et al., 2014b). These results of motor error related modulation in the post-movement beta have been repeated by other studies from other groups. In Tan et al. 2014b, with simultaneous EEG and STN LFP measurements and a similar number of trials (around 30), we also quantified the time-course of STN-motor cortex coherence during voluntary movements. This pattern has also been repeated in a separate study from another group with around 50 trials per participant (Talakoub et al., 2016). In addition, similar behavioural paradigm (passive figure viewing paradigm) has been used in two previous studies with LFP recordings from STN from different patient groups (Brucke et al., 2007; Huebl et al., 2014). In both studies, a similar number of trials per condition around 27 was used. The authors have identified meaningful activity modulation in the STN by emotional stimuli. Therefore, we think the number of trials per condition was sufficient to identify emotional valence induced difference in the LFPs in the paradigm.

      We agree that the measurement of coherence can be more susceptible to noise and suffer from the reduced signal-to-noise ratio in MEG recording. In Hirschmann et al. 2013, 5 minutes of resting recording and 5 minutes of movement recording from 10 PD patients were used to quantify movement related changes in STN-cortical coherence and how this was modulated by levodopa (Hirschmann et al., 2013). Litvak et al. (2012) have identified movement-related changes in the coherence between STN LFP and motor cortex with recording with simultaneous STN LFP and MEG recordings from 17 PD patients and 20 trials in average per participant per condition (Litvak et al., 2012). With similar methods, van Wijk et al. (2017) used recordings from 9 patients and around on average in 29 trials per hand per condition, and they identified reduced cortico-pallidal coherence in the low-beta decreases during movement (van Wijk et al., 2017). So the trial number per condition participant we used in this study are comparable to previous studies.

      The DBS extension wires do reduce signal-to-noise ratio in the MEG recording. therefore the spatiotemporal Signal Space Separation (tSSS) method (Taulu and Simola, 2006) implemented in the MaxFilter software (Elekta Oy, Helsinki, Finland) has been applied in this study to suppress strong magnetic artifacts caused by extension wires. This method has been proved to work well in de-noising the magnetic artifacts and movement artifacts in MEG data in our previous studies (Cao et al., 2019; Cao et al., 2020). In addition, the beamforming method proposed by several studies (Litvak et al., 2010; Hirschmann et al., 2011; Litvak et al., 2011) has been used in this study. In Litvak et al., 2010, the artifacts caused by DBS extension wires was detailed described and the beamforming was demonstrated to effectively suppress artifacts and thereby enable both localization of cortical sources coherent with the deep brain nucleus. We have now added more details and these references about the data cleaning and the beamforming method in the main text. With the beamforming method, we did observe the standard movement-related modulation in the beta frequency band in the motor cortex with 9 trials of figure pressing movements, shown in the following figure for one patient as an example (Figure 5–figure supplement 1). This suggests that the beamforming method did work well to suppress the artefacts and help to localise the source with a low number of trials. The figure on movement-related modulation in the motor cortex in the MEG signals have now been added as a supplementary figure to demonstrate the effect of the beamforming.

      Figure 5–figure supplement 1. (A) Time-frequency maps of MEG activity for right hand button press at sensor level from one participant (Case 8). (B) DICS beamforming source reconstruction of the areas with movement-related oscillation changes in the range of 12-30 Hz. The peak power was located in the left M1 area, MNI coordinate [-37, -12, 43].

      References:

      Tan H, Jenkinson N, Brown P. Dynamic neural correlates of motor error monitoring and adaptation during trial-to-trial learning. Journal of Neuroscience 2014a; 34(16): 5678-88.

      Tan H, Zavala B, Pogosyan A, Ashkan K, Zrinzo L, Foltynie T, et al. Human subthalamic nucleus in movement error detection and its evaluation during visuomotor adaptation. Journal of Neuroscience 2014b; 34(50): 16744-54.

      Talakoub O, Neagu B, Udupa K, Tsang E, Chen R, Popovic MR, et al. Time-course of coherence in the human basal ganglia during voluntary movements. Scientific Reports 2016; 6: 34930.

      Brucke C, Kupsch A, Schneider GH, Hariz MI, Nuttin B, Kopp U, et al. The subthalamic region is activated during valence-related emotional processing in patients with Parkinson's disease. European Journal of Neuroscience 2007; 26(3): 767-74.

      Huebl J, Spitzer B, Brucke C, Schonecker T, Kupsch A, Alesch F, et al. Oscillatory subthalamic nucleus activity is modulated by dopamine during emotional processing in Parkinson's disease. Cortex 2014; 60: 69-81.

      Hirschmann J, Ozkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Differential modulation of STN-cortical and cortico-muscular coherence by movement and levodopa in Parkinson's disease. NeuroImage 2013; 68: 203-13.

      Litvak V, Eusebio A, Jha A, Oostenveld R, Barnes G, Foltynie T, et al. Movement-related changes in local and long-range synchronization in Parkinson's disease revealed by simultaneous magnetoencephalography and intracranial recordings. Journal of Neuroscience 2012; 32(31): 10541-53.

      van Wijk BCM, Neumann WJ, Schneider GH, Sander TH, Litvak V, Kuhn AA. Low-beta cortico-pallidal coherence decreases during movement and correlates with overall reaction time. NeuroImage 2017; 159: 1-8.

      Taulu S, Simola J. Spatiotemporal signal space separation method for rejecting nearby interference in MEG measurements. Physics in Medicine and Biology 2006; 51(7): 1759-68.

      Cao C, Huang P, Wang T, Zhan S, Liu W, Pan Y, et al. Cortico-subthalamic Coherence in a Patient With Dystonia Induced by Chorea-Acanthocytosis: A Case Report. Frontiers in Human Neuroscience 2019; 13: 163.

      Cao C, Li D, Zhan S, Zhang C, Sun B, Litvak V. L-dopa treatment increases oscillatory power in the motor cortex of Parkinson's disease patients. NeuroImage Clinical 2020; 26: 102255.

      Litvak V, Eusebio A, Jha A, Oostenveld R, Barnes GR, Penny WD, et al. Optimized beamforming for simultaneous MEG and intracranial local field potential recordings in deep brain stimulation patients. NeuroImage 2010; 50(4): 1578-88.

      Litvak V, Jha A, Eusebio A, Oostenveld R, Foltynie T, Limousin P, et al. Resting oscillatory cortico-subthalamic connectivity in patients with Parkinson's disease. Brain 2011; 134(Pt 2): 359-74.

      Hirschmann J, Ozkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Distinct oscillatory STN-cortical loops revealed by simultaneous MEG and local field potential recordings in patients with Parkinson's disease. NeuroImage 2011; 55(3): 1159-68.

      I doubt that the correlation between habenula power and habenula-MEG coherence (Fig. 6C) is informative of emotion processing. First, power and coherence in close-by time windows are likely to to be correlated irrespective of the task/stimuli. Second, if meaningful, one would expect the strongest correlation for the negative condition, as this is the only condition with an increase of theta coherence and a subsequent increase of theta power in the habenula. This, however, does not appear to be the case.

      The authors included the factors valence and arousal in their linear model and found that only valence correlated with electrophysiological effects. I suspect that arousal and valence scores are highly correlated. When fed with informative yet highly correlated variables, the significance of individual input variables becomes difficult to assess in many statistical models. Hence, I am not convinced that valence matters but arousal not.

      For the correlation shown in Fig. 6C, we used a linear mixed-effect modelling (‘fitlme’ in Matlab) with different recorded subjects as random effects to investigate the correlations between the habenula power and habenula-MEG coherence at an earlier window, while considering all trials together. Therefore the reported value in the main text and in the figure (k = 0.2434 ± 0.1031, p = 0.0226, R2 = 0.104) show the within subjects correlation that are consistent across all measured subjects. The correlation is likely to be mediated by emotional valence condition, as negative emotional stimuli tend to be associated with both high habenula-MEG coherence and high theta power in the later time window tend to happen in the trials with.

      The arousal scores are significantly different for the three valence conditions as shown in Fig. 1B. However, the arousal scores and the valence scores are not monotonically correlated, as shown in the following figure (Fig. S2). The emotional neutral figures have the lowest arousal value, but have the valence value sitting between the negative figures and the positive figures. We have now added the following sentence in the main text:

      "This nonlinear and non-monotonic relationship between arousal scores and the emotional valence scores allowed us to differentiate the effect of the valence from arousal."

      Table 2 in the main text show the results of the linear mixed-effect modelling with the neural signal as the dependent variable and the valence and arousal scores as independent variables. Because of the non-linear and non-monotonic relationship between the valence and arousal scores, we think the significance of individual input variables is valid in this statistical model. We have now added a new figure (shown below, Fig. 7) with scatter plots showing the relationship between the electrophysiological signal and the arousal and emotional valence scores separately using Spearman’s partial correlation analysis. In each scatter plot, each dot indicates the average measurement from one participant in one emotional valence condition. As shown in the following figure, the electrophysiological measurements linearly correlated with the valence score, but not with the arousal scores. However, the statistics reported in this figure considered all the dots together. The linear mixed effect modelling taking into account the interdependency of the measurements from the same participant. So the results reported in the main text using linear mixed effect modelling are statistically more valid, but supplementary figure here below illustrate the relationship.

      Figure S2. Averaged valence and arousal ratings (mean ± SD) for figures of the three emotional condition. (B) Scatter plots showing the relationship between arousal and valence scores for each emotional condition for each participant.

      Figure 7. Scatter plots showing how early theta/alpha band power increase in the frontal cortex (A), theta/alpha band frontal cortex-habenula coherence (B) and theta band power increase in habenula stimuli (C) changed with emotional valence (left column) and arousal (right column). Each dot shows the average of one participant in each categorical valence condition, which are also the source data of the multilevel modelling results presented in Table 2. The R and p value in the figure are the results of partial correlation considering all data points together.

      Page 8: "The time-varying coherence was calculated for each trial". This is confusing because coherence quantifies the stability of a phase difference over time, i.e. it is a temporal average, not defined for individual trials. It has also been used to describe the phase difference stability over trials rather than time, and I assume this is the method applied here. Typically, the greatest coherence values coincide with event-related power increases, which is why I am surprised to see maximum coherence at 1s rather than immediately post-stimulus.

      We thank the reviewer for pointing out this incorrect description. As the reviewer pointed out correctly, the method we used describe the phase difference stability over trials rather than time. We have now clarified how coherence was calculated and added more details in the methods:

      "The time-varying cross trial coherence between each MEG sensor and the habenula LFP was first calculated for each emotional valence condition. For this, time-frequency auto- and cross-spectral densities in the theta/alpha frequency band (5-10 Hz) between the habenula LFP and each MEG channel at sensor level were calculated using the wavelet transform-based approach from -2000 to 4000 ms for each trial with 1 Hz steps using the Morlet wavelet and cycle number of 6. Cross-trial coherence spectra for each LFP-MEG channel combination was calculated for each emotional valence condition for each habenula using the function ‘ft_connectivityanalysis’ in Fieldtrip (version 20170628). Stimulus-related changes in coherence were assessed by expressing the time-resolved coherence spectra as a percentage change compared to the average value in the -2000 to -200 ms (pre-stimulus) time window for each frequency."

      In the Morlet wavelet analysis we used here, the cycle number (C) determines the temporal resolution and frequency resolution for each frequency (F). The spectral bandwidth at a given frequency F is equal to 2F/C while the wavelet duration is equal to C/F/pi. We used a cycle number of 6. For theta band activities around 5 Hz, we will have the spectral bandwidth of 25/6 = 1.7 Hz and the wavelet duration of 6/5/pi = 0.38s = 380ms.

      As the reviewer noticed, we observed increased activities across a wide frequency band in both habenula and the prefrontal cortex within 500 ms after stimuli onset. But the increase of cross-trial coherence starts at around 300 ms. The increase of coherence in a time window without increase of power in either of the two structures indicates a phase difference stability across trials in the oscillatory activities from the two regions, and this phase difference stability across trials was not secondary to power increase.

      Reviewer #3 (Public Review):

      This paper describes the oscillatory activity of the habenula using local field potentials, both within the region and, through the use of MEG, in connection to the prefrontal cortex. The characteristics of this activity were found to vary with the emotional valence but not with arousal. Sheding light on this is relevant, because the habenula is a promising target for deep brain stimulation.

      In general, because I am not much on top of the literature on the habenula, I find difficult to judge about the novelty and the impact of this study. What I can say is that I do find the paper is well-written and very clear; and the methods, although quite basic (which is not bad), are sound and rigourous.

      We thank the reviewer for the positive comments about the potential implication of our study and on the methods we used.

      On the less positive side, even though I am aware that in this type of studies it is difficult to have high N, the very low N in this case makes me worry about the robustness and replicability of the results. I'm sure I have missed it and it's specified somewhere, but why is N different for the different figures? Is it because only 8 people had MEG? The number of trials seems also a somewhat low. Therefore, I feel the authors perhaps need to make an effort to make up for the short number of subjects in order to add confidence to the results. I would strongly recommend to bootstrap the statistical analysis and extract non-parametric confidence intervals instead of showing parametric standard errors whenever is appropriate. When doing that, it must be taken into account that each two of the habenula belong to the same person; i.e. one bootstraps the subjects not the habenula.

      We do understand and appreciate the concern of the reviewer on the low sample numbers due to the strict recruitment criteria for this very early stage clinical trial: 9 patients for bilateral habenula LFPs, and 8 patients with good quality MEGs. Some information to justify the number of trials per condition for each participant has been provided in the reply to the Detailed Comments 1 from Reviewer 2. The sample number used in each analysis was included in the figures and in the main text.

      We have used non-parametric cluster-based permutation approach (Maris and Oostenveld, 2007) for all the main results as shown in Fig. 3-5. Once the clusters (time window and frequency band) with significant differences for different emotional valence conditions have been identified, parametric statistical test was applied to the average values of the clusters to show the direction of the difference. These parametric statistics are secondary to the main non-parametric permutation test.

      In addition, the DICS beamforming method was applied to localize cortical sources exhibiting stimuli-related power changes and cortical sources coherent with deep brain LFPs for each subject for positive and negative emotional valence conditions respectively. After source analysis, source statistics over subjects was performed. Non-parametric permutation testing with or without cluster-based correction for multiple comparisons was applied to statistically quantify the differences in cortical power source or coherence source between negative and positive emotional stimuli.

      References:

      Maris E, Oostenveld R. Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods 2007; 164(1): 177-90.

      Related to this point, the results in Figure 6 seem quite noisy, because interactions (i.e. coherence) are harder to estimate and N is low. For example, I have to make an effort of optimism to believe that Fig 6A is not just noise, and the result in Fig 6C is also a bit weak and perhaps driven by the blue point at the bottom. My read is that the authors didn't do permutation testing here, and just a parametric linear-mixed effect testing. I believe the authors should embed this into permutation testing to make sure that the extremes are not driving the current p-value.

      We have now quantified the coherence between frontal cortex-habenula and occipital cortex-habenula separately (please see more details in the reply to Reviewer 2 (Recommendations for the authors 6). The new analysis showed that the increase in the theta/alpha band coherence around 1 s after the negative stimuli was only observed between prefrontal cortex-habenula and not between occipital cortex-habenula. This supports the argument that Fig. 6A is not just noise.

    2. Reviewer #3 (Public Review):

      This paper describes the oscillatory activity of the habenula using local field potentials, both within the region and, through the use of MEG, in connection to the prefrontal cortex. The characteristics of this activity were found to vary with the emotional valence but not with arousal. Sheding light on this is relevant, because the habenula is a promising target for deep brain stimulation.

      In general, because I am not much on top of the literature on the habenula, I find difficult to judge about the novelty and the impact of this study. What I can say is that I do find the paper is well-written and very clear; and the methods, although quite basic (which is not bad), are sound and rigourous.

      On the less positive side, even though I am aware that in this type of studies it is difficult to have high N, the very low N in this case makes me worry about the robustness and replicability of the results. I'm sure I have missed it and it's specified somewhere, but why is N different for the different figures? Is it because only 8 people had MEG? The number of trials seems also a somewhat low. Therefore, I feel the authors perhaps need to make an effort to make up for the short number of subjects in order to add confidence to the results. I would strongly recommend to bootstrap the statistical analysis and extract non-parametric confidence intervals instead of showing parametric standard errors whenever is appropriate. When doing that, it must be taken into account that each two of the habenula belong to the same person; i.e. one bootstraps the subjects not the habenula.

      Related to this point, the results in Figure 6 seem quite noisy, because interactions (i.e. coherence) are harder to estimate and N is low. For example, I have to make an effort of optimism to believe that Fig 6A is not just noise, and the result in Fig 6C is also a bit weak and perhaps driven by the blue point at the bottom. My read is that the authors didn't do permutation testing here, and just a parametric linear-mixed effect testing. I believe the authors should embed this into permutation testing to make sure that the extremes are not driving the current p-value.

    3. Reviewer #2 (Public Review):

      In this study, Huang and colleagues recorded local field potentials from the lateral habenula in patients with psychiatric disorders who recently underwent surgery for deep brain stimulation (DBS). The authors combined these invasive measurements with non-invasive whole-head MEG recordings to study functional connectivity between the habenula and cortical areas. Since the lateral habenula is believed to be involved in the processing of emotions, and negative emotions in particular, the authors investigated whether brain activity in this region is related to emotional valence. They presented pictures inducing negative and positive emotions to the patients and found that theta and alpha activity in the habenula and frontal cortex increases when patients experience negative emotions. Functional connectivity between the habenula and the cortex was likewise increased in this band. The authors conclude that theta/alpha oscillations in the habenula-cortex network are involved in the processing of negative emotions in humans.

      Because DBS of the habenula is a new treatment tested in this cohort in the framework of a clinical trial, these are the first data of its kind. Accordingly, they are of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap.

      In terms of community impact, I see the strengths of this paper in basic science rather than the clinical field. The authors demonstrate the involvement of theta oscillations in the habenula-prefrontal cortex network in emotion processing in the human brain. The potential of theta oscillations to serve as a marker in closed-loop DBS, as put forward by the authors, appears less relevant to me at this stage, given that the clinical effects and side-effects of habenula DBS are not known yet.

      Detailed comments:

      The group-average MEG power spectrum (Fig. 4B) suggests that negative emotions lead to a sustained theta power increase and a similar effect, though possibly masked by a visual ERP, can be seen in the habenula (Fig. 3C). Yet the statistics identify brief elevations of habenula theta power at around 3s (which is very late), a brief elevation of prefrontal power a time 0 or even before (Fig. 4C) and a brief elevation of Habenula-MEG theta coherence around 1 s. It seems possible that this lack of consistency arises from a low signal-to-noise ratio. The data contain only 27 trails per condition on average and are contaminated by artifacts caused by the extension wires.

      I doubt that the correlation between habenula power and habenula-MEG coherence (Fig. 6C) is informative of emotion processing. First, power and coherence in close-by time windows are likely to to be correlated irrespective of the task/stimuli. Second, if meaningful, one would expect the strongest correlation for the negative condition, as this is the only condition with an increase of theta coherence and a subsequent increase of theta power in the habenula. This, however, does not appear to be the case.

      The authors included the factors valence and arousal in their linear model and found that only valence correlated with electrophysiological effects. I suspect that arousal and valence scores are highly correlated. When fed with informative yet highly correlated variables, the significance of individual input variables becomes difficult to assess in many statistical models. Hence, I am not convinced that valence matters but arousal not.

      Page 8: "The time-varying coherence was calculated for each trial". This is confusing because coherence quantifies the stability of a phase difference over time, i.e. it is a temporal average, not defined for individual trials. It has also been used to describe the phase difference stability over trials rather than time, and I assume this is the method applied here. Typically, the greatest coherence values coincide with event-related power increases, which is why I am surprised to see maximum coherence at 1s rather than immediately post-stimulus.

    4. Reviewer #1 (Public Review):

      The study by Huang et al. report on direct recordings (using DBS electrodes) from the human habenula in conjunction with MEG recordings in 9 patients. Participants were shown emotional pictures. The key finding was a transient increase in theta/alpha activity with negative compared to positive stimuli. Furthermore, there was a later increase in oscillatory coupling in the same band. These are important data, as there are few reports of direct recordings from the habenula together with the MEG in humans performing cognitive tasks. The findings do provide novel insight into the network dynamics associated with the processing of emotional stimuli and particular the role of the habenula.

      Recommendations:

      How can we be sure that the recordings from the habenula are not contaminated by volume conduction; i.e. signals from neighbouring regions? I do understand that bipolar signals were considered for the DBS electrode leads. However, high-frequency power (gamma band and up) is often associated with spiking/MUA and considered less prone to volume conduction. I propose to also investigate that high-frequency gamma band activity recorded from the bipolar DBS electrodes and relate to the emotional faces. This will provide more certainty that the measured activity indeed stems from the habenula.

      Figure 3: the alpha/theta band activity is very transient and not band-limited. Why refer to this as oscillatory? Can you exclude that the TFRs of power reflect the spectral power of ERPs rather than modulations of oscillations? I propose to also calculate the ERPs and perform the TFR of power on those. This might result in a re-interpretation of the early effects in theta/alpha band.

      Figure 4D: can you exclude that the frontal activity is not due to saccade artifacts? Only eye blink artifacts were reduced by the ICA approach. Trials with saccades should be identified in the MEG traces and rejected prior to further analysis.

      The coherence modulations in Fig 5 occur quite late in time compared to the power modulations in Fig 3 and 4. When discussing the results (in e.g. the abstract) it reads as if these findings are reflecting the same process. How can the two effect reflect the same process if the timing is so different?

      Be explicit on the degrees of freedom in the statistical tests given that one subject was excluded from some of the tests.

    5. Evaluation Summary:

      Since DBS of the habenula is a new treatment, these are the first data of its kind and potentially of high interest to the field. Although the study mostly confirms findings from animal studies rather than bringing up completely new aspects of emotion processing, it certainly closes a knowledge gap. This paper is of interest to neuroscientists studying emotions and clinicians treating psychiatric disorders. Specifically the paper shows that the habenula is involved in processing of negative emotions and that it is synchronized to the prefrontal cortex in the theta band. These are important insights into the electrophysiology of emotion processing in the human brain.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

    1. Reviewer #3 (Public Review):

      The authors tested HIV-1 DNA and RNA levels in two large cohorts of ART-treated HIV-1 patient to evaluate possible differences in HIV-1 reservoir cell markers between NNRTI- and PI-based ART regimens, this question is relevant since millions of people living with HIV are currently receiving HIV treatment with these agents. Their major finding is that NNRTI-based treatment is associated with reduced cell-associated HIV-1 RNA and DNA levels; this finding is not entirely novel and well in line with a number of previous observations. The strengths of the study are the large clinical cohorts for which detailed clinical and demographical data are available. The analysis of HIV-1 DNA and RNA is informative, but the assays used do not distinguish between replication-competent and defective proviral species; this is appropriately identified as a limitation of this work. The authors do not address possible immunological consequences of higher HIV DNA levels in PI-treated patients - is this associated with higher levels of inflammatory markers? In addition, it is possible that higher levels of cell-associated HIV-1 RNA may stimulate cell-intrinsic innate (type I IFN-mediated) immunity in PI-treated patients - an aspect that the authors do not address. In the absence of such additional immunological data, it is difficult to assess the true significance and importance of the described observations.

    2. Reviewer #2 (Public Review):

      This is a well-written study that will be of interest to many investigators working in the field of HIV persistence during ART. The strengths of the study include the analysis of samples from two relatively large cohorts of individuals (n= 100 and 124) and the use of multivariable models to adjust for numerous parameters. One weakness is the fact that the authors do not consider alternative models that may explain their results. The data is important but should not be overinterpreted, because it does not demonstrate that NNRTI have a better ability to suppress HIV replication. It shows that NNRTI usage is associated with lower levels of HIV persistence markers but does not provide a mechanistic explanation for that (and should not attempt to do so, at least not in the abstract). Overall, this is a well-conducted and important study, with new findings that have potential clinical implications.

    3. Reviewer #1 (Public Review):

      The authors examine measures of viral reservoir to understand how different antiviral treatment regimens impact residual virus in HIV infection. They find that NNRTI-based treatments are associated with lower viral reservoirs than PI-based regimens, suggesting they may have some advantage at reducing HIV levels long term.

    4. Evaluation Summary:

      This study addresses how antiviral treatment regimens impact persistence of an HIV reservoir in individuals who are treated for a long period. The authors examine measures of viral reservoir to understand how different antiviral treatment regimens impact residual virus in HIV infection. They find that NNRTI-based treatments are associated with lower viral reservoirs and better viral suppression than PI-based regimens, suggesting they may have some advantage at reducing HIV levels long term.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Reviewer #2 (Public Review):

      As the most common reason for infertility, the underlying mechanism of endometrial fibrosis remains largely unknown. Although some progress has been made about the pathogenesis of endometrial fibrosis, the role of cirRNAs during this process remains elusive. In this investigation, Song et al. propose a novel mechanism that increased epithelial circPTPN12 reduces miR-21-5p, which contributes to upregulation of ΔNp63α to induce the epithelial mesenchymal transition (EMT) of EECs (EEC-EMT). There are several interesting findings in this manuscript including 1) there are hundreds of miRNAs are differentially expressed between control and endometrial fibrosis; 2) miR21-5p is mainly located in epithelial cells in normal endometrium 3) There are also some circRNAs are significantly changed between control and IUA; 4) Moreover, functional studies reveal that circPTPN12 is a critical ceRNA for miR21-5p; 5) Different in vivo evidence from the established animal model also unravels that circPTPN12-miR21-5p participate EMT process. Although the author provides comprehensive evidence to support their hypothesis, there are still some minor concerns raised during reviewing this manuscript.

    2. Reviewer #1 (Public Review):

      The study by Song and colleagues explores the role of circRNAs in fibrosis of the endometrium. Endometrial cells for patients with and without fibrosis were subjected to expression profiling analysis, and circPTPN12 and miR-21-5p were strongly separate in fibrosis in endometrial, with circPTPN12 acting as an inhibitory factor for miR-21-5p. Through the use of various molecular approaches, the authors further that miR-21-5p inhibition results in upregulation of ΔNp63α, and transcription factor that induces EMT. The role of circPTPN12 was also confirmed in vivo using a mouse model of mechanically induced endometrial fibrosis. The authors concluded that targeting the path circPTPN12/miR-21-5p/∆Np63α may be a therapeutic strategy for endometrial fibrosis.

      The authors clearly and convincingly show the involvement of the circPTPN12/miR-21-5p/∆Np63α in EMT and its potential involvement in endometrial fibrosis. Whether or not this can be a therapeutic target is too preliminary at this point. First because the in vivo experiments confirm the link between circPTPN12/miR-21-5p/∆Np63α at the RNA level only (p63) and it would be more convincing to see protein data as well. The involvement of p63 in the process remains a little elusive in this paper. In addition, if the authors believe this pathway can be a real future target to treat endometrial fibrosis, they could better contextualise such a statement, specifically describe what kinds of therapeutic intervention they think of, like regression or prevention of fibrosis. These should be tested in vitro and in vivo. More evidence of the involvement of circPTPN12/miR-21-5p/∆Np63α and the correlation between the three players using clinical material is also necessary.

    3. Evaluation Summary:

      The study by Song and colleagues explores the role of circRNAs in fibrosis of the endometrium. The paper is of interest for scientists working in the field of endometrial fibrosis and most likely can have implications for other endometrial disorders characterised by fibrotic tissues. The study unravel the molecular mechanism underlying the disease and the thorough experimental part fully support the author's claim.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Author Response:

      Evaluation Summary:

      The paper describes an algorithm that combines epidemiological and sequence data to provide a rapid assessment of the probability of healthcare-associated infections among hospital onset SARS-CoV-2 infections, that also may be associated with outbreak events. There is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals. It is questionable to what extent the tool presented is generalisable to medical facilities outside of the specific data rich settings considered here, or if the tool is useful for prospective analyses. This study would be of interest to specialists working in hospital infection prevention, with more limited further interest.

      We thank eLife for the commentary on our work. We agree that there is a need for robust prospective evaluation of routine viral sequencing of SARS-CoV-2 for Infection Prevention and Control and of this tool specifically. Our research group is conducting such work within a multi- centre prospective study that is currently ongoing https://clinicaltrials.gov/ct2/show/NCT04405934, https://doi.org/10.1101/2021.04.13.21255342.

      Reviewer #1 (Public Review):

      -In the present paper the authors have attempted to develop a novel statistical method and sequence reporting tool that combines epidemiological and sequence data to provide a rapid assessment of the probability of HCAI among HOCI cases (defined as first positive test >48 hours following admission) and to identify infections that could plausibly constitute outbreak events.

      -As healthcare-associated infections in hospitals present a significant health risk to both vulnerable patients and healthcare workers, significant improvements to provide a rapid assessment of the probability of HCAI among HOCI cases is of utmost importance in a pandemic setting.

      -The strength of the paper is that they have successfully used a large number of virus sequence data from two UK cities with selected hospitals and developed a statistical method to bring these together with classical epidemiological data, which has resulted in a sequence reporting tool (SRT) that was evaluated in relation to:

      -The IPC classification system recommended by PHE,

      -The PHE definition of healthcare-associated COVID-19 outbreaks (using a 2 SNP threshold).

      -They show the added value of combining the two systems. Obviously, this can only work prospectively in a setting like in the UK, where indeed a system like the COVID-19 Genomics (COG) UK initiative is effectively in place. They conclude that through their retrospective application to clinical datasets, to have demonstrated that the methodology is able to provide confirmatory evidence for most PHE-defined definite and probable HCAIs and provide further information regarding indeterminate HCAIs. Therefor, the SRT may allow IPC teams to optimise their use of resources on areas with likely nosocomial acquisition events.

      -The acquisition of the extensive prospective datasets necessary to use the system requires a non-negligible investment that is possible in a setting in which sequencing routine and phylogenetic analyses can be carried out in real time. The added value of the methodology should eventually justify the investment.

      We thank the reviewer for their summary and commentary on our work. We agree that full evaluation of the use of viral sequencing for clinical practice requires health economic analysis of the associated costs relative to potential gains, and this is planned within our ongoing research program on this topic.

      Reviewer #2 (Public Review):

      Since early 2020, the SARS-CoV-2 pandemic has presented numerous challenges to healthcare facilities around the world. Given the highly transmissible nature of SARS-CoV-2 virus, and the confined nature of most hospital settings, hospital acquired infections with SARS-CoV-2 are a frequent occurrence and pose major challenges for hospital infection prevention teams. The increasing use of genomic epidemiology, facilitated by cheaper/faster genetic sequencing tools and user-friendly algorithms for data analysis, creates new opportunities for using virus sequencing to track virus spread in healthcare facilities. While opportunities are increasing, there remain two important bottlenecks to meaningful and widespread use of genomic epidemiology in well-resourced healthcare settings - 1. the turnaround time from sample collection to delivery of sequenced and analysed result; 2. a lack of training among many infection prevention personnel in interpreting genomic epidemiology output.

      The study by Stirrup et al tries to alleviate these issues through the development of an algorithm that synthesises inferences from virus genetic sequences and hospital epidemiological data to provide easy to interpret information about whether or not there is likely to be ongoing virus transmission within a medical facility. In general, these kinds of approaches are highly worthwhile and can have important translational value as they facilitate the use of powerful new technologies without necessarily requiring extensive professional training to interpret the results. Indeed, there is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals.

      In this study, the authors describe their new algorithm and apply it in two retrospective cases to evaluate its potential value to provide valuable information to infection control teams. While it seems clear that the algorithm reliably detects nosocomial transmission in situations where there are obvious hospital outbreaks, it is much less clear that it performs meaningfully in situations where nosocomial transmission is more questionable. To this end, it is not clear if the algorithm provides useful or meaningful information that would help to reduce the burden of hospital acquired SARS-CoV-2 infections. Towards the end of the discussion section, the authors mention that analyses on the utility of the algorithm in prospective use cases were ongoing from late 2020 to early 2021. These analyses will provide essential information on the value of this tool.

      While the development of these sorts of tools is important, it is unclear from this study if the tool has value in prospective use or if it would be useful in settings where virus genetic sequencing is less frequent and/or slower than the retrospective use cases considered here. Additionally, in many infection prevention scenarios the existence of an outbreak is clear but tracing the routes of transmission is the primary object of investigation. Because the algorithm does not include phylogenetic information infection tracing potential transmission routes is not possible.

      We thank the Reviewer for their commentary on our work. Our ongoing prospective study on implementation of the reporting tool includes intervention phases both with a ‘rapid’ target turnaround of 48 hours from sampling and with a ‘slow’ target turnaround of 5-10 days, and this will generate data on the relative utility of viral sequencing within these timeframes. We acknowledge that the reporting tool developed does not evaluate evidence of direct transmission between case pairs, although it should also be noted that phylogenetic investigation alone cannot be used to confidently infer direct transmission linkage for SARS-CoV-2. We feel that the algorithm and report format can flag potential transmission routes to IPC teams, through the identification of close sequence matches within the hospital as a whole and highlighting of any matching previous ward locations (although the latter is not used in the probability calculations).

    2. Reviewer #2 (Public Review):

      Since early 2020, the SARS-CoV-2 pandemic has presented numerous challenges to healthcare facilities around the world. Given the highly transmissible nature of SARS-CoV-2 virus, and the confined nature of most hospital settings, hospital acquired infections with SARS-CoV-2 are a frequent occurrence and pose major challenges for hospital infection prevention teams. The increasing use of genomic epidemiology, facilitated by cheaper/faster genetic sequencing tools and user-friendly algorithms for data analysis, creates new opportunities for using virus sequencing to track virus spread in healthcare facilities. While opportunities are increasing, there remain two important bottlenecks to meaningful and widespread use of genomic epidemiology in well-resourced healthcare settings - 1. the turnaround time from sample collection to delivery of sequenced and analysed result; 2. a lack of training among many infection prevention personnel in interpreting genomic epidemiology output.

      The study by Stirrup et al tries to alleviate these issues through the development of an algorithm that synthesises inferences from virus genetic sequences and hospital epidemiological data to provide easy to interpret information about whether or not there is likely to be ongoing virus transmission within a medical facility. In general, these kinds of approaches are highly worthwhile and can have important translational value as they facilitate the use of powerful new technologies without necessarily requiring extensive professional training to interpret the results. Indeed, there is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals.

      In this study, the authors describe their new algorithm and apply it in two retrospective cases to evaluate its potential value to provide valuable information to infection control teams. While it seems clear that the algorithm reliably detects nosocomial transmission in situations where there are obvious hospital outbreaks, it is much less clear that it performs meaningfully in situations where nosocomial transmission is more questionable. To this end, it is not clear if the algorithm provides useful or meaningful information that would help to reduce the burden of hospital acquired SARS-CoV-2 infections. Towards the end of the discussion section, the authors mention that analyses on the utility of the algorithm in prospective use cases were ongoing from late 2020 to early 2021. These analyses will provide essential information on the value of this tool.

      While the development of these sorts of tools is important, it is unclear from this study if the tool has value in prospective use or if it would be useful in settings where virus genetic sequencing is less frequent and/or slower than the retrospective use cases considered here. Additionally, in many infection prevention scenarios the existence of an outbreak is clear but tracing the routes of transmission is the primary object of investigation. Because the algorithm does not include phylogenetic information infection tracing potential transmission routes is not possible.

    3. Reviewer #1 (Public Review):

      -In the present paper the authors have attempted to develop a novel statistical method and sequence reporting tool that combines epidemiological and sequence data to provide a rapid assessment of the probability of HCAI among HOCI cases (defined as first positive test >48 hours following admission) and to identify infections that could plausibly constitute outbreak events.

      -As healthcare-associated infections in hospitals present a significant health risk to both vulnerable patients and healthcare workers, significant improvements to provide a rapid assessment of the probability of HCAI among HOCI cases is of utmost importance in a pandemic setting.

      -The strength of the paper is that they have successfully used a large number of virus sequence data from two UK cities with selected hospitals and developed a statistical method to bring these together with classical epidemiological data, which has resulted in a sequence reporting tool (SRT) that was evaluated in relation to:

      -The IPC classification system recommended by PHE,

      -The PHE definition of healthcare-associated COVID-19 outbreaks (using a 2 SNP threshold).

      -They show the added value of combining the two systems. Obviously, this can only work prospectively in a setting like in the UK, where indeed a system like the COVID-19 Genomics (COG) UK initiative is effectively in place. They conclude that through their retrospective application to clinical datasets, to have demonstrated that the methodology is able to provide confirmatory evidence for most PHE-defined definite and probable HCAIs and provide further information regarding indeterminate HCAIs. Therefor, the SRT may allow IPC teams to optimise their use of resources on areas with likely nosocomial acquisition events.

      -The acquisition of the extensive prospective datasets necessary to use the system requires a non-negligible investment that is possible in a setting in which sequencing routine and phylogenetic analyses can be carried out in real time. The added value of the methodology should eventually justify the investment.

    4. Evaluation Summary:

      The paper describes an algorithm that combines epidemiological and sequence data to provide a rapid assessment of the probability of healthcare-associated infections among hospital onset SARS-CoV-2 infections, that also may be associated with outbreak events. There is an urgent need for tools that can synthesise multiple data streams to provide real time information to healthcare professionals. It is questionable to what extent the tool presented is generalisable to medical facilities outside of the specific data rich settings considered here, or if the tool is useful for prospective analyses. This study would be of interest to specialists working in hospital infection prevention, with more limited further interest.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Author Response:

      Reviewer #3 (Public Review):

      About 30 million years ago the ancestors of Old World primates lost the ability to produce the glycan a-gal due to the fixation of several loss-of-function mutations in the GGTA1 gene. The evolutionary advantage of such loss remains elusive. The current study builds upon previous work by the authors showing (i) that the presence of a-gal expressing bacteria in ggta1 deficient mice led to production of antibodies capable of clearance of malaria-causing plasmodia carrying a-gal (Yilmaz et al., 2014), and (ii) that ggta1 deficiency is associated with increased resistance to sepsis via the enhancement of IgG effector function (Sigh et al., 2021). Here they expand on these findings to show that ggta1 deletion in mice is associated with altered composition of the gut microbiome due to the action of IgA targeting of a-Gal expressing bacteria. In addition, they show that the absence of a-gal results in a microbiome that is less pathogenic (i.e., less likely to induce sepsis in their experimental model). Although some aspects of the work are not very novel (e.g., the fact that ggta1 is associated with a remodeled microbiome had already been shown in their previous publications) the work does provide additional insights into the pleiotropic role of ggta1 in immune function, susceptibility to sepsis, and eventual fitness advantage. The work is extremely well done and all conclusions are supported by solid data. Indeed, I felt that the authors were reading my mind every step of the way. Each time I questioned one of the conclusions the next paragraph would address that exact concern. There are, however, a few points that I think would deserve additional clarification.

      1 - I was a little surprised that they found no difference in the microbiome of F2 mice between a-gal deficient and wild-type mice. Although I understand that this might be due to antibodies received by the mom, the fact that the divergence in only seen in F3 to F5 would also be compatible with drift and not necessarily a genotype-driven phenotype. Are the microbiome differences detected in F3-F5 overlapping to those observed at F0? If the original differences were controlled by host genetics - the hypothesis being tested - we would expect to see some convergent (at least at the level of specific taxa)

      We agree essentially with the comment: “… would also be compatible with drift and not necessarily a genotype-driven phenotype” and have addressed this issue by adding the following statement in the Discussion section:

      “On the basis of this observation alone (Figure 1), one cannot exclude the observed divergence in the microbiota bacterial population frequencies of wild type vs. Ggta1-deleted mice (Figure 1) from being a stochastic event. However, the observation that these changes occur via an Ig-dependent mechanism that differs in wild type vs. Ggta1- deleted mice (Figure 3) does support that loss of αGal contributes critically to shape the microbiota composition of Ggta1- deficient mice.”

      We have previously shown that homogenization of the microbiota occurs between the littermates in the F2 generation (Singh et al., 2021). Having confirmed this finding in this manuscript (Figure 1C, Figure 3-figure supplement 7A-B), we find that the effect of the genotype and Ig is seen only from the F3 generation onwards (Figure 1D-F, Figure 3). Presumably, the inability of F1 Ggta1+/- mothers to produce anti-αGal antibodies accounts for the absence of overt shaping of the F2 microbiota. In these experiments, anti-αGal antibodies can only be generated from αGal-deficient F2 Ggta1-/- mice, being vertically transferred and shaping the microbiota from F3 Ggta1-/- mice onwards. We propose that the differences in the microbiota composition of the two F3 genotypes onwards are driven by a cumulative effect of maternal anti-αGal antibodies over the offspring microbiota composition.

      2 - I was really surprised that ggta1 deficient mice lacking a functional adaptive immune system (Figure S8) were equally resistant to systemic infection with the cecal inoculum isolated from ggta1 deficient mice. In the previous work they show that the increases resistance to sepsis comes from increases effector function of IgG. If that is the case, how come mice not having an adaptive system (hence no IgG) are equally protected? Is the pathogenicity of the microbiome of ggta1 deficient mice that reduced? It seems unlikely. More generally, I would like to have seen a better discussion about how these new findings connect to their past work. In the context of increased resistance to sepsis what seems to be more important - the remodeling of the microbiome by IgA or the increased effector function of IgG?

      The data reported in our manuscript does indeed support the conclusion that shaping of the microbiota composition of Ggta1-deficient mice is associated with an overall reduction of the microbiome pathogenicity. This finding is in keeping with host-microbe commensal interactions not being hard- wired but instead oscillating from pathogenic to symbiotic (Ayres, 2016; Vonaesch et al., 2018). Our findings suggest that the loss of Ggta1 function can modify the nature of host-microbiota interactions, through a mechanism whereby the absence of host αGal and the emergence of antibodies targeting this glycan in microbes, shapes and reduces the microbiome pathogenicity.

      We have shown that loss of αGal can enhance resistance to bacterial sepsis via a mechanism that increases IgG effector function (Singh et al., 2021). This was demonstrated by systemically infecting Ggta1-deficient mice with a “non-shaped” microbiota inoculum, isolated from Ggta1-deficient mice lacking adaptive immunity (Rag2-/-Ggta1-/- mice). As discussed in the manuscript “the gut microbiota of Rag2-/-Ggta1-/- mice, lacking adaptive immunity, is highly enriched in pathobionts such as Proteobacteria, including Helicobacter (Singh et al., 2021)”. Under these experimental conditions, resistance to infection is IgG dependent, explaining why modulation of IgG effector function by αGal impacts on the outcome of sepsis.

      In the current manuscript we describe another survival advantage against bacterial sepsis associated with Ggta1 deletion in mice. Namely, antibodies generated by Ggta1-deficient mice can shape and reduce the microbiota pathogenicity. This was demonstrated by infecting systemically Ggta1-deficient mice lacking adaptive immunity (Rag2-/-Ggta1-/- mice) with a “shaped- microbiota” inoculum isolated from Ggta1-deficient mice. While the mechanism underlying microbiota shaping is antibody-dependent, the effector mechanism conferring resistance against the shaped microbiota acts irrespectively of adaptive immunity, including IgG. This conclusion is supported by the observation that systemic infection by the shaped microbiota (isolated from Ggta1-deficient mice) failed to induce sepsis in Rag2-/-Ggta1-/- mice, which was not the case upon systemic infection with a non-shaped microbiota (isolated from Rag2-/-Ggta1-/- mice). We conclude that Ggta1 deletion in mice increases resistance to bacterial sepsis via two interrelated antibody-dependent mechanisms: i) Increased IgG effector function (Singh et al., 2021) and ii) Antibody shaping and reduction of microbiota pathogenicity (current manuscript). To what extent these two traits are related remains to be established.

      It is possible that similarly to what was demonstrated for IgG (Singh et al., 2021), the absence of αGal from glycan structures in other Ig isotypes, including IgA, might modify their effector function. We do not yet know if this is the case, as in our manuscript, what we find is an altered antibody response targeting immunogenic bacteria in the microbiota of Ggta1-deficient mice. This is associated with modulation of the microbiota bacterial composition, i.e. antibody shaping of the microbiota, and with a reduction of the microbiome pathogenicity. The latter explains why the Ggta1-deficient mice do not rely on circulating antibodies to prevent the development of sepsis upon systemic infection by bacteria emanating for their own “shaped” microbiota.

    2. Reviewer #3 (Public Review):

      About 30 million years ago the ancestors of Old World primates lost the ability to produce the glycan a-gal due to the fixation of several loss-of-function mutations in the GGTA1 gene. The evolutionary advantage of such loss remains elusive. The current study builds upon previous work by the authors showing (i) that the presence of a-gal expressing bacteria in ggta1 deficient mice led to production of antibodies capable of clearance of malaria-causing plasmodia carrying a-gal (Yilmaz et al., 2014), and (ii) that ggta1 deficiency is associated with increased resistance to sepsis via the enhancement of IgG effector function (Sigh et al., 2021). Here they expand on these findings to show that ggta1 deletion in mice is associated with altered composition of the gut microbiome due to the action of IgA targeting of a-Gal expressing bacteria. In addition, they show that the absence of a-gal results in a microbiome that is less pathogenic (i.e., less likely to induce sepsis in their experimental model). Although some aspects of the work are not very novel (e.g., the fact that ggta1 is associated with a remodeled microbiome had already been shown in their previous publications) the work does provide additional insights into the pleiotropic role of ggta1 in immune function, susceptibility to sepsis, and eventual fitness advantage. The work is extremely well done and all conclusions are supported by solid data. Indeed, I felt that the authors were reading my mind every step of the way. Each time I questioned one of the conclusions the next paragraph would address that exact concern. There are, however, a few points that I think would deserve additional clarification.

      1 - I was a little surprised that they found no difference in the microbiome of F2 mice between a-gal deficient and wild-type mice. Although I understand that this might be due to antibodies received by the mom, the fact that the divergence in only seen in F3 to F5 would also be compatible with drift and not necessarily a genotype-driven phenotype. Are the microbiome differences detected in F3-F5 overlapping to those observed at F0? If the original differences were controlled by host genetics - the hypothesis being tested - we would expect to see some convergent (at least at the level of specific taxa)

      2 - I was really surprised that ggta1 deficient mice lacking a functional adaptive immune system (Figure S8) were equally resistant to systemic infection with the cecal inoculum isolated from ggta1 deficient mice. In the previous work they show that the increases resistance to sepsis comes from increases effector function of IgG. If that is the case, how come mice not having an adaptive system (hence no IgG) are equally protected? Is the pathogenicity of the microbiome of ggta1 deficient mice that reduced? It seems unlikely. More generally, I would like to have seen a better discussion about how these new findings connect to their past work. In the context of increased resistance to sepsis what seems to be more important - the remodeling of the microbiome by IgA or the increased effector function of IgG?

    3. Reviewer #2 (Public Review):

      The authors aimed to examine the impact of GGTA1 deletion on host-microbial interactions using a mouse model of a primate-specific mutation. This is a very informative model system that provided interesting insights into the consequences of aGal elimination from host glycoproteins, with subsequent 'release' of immune tolerance breaks and generation of antibody responses agains bacterial aGal epitopes.

      The study is well executed and and the conclusions are well supported by the provided evidence. The findings are interesting for a broad audience of biologists.

      The identity of IgA targeted bacteria in GGTA1 vs WT mice would be interesting to investigate in the future studies.

    4. Reviewer #1 (Public Review):

      This work is a powerful example of thinking across silos. It combines much knowledge of innate and adaptive immunity, with primate evolution of certain antigens lost only in certain primate lineages and tests an important idea about host-mediated, antibody dependent shaping of gut microbiota using laboratory mice with different engineered genetic alterations. Gut microbiota are all the rage these days, but is often forgotten that these microbial communities represent formidable danger that is really too close (one epithelial layer away) for comfort. The authors demonstrate in laboratory mice, how antibodies against non-self sugar molecules present on bacteria can shape the microbiome. Claims and conclusions seem justified by the data presented.

    5. Evaluation Summary:

      30 million years ago the ancestors of Old World primates lost the ability to produce alpha-gal due to the fixation of several loss-of-function mutations in the GGTA1 gene. The evolutionary advantage of such loss remains elusive. Here, the authors provide additional insights into the pleiotropic role of ggta1 in shaping the gut microbiota, immune function, susceptibility to sepsis, and eventual fitness advantage.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

    1. Author Response:

      Reviewer #1 (Public Review):

      Facial muscles control the execution of essential tasks like eating, drinking, breathing and (in most mammals) tactile exploration. The activity of motor neurons targeting different muscles are coordinated by premotor regions distributed throughout brainstem. The precise identity of these cells and regions in adults is presently unclear, largely due to technical challenges. In the current work, Takaoh and colleagues develop an elegant strategy to label premotor neurons that target select muscles and register these cells on a common digital atlas. Their work confirms and also extends previous studies in neonates and provides a useful resource for the field.

      We thank Reviewer 1 for the positive evaluation.

      Reviewer #2 (Public Review):

      The authors describe a variant of retrograde monosynaptic rabies tracing from skeletal muscle. They make use of AAV2-retro-Cre to infect brainstem motoneurons projecting to muscles involved in regulation of orofacial movements (whisking, genioglossus, masseter motoneurons). The strategy that worked most efficiently and with specificity was to inject AAV2-retro-Cre intramuscularly at P17, followed 3 weeks thereafter by central injection of Cre-dependent AAVs expressing TVA and oG, and 2 weeks thereafter followed by central injection of EnvA(M21)-ΔG-RV-GFP. Five days after this final injection, experiments were terminated to analyse the distribution of premotor neurons. This allowed the authors to reconstruct and compare the distribution of premotor neurons to the whisking (lateral 7N), tongue protruding genioglossus (12N), and jaw-closing masseter (5N) motoneurons. To do so, they used the Allen Brain Atlas as a reference for 3D reconstruction, into which they integrated all data. Notably, the authors found that for all three injection types, the highest density of neurons was found in the IRt and PCRt, but the precise peak of highest density was consistently distinct for the three different injection types. The peak for whisker premotor neurons was most caudal-ventral, for masseter premotor neurons most rostro-dorsal, and jaw-closing genioglossal premotor neurons in between these. The authors also make use of the strong expression of fluorescent proteins through rabies virus to analyse collateralization to other motor nuclei. Interestingly, they found cross-talk to other motor nuclei in selective patterns, supporting a model whereby some premotor neurons to one brainstem motor pool also interact with other output circuits, perhaps to coordinate orofacial behaviors. Using a split-Cre retrograde approach from motor nuclei, dual-projecting premotor neurons were identified to be located in dorsal IRt and SupV.

      This is a high-quality study making use of several methods not previously brought together in one study. Particularly interesting is the 3-way virus strategy in wild-type mice allowing visualization of premotor neurons in the adult. Second, alignment in a common reference brain is also very useful. And finally, the beginning of understanding dynamics of premotor circuit distribution between development and adult is also a value of this paper. Overall, the study is very interesting for the field.

      We thank Reviewer 2 for the positive evaluation.

      Reviewer #3 (Public Review):

      Orofacial actions show exquisite coordination among many muscles, yet the pools of motor neurons exciting each of these muscles is specific to that muscle. The coordination of activity across muscles therefore relies on circuits of premotor neurons that excite the motor neurons. Work by the authors and others has produced major progress in delineating these complex premotor circuits. Recent work using transsynaptic viral tracing has overcome limitations associated with traditional retrograde tracing methods, such as a lack of adequate specificity. However, these transsynaptic viral methods have been unsuccessful in animals older than approximately postnatal day 8 (P8). This is a problem because circuits continue to develop far beyond P8 in mice. Here, the authors overcome this limitation by introducing a novel viral transsynaptic tracing method that can be applied in adult mice.

      The authors apply their method to trace premotor circuits for whisking, licking, and jaw movements. They align their anatomical data to the Allen Mouse Brain Common Coordinate Framework and make it available with the manuscript, greatly facilitating its quantitative use by other laboratories. The authors find premotor circuits in adult mice that are almost entirely consistent with results from younger mice, with some important exceptions that they highlight and discuss. The authors quantify overlap of premotor circuits for whisking, licking and jaw movements and discuss the implications of interactions among these circuits.

      The experiments and analysis are carefully performed, and the results put into proper context. Overall, this is a straightforward and valuable contribution to our knowledge of the premotor circuits that coordinate orofacial behaviors. It will be of wide interest to neuroscientists.

      Suggestions:

      -The methods applied in neonatal mice (Takatoh et al. 2013; Stanek et al. 2014), while obviously different, are similar enough that it may be worth including discussion of any possible ways that differences between the neonatal and adult results could be due to methods, rather than age. I defer to the authors about whether such discussion is worthwhile, but readers may benefit from knowing what was considered.

      Now we added the technical considerations that may cause the difference in the tracing patterns: Line 505-517.

      -Spatial correlation in Figure 5C. To interpret this properly it's important to know the degree of smoothing. I could not find this in the relevant methods section describing the kernel density estimation or elsewhere.

      Same as the above: The cells detected in each mouse were first registered into the standard three-dimensional brain model. The (x, y, z) coordinates of each cell were then extracted, and the multivariate kernel smoothing density estimation was applied (bandwidth = 1). The resulting kernel density estimation was then vectorized, and the cosine similarity between any two of the mice were calculated to form the correlogram.

    2. Reviewer #3 (Public Review):

      Orofacial actions show exquisite coordination among many muscles, yet the pools of motor neurons exciting each of these muscles is specific to that muscle. The coordination of activity across muscles therefore relies on circuits of premotor neurons that excite the motor neurons. Work by the authors and others has produced major progress in delineating these complex premotor circuits. Recent work using transsynaptic viral tracing has overcome limitations associated with traditional retrograde tracing methods, such as a lack of adequate specificity. However, these transsynaptic viral methods have been unsuccessful in animals older than approximately postnatal day 8 (P8). This is a problem because circuits continue to develop far beyond P8 in mice. Here, the authors overcome this limitation by introducing a novel viral transsynaptic tracing method that can be applied in adult mice.

      The authors apply their method to trace premotor circuits for whisking, licking, and jaw movements. They align their anatomical data to the Allen Mouse Brain Common Coordinate Framework and make it available with the manuscript, greatly facilitating its quantitative use by other laboratories. The authors find premotor circuits in adult mice that are almost entirely consistent with results from younger mice, with some important exceptions that they highlight and discuss. The authors quantify overlap of premotor circuits for whisking, licking and jaw movements and discuss the implications of interactions among these circuits.

      The experiments and analysis are carefully performed, and the results put into proper context. Overall, this is a straightforward and valuable contribution to our knowledge of the premotor circuits that coordinate orofacial behaviors. It will be of wide interest to neuroscientists.

      Suggestions:

      -The methods applied in neonatal mice (Takatoh et al. 2013; Stanek et al. 2014), while obviously different, are similar enough that it may be worth including discussion of any possible ways that differences between the neonatal and adult results could be due to methods, rather than age. I defer to the authors about whether such discussion is worthwhile, but readers may benefit from knowing what was considered.

      -Spatial correlation in Figure 5C. To interpret this properly it's important to know the degree of smoothing. I could not find this in the relevant methods section describing the kernel density estimation or elsewhere.

    3. Reviewer #2 (Public Review):

      The authors describe a variant of retrograde monosynaptic rabies tracing from skeletal muscle. They make use of AAV2-retro-Cre to infect brainstem motoneurons projecting to muscles involved in regulation of orofacial movements (whisking, genioglossus, masseter motoneurons). The strategy that worked most efficiently and with specificity was to inject AAV2-retro-Cre intramuscularly at P17, followed 3 weeks thereafter by central injection of Cre-dependent AAVs expressing TVA and oG, and 2 weeks thereafter followed by central injection of EnvA(M21)-ΔG-RV-GFP. Five days after this final injection, experiments were terminated to analyse the distribution of premotor neurons. This allowed the authors to reconstruct and compare the distribution of premotor neurons to the whisking (lateral 7N), tongue protruding genioglossus (12N), and jaw-closing masseter (5N) motoneurons. To do so, they used the Allen Brain Atlas as a reference for 3D reconstruction, into which they integrated all data. Notably, the authors found that for all three injection types, the highest density of neurons was found in the IRt and PCRt, but the precise peak of highest density was consistently distinct for the three different injection types. The peak for whisker premotor neurons was most caudal-ventral, for masseter premotor neurons most rostro-dorsal, and jaw-closing genioglossal premotor neurons in between these. The authors also make use of the strong expression of fluorescent proteins through rabies virus to analyse collateralization to other motor nuclei. Interestingly, they found cross-talk to other motor nuclei in selective patterns, supporting a model whereby some premotor neurons to one brainstem motor pool also interact with other output circuits, perhaps to coordinate orofacial behaviors. Using a split-Cre retrograde approach from motor nuclei, dual-projecting premotor neurons were identified to be located in dorsal IRt and SupV.

      This is a high-quality study making use of several methods not previously brought together in one study. Particularly interesting is the 3-way virus strategy in wild-type mice allowing visualization of premotor neurons in the adult. Second, alignment in a common reference brain is also very useful. And finally, the beginning of understanding dynamics of premotor circuit distribution between development and adult is also a value of this paper. Overall, the study is very interesting for the field.

    4. Reviewer #1 (Public Review):

      Facial muscles control the execution of essential tasks like eating, drinking, breathing and (in most mammals) tactile exploration. The activity of motor neurons targeting different muscles are coordinated by premotor regions distributed throughout brainstem. The precise identity of these cells and regions in adults is presently unclear, largely due to technical challenges. In the current work, Takaoh and colleagues develop an elegant strategy to label premotor neurons that target select muscles and register these cells on a common digital atlas. Their work confirms and also extends previous studies in neonates and provides a useful resource for the field.

    1. Author Response:

      Reviewer #3 (Public Review):

      1) The authors seem to assume a somewhat random sample throughout Washington state. They state that given a low sampling proportion they do not expect to have captured infection pairs, which seems reasonable. However, they then go onto assume that their sample is primarily comprised of samples from long, successful transmission chains. This is a reasonable assumption if there is no major difference in accessibility of samples from long transmission chains and shorter ones (for example, decreased access to healthcare). Could this impact the assumption of sampling primarily from long transmission chains? It seems from the data collected in this outbreak that this was not the case for mumps in Washington but addressing this assumption clearly (and potential ways to interrogate it) could make their methodology more applicable to other pathogen studies.

      2) There are many examples of phylogenetic analyses that have led to conclusions about pathogen sources and sinks that were later shown to be wrong because of oversampling or other sampling biases. The authors address unequal sampling between clades, but additional contextualization of the problem and how this approach is different may help strengthen the methodology presented in the paper.

      We thank the reviewer for these important points. We have attempted to address these by including an additional paragraph about different types of sampling and their impacts on phylodynamic studies.

      We agree that this is a helpful addition, and have added a new paragraph devoted to a discussion of sampling bias to the discussion on lines 458-484. This paragraph reads:

      “Sampling bias presents a persistent problem for phylodynamic studies that can complicate inference of source-sink dynamics (De Maio et al., 2015; Dudas et al., 2018; Frost et al., 2015; Kühnert et al., 2011; Lemey et al., 2020; Stack et al., 2010). Sampling bias can arise from unequal case detection or from curating a dataset that poorly represents the underlying outbreak. Washington State uses a passive surveillance system for mumps detection and case acquisition, which is known to result in underreporting. Because the WA Department of Health did not perform active mumps surveillance, it is difficult to assess whether different epidemiologic groups have different likelihoods of being sampled. Marshallese individuals are less likely to seek healthcare (Towne et al., 2020), which may have resulted in particularly high rates of underreporting in this group. If the number of cases within the Marshallese community were in fact higher than reported, this would increase the magnitude of the patterns we describe, making our estimates conservative. Given a distribution of cases, composing a dataset for analysis also requires sampling decisions. Uniform sampling regimes in which sampling probability is equal across groups have been shown to perform well for source- sink inferences (Hall et al., 2016). By selecting sequences that matched the overall attributes of the outbreak, including a near 50:50 split between Marshallese and non- Marshallese cases, we adhere to this recommendation. We then specifically employed structured coalescent approaches which have been shown to be robust to sampling differences (Dudas et al., 2018; Müller et al., 2018; Vaughan et al., 2014), rather than using other common approaches that treat sampling intensity as informative of population size (Lemey et al., 2009). Within this framework, we further explore the possibility that unequal sampling within Washington clades could skew internal node reconstruction by forcing the sampling within each Washington clade to be equal between Marshallese and non-Marshallese tips. In doing so, differences within each clade must necessarily be driven by differences in transmission dynamics, rather than sampling. By combining careful sample selection with overlapping approaches to evaluate sampling bias, we were able to mitigate concerns that our source-sink reconstructions are driven by sampling artifacts.”

      3) The authors present compelling evidence that the mumps outbreak in Washington state was sustained by the Marshallese community, and state that mumps did not transmit efficiently among the general Washington populace. That said, there were several other mumps outbreaks in the United States in the same 2016-2017 time period. Was there something different about Washington state that prevented mumps transmission outside of the Marshallese community? Were there no other close-knit communities (universities, prisons, other cultural communities, etc.) affected? It just seems surprising that the Marshallese community was the only community sustaining transmission at a time where many different types of communities were affected across the United States.

      We thank the reviewers and editor for this comment, and agree that further contextualization would be helpful. We did not make it clear in the initial submission that in 2016/2017, the vast majority of mumps outbreaks in the US were associated with either universities or ethnic communities. We have re-organized a few paragraphs in the discussion section and added information about other 2016/2017 outbreaks. This new paragraph is on lines 499-519, and reads:

      “Our finding that most introductions sparked short transmission chains suggests that mumps did not transmit efficiently among the general Washington populace. We suspect that more diffuse contact patterns may help explain this. Mumps has historically caused outbreaks in communities with strong, interconnected contact patterns (Barskey et al., 2012; Fields et al., 2019; Nelson et al., 2013), and in dense housing environments (Snijders et al., 2012), highlighted most recently by outbreaks in US detention centers (Lo et al., 2021). In 2016, most outbreaks in the US were associated with university settings (Albertson et al., 2016; Bonwitt et al., 2017; Donahue et al., 2017; Golwalkar et al., 2018; Shah et al., 2018; Wohl et al., 2020), including a separate, smaller outbreak in Washington State associated with Greek housing (Bonwitt et al., 2017). Outside of university settings, other outbreaks in 2016 were reported within close-knit ethnic communities (Fields et al., 2019; Marx et al., 2018). We speculate that while waning immunity may promote outbreaks by increasing susceptibility among young adults, outbreaks in younger age groups may be possible in sufficiently high-contact settings. Provision of an outbreak dose of mumps-containing vaccine to high-risk groups may therefore be especially effective for limiting mumps transmission in future outbreaks. Others have reported success in using outbreak dose mumps vaccinations to reduce mumps transmission on college campuses (Cardemil et al., 2017; Shah et al., 2018) and in the US army (Arday et al., 1989; Eick et al., 2008; Green, 2006; Kelley et al., 1991), and the CDC currently recommends providing outbreak vaccine doses to individuals with increased risk due to an outbreak (Marlow et al., 2020). Future work to quantify the interplay between contact rates and vaccine-induced immunity among different age and risk groups should be used to guide updated vaccine recommendations.”

      We also amended lines 42-46 in the introduction to highlight that most other US outbreaks in 2016/2017 were university-associated:

      “Like with other recent mumps outbreaks, most Washington cases in 2016/17 were vaccinated. Unusually though, while most US outbreaks in 2016/2017 were associated with university settings (Albertson et al., 2016; Bonwitt et al., 2017; Donahue et al., 2017; Golwalkar et al., 2018; Shah et al., 2018; Wohl et al., 2020), incidence in Washington was highest among children aged 10-18 years, younger than expected given waning immunity.”

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Schrieber et al., explores whether inbreeding affects floral attractiveness to pollinators with additional factors of sex and origin in play, in male and female plants of Silene latifolia. The authors use a combination of spatial sampling, floral volatiles, flower color, and floral rewards coupled with the response of a specialized pollinator to these traits. Their results show that females are more affected by inbreeding and in general inbreeding negatively impacts the "composite nature" of floral traits. The manuscript is well written, the experiments are detailed and quite elaborate. For example., the methodology for flower color estimation is the most detailed effort in this area that I can remember. All the experiments in the manuscript show meticulous planning, with extensive data collection addressing minute details, including the statistics used. However, I do have some concerns that need to be addressed.

      Core strengths: Detailed experimental design, elaborate data collection methods, well-defined methodology that is easy to follow. There is a logical flow for the experiments, and no details are missing in most of the experiemnts.

      Weaknesses: A recent study has addressed some of the questions detailed in the manuscript. So, introduction needs to be tweaked to reflect this.

      Thank you very much for bringing this excellent article to our attention! We adjusted the writing in the introduction and the discussion accordingly. Please consider that this article was first published at the 15th of January 21, while our manuscript was submitted at the 9th of January. Hence, we were not able to account for this study in the first submission. Introduction pp 4-5, ll 48-54: “Although in a few cases inbreeding has been shown to alter single components of flower attractiveness (Ivey and Carr, 2005; Ferrari et al., 2006; Haber et al., 2019), insight into syndrome-wide effects is restricted to a single study. Kariyat et al. (2021) demonstrated that inbred Solanum carolinense L. display reduced flower size, pollen and scent production and receive fewer visits from diurnal generalists. It is necessary to broaden such integrated methodological approaches to other plant-pollinator systems (e.g., nocturnal specialist pollinators) and further floral traits (i.e., flower colour).” Discussion p 19, ll 535-542: “In summary, our research on S. latifolia suggests that in addition to inbreeding disrupting interactions with herbivores by changing plant leaf chemistry (Schrieber et al., 2018) it affects plant interactions with pollinators by altering flower chemistry. Our observations are in line with studies on other plant species (Ivey and Carr, 2005; Kariyat et al., 2012, 2021) and highlight that inbreeding has the potential to reset the equilibrium of species interactions by altering functional traits that have developed in a long history of co-evolution. These threats to antagonistic and symbiotic plant-insect interactions may mutually magnify in reducing plant individual fitness and altering the dynamics of natural plant populations under global change.”

      Some details and controls are missing in floral scent estimation. Flower age, a pesticide treatment of plants that could affect chemistry..needs to be better refined.

      We clarified this issue at different occasions in the methods section. Previous studies (and our study) on S. latifolia have shown no clear differences in the quality of floral scent between sexes. However, one study found higher total emission of VOC in males, while others found no differences. Hence, females produce no specific VOC that are used as oviposition cues but may be differentiated from males by the total amount of emitted VOC and pronounced differences in spatial flower traits. We highlight this at p 6, ll 111-116: “Silene latifolia exhibits various sexual dimorphisms with male plants producing more and smaller flowers that excrete lower volumes of nectar with higher sugar concentrations as compared to females (Gehring et al., 2004; Delph et al., 2010). The quality of floral scent exhibits no clear sex-specific patterns, while male plants have been shown to emit higher or equal total amounts of VOC as compared to females in different studies (Dötterl & Jürgens 2005, Waelti et al. 2009)”.

      Both male and female moths show pronounced behavioural responses to lilac aldehyde isomers and other VOC in the floral scent of S. latifolia (Dötterl et al., 2006). We therefore treated these VOC as typical floral scent compounds. We clarified this at p 7, ll 125-126: “A substantial fraction of floral VOC produced by S. latifolia triggers antennal and behavioural responses in male and female H. bicruris moths (Dötterl et al., 2006).” and p 9, ll 2010-218:” For targeted statistical analyses, we focused on those VOC that evidently mediate communication with H. bicruris according to Dötterl et al. (2006). We analysed the Shannon diversity per plant (calculated with R-package: vegan v.2.5-5, Oksanen et al. 2019) for 20 floral VOC in our data set that were shown to elicit electrophysiological responses in the antennae of H. bicruris (Supplementary File 1). Moreover, we analysed the intensities of three lilac aldehyde isomers, which trigger oriented flight and landing behaviour in both male and female H. bicruris most efficiently when compared to other VOC in the floral scent of S. latifolia. Furthermore, H. bicruris is able to detect the slightest differences in the concentration of these three compounds at very low dosages (Dötterl et al. 2006).”

      We used biological pest control agents in a preventive manner because S. latifolia is often infested by thrips and aphids under greenhouse conditions. The writing in the previous manuscript version was not clear with this regard and we changed the text at p 8, ll 157-161: ” Plants received water and fertilisation (UniversolGelb 12-30-12, Everris-Headquarters, NL) when necessary for the entire experimental period and were prophylactically treated with biological pest control agents under greenhouse conditions to prevent thrips (agent Amblyseius barkeri and Amblyseius cucumeris) and aphid (agent Chrysoperla carnea) infestation (Katz Biotech GmbH, GE) .”

      Indeed, flower size and scent emission can be correlated. Although the question whether differences in scent emission were based on a difference in flower size is an interesting one, it seemed less relevant to us because it is unlikely that our pollinators correct their perception of a scent for the size of a flower (see also p 19, 520-526). We were rather interested in whether scent emission differs between the plant treatments and thus pollinators may chemically perceive such differences. Moreover, we found it problematic to correct our models for flower size by including it as a covariate, which is the reason why we have not assessed this trait during scent collection. In this case, we would have corrected our scent responses for the effects of inbreeding, sex and population origin (i.e., the predictors we are interested in) because all of them determine the size of a flower (Figure 2 c,d). Hence, the inbreeding, sex and origin effects on flower scent would likely vanish. However, it is highly unlikely that the set of genes contributing to sex-, breeding treatment- and origin-based variation in flower size is exactly the same one that determines variation in scent emission per flower, which is basically the assumption underlying the model that includes flower size as a covariate. We critically mentioned the trade-off relationships and our reasoning to not correct for flower size at 9p ll 208-210: “The intensities of VOC were not corrected for flower size because we wanted to capture all variation in scent emission that is relevant for the receiver i.e., the pollinator.”

      While the study is laser-focused on floral traits, as the authors are aware inbreeding affects the total phenotype of the plants including fitness and defense traits. For example, there are quite a few studies that have shown how inbreeding affects the plant defense phenotype. This could be addressed in the introduction and discussion.

      We agree that this aspect is important and therefore addressed it in further detail in the introduction at p 4 ll 34-38: “While it is well established that inbreeding can increase a plant’s susceptibility to herbivores by diminishing morphological and chemical defences (Campbell et al., 2013; Kariyat et al., 2012; Kalske et al., 2014), its effects on plant-pollinator interactions are less well understood. Inbreeding may reduce a plant’s attractiveness to pollinating insects by compromising the complex set of floral traits involved in interspecific communication.” Since other referees suggested to rather tone down than increase the discussion based on floral scent results, we stick to the general feedback relationship among of herbivory and pollination, rather than relating it specifically to volatiles in the discussion at p 19, ll 535-544: “In summary, our research on S. latifolia suggests that in addition to inbreeding disrupting interactions with herbivores by changing plant leaf chemistry (Schrieber et al., 2018) it affects plant interactions with pollinators by altering flower chemistry. Our observations are in line with studies on other plant species (Ivey and Carr, 2005; Kariyat et al., 2012, 2021) and highlight that inbreeding has the potential to reset the equilibrium of species interactions by altering functional traits that have developed in a long history of co-evolution. These threats to antagonistic and symbiotic plant-insect interactions may mutually magnify in reducing plant individual fitness and altering the dynamics of natural plant populations under global change. As such, our study adds to a growing body of literature supporting the need to maintain or restore sufficient genetic diversity in plant populations during conservation programs.”

      Reviewer #2 (Public Review):

      A summary of what the authors were trying to achieve. This interesting and data-rich paper reports the results of several detailed experiments on the pollination biology of the dioceus plant Silene latfolia. The authors uses multiple accessions from several European (native range) and North American (introduced range) populations of S. latifolia to generate an experimental common garden. After one generation of within-population crosses, each cross included either two (half-)siblings or two unrelated individuals, they compared the effects of one-generation of inbreeding on multiple plant traits (height, floral size, floral scent, floral color), controlling for population origin. Thereby, they set out to test the hypothesis that inbreeding reduces plant attractiveness. Furthermore, they ask if the effect is more pronounced in female than male plants, which may be predicted from sexual selection and sex-chromosome-specific expression, and if the effect of inbreeding larger in native European populations than in North American populations, that may have already undergone genetic purging during the bottleneck that inbreeding reduces plant attractiveness. Finally, the authors evaluate to what extent the inbreeding-related trait changes affect floral attractiveness (measured as visitation rates) in field-based bioassays.

      An account of the major strengths and weaknesses of the methods and results. The major strength of this paper is the ambitious and meticulous experimental setup and implementation that allows comparisons of the effect of multiple predictors (i.e. inbreeding treatment, plant origin, plant sex) on the intraspecific variation of floral traits. Previous work has shown direct effects of plant inbreeding on floral traits, but no previous study has taken this wholesale approach in a system where the pollination ecology is well known. In particular, very few studies, if any, has tested the effects of inbreeding on floral scent or color traits. Moreover, I particularly appreciate that the authors go the extra mile and evaluate the biological importance of the inbreeding-induced trait variation in a field bioassay. I also very much appreciate that the authors have taken into account the biological context by using a relevant vision model in the color analyses and by focusing on EAD-active compounds in the floral scent analyses.

      The results are very interesting and shows that the effects of inbreeding on trait variation is both origin- and sex-dependent, but that the strongest effects were not always consistent with the hypothesis that North American plants would have undergone genetic purging during a bottleneck that would make these plants less susceptible to inbreeding effects. The authors made a large collection effort, securing seeds from eight populations from each continent, but then only used population origin and seed family origin as random factors in the models, when testing the overall effect of inbreeding on floral traits. It would have been very interesting with an analysis that partition the variance both in the actual traits under study and in the response to inbreeding to determine whether to what extent there is variation among populations within continents. Not the least, because it is increasingly clear that the ecological outcome of species interactions (mutualistic/antagonistic) in nursery pollination systems often vary among populations (cf. Thompson 2005, The geographic mosaic of coevolution), and some results suggest that this is the case also in Hadena-Silene interactions (e.g. Kephardt et al. 2006, New Phytologist). Furthermore, some plants involved in nursery pollination systems both show evidence of distinct canalization across populations of floral traits of importance for the interaction (e.g. Svensson et al. 2005), whereas others show unexpected and fine-grained variation in floral traits among populations (e.g. Suinyuy et al. 2015, Proceedings B, Thompson et al. 2017 Am. Nat., Friberg et al. 2019, PNAS). Hence, it is possible that the local population history and local variation in the interactions between the plants and their pollinators may be more important predictors for explaining variation in floral trait responses to inbreeding, than the larger-scale continental analyses. Not the least, because North American S. latifolia probably has multiple origins, with subsequent opportunity for admixture in secondary contact.

      Yes, it is necessary to put populations from the same continent into one category, since native and invasive plant populations differ significantly in their evolutionary history (p 5, ll 74-81, http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2012.05751.x). Origin explained sufficient amounts of variation in several traits including flower number, corolla expansion, VOC diversity, lilac aldehyde A intensity, and pollinator visitation rates (see Figures 2-3; and Table 2) and some variation in in the magnitude of inbreeding effects (Figure 2e, f; Figure 3). Even if we would not be interested in differences among native and invasive populations, we would have to include origin as a fixed effect in our models because:

      i) populations within a distribution range are no independent samples,

      ii) origin explains sufficient variation in many responses,

      iii) origin cannot be fitted as a random factor, since it has only two levels (the minimum number of levels for random effect is 4). We agree that it would be very interesting to specifically assess differences in the magnitude of breeding and sex effects among populations within origins. We now discuss this as important future research direction at p 18, ll 500-507: “As such, the precise mechanisms underlying variation in inbreeding effects on different scent traits across population origins of S. latifolia can only be explored based on comprehensive genomic resources, which are currently not available. Future studies should also incorporate field-data on the abundance of specialist pollinators and extend the focus from variation in the magnitude of inbreeding effects among geographic origins to variation among populations within geographic origins and individuals within populations. This would allow a detailed quantification of geographic variation in inbreeding effects and elaborating on the causes and ecological consequences of such variation (Thompson, 2005; Schrieber and Lachmuth, 2017; Thompson et al., 2017)”.

      To empirically address within-origin variation of inbreeding effects with our data, we would have to i) fit correlated random intercepts and slopes for the interaction breeding-sex on the population random factor (models consume min. 22 DF); or ii) include population as a fixed effect in our models (models consume min. 67 DF). We have tried both of these approaches when preparing the revision, but unfortunately it turned out that our study is not designed to address this question. The models for both variants only partially converge (see R-script ll. 1568-1580), and even if they do this does not imply that one can draw solid inference from them. Approach i often results in multiple singular convergence warning messages implying that no variance is explained by population-specific reaction norms to the fixed effects specified in the random effects structure. Approach ii results in odd rank- deficient models (I was seriously worried about type I errors). We simply have too few replicates (5) per population-breeding treatment-sex combination for both approaches. For solid inference we would need 10approach i-40approach ii replicates = 640-2600 individuals. However, our experimental design is sufficient to address the hypothesis we have raised in the introduction as well as general differences in response variables among populations. We now provide information on variance partitioning for all models that include population as a random effect in S9. As you will see, population explains lower amounts of variation in our responses as the fixed effects in 9 out of 12 models. The random effects maternal and paternal genotype (mother&father) explain more variation than the random effect population in 6 of 12 cases. Thus, these data do not make a strong case for an extensive discussion of population-based differences in floral traits and this was also not a question or hypotheses we wanted to address with our study.

      I see no major weaknesses in the study, and but in my detailed response, I have made a few questions and suggestions about the floral scent analyses. In short, the authors have used a technique that is not the standard method used for making quantitative floral scent analyses, and I am curious about how it was made sure that the results obtained from the static headspace sampling using PDMS adsorbents could be used as a quantitative measure. I would suggest the authors to validate the use of this method more thoroughly in the manuscript, and have detailed this comment in my response to the authors.

      Also, and this may seem like a nit-picky comment, I am not convinced that the best way to describe the traits under study is "plant attractiveness", because in the experimental bioassays, most of the traits under study that are affected by the inbreeding treatment, did not result in a reduced pollinator visitation. Most (or all) of these traits may also be involved in other plant functions and important for other interactions, so I suggest potentially using a term like "floral traits" or "(putative) signalling traits".

      We now avoid the term floral attractiveness throughout the manuscript and instead refer to “floral traits”.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions: By and large, the authors achieved the aims of this study, and drew conclusions based in these results. One interesting aspect of this work that I think could be discussed a bit deeper is the lack of congruence between the effects of inbreeding on floral traits and the variation in visitation pattern in the bioassay. In fact, the only large effect of inbreeding on a floral trait that may play a role as an explanatory factor is the reduction of emission of lilac aldehyde A in inbred female S. latifolia from North America, which correspond to a reduced visitation rate in this group in the pollinator visitation bioassay. I have made some specific suggestions in my comments to the authors.

      We agree that this aspect required deeper discussion and revised the section at p 19, ll 520-526 accordingly. We believe that the limited spatial vision of H. bicruris in combination with our experimental setup for pollinator observations increased the relative importance of floral scent for pollinator visitation rates (suggested by referee #3).

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community: I think that one important aspect of this work that may broaden the impact of this study further is the link between these experiment, and our expectations from the evolution of selfing. Selfing plant species most often conform to the selfing syndrome, presenting smaller, less scented flowers than outcrossing relatives. Traditionally, the selfing syndrome is explained by natural selection against individuals that invest energy into floral signalling, when attracting pollinators is no longer crucial for reproduction. Some studies (for example Andersson, 2012, Am. J. Bot), however, have shown that only one, or a few, generations of inbreeding may reduce floral size as much as quite strong selection for reduced signalling. Here, at least for some populations and sexes, similar results are obtained in this paper regarding several traits (including floral scent), and one way to put this paper in context is by discussing the results in the light of these previous papers.

      We now address this issue at p 16, ll 417-420: “However, our findings highlight that even weak degrees of biparental inbreeding (i.e., one generation sib-mating) can result in a severe reduction of spatial flower trait and scent trait values that is detectable against the background of natural variation among multiple plant populations from a broad geographic region. This observation indirectly supports that the selfing syndrome (i.e., smaller, less scented flowers observed in selfing relative to outcrossing populations of hermaphroditic plant species) may not merely be a result of natural selection against resource investment into floral traits, but also a direct negative consequence of inbreeding (Andersson, 2012).”

      Reviewer #3 (Public Review):

      Schrieber et al. studied the effects of biparental inbreeding in the dioecious plant Silene latifolia, focusing specifically on traits important for floral attractiveness and pollinator attraction. These traits are especially important for dioecious species with separate sexes as they are obligate outcrossers. The authors find that inbreeding mostly decreases floral attractiveness, but that this effect tended to be stronger in the female flowers, which the authors suspect to result from the trade-off with larger investment in the sexual functions in the female plants. The authors then go on to couple the changes in visual and olfactory floral traits to pollinator attraction which allows them to conclude or at least speculate that differences in pollinator behavior are mostly driven by the changes in olfactory traits. The study is robust in its broad and well-balanced sampling of populations, rigorous and in large part meticulously documented experimental designs and linking of the effects on mechanisms to ecological function. The hypothesis are clearly stated and the study is able to address them mostly convincingly. However, some of the aspects of the decisions the authors made and possible caveats need to be addressed and elaborated on.

      A major caveat, in my opinion, is that while the authors find stronger effects of inbreeding on pollinator visitation rates in the plants from the North American (Na) origin, these plants were tested in an environment that was foreign to them, which could have important consequences for the results of this study. This is specifically because the main pollinator Hadena bicruris moth is completely absent from the populations in Na, and yet, was the main pollinator observed in the pollinator attraction experiment. As this pollinator is also a seed predator, the Na populations are released from the selection pressure to avoid attracting the females of this species and thus risking the loss of seeds and fitness. In fact, some of the results suggest that the release from the specialist pollinator and seed predator in Na has led to increase in the attractiveness of the female flowers based on the higher number of flowers visited in the outcrossed females compared to outcrossed males in the plant from the Na origin and the similar, though not statistically significant, pattern in the olfactory cue. While ideally this pollinator attraction experiment should be repeated within the local range of the Na plants, this is of course is not feasible. Instead I suggest the problem should be addressed in the discussion explicitly and its consequences for the interpretation of the results should be considered.

      Indeed, North American populations are tested in their “away”- habitat only and the observed plant performance and pollinator visitation rates can thus provide no direct implications for their “home”-habitat. We state this now more clearly at pp 11-12, ll 283-285. However, our design is appropriate for investigating inbreeding effects on plant-pollinator interactions in multiple plant populations in a common environment. Given the close taxonomic relationship of H. bicruris (main pollinator in Europe) and H. ectypa (main pollinator in North America), the behavioural responses of the former species to variation in the quality of its host plant was considered to overlap sufficiently with responses of the latter species as outlined at pp 11-12, ll 285-291.

      The hypothesis that North American (NA) S. latifolia evolved higher attractiveness to female Hadena moths because H. ectypa is not able to oviposit on female plants in contrast to H. bicruris is indeed a highly interesting one. However, as you have outlined correctly, our study is not designed to elaborate on questions related to adaptive evolutionary differentiation among North American and European plants. Instead of addressing this hypothesis based on our data, we thus take reference to previous studies in the discussion p 17, ll 482-487: “As discussed in detail in previous studies, higher flower numbers in North American S. latifolia plants (Figure 1b) may result from changes in the selective regimes for numerous abiotic factors (Keller et al., 2009) or from the release of seed predation. As opposed to H. bicruris, H. ectypa pollinates North American S. latifolia without incurring costs for seed predation, which may result in the evolution of higher flower numbers, specifically in female plants (Elzinga and Bernasconi, 2009).”

      The incorporation of the VOC data in the actual manuscript was quite limited and I found the reasoning for picking only the three lilac aldehydes (in addition to the Shannon diversity index) for the univariate statistical tests insufficient. How much more efficient was the effect of the lilac aldehydes compared to the other 17 compounds deemed important in the previous study? While the data on this one aldehyde matches the pollinator attraction results, having one compound out of 70 (or out of 20 if only considering the ones identified important for the main pollinator) seems, perhaps, fortuitous lest there is a good reason for focusing on these particular compounds.

      We adapted the text to increase clarity but sticked to our previous choice for the analyses of VOC data.

      i) We now explain our choice of analysing lilac aldehydes with more detail p9, ll 210-218: “For targeted statistical analyses, we focused on those VOC that evidently mediate communication with H. bicruris according to Dötterl et al. (2006). We analysed the Shannon diversity per plant (calculated with R-package: vegan v.2.5-5, Oksanen et al. 2019) for 20 floral VOC in our data set that were shown to elicit electrophysiological responses in the antennae of H. bicruris (Supplementary File 1). Moreover, we analysed the intensities of three lilac aldehyde isomers, which trigger oriented flight and landing behaviour in both male and female H. bicruris most efficiently when compared to other VOC in the floral scent of S. latifolia. Furthermore, H. bicruris is able to detect the slightest differences in the concentration of these three compounds at very low dosages (Dötterl et al. 2006).”

      ii) If one analyses 20 compounds with zero-inflation models (actually two models in one) + 8 floral trait models + 2 pollinator visitation models (zi-models with two component models), one ends up with 52 models investigating complex fixed and random effect structures. To keep type-1 errors as low as possible (see also comment 2.12.b from Referee#2), we approached the more comprehensive VOC data sets with multivariate analyses or Shannon diversity.

      iii) We tested the effect of sexoriginbreeding treatment on the Shannon diversity of 20 active VOC as well as in the random forest analyses with the 20 VOC and 70 VOC dataset and transparently reported the results from all of these analyses in the manuscript. Hence, the incorporation of VOC data was not limited. However, we agree that we have taken too little reference to these results and now changed the text accordingly. Results section p 13 ll 351-354: ”Multivariate statistical analyses of 20 H. bicruris active VOC and all 70 VOC detected in S. latifolia revealed no clear separation of floral headspace VOC patterns for any of the treatments (Figure 2-figure supplement 2). In summary, the combined effects of breeding treatment, sex and range on floral scent were rather week.”

      Sampling time of VOCs is reported ambiguously. Was it from 21:00 to 17:00 the next day or in fact from 9pm to 5AM (instead of 5 pm as reported)? Please be more specific in the text as this is quite important. If sampling tubes were left in place during the daytime, some of the compounds could have evaporated due to heating of the tubes in the summer. It would also be important to mention whether all of the headspace VOCs were sampled on the same day and whether there could be variation in i.e. temperature.

      Thank you very much for identifying this typo! It is from 9 pm to 5 am (p 9, l 186).

      Considering the experimental setup for the pollinator attraction observations and the pooling of the data at the block level (which I think is the right choice) it seems possible the authors were more likely to get a result where pollinator behavior matches the long-distance cue, the VOCs. Short-distance cues such a subtle difference in flower size would perhaps not be distinguished with the current setup. I would be interested to know if the authors agree, and if so, mention this in the discussion.

      Thank you very much for this excellent suggestion! We agree and discuss this aspect in detail at p 19, ll 520-526. Indeed, one would need two different experimental setups to assess the contributions of long and short distance cues. Our setup (large distances among plots) is optimal for long distance cues, while a setup for short distance cues should have all plants in close spatial proximity. However, the latter approach does then not allow to address long-distance cues and to exclude competition/facilitation for pollinators among plants from different treatment groups.

    1. Author Response:

      Evaluation Summary:

      This manuscript will be of interest to a broad audience of immunologists especially those studying host-pathogen interactions, mucosal immunology, innate immunity and interferons. The study reveals a novel role for neutrophils in the regulation of pathological inflammation during viral infection of the genital mucosa. The main conclusions are well supported by a combination of precise technical approaches including neutrophil-specific gene targeting and antibody-mediated inhibition of selected pathways.

      We would like to thank the reviewers for taking the time to review our manuscript, would also like to thank the editors for handling our manuscript. We are grateful for the positive response to our work and the thoughtful suggestions.

      Reviewer #1 (Public Review):

      Overall this is a well-done study, but some additional controls and experiments are required, as discussed below. The authors have done a considerable amount of work, resulting in quite a lot of negative data, and so should be commended for persistence to eventually identify the link between neutrophils with IL-18, though type I IFN signaling.

      Thank you! We appreciate the feedback and suggestions for strengthening the study.

      Major Comments:

      -A major conclusion of this manuscript is prolonged type I IFN production following vaginal HSV-2 infection, but the data presented herein did not actually demonstrate this. At 2 days post infection, IFN beta was higher (although not significantly) in HSV-2 infection, but much higher in HSV-1 infection compared to uninfected controls. At 5 days post infection the authors show mRNA data, but not protein data. If the authors are relying on prolonged type I IFN production, then they should demonstrate increased IFN beta during HSV-2 infection at multiple days after infection including 5dpi and 7dpi.

      We apologize for not including the IFN protein data and have now have provided this information in new Figure 3 and Figure 3 - Supplement 3. This new addition shows measurement of secreted IFNb in vaginal lavages at 4, 5 and 7 d.p.i., as well as total IFNb levels in vaginal tissue at 7 d.p.i..

      -Does the CNS viral load or kinetics of viral entry into the CNS differ in mice depleted of neutrophils, IFNAR cKO mice, or mice treated with anti- IL-18? Do neutrophils and/or IL-18 participate at all in neuronal protection from infection?

      To maintain the focus of our study on the host factors that contribute specifically to genital disease, we have not included discussion on viral dissemination into the PNS or CNS, especially as viral invasion of

      the CNS seems to be an infrequent occurrence during genital herpes in humans. However, we have performed some preliminary exploration of this interesting question, and find that viral invasion of the nervous system is unaltered in the absence of neutrophils. This is in accordance with the lack of antiviral neutrophil activity we have described in the vagina after HSV-2 infection. These preliminary data are provided below as a Reviewer Figure 1. We have not yet begun to investigate whether IL-18 modulates neuroprotection, but agree this is an important question to address in future studies.

      RFigure 1. Viral burden in the nervous system is similar in the presence or absence of neutrophils. Graphs show viral genomes measured by qPCR from the DRG, lower half of of the spinal cord and the brainstem at the indicated days post- infection.

      -In Figure 3 the authors show that neutrophil "infection" clusters 2 and 5 express high levels of ISGs. Only 4 of these ISGs are shown in the accompanying figures. Please list which ISGs were increased in neutrophils after both HSV-2 and HSV-1 infection, perhaps in a table. Were there any ISGs specifically higher after HSV-2 infection alone, any after HSV-1 infection alone?

      These tables listing differentially-expressed neutrophils ISGs during HSV-1 and HSV-2 have now been provided in new Figure 3 - Supplement 1, with complete lists of DEGs provided as Source Files for the same figure.

      -The authors claim that HSV-1 infection recruits non-pathogenic neutrophils compared to the pathogenic neutrophils recruited during HSV-2 infection. Can the authors please discuss if these differences in inflammation or transcriptional differences between the neutrophils in these two different infections could be due to differences in host response to these two viruses rather than differences in inflammation? Please elaborate on why HSV-1 used as opposed to a less inflammatory strain of HSV-2. Furthermore, does HSV-1 infection induce vaginal IL-18 production in a neutrophil-dependent fashion as well?

      These are excellent questions, and we have emphasized that differences in host responses against HSV-1 and HSV-2 likely lead to distinct inflammatory milieus that differentially affect neutrophil responses in lines 374-375 and 409-419. We completely agree that differences in neutrophil responses are likely due to distinct host responses against HSV-1 and HSV-2 and apologize for not making that clear. We have previously described some of the other differences in the immunological response against these two viruses (Lee et al, JCI Insight 2020). We would suggest that differences in the host response against these two viruses would naturally result in differences in the local inflammatory milieu, which then modulates neutrophil responses. Whether the transcriptomes of neutrophils beyond the immediate site of infection (outside the vagina) are different between HSV-1 and HSV-2 is currently an open question.

      As for why we used HSV-1 instead of a less inflammatory strain of HSV-2, we had originally been interested in trying to model the distinct disease outcomes that have previously been described during HSV-1 vs HSV-2 genital herpes in humans and thought this would be a relevant comparison. We have not yet examined infection with less inflammatory HSV-2 strains, but agree that this is a great idea. We have also not yet examined neutrophil-dependent IL-18 production in the context of HSV-1.

      Reviewer #2 (Public Review):

      This manuscript will be of interest to a broad audience of immunologists especially those studying host-pathogen interactions, mucosal immunology, innate immunity and interferons. The study reveals a novel role for neutrophils in the regulation of pathological inflammation during viral infection of the genital mucosa. The main conclusions are well supported by a combination of precise technical approaches including neutrophil-specific gene targeting and antibody-mediated inhibition of selected pathways.

      In this study by Lebratti, et al the authors examined the impact of neutrophil depletion on disease progression, inflammation and viral control during a genital infection with HSV-2. They find that removal of neutrophils prior to HSV-2 infection resulted in ameliorated disease as assessed by inflammatory score measurements. Importantly, they show that neutrophil depletion had no significant impact on viral burden nor did it affect the recruitment of other immune cells thus suggesting that the observed improvement on inflammation was a direct effect of neutrophils. The role of neutrophils in promoting inflammation appears to be specific to HSV-2 since the authors show that HSV-1 infection resulted in comparable numbers of neutrophils being recruited to the vagina yet HSV-1 infection was less inflammatory. This observation thus suggests that there might be functional differences in neutrophils in the context of HSV-2 versus HSV-1 infection that could underlie the distinct inflammatory outcomes observed in each infection. In ordered to uncover potential mechanisms by which neutrophils affect inflammation the authors examined the contributions of classical neutrophil effector functions such as NETosis (by studying neutrophil-specific PAD4 deficient mice), reactive oxygen species (using mice global defect in NADH oxidase function) and cytokine/phagocytosis (by studying neutrophil-specific STIM-1/STIM-2 deficient mice). The data shown convincingly ruled out a contribution by the neutrophil factors examined. The authors thus performed an unbiased single cell transcriptomic analysis of vaginal tissue during HSV-1 and HSV-2 infection in search for potentially novel factors that differentially regulate inflammation in these two infections. tSNE analysis of the data revealed the presence of three distinct clusters of neutrophils in vaginal tissue in mock infected mice, the same three clusters remained after HSV-1 infection but in response to HSV-2 only two of the clusters remained and showed a sustained interferon signature primarily driven by type I interferons (IFNs). In order to directly interrogate the impact of type I IFN on the regulation of inflammation the authors blocked type I IFN signaling (using anti IFNAR antibodies) at early or late times after infection and showed that late (day 4) IFN signaling was promoting inflammation while early (before infection) IFN was required for antiviral defense as expected. Importantly, the authors examined the impact of neutrophil-intrinsic IFN signaling on HSV-2 infection using neutrophil-specific IFNAR1 knockout mice (IFNAR1 CKO). The genetic ablation of IFNAR1 on neutrophils resulted in reduced inflammation in response to HSV-2 infection but no impact on viral titers; findings that are consistent with observations shown for neutrophil-depleted mice. The use of IFNAR1 CKO mice strongly support the importance of type I IFN signaling on neutrophils as direct regulators of neutrophil inflammatory activity in this model. Since type I IFNs induce the expression of multiple genes that could affect neutrophils and inflammation in various ways the authors set out to identify specific downstream effectors responsible for the observed inflammatory phenotype. This search lead them to IL-18 as possible mediator. They showed that IL-18 levels in the vagina during HSV-2 infection were reduced in neutrophil-depleted mice, in mice with "late" IFNAR blockade and in IFNAR1 CKO mice. Furthermore, they showed that antibody-mediated neutralization of IL-18 ameliorated the inflammatory response of HSV-2 infected mice albeit to a lesser extent that what was seen in IFNAR1 CKO. Altogether, the study presents intriguing data to support a new role for neutrophils as regulators of inflammation during viral infection via an IFN-IL-18 axis.

      In aggregate, the data shown support the author's main conclusions, but some of the technical approaches need clarification and in some cases further validation that they are working as intended.

      Thank you! We appreciate the enthusiasm for our work as well as the suggestions for improving our study.

      1) The use of anti-Ly6G antibodies (clone 1A8) to target neutrophil depletion in mice has been shown to be more specific than anti-Gr1 antibodies (which targets both monocytes and neutrophils) thus anti-Ly6G antibodies are a good technical choice for the study. Neutrophils are notoriously difficult to deplete efficiently in vivo due at least in part to their rapid regeneration in the bone marrow. In order to sustain depletion, previous reports indicate the need for daily injection of antibodies. In the current study the authors report the use of only one, intra-peritoneal injection (500 mg) of 1A8 antibodies and that this single treatment resulted in diminished neutrophil numbers in the vagina at day 5 after viral infection (Fig 1A). Data shown in figure 2B suggests that there are neutrophils present in the vagina of uninfected mice, that there is a significant increase in their numbers at day 2 and that their numbers remain fairly steady from days 2 to 5 after infection. In order to better understand the impact antibody-mediated depletion in this model the authors should have examined the kinetics of depletion from day 0 through 5 in the vaginal tissue after 1A8 injection as compared to the effect of antibodies in the periphery. These additional data sets would allow for a deeper understanding of neutrophil responses in the vagina as compared to what has been published in other models of infection at other mucosal sites.

      We agree and apologize for not providing this information in the original submission. Neutrophil depletion kinetics from the vagina have been shown in new Figure 1A, while depletion from the blood is shown in new Figure 1 - Supplement 1.

      2) The authors used antibody-mediated blockade as a means to interrogate the impact of type I IFNs and IL-18 in their model. The kinetics of IFNAR blockade were nicely explained and supported by data shown in supplementary figure 4. IFNAR blockade was done by intra-peritoneal delivery of antibodies at one day before infection or at day 4 after infection. When testing the role of IL-18 the authors delivered the blocking antibody intra-vaginally at 3 days post infection. The authors do not provide a rationale for changing delivery method and timing of antibody administration to target IL-18 relative to IFNAR signaling. Since the model presented argues for an upstream role for IFNAR as inducer of IL-18 it is unclear why the time point used to target IL-18 is before the time used for IFNAR.

      We thank Reviewer #2 for raising this point and apologize for not providing an explanation for the differences in antibody treatment regimens for modulating IFNAR and IL-18. As the anti-IL-18 mAb is a cytokine neutralizing antibody, we hypothesized that administering the antibody vaginally would help to concentrate the antibody at the relevant site of cytokine production and increase the potency of neutralization. This is in contrast to systemic administration of the anti-IFNAR1 mAb that acts to block signaling in the 'receiving' cell. We expect the anti-IFNAR1 mAb (given in much higher doses) to bind both circulating cells that are recruited to the site of infection as well as cells that are already at the site of infection. Similarly, we started the anti-IL-18 antibody treatment one day earlier to allow a presumably sufficient amount antibody to accumulate in the vagina. Our rationale has been included in the revised manuscript (lines 351-353). We are pleased to report, however, that we have conducted preliminary studies in which mice were treated beginning at 4 d.p.i. rather than 3 d.p.i., and observe similar trends. This data is provided below as Reviewer Figure 3.

      RFigure 3. Mice treated with anti-IL-18 mAb starting at 4 d.p.i. exhibit reduced disease severity. Mice were infected with HSV-2 and treated ivag with 100ug of anti-IL-18 on 4, 5 and 6 d.p.i.. Mice were monitored for disease until 7 d.p.i.. Data was analyzed by repeated measured two-way ANOVA with Geisser-Greenhouse correction and Bonferroni's multiple comparisons test.

      3) An open question that remains is the potential mechanism by which IL-18 is acting as effector cytokine of epithelial damage. As acknowledged by the authors the rescue seen in IFNAR1 CKO mice (Fig 5C) is more dramatic that targeting IL-18 (Fig 6D). It is thus very likely that IFNAR signaling on neutrophils is affecting other pathways. It would have been greatly insightful to perform a single cell RNA seq experiment with IFNAR CKO mice as done for WT mice in Fig 3. Such an analysis might would have provided a more thorough understanding of neutrophil-mediated inflammatory pathways that operate outside of classical neutrophil functions.

      We agree that the proposed scRNA-seq experiment comparing vaginal cells from IFNAR CKO and WT mice would be very interesting and insightful. Although a bit beyond the scope of the current manuscript, we are currently planning on performing these types of studies to better understand IFN-mediated regulation of inflammatory neutrophil functions.

      4) The inflammatory score scale used is nicely described in the methods and it took into consideration external signs of vaginal inflammation by visual observation. It would have been helpful to mention whether the inflammation scoring was done by individuals blinded to the experimental groups.

      This is an important point and we apologize for not making this clear. We have now provided this information in the methods section of the revised manuscript (lines 778).

      5) The presence of distinct clusters of neutrophils in the scRNA-seq data analysis is a fascinating observation that might suggest more diversity in neutrophils than what is currently appreciated. In this study, the authors do not provide a list of the genes expressed in each cluster within the data shown in the paper. Although the entire data set is deposited and publicly available, having the gene lists within the paper would have been helpful to provide a deeper understanding of the current study.

      The heterogeneity of the vaginal neutrophil population after HSV infection is indeed an unexpected finding. To provide a deeper understanding of these transcriptionally distinct clusters, we have now included complete lists of DEGs between the different clusters as Source Files for Figure 3.

      Reviewer #3 (Public Review):

      This paper examines the role of neutrophils, inflammatory immune cells, in disease caused by genital herpes virus infection. The experiments describe a role for type I interferon stimulation of neutrophils later in the infection that drives inflammation. Blockade of interferon, and to a lesser degree, IL-18 ameliorated disease. This study should be of interest to immunologists and virologists.

      This study sought to examine the role of neutrophils in pathology during mucosal HSV-2 infection in a mouse model. The data presented in this manuscript suggest that late or sustained IFN-I signals act on neutrophils to drive inflammation and pathology in genital herpes infection. The authors show that while depletion of neutrophils from mice does not impact viral clearance or recruitment of other immune cells to the infected tissue, it did reduce inflammation in the mucosa and genital skin. Single cell sequencing of immune cells from the infected mucosa revealed increased expression of interferon stimulated genes (ISGs) in neutrophils and myeloid cells in HSV-2 infected mice. Treatment of anti-IFNAR antibodies or neutrophil-specific IFNAR1 conditional knockout mice decreased disease and IL-18 levels. Blocking IL-18 also reduced disease, although these data show that other signals are likely to also be involved. It is interesting that viral titers and anti-viral immune responses were unaffected by IFNAR or IL-18 blockade when this treatment was started 3-4 days after infection, because data shown here (for IFN-I) and by others in published studies (for IFN-I or IL-18) have shown that loss of IFN-I or IL-18 prior to infection is detrimental.

      These data are interesting and show pathways (namely IFN-I and IL-18) that could be blocked to limit disease. While this suggests that IL-18 blockade might be an effective treatment for genital inflammation caused by HSV-2 infection, the utility of IL-18 blockade is still unclear, because the magnitude of the effect in this mouse model was less than IFNAR blockade. Additionally, further experiments, such as conditional loss of IL-18 in neutrophils, would be required to better define the role and source(s) of IL-18 that drive disease in this model.

      We thank the reviewer for the positive response and agree that additional studies would likely be necessary to fully understand the role of IL-18 during HSV-2 infection.

    1. Author Response:

      Reviewer #1 (Public Review):

      The study by Diboun et al. aims to investigate methylation profiles in Paget's disease of bone patients. Many of the genes identified near areas of differentially methylated sites were known to be involved in osteoclast differentiation, viral infection and mechanical loading. These gene pathways are known to play a role in the pathogenesis of PDB. The strength of this study is that it is the first study to look at changes in methylation profiles in Paget's disease of bone patients. Additionally, the genes identified as having differentially methylated sites suggest that environmental factors such as host immune responses may be altered and play a role in the pathogenesis of PBD. The main weakness of this study is that the cells that were analyzed for changes in methylation sites were not osteoclasts the cells of interest in PBD. While many of the genes identified have been shown to play a role in regulation of the skeletal system, results should be interpreted with caution until they are validated in bone tissue.

      We thank the reviewers and the editors for this thoughtful comment. Ebrahimi et al (EPIGENETICS; 2021, 16(1): 92–105) investigated correlation in methylation profiles between blood and bone tissue in 12 subjects using Illumina MethylationEPIC BeadChip array. Bone samples were taken from the exposed proximal femur after removal of the femoral head from osteoarthritis patients. After quality control, Ebrahimi et al focused the correlation analysis on 64,349 probes that fit their analysis criteria (to define the most highly correlated positions), of which 30,607 sites showed significant (FDR < 0.05) high correlation (r2 > 0.74) between bone and blood.

      Additional filter was applied to these sites to include those with at least 80% similar methylation profile between bone and blood (n = 28,549) which were reported as supplementary table in their paper. We assessed if CpG sites annotated to genes identified from our DMS and DMR analyses (Table 2 and 3) showed high correlation between bone and blood as reported by Ebrahimi et al. Results showed that CpGs annotated to 8 out of the 14 genes from our DMS analysis were among the highly correlated sites between blood and bone (r2 > 0.74; FDR <0.05; Supplementary File 6). For DMRs, out of the 10 genes reported in our study (Table 3), 6 had at least one CpG with high correlation between blood and bone (Supplementary File 6). It is important to note that, in the study by Ebrahimi et al, only 64,349 CpG sites were tested for correlation, owing to the stringent criteria adopted by the authors to identify the list of highly concordant sites. Therefore, our DMS/DMR sites that did not feature in the list are not necessarily uncorrelated. Unfortunately, these sites cannot be investigated further since Ebrahimi et al did not make their entire dataset available in public domain. To address this point, A table has been added to the manuscript (Supplementary File 6) listing the sites with high correlation and the text has been modified to include and discuss these results.

      Reviewer #2 (Public Review):

      This unique study has shown that epigenetic (therefore, potentially environment-driven) factors contribute to the pathogenesis of Paget's Disease of Bone (PDB). Although PDB is not very rare condition, its early diagnosis is problematic. The bone tissue is not easily accessible, thus many cases are not diagnosed till later in life. Thus, having diagnostic markers measured in blood, normalized to cell type count, might be of use for possible diagnostic applications.

      The PRISM trial's sample, comprising 232 cases and 260 controls from UK, was divided in two - discovery and replication sets - based on power calculations for EWAS. Meta-analysis of data from the discovery and replication sets revealed significant differences in DNA methylation. Among gene-body regions/loci, many associated with functions related to osteoclast differentiation, mechanical loading, immune function, etc. two loci were suggested as functional through expression quantitative trait methylation (eQTM) analysis. Further, there was some value in assessing the risk of developing PDB. The AUC of 82.5%, based on the 95 discriminatory sites from the "best subset" analysis, is promising for clinical applicability. If confirmed in independent samples and further studies, chromosomal loci found in this study may offer diagnostic markers for prediction of the disease.

      We would like to draw the reviewer’s attention to the fact that the original cohort comprised of 232 PDB cases and 260 controls (that is 116 cases and 130 controls in each of the discovery and cross validation set). The abstract has been slightly modified to make the text clearer.

      Reviewer #3 (Public Review):

      Diboun et al used a case-control study design to identify DNA methylation sites and regions that differ between individuals with Paget's Disease of Bone (PDB) and controls. Cases were identified from an ongoing PDB clinical trial. Spouses of cases were used as controls. Candidate methylation sites were identified in a discovery set and then tested in a validation set to confirm association with PDB. Meta-analysis was used to combine effects from the discovery and validation sets. A machine learning approach was then used to prioritize candidates and build a prediction model capable of differentiating PDB cases from controls. The model was associated with high level of accuracy (AUC >0.90) in the discovery and validation sets.

      A major strength of the study is the collection of a large population of individuals with a rare bone disease. Epigenetic features are appealing for building prediction models as they may represent interplay between genetics and environment. Using this approach, the authors built a prediction model with a high level of accuracy. The results advance our understanding of the etiology of PDB.

      Overall, the primary conclusions are generally well supported. However, there are several aspects of the paper that will require additional clarification.

      I commend the authors for using a split sample cross validation approach to maximize experimental rigor. However, this approach is distinct from a true external replication. Given that the 'training' and the 'test' sets come from the same overall population, we expect the 'replication' results to be optimistic relative to results from a true, external replication population. Given the absence of a suitable external replication population due the unique nature of the disease, this limitation is acceptable. However, I expect the authors to discuss the potential limitations of this approach in their discussion section and I encourage the authors to refer to the 'replication' set as a 'cross-validation' set to more appropriately convey their experimental approach to the broader scientific community.

      We have referred to the replication set as “cross-validation” as suggested by the reviewer. However, the study subjects were recruited from over 27 medical centres across the United Kingdom (UK) representing most major cities. We have also added text to discuss this point.

      The authors look for functional validation using the BIOS qTL database. This reference provides valuable information about functional role of methylation in gene expression in whole blood (eQTM). We know that eQTMs are tissue specific. Do the authors have any evidence whether the methylation plays a similar role in bone tissue?

      We agree that eQTMs tend to be tissue specific and although we were able to gather some confidence about concordance in methylation levels between blood and bone tissue samples using the Ebrahimi study, it is rather difficult to speculate about the concordance in the effect on gene expression. We therefore raise this issue in the study limitation section of the paper.

      The authors report the markers from their 'best set' for prediction have potential functional relevance. The potential clinical relevance, however, requires additional context. The data were obtained after onset of PDB. The potential for reverse causation cannot be overlooked. Do the authors have any evidence that the methylation markers precede clinical diagnosis? Appropriate temporality is an essential requisite for an effective clinical prediction model.

      We agree with the reviewers that this is an issue with most EWAS studies. The observed methylation changes reported in the study may exist as a consequence of the disease. We therefore updated our discussion of study limitations to reflect the potential issue of reverse causation (page 11). We also discussed the design of future experiments when the predictive value of our best subset set could be properly validated with appropriate temporality. Specifically, how individuals with a genetic predisposition or/and family history of PDB could be measured routinely for changes in the methylation patterns of the best subset identified in this study in an attempt to draw possible associations with future disease onset.

    2. Reviewer #3 (Public Review):

      Diboun et al used a case-control study design to identify DNA methylation sites and regions that differ between individuals with Paget's Disease of Bone (PDB) and controls. Cases were identified from an ongoing PDB clinical trial. Spouses of cases were used as controls. Candidate methylation sites were identified in a discovery set and then tested in a validation set to confirm association with PDB. Meta-analysis was used to combine effects from the discovery and validation sets. A machine learning approach was then used to prioritize candidates and build a prediction model capable of differentiating PDB cases from controls. The model was associated with high level of accuracy (AUC >0.90) in the discovery and validation sets.

      A major strength of the study is the collection of a large population of individuals with a rare bone disease. Epigenetic features are appealing for building prediction models as they may represent interplay between genetics and environment. Using this approach, the authors built a prediction model with a high level of accuracy. The results advance our understanding of the etiology of PDB.

      Overall, the primary conclusions are generally well supported. However, there are several aspects of the paper that will require additional clarification.

      I commend the authors for using a split sample cross validation approach to maximize experimental rigor. However, this approach is distinct from a true external replication. Given that the 'training' and the 'test' sets come from the same overall population, we expect the 'replication' results to be optimistic relative to results from a true, external replication population. Given the absence of a suitable external replication population due the unique nature of the disease, this limitation is acceptable. However, I expect the authors to discuss the potential limitations of this approach in their discussion section and I encourage the authors to refer to the 'replication' set as a 'cross-validation' set to more appropriately convey their experimental approach to the broader scientific community.

      The authors look for functional validation using the BIOS qTL database. This reference provides valuable information about functional role of methylation in gene expression in whole blood (eQTM). We know that eQTMs are tissue specific. Do the authors have any evidence whether the methylation plays a similar role in bone tissue?

      The authors report the markers from their 'best set' for prediction have potential functional relevance. The potential clinical relevance, however, requires additional context. The data were obtained after onset of PDB. The potential for reverse causation cannot be overlooked. Do the authors have any evidence that the methylation markers precede clinical diagnosis? Appropriate temporality is an essential requisite for an effective clinical prediction model.

    3. Reviewer #2 (Public Review):

      This unique study has shown that epigenetic (therefore, potentially environment-driven) factors contribute to the pathogenesis of Paget's Disease of Bone (PDB). Although PDB is not very rare condition, its early diagnosis is problematic. The bone tissue is not easily accessible, thus many cases are not diagnosed till later in life. Thus, having diagnostic markers measured in blood, normalized to cell type count, might be of use for possible diagnostic applications.

      The PRISM trial's sample, comprising 232 cases and 260 controls from UK, was divided in two - discovery and replication sets - based on power calculations for EWAS. Meta-analysis of data from the discovery and replication sets revealed significant differences in DNA methylation. Among gene-body regions/loci, many associated with functions related to osteoclast differentiation, mechanical loading, immune function, etc. two loci were suggested as functional through expression quantitative trait methylation (eQTM) analysis. Further, there was some value in assessing the risk of developing PDB. The AUC of 82.5%, based on the 95 discriminatory sites from the "best subset" analysis, is promising for clinical applicability. If confirmed in independent samples and further studies, chromosomal loci found in this study may offer diagnostic markers for prediction of the disease.

    4. Reviewer #1 (Public Review):

      The study by Diboun et al. aims to investigate methylation profiles in Paget's disease of bone patients. Many of the genes identified near areas of differentially methylated sites were known to be involved in osteoclast differentiation, viral infection and mechanical loading. These gene pathways are known to play a role in the pathogenesis of PDB. The strength of this study is that it is the first study to look at changes in methylation profiles in Paget's disease of bone patients. Additionally, the genes identified as having differentially methylated sites suggest that environmental factors such as host immune responses may be altered and play a role in the pathogenesis of PBD. The main weakness of this study is that the cells that were analyzed for changes in methylation sites were not osteoclasts the cells of interest in PBD. While many of the genes identified have been shown to play a role in regulation of the skeletal system, results should be interpreted with caution until they are validated in bone tissue.

    5. Evaluation Summary:

      Paget disease of bone (PDB) results in focal areas of disorganized bone, leading to bone deformities and fragility. There is substantial interest in finding circulating biomarkers that might be of use for possible diagnostic applications and towards this end, these authors identified novel DNA methylation patterns in peripheral blood mononuclear cells that are able to differentiate PDB cases from controls with a high level of accuracy. This prediction model has functional relevance as these candidate methylation sites and regions are associated with osteological and immunologic processes and in the longer term, has future clinical potential.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #2 (Public Review):

      The authors sought to understand the mechanisms determining whether the kinase RIPK3 induces apoptosis or necroptosis and the physiological significance of this dual function. They identified a new phosphorylation event on RIPK3 (S164/T165) that appears to inhibit its capacity to induce necroptosis and make it a potent inducer or apoptosis. Low levels of the chaperone HSP90/CDC37 seem to favor S164/T165 RIPK3 phosphorylation, which is suggested to be important for luteal regression by inducing apoptosis in luteal granulosa cells in the ovaries of female mice.

      The results presented expand on previous studies showing that whereas RIPK3 induces necroptosis by phosphorylating MLKL, inhibition of RIPK3 kinase activity by small molecules or by D160N mutation caused apoptosis and embryonic lethality. The authors provide experimental evidence supporting that phosphorylation on S164/T165 promotes apoptosis in vitro and in vivo, however the mechanisms regulating this transition remain poorly understood. The data on HSP90/CDC37 is supportive but largely correlative. The authors speculate that association with this chaperone is necessary for proper folding of RIPK3 into a configuration that can only be activated by upstream necroptosis inducers, while at low HSP90/CDC37 levels RIPK3 is not correctly folded and likely auto-phosphorylates on S164/T165, however this remains to be demonstrated. The authors propose that this process is particularly important in luteal granulosa cells and provide some evidence suggesting that RIPK3 phosphorylation on S164/T165 occurs in the ovaries of older mice. This seems counterintuitive given that corpus luteum involution occurs as part of the ovulation cycle and should therefore be especially relevant in young, sexually mature mice. Most importantly, there is no evidence that RIPK3 phosphorylation at these sites is important for female reproductive function, questioning its physiological significance. It would be important to know whether RIPK3 deficient or S165a/T166A mutant mice show any reproductive defects that would be expected by the lack of the proposed RIPK3-mediated apoptosis program in luteal granulosa cells.

      The in vivo data in the knock-in mouse models clearly show that phosphomimetic mutations (RIPK3S165D/T166E) on RIPK3 cause severe pathology in multiple organs associated with increased numbers of dying cells. However, rescue experiments, for example by crossing to caspase-8 knockout mice, to prove that the pathology is indeed induced by apoptosis are lacking. It is also interesting that heterozygous expression of the phosphomimetic mutants does not cause any pathology in vivo. The authors speculate that a threshold of expression is required for activation of this mutant, however an alternative explanation could be that the presence of the wild type protein prevents its activation, e.g. by trans-autophosphorylation on S227. Introducing a RIPK3 null allele to generate heterozygous RIPK3S165D/T166E mice that do not express wild type RIPK3 could help resolve this question, as in that case the phosphomimetic mutant will be expressed at the same level but in the absence of the wild type protein.

      Finally, most of the in vitro mechanistic studies rely on overexpression of the different mutants in cell lines. Using cells from the knock-in mice expressing the mutated proteins at endogenous levels would be a more appropriate experimental system to explore the mechanistic underpinnings such as the interaction with HSP90/CDC37.

    2. Reviewer #1 (Public Review):

      The protein kinase RIPK3 was widely known to promote a form of lytic cell death termed necroptosis. However, RIPK3 could also promote apoptotic cell death under certain conditions. However, the mechanism by which RIPK3 promotes apoptosis and the physiological relevance of this apoptotic activity were not understood. In this study, the authors provided answers to these two questions.

      Strengths:

      The authors found that a specific phosphorylation on RIPK3 plays a critical role in the switch of RIPK3 into an apoptosis-inducing protein. The authors provided strong evidence to support their conclusion using mouse genetics and demonstrated a role for this RIPK3 activity in reproductive physiology.

      Weaknesses:

      Although the authors succeeded in finding the protein phosphorylation that controls the form of cell death mediated by RIPK3, key questions remained as to how this modification prevents RIPK3 from promoting necroptosis. Also, the authors implied that the kinase activity of RIPK3 is critical in this switch to apoptosis. However, the phenotypes of mice that lack RIPK3 kinase activity do not match that of the mice that harbor mutations that mimic this phosphorylation.

      Overall, this work should provide useful information for future studies to further examine the mechanism by which RIPK3 controls different types of cell death in normal and pathophysiology.

    3. Evaluation Summary:

      This manuscript is of potential interest to the field of cell death research in terms of understanding basic mechanisms and in the context of disease. The authors have used a broad range of methodologies and identified key phosphorylation sites on the protein kinase RIPK3 that determine whether cells undergo necroptotic or apoptotic cell death. The authors examine this phosphorylation event in the context of corpus luteum regression.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #3 (Public Review):

      Jenny I. Aguilar et. al. present a manuscript that methodically investigates the behavioral, structural, functional, and physiological consequences of a Cys substitution at R445 in the human dopamine transporter. Parkinson's disease is a common progressive neurodegenerative disorder that affects millions of people worldwide. In most patients, the underlying cause their disease is unknown, but some genetic forms of Parkinsonism have been identified. In this manuscript, the authors investigate the effect of a mutation in the gene that encodes the dopamine transporter that was identified in a patient with infantile Parkinsonism-Dystonia. Using a Drosophila model and an abundance of tools, the data show that the mutation produces: 1) a reduction in spontaneous motor activity, movement vigor, compromised flight initiation, and impaired coordinated movements; 2) a decrease in dopamine content and the number of tyrosine hydroxylase containing neurons in fly brain; 3) a decrease in amphetamine-induced dopamine efflux and dopamine uptake; 4) altered dopamine transporter structure leading to increased probability of open conformations on both sides of the transporter; 5) a reduction in dopamine transporter surface expression and transport capacity. Chloroquine, used as means to limit dopamine transporter lysosomal degradation, increased the ratio of mature to immature dopamine transporter and improved flight initiation. So why does a decrease in dopamine reuptake promote a dopamine-deficient Parkinson phenotype? The Authors conclude that an overall reduction in dopamine transporter would deplete dopamine stores by promoting excessive extracellular dopamine. The decrease in vesicular release would be further exacerbated by DA stimulation of presynaptic dopamine-D2 receptors on dopamine axons. This rather novel counterintuitive hypothesis appears to be supported by the outcome of this investigation. Overall, the study may highlight the mechanism underlying a rare type of Parkinsonism that can affect children as well as adults.

    2. Reviewer #2 (Public Review):

      I have reviewed Psychomotor Impairments and Therapeutic Implications Revealed by a Mutation Associated with Infantile Parkinsonism-Dystonia by Aguilar et al. The authors first express hDAT in the dDAT loss of function background to explore in vivo effects. The comparison of hDAT rescue flies to wild type flies and the DAT mutant provide a nice control for the functionality of the hDAT transgene. A better control might have been rescue using dDAT with the same driver but this is a very minor concern since the wild type flies and the hDAT rescue look so similar. They then show that the R445C mutant decreases "movement vigor" and flight initiation. They use HPLC and immunolabeling to convincingly show deficits in both total tissue DA and a decrease in the number of detectable DA cells and use amperometry in the fly brain to quantify defects in efflux. Amperometry in the fly brain is technically impressive since few other labs have accomplished this without fouling the carbon electrode. In the second section of the paper, the authors perform a structural analysis, using LeuT to model DAT. The combination of Rosetta modeling, X-ray crystallography and EPR spectroscopy further adds to the technical strength of the paper. They show that substitution at the position in LeuT R375 analogous to DAT R445 disrupts a previously identified salt bridge and the IC vestibule. They then generate X-ray crystal structures of LeuT WT, LeuT R375A and LeuT R375D at resolutions of 2.1-2.6 Å. Their analysis confirms that substitution at LeuT-R375 disrupt salt bridge formation consistent with Rosetta modeling. They further conform the disruption of the interaction between R375 and its partner using a variant of EPR and show that substitutions at this site bias toward open conformations. In the final figure of the paper they heterologously express the DAT mutants in cell culture and show that cell surface expression, transport and efflux are compromised, similar to previously published findings from another lab. Finally, they show that chloroquine can rescue some of the behavioral deficits in the fly.

      The authors present a remarkably comprehensive and technically sophisticated analysis of the structure, function and behavioral sequelae of a mutation in the DAT (hDAT R445C). The analysis is translationally relevant since the mutation was identified in a patient suffering from a rare movement disorder relevant to Parkinson's disease. The combination of behavioral and biochemical analysis in a transgenic animal with X-ray crystallography and modeling is extremely unusual and from a technical standpoint the paper is unusually strong. The insight gained from comparing the structural and functional halves of the paper is also useful. The partial pharmacologic rescue of the behavioral deficits further elevates this work.

      Concerns It might be argued that the insights obtained from comparing the various data on modeling, structural analysis, biochemical assays and the behavior of the R445 mutant may not always be consistent with one another, making it difficult to determine the physiological relevance of each effect. This concern is balanced by the idea that we cannot know which aspects of any given mutant will or will not conform to expectations without the comprehensive analysis used here. As such, the paper provides an important example of examining a risk allele in a variety of different ways to determine which molecular deficits may be relevant to the observed phenotype and to the function of the transporter. That said, the authors should add text to acknowledge that some of the molecular defects they observe may be overshadowed by others and/or may not be as relevant to the in vivo defects in activity. For example, the idea that efflux may play a role in the R445 phenotype similar to other mutants and neuropsychiatric illness in general is provocative, but seems difficult to reconcile with the observation that relatively low levels of protein are present at the cell surface.

      The behavioral analysis is elegant and takes advantage of high-speed video recording to determine subtle defects in movement. The specificity of the defect is also interesting since grooming is not affected. However, it is difficult to determine whether the data represent a true deficit in movement versus wakefulness or overall activity of the animal. Dopamine is well known to be required for sleep in the fly and it is unclear whether the "deficits in movement vigor" are caused by the flies being "sleepy". Alternatively, higher order decision making processes rather than movement per se might be compromised. These explanations for the observed deficits would not take away from the importance of the findings. Indeed, as the authors acknowledge, the non-motor symptoms of PD are just as important as the motor symptoms. However, it seemed at times that authors felt compelled to fit their data into a motor paradigm rather than taking a more general view on the relationship of the observed defects to other problems that accompany PD. The authors should address these issues with additional text. Additional experiments to address this issue are likely beyond the scope of the current manuscript which is already quite lengthy.

      Minor points:

      The authors discuss a model in which loss or DAT reuptake and an increase in extracellular DA could down regulate TH. Since they use TH labeling to count DA cells they should acknowledge the possibility the cells are not absent in the mutant (even if they are functionally compromised) but are simply not detectable.

      It is unclear why (Brand and 147 Perrimon, 1993); are cited on line 146.

      Typo in "Initiate" on Y axis of Fig 3B.

      State somewhere in the text or in the Fig 3 legend that HPLC was used to measure tissue concentrations of DA to make it more obvious that amperometry was not used

    3. Reviewer #1 (Public Review):

      The authors generated new transgenic fly lines with the human dopamine transporter (hDAT-WT) and the hDAT with the R445C mutation (hDAT-R445C). Studies in the hDAT-R445C flies show a decrease of tissue DA content and a loss of TH+ PPL1 neurons indicating an effect of the DAT mutation on dopamine neuron phenotype or cell survival rather than general DA levels per se. The motor phenotypes observed in the fly include a decrease in the time to initiate flight and in the velocity of locomotion (vigor) but not in the velocity of locomotion initiation or grooming behavior. These behaviors are consistent with the bradykinesia observed in patients. This model system could potentially be used to assay for specific modulators of the mutant to restore surface expression, TH expression and motor behavior.

      In the recombinant cell culture system (HEK Cells), the major consequence of the mutation is a decrease in cell surface expression (there is a decrease in conversion to the mature form). A change in the Km is difficult to ascertain with such a dramatic change in the cell surface expression level but looks to be dramatically decreased (higher affinity). These data differ somewhat from those reported in the study by Ng et al, 2014 where the Bmax for CFT was slightly reduced and the affinity was significantly decreased (Km was ~8 fold higher) as was the Ki for DA inhibition of CFT. It should be noted that the decrease in cell surface expression of R445C reported by Ng et al was also not as dramatic as what the same group demonstrated for the other mutation, R87L, that was compound heterozygous in this family. Differences in the transport properties between the two studies should be discussed.

      X-ray crystallography and molecular modeling provide novel insights into how the mutation (and other substitutions at this site) affects structure-function relationships of the transporter with respect to gating, uptake and efflux. This information could be used to design modulators of the transporter mutants to rescue cell surface expression or function.

      The behavioral effect of CQ on the mutant flies was on the time to flight initiation, which decreased. Locomotion was not tested.

      The value of the study is the creation of the flies for screening and the crystallography and molecular modeling studies which examined the impact of this residue on function in detail. The weakness of the study was the limited characterization of the transport properties and cell surface expression in the flies. Being able to tie together the different studies into a cohesive understanding of what happens in patients and thus what needs to be corrected in patients is an important goal of the study. Some of the key questions needed to achieve this understanding were not fully addressed.

    4. Evaluation Summary:

      Infantile parkinsonism-dystonia is a rare but devastating condition that leads to early mortality. Mutations in the dopamine transporter that decrease its transport activity or cell surface expression have been identified as potential causes of this disease. Here, Aguilar et al perform a series of experiments to examine the effect of one of the mutations, R445C, on properties of the transporter in cell culture and on motor function in newly generated transgenic flies. They also explore structure function relationships of the mutation using X-ray crystallography of LeuT, a bacterial homolog, and molecular modeling. Lastly, they show blocking lysosomal degradation rescues a motor deficit in the flies. Insights from the work could lead to new approaches to specifically modulate the transporter structure to restore surface expression and function of the mutant dopamine transporter in this disorder. This elegant and technically sophisticated analysis is of interest to readers in the fields of neurobiology, behavior, and movement disorders, as the work provides an excellent example of using a variety of different approaches to determine the relationship between transporter structure and activity and potentially underlying pathology in human disease.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Author Response:

      We would like to thank the reviewers for their thoughtful and thorough critique of our manuscript. In our revised preprint, we added important additional data and restructured our manuscript to reflect as many of the recommendations as possible. Additionally, we have added experiments to define the cellular mechanisms underlying observed damage following mechanical injury. The most significant additions of new data include:

      • Further experiments demonstrating block of glutamate clearance exacerbates stimulus-induced hair-cell synapse loss.
      • Analysis of neuromast disruption in lhfpl5b mutant null larvae showing mechanical displacement. Lhfpl5b mediates mechanosensitivity in lateral-line hair cells, allowing us to determine whether mechanotransduction is required for mechanical disruption of neuromasts.
      • Testing the vibratory stimulus at various frequencies to confirm the optimal frequency to induce acute, generally sub-lethal damage to lateral-line hair cells is 60 Hz.
      • Assessment of neuromast supporting cell and hair cell proliferation following mechanical overstimulation.
      • Quantitative analysis of kinocilia SEM and confocal images of hair bundles in control and stimulus exposed fish. Individual comments are addressed as outlined below.

      Reviewer #1:

      1) The authors use a vertically-oriented Brüel+Kjær LDS Vibrator to deliver a 60 Hz vibratory stimulus to damage lateral line hair cells. It is not made clear on why this frequency was selected. Did the authors choose this frequency because they screened a number of frequencies and this is the one that did the most damage to hair cells or was it chosen for another reason? Or, do all frequencies do the same amount of damage? The authors should screen a number of frequencies and choose the stimulus that does the most damage to hair cells. This would set the field in the best direction, should members of the community attempt this new technique. It is not necessary to repeat all of the experiments, but the authors should show which frequencies are best for inducing damage.

      The frequency selected for mechanical overexposure of lateral-line organs was based on previous studies showing 60 Hz to be within the optimal upper frequency range of mechanical sensitivity of superficial posterior lateral-line neuromasts, with maximal response between 10-60 Hz, but a suboptimal frequency for hair cells of the anterior macula in the ear (Weeg and Bass 2002, Trapani et al, 2009, Levi et al, 2015). To confirm that 60 Hz was the optimal frequency to induce damage, we tested 45, 60, and 75 Hz at comparable intensities. We observed at 75 Hz no apparent damage to lateral line neuromasts while 45 Hz at a comparable intensity proved toxic i.e. it was lethal to the fish. We have updated the Results and Method Details to include our rationale for choosing 60 Hz.

      2) The SEM images of the hair bundle are beautiful and do show damage to the hair bundle, but historically speaking older studies in mammals have shown that the actin core of the stereocilia is damaged. It would be critical to know if this was the case. Showing damage to the kinocilium and stereocilia splaying is a start, but readers would need to know if the actin cores are damaged. So, TEM should be used to find damage to the actin cores of stereocilia.

      Our main goal of this initial manuscript was to survey morphological and functional changes in mechanically injured lateral line organs with an emphasis on inflammation and synapse loss. We agree TEM studies showing damage to the actin core of the stereocilia will be important to determine whether mechanical damage to neuromast hair bundles fully mimics mammalian stereocilia damage, but these experiments will require significant time to perform and optimize. We have expanded our analysis of hair-bundle morphology in this study and intend to pursue deeper analysis of hair bundle damage, i.e. examination of the stereocilia actin core, in future follow-up studies.

      3) I think the use of "Noise-exposed lateral line" as a term for mechanically overstimulated lateral line hair cells is not correct and could be misleading. The lateral line senses water motion not sound as the word noise would imply. Calling the stimulus "noise" should be removed throughout.

      We have removed the term “noise” throughout the manuscript and replaced it with either “strong water current stimulus” or “mechanical overstimulation” where appropriate.

      4) Decreases in mechanotransduction are shown by dye entry. These results should be strengthened using microphonic potentials to determine the extent of damage. This experiment is not necessary but would improve the quality of the document.

      While we agree that microphonic recordings would provide further support for reduced mechanotransduction, quantitative FM1-43 uptake in zebrafish lateral line hair cells is a well-established proxy for microphonic measurements. In a previous study using the same protocol utilized in our manuscript, FM1-43 labeling intensity was shown to directly correspond with microphonic amplitude (Toro et al, 2015). Moreover, the fixable analogue of FM1-43 (FM1-43FX) gave us comparable relative measurements of uptake as live FM1-43 and provided the additional advantage of high temporal resolution and the ability to simultaneously assay entire cohorts of control and overstimulated fish (which is not possible with microphonic measurements or live FM1-43 imaging), as we could expose groups of fish briefly to the dye at determined time intervals following overstimulation, then immediately place in fixative.

      5) In figure 2, PSD labeling is not clear.

      We assume the reviewer meant PSD labeling in Figure 4 and we agree it is difficult to discern. We have changed the hair-cell label from gray to blue in the images so that the green PSD labeling is clear.

      Reviewer #2:

      1) While the findings are carefully measured and described, the effects of insult on hair cells are relatively minor, with a change in hair cell number, extent of innervation or synapses per hair cell (Figs 3 and 4) in the range of 10% reduction compared to control. One potential value of the model would be to use it to discover underlying pathways of damage or screen for potential therapeutics. However with these modest changes it is not clear that there will be enough power to determine effects of potential interventions.

      One advantage of the zebrafish model is the ability to overstimulate large cohorts of larvae, thereby providing enough power to uncover modest but significant changes resulting from moderate damage to hair cells. While not as well suited for unbiased large-scale screens of therapeutics, our overexposure protocol provides the opportunity to determine the role of specific cellular pathways (e.g. metabolic stress, inflammation, and glutamate excitotoxicity) in hair-cell damage and synapse loss following mechanically-induced damage via genetic or pharmacological manipulation of these pathways. Additionally, as the hair cell synapses fully repair following stimulus-induced loss, the zebrafish model has the potential for identifying novel pathways for repair through transcriptomic profiling (for an example, see Mattern et al, Front. Cell Dev. Biol., 2018). Cumulatively, these future experimental directions will provide important mechanistic information that could be used toward the development of targeted therapeutic interventions.

      2) The most dramatic phenotype after shaking is a physical displacement of hair cells, described as disrupted morphology. However it is not clear what the underlying cause of this change. Are only posterior neuromasts damaged in this way? Is it a wounding response as animals are exposed to an air interface during shaking? It is also not clear to what extent this displacement reveals more general principles of the effects of noise on hair cells. Additional discussion of underlying causes would be welcome.

      We agree that the underlying causes of the physical displacement of posterior lateral-line neuromasts warranted further investigation and we have expanded appropriate sections of the results. To determine if excessive hair-cell activity plays a role in the displacement of neuromasts we have exposed lhfpl5b mutant—fish that have intact hair cell function in the ear, but no mechanotransduction in hair cells of the lateral line—to mechanical overstimulation. We observed comparable disruption of neuromasts lacking mechanotransduction, supporting that displacement of lateral-line hair cells is due to mechanical damage and does not require intact mechnotransduction. Further, when examining the adjacent supporting cells in disrupted neuromasts, we observed they are similarly displaced and elongated. We conclude that observed disruption of hair cells is a consequence of mechanical displacement of the entire neuromast organ. We have added additional discussion of this phenomenon to the Results and Discussion sections of the manuscript.

      3) Because afferent neurons innervate more than one neuromast and more than one hair cell per neuromast, measurements of innervation of neuromasts (Figure 3) or synapses per hair cell (Fig 4) cannot be assumed to be independent events. That is, changes in a single postsynaptic neuron may be reflected across multiple synapses, hair cells, and even neuromasts. This needs to be accounted for in experimental design for statistical analysis.

      We agree that changes in single postsynaptic neurons, which innervate groups of hair cells of the same polarity within a neuromast, could be reflected across multiple synapses. Additionally, it is plausable that excitotoxic events at the postsynapse, while not contributing to apparent neurite retraction, could be contributing to synapse loss across multiple innervated hair cells. We have updated the manuscript to reflect the potential contribution of postsynaptic signaling to synapse loss and added experiments pharmacologically blocking glutamate uptake.

      4) The SEM analysis provides compelling snapshots of apical damage, but could be supplemented by quantitative analysis with antibody staining or transgenic lines where kinocilia are labeled. The amount of reduced FM1-43 labeling is one of the more dramatic effects of the shaking insult, suggesting widespread disruption to mechanotransduction that could be related to this apical damage. Further examination of the recovery of mechanotransduction would be interesting.

      To supplement the SEM snapshots of severe apical damage, we have expanded the SEM image analysis with quantitative data on kinocilia morphology. We have also added confocal images of hair bundles using antibody labeling of acetylated tubulin in a transgenic line expressing β-actin-GFP in hair cells. We agree that correlative studies of mechanotransduction recovery relative to hair-bundle morphology would be interesting, and we intend to examine this question in a future follow-up study.

      5) A previous publication by Uribe et al.2018 describes a somewhat similar shaking protocol with somewhat different results - more long-lasting changes in hair cell number, presynaptic changes in synapses, etc. It would be worth discussing potential differences across the two studies.

      We agree we did not adequately address the considerable differences between our mechanical damage protocol for the zebrafish lateral line and the damage protocol described by Uribe et al, 2018. We have provided a more direct comparison in the Results section and addressed the differences in our protocols in-depth in the Discussion section.

      Our damage protocol uses a stimulus within the known frequency range of lateral-line hair cells (60 Hz) that is applied to free-swimming larvae and evokes a behaviorally relevant response (fast start response). The damage is observable immediately following noise exposure, is specific to posterior lateral-line neuromasts, and appears to be rapidly repaired. Some features of the damage we observe—reduced mechanotransduction and hair-cell synapse loss—may correspond to mechanically induced damage of hair cell organs in other species. Notably, hair cell synapse loss in seemingly intact neuromasts is exacerbated by pharmacologically blocking synaptic glutamate clearance, supporting that the 60 Hz frequency stimulus is overstimulating neuromast hair cells directly and suggesting that the mechanism of synapse loss may be similar to inner hair cell synapse loss reported in mice following moderate noise exposures.

      By contrast, the damage protocol published by Uribe et al used ultrasonic transducers (40-kHz) to generate small, localized shock waves rather than directly stimulate neuromast hair cells. The damaged they reported—delayed hair-cell death and modest synapse loss with no effect on hair-cell mechanotransduction—was not apparent until 48 hours following exposure and not specific to the lateral-line organ. Some of the features of the damage they observed—delayed onset apoptosis and hair-cell death—may correspond to damage reported in mice following blast injuries.

      Reviewer #3:

      1) As the authors point out, zebrafish hair cells can be regenerated. With that in mind, and to make the relevance for mammalian hair cell repair clear, a clear distinction between mechanisms mediated by "repair" or "regeneration" needs to be made. The authors discuss that proliferative hair cell generation can be excluded based on the short time period, but suggest that transdifferentiation might be involved. Recovery of NM hair cell number occurs within the same 2 hour period in which NM morphology and hair cell function improved, making it difficult to determine the extent to which "regeneration" contributed to the recovery. The amount of transdifferentiation has to be shown experimentally (lineage tracing?).

      We agree that the distinction between "repair" and "regeneration" needs to be made when discussing this model of mechanical damage to zebrafish hair cell organs. We have tried to clarify that most of what we observe regarding recovery—restoration of neuromast shape, mechanostransduction, afferent contacts, and synapse number —reflect mechanisms of repair following mechanical damage (and, in the case of synapse loss, overstimulation) rather than regeneration. However, one feature of damage that may reflect rapid regeneration is restoration of hair cells number following mechanical injury. To experimentally determine whether proliferation contributed to hair cell generation, we assessed the incorporation of the thymidine analog EdU during a 4 hour recovery following mechanical overexposure in a transgenic line expressing GFP in neuromast supporting cells and observe a modest but not statistically significant increase in the number of proliferating supporting cells in neuromasts exposed to strong current stimulus, suggesting recovery of lost hair cells is not primarily due to renewed proliferation.

      The number of hair cells that are lost and recover within several hours are low, i.e., typically ~1 hair cell/neuromast. We observed this consistently in all of our experiments, but the mechanisms responsible are not clear. Based on previous studies of hair cell regeneration in the lateral line, the recovery time appears too rapid to be caused by renewed proliferation, a notion that is further supported by our Edu studies. On the other hand, it is possible that a few supporting cells may undergo the initial phases of phenotypic change into hair cells during this short time period, and we speculate that such transdifferentiation may be responsible for the observed recovery. We should emphasize that this is a new observation and, at present, we do not fully understand the underlying mechanism. However, the focus of the present study is on mechanical damage, synaptic loss, and subsequent repair. We believe that it is important to report our consistent findings of low level hair cell loss and recovery, but a detailed characterization of the mechanism would require considerable effort and would best be the topic of a future study.

      2) The classification of "normal" vs "disrupted" is vague and not quantitative. The examples shown in the paper seem to be quite clear-cut, but this reviewer doubts that was the case throughout all analyzed samples. Formulate clear benchmarks and criteria for the disrupted phenotype (even when blind analysis is performed).

      We have defined measurable criteria for "normal" vs "disrupted" neuromasts that we have added to the Method Details section: “We defined exposed neuromast morphology as “normal” when hair cells appeared radially organized with a relatively uniform shape and size, with ≤7 μm difference observed when comparing the lengths from apex to base of an opposing pair of anterior/posterior hair cells. Length was measured from a fixed point at the center of the hair bundle to the basolateral end of each opposing hair cell. We defined neuromasts as “disrupted” when hair cells appeared elongated and displaced to one side, with >7 μm difference observed when comparing the lengths of an opposing pair of anterior/posterior hair cells. Generally, the apical ends of the hair cells were displaced posteriorly, with the basolateral ends oriented anteriorly.”

      3) Sustained and periodic exposure: These two exposure protocols not only differ with respect to sustained vs periodic, they also differ in total exposure time (Fig 2B). This complicates the interpretation, especially considering the authors own finding that a pre-exposure is protective.

      To clarify—pre-exposure was not protective to hair-cell survival. Rather, in preliminary experiments, pre-exposure appeared to reduce larval mortality, and we have clarified that observation in the text of the Results and the Methods Details sections. We agree with the reviewer that comparing the two protocols based on differences in time distribution is complicated in that they also differ in total exposure time. For the purpose of clarity, we now focus on the sustained exposure in the main figures and created supplemental figures for the reduced damage still observed using periodic exposure, specifying that reduced damage may be the result of periodic time distribution of stimulus and/or less cumulative time exposed to the stimulus.

      4) The data on the mitochondrial ROS aspect seems not well integrated into the overall story.

      We agree that the ROS story was not well integrated and incomplete. We have removed the data describing mpv17-/- mutants and mitochondrial disfunction from this manuscript. A more comprehensive report of mpv17-/- mutant mitochondrial function and morphological analysis of neuromasts following noise exposure is now described in a follow-up manuscript (“Influence of Mpv17 on hair-cell mitochondrial homeostasis, synapse integrity, and vulnerability to damage in the zebrafish lateral line”).

      5) It is surprising that the hair bundle morphology was not assessed after recovery. This is crucial. Overall, it would be good to see some quantification of the SEM data, e.g. kinocilia length and number of splayed bundles.

      We have expanded the SEM image analysis to quantitatively access kinocilia morphology following exposure. We agree that assessment of recovery using live imaging of hair bundles paired with subsequent SEM analysis will be informative, and we intend to perform those experiments in a future study.

      6) Behavioral recovery (measured as number of "fast start" responses) was also not assessed. This is essential for determining the functional relevance of the recovery.

      We attempted to measure behavior recovery of lateral-line function by measuring “fast-start” responses immediately and several hours after recovery, and discovered that i) strong water current provided stimulation that was too intense to reveal subtle behavioral changes following lateral-line damage and recovery, and ii) when testing larvae immediately following sustained strong current exposures, it was difficult to discern if fewer “fast-start” responses were due to lateral-line organ damage or larval fatigue. We agree that behavioral recovery is important to assay but acknowledge assessing lateral-line mediated behavior following mechanical damage will require a more sensitive testing paradigm that stimulates the lateral-line sensory organ with a relatively gentile, calibrated water flow stimulus. We are currently performing a follow-up study to this paper using a testing paradigm developed by a postdoctoral associate in our lab that analyses subtle changes in larval orientation to water flow (rheotaxis) mediated by the lateral-line organ. Using this behavior paradigm, we will directly correlate morphological and functional recovery over time.

      7) This reviewer is not yet convinced that this damage model displays enough commonalities to mammalian noise damage to justify the ubiquitous use of the term "noise" throughout the manuscript. It would be more prudent to use a more careful term along the lines of "mechanical overstimulation-induced damage".

      We have removed the term “noise” throughout the manuscript and replaced it with either “strong water current stimulus” or “mechanical overstimulation” where appropriate.

      8) Overall, there was a lack of experimental and analysis detail in the results section. For example, how was afferent innervation quantified? Just counting GFP labeled contacts to hair cells?

      Innervation of neuromast hair cells was quantified during blinded analysis by scrolling through confocal z-stacks of each neuromast (step size 0.3 μm) containing hair cell and afferent labeling and identifying hair cells that were not directly contacted by an afferent neuron i.e. no discernable space between the hair cell and the neurite. Hair cells that were identified as no longer innervated showed measurable neurite retraction; there was generally >0.5 μm distance between a retracted neurite and hair cell. We have added this information to the Methods Detail section.

      There was also inconsistency in the use of two variations of the mechanical damage protocol, the time points at which repair was assessed, and whether the damage was quantified in all neuromasts or in normal vs. disrupted neuromasts separately, making the data difficult to interpret.

      We have revised our figure legends to clearly indicate when we are assessing damage in all exposed neuromasts (pooled) to control vs. comparative analysis of normal vs. disrupted neuromasts relative to control. In addition, we now focus on the sustained exposure in the main figures, which was the exposure protocol used for the time points in which repair and recovery were assessed.

    1. Reviewer #3 (Public Review):

      Evolution is a historical phenomenon that plays out over time through the complex interaction of the stochastic processes of mutation and genetic drift and the deterministic process of natural selection. Biology has seen a vibrant debate over the last few decades over what this means for the repeatability of evolution, and to what degree evolutionary outcomes are shaped by the combination of necessity, chance, and historical contingency. This debate has led to intense empirical study of these factors in evolution. Reconstruction and examination of functional protein evolution has been one of the cleverest and most interesting systems used in this study. Here, the authors seek to examine the roles of chance, contingency, and necessity in the evolution of protein-protein interactions (PPI) between BCL-2 family proteins and their coregulators. They specifically look at the evolution of specific interaction between BCL-2 and BID and the more generalized interaction between MCL-1 and coregulators BID and NOXA. They authors reconstructed the last common ancestor protein of BCL-2 and MCL-1 and a series of intermediates along their respective lines of descent. They then used a very clever Phage Assisted Continuous Evolution (PACE) system to subject replicates from each time point to selection for different PPIs and examined variation in sequence variation. By looking at evolution in replicates from different time points, they were able to disentangle the effects of chance, contingency, and necessity. They found that necessity played little role in protein evolution, with little predictability between replicates of single time points and among those from multiple time points, indicating that there was no single pathway through sequence space to the selected function. They did, however, find strong and synergistic effects of chance and contingency. They did tests to demonstrate that the effects of contingency were due to epistatic interactions that affect the viability of particular historical paths. Chance, meanwhile, had effects because multiple mutations could lead down paths to the selected function. The authors conclude that history and chance must be considered when attempting to understand protein function evolution, and that the sequences of proteins with given functions reflect do not reflect necessary pathways or constrained endpoints, but particular and idiosyncratic histories. Moreover, they suggest that contingency may need to be considered as a fundamental aspect of the evolutionary process, along with mutation, drift, and selection.

      Altogether, this is a wonderful and interesting manuscript that makes a substantial and material contribution to our understanding of how history and chance affect evolution. It even speaks to the nature of more fine-grained protein sequence evolution relative to neutral and adaptationist theories. The amount of work and thought that went into the research is nothing less than astonishing. Every time I found myself wondering, "but did they check this...", I found that they, in fact, did in the next section. The work is solid, and the results are robust. I do not see anything that concerns me in the nitty gritty of the actual scientific work. I do, however, think that the authors should engage the work that philosophers of science have done in the last decade or so to better develop our conceptual understanding of contingency and reconsider the meaning of their findings in light of that work.

    2. Reviewer #2 (Public Review):

      The extensive description of mutational paths using high-throughput phenotyping combined with sequencing provides a rich and useful data set. However, the experimental setup has some serious limitations.

      First, the authors want to address the evolution of protein-protein interactions, but they actually do so comparing the interaction of actual and ancestral proteins with actual human BID and NOXA proteins. The analysis would have been stronger with reconstruction of ancestral sequences also for the BID and NOXA proteins, to test interaction of two proteins at the same evolutionary node. Actually, characterization of protein-protein interactions between proteins from Trichoplax, for example, suggest that the results may be different (Popgeorgiev et al., Science Advances 2021).

      Second, the specificity of the binding of NOXA to MCL-1 and not to BCL-2 seems to be an artifact due to the use of peptides instead of full-length protein during interaction assays. This is explicitly indicated in one of the reviews the authors cite in their introduction (Kale et al., 2018, p67). This review mentions a JBC paper clearly demonstrating that BCL-2/NOXA interaction do occur even in human cells: Smith AJ, Dai H, Correia C, Takahashi R, Lee SH, Schmitz I et al. Noxa/Bcl-2 protein interactions contribute to bortezomib resistance in human lymphoid cells. J Biol Chem 2011; 286: 17682-17692.

      Third, the same review also stresses that these proteins are partially membrane-bound in vivo. So testing their interactions in soluble protein bioassays is far from physiological relevance. Actually, such a warning appears already in one of the bullet points from the Kale review:

      "The majority of studies examining the interactions between BCL-2 family proteins use truncated proteins or peptides of the BH3 region at physiologically irrelevant concentrations or in the absence of membranes leading to confusion in defining the core mechanisms of the BCL-2 family proteins."

    3. Reviewer #1 (Public Review):

      This manuscript reports a novel and original approach to examine the possible mutational paths underlying directed protein evolution.

      The authors conclude from their experiments that "Necessity was almost entirely absent" (line 209). Indeed, the vast majority of states evolved in just one replicate from one starting point. But this is the problem of a half-full glass: is it half full or half empty? If I understand Fig.4F correctly, one can still detect amino acid changes that recapitulate historical substitutions, and others that revert to the historical state, so it does not seem that necessity is "almost entirely absent". Furthermore, several of the amino acid changes that were detected may not have any effect on NOXA or BID biding, maybe they occurred because of mutational bias, drift or hitchhiking. If this is the case, then one cannot compare all acquired states in each trajectory and conclude about the importance of chance, as in this sentence for example: "Pairs of trajectories launched from the same starting point differed, on average, at 78% of their acquired states, indicating a strong role for chance" (line 219). There are causal mutations that arose repeatedly during PACE replicates from each starting genotype and these mutations do indeed confer the selected-for specificity in their "native" background (as is nicely shown in Figure 6A-B). So this, to me, is evidence for necessity.

      Loosing a binding property can probably occur via multiple ways, which are likely to be more numerous than gaining binding for a given protein. It would be nice to discuss this point in more detail.

      The experiments presented are limited to one protein family and to the binding properties to two different proteins. In living organisms, each protein is likely to exhibit particular properties such that it can bind or not bind to hundreds of different proteins, and not just two as tested here. So the constraints present in living organisms may be much larger than the ones present within this experimental evolution set up. Furthermore, the tested proteins probably encounter other constraints in their native environment besides affinity for other proteins, and it is yet unclear whether the variant forms obtained here via experimental evolution would be fine to replace the endogenous proteins in living organisms. It is therefore difficult to generalize from the obtained results to all types of evolutionary changes. In general, the conclusions should be toned down and focused on this particular example.

    4. Evaluation Summary:

      This manuscript, which will be of interest to students of evolution and anybody interested in protein function, uses an original, clever, high throughput, and rapid experimental protein evolution method to assess the roles and contributions of contingency, chance, and necessity in the evolution of protein-protein interactions. The authors focus on the animal BCL-2 protein family and on the evolution of their binding properties to two proteins, NOXA and BID. Using several replicates and several starting points, they found little predictability between replicates of single starting points and among those from multiple starting points, indicating that there is no single pathway through sequence space to the selected function, and that historical contingency is the primary cause of protein evolution here. The presented results convincingly illustrate the potential of this novel technology for future work in directed protein evolution.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their names with the authors.)

    1. Reviewer #4 (Public Review):

      The authors analysed flavinylation across different species. They analysed impressive number of 31.910 prokaryotic genomes. They mined flavinylation associated gene clusters using a bioinformatic approach. They define five different protein classes responsible for transmembrane electron transfer. Moreover, they predicted and validated flavinylation of two domains with unknown functions (by ApbE). Unfortunately, the vast majority of predictions made in this study were not experimentally validated. It is therefore very difficult to judge the reliability of predictions, proposals and claims made in the manuscript.

    2. Reviewer #3 (Public Review):

      Summary

      The authors have applied a comprehensive bioinformatics analysis to 31,910 prokaryotic genomes and found evidence for extracytosolic flavin transferases ("ApbE") in approximately 50% of the genomes. Moreover, they have analyzed associated gene clusters resulting in the hypothesis that five protein classes are involved in transmembrane electron transfer. Furthermore, the authors postulate that these protein classes are subject to flavinylation by ApbEs. Although the exact biochemical role of these five classes of protein remains unknown, the authors hypothesize that they might be involved in iron assimilation and respiration, at least in some cases. In this context, the authors also identified multi-flavinylated proteins and propose that these might exert a similar role as multi-heme cytochromes, for example under iron depletion; in other words, multi-flavinylated systems might replace multi-heme cytochromes if iron is limiting.

      Strength & weaknesses

      As is evident from the summary, the basis of the article is the bioinformatic analysis of prokaryotic genomes leading to a number of interesting hypotheses with regard to transmembrane electron transport of hitherto uncharacterized protein complexes. Thus, the proposed functions of the potentially flavinylated membrane complexes will stimulate biochemical studies to characterize the suggested involvement of flavinylated protein complexes in prokaryotes. I would consider this as the main strength of the paper that it has generated multiple challenging hypotheses to follow up experimentally.

      As mentioned by the authors, about 50% of the prokaryotic genomes analyzed harbor targets for flavinylation/and the FMN transferase. However, no discussion and not even a hint is provided what these 50% of prokaryotes have in common and what distinguishes this group from the other (50%) prokaryotes. Is it lifestyle (environment), energy production, ...?

      On the other hand, the presented study leaves many issues unmentioned creating the (false) impression that all it takes to transport electrons across the membrane is a series of hemes and/or flavins along the way. For example, in the discussion of the very interesting hypothesis that flavinylation might replace multi-heme cytochromes under iron deficiency, discussed on page 20 (last para), the authors mention that "flavins possess two-electron transferring properties (ref. 46)" in contrast to the heme system. If this were true than the switch from heme to flavin would also imply that the electron transport itself would have to change from one- electron to two-electron transport. It is unclear that this would be compatible with all other components of the electron transport system. On the other hand, flavins can also - under certain circumstances and in certain environments - carry out one-electron transfer processes, e. g. DNA-photolyases, flavodoxins, etc. Thus, it is conceivable that the flavins operating in the suggested systems in prokaryotes also perform one-electron transport, similar to the operating mode of heme cytochromes. It is clear that we currently lack the biochemical/physical information to know what is really going on, but at least it should be discussed more thoroughly. Equally, several other aspects of the (multi-)flavinylation should be addressed:

      • What is known about the environment of the flavin(s)? - Is the flavin embedded in a protein matrix or freely accessible, in other words does it "behave" like a "free" flavin?

      • How does the binding of the flavin affect the redox potential (this is very important in order to understand the direction of electron transport).

      • In contrast to other covalent flavin attachments, the flavinylation addressed in the current work is reversible. Is anything known about the removal of flavins from the protein complexes in question?

      • Are there any enzymes that carry out de-flavinylation? If so, how are they regulated?

      • Connected to the last bullet point: Is the reversibility of flavinylation used for the overall regulation of electron transport?

      I assume that most of the questions cannot be satisfactorily answered yet, but I think these issues should at least be addressed in the discussion in order to stress the need for further in depths biochemical studies that target the obvious complexity of these systems.

    3. Reviewer #2 (Public Review):

      Interesting bioinformatics. The strength of this article lies in the extensive search for flavinylated domains in prokaryotic genomes. This has resulted in several new ideas about the functions of these domains in transmembrane electron transport. The comparison with (multi-heme) cytochromes and thioredoxins is interesting, and needs experimental validation in future work.

      Some weaknesses: In the introduction, I miss a clear explanation about the mode of flavinylation of the FMN-binding proteins and how this relates to other covalent flavinylation systems (where an increase in redox potential of the flavin is a prominent effect of covalent binding). It is also not clearly explained whether the predicted flavinylation of the phosphate moiety of FMN is reversible.

      Results and Discussion: The electron transfer properties of flavoproteins are not well explained. Quite some flavoproteins (e.g. flavodoxins) mediate one-electron transfer processes, and this is most likely the preferred way in the discussed transmembrane electron transport systems.

      I was wondering if there is any protein structural information about this mode of flavinylation, for instance is the flavin hidden in the protein or accessible? Can the authors tell us more whether the amino acid sequence results explain in more general terms the site(s) of flavinylation?

      I would also like to know how sure the authors are that the conserved motif always represents covalent flavinylation.

      Along similar lines, regarding the reversibility of the covalent flavinylation, I am curious how sure the authors are that the flavin is always covalently bound and what would be the consequence if this is not the case. For example, might there be next to iron limitation, also flavin limitation?

      Finally, I am wondering whether more could be said about the comparison with thioredoxins and cytochromes when we look at the 50% of bacteria that do not contain the flavinylation domains.

    4. Reviewer #1 (Public Review):

      The manuscript by MeHeust reports identification of flavinylation proteins that can potentially function as cellular redox mediators related to electron transfer systems in prokaryotes.

      The work is useful and informative. The authors used bioinformatic approach to illustrate wide distribution of these proteins in a variety of prokaryotes. Although exact functions of these proteins are not known, this work should inspire further investigation by researchers in the fields of redox enzymology and bioenergetics.

    5. Evaluation Summary:

      Light and coworkers provide evidence from mining 31,910 prokaryotic genomes for the widespread occurrence of extracytosolic flavinylated FMN-binding domains in bacteria. They discovered extracytosolic flavinylation of five protein classes potentially involved in transmembrane electron transfer. The study also proposes new connections between respiration and iron assimilation and identifies two novel substrates of ApbE enzymes. This work should inspire further work in the fields of redox enzymology and bioenergetics to characterize the suggested involvement of flavinylated protein complexes in prokaryotes.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

    1. Reviewer #3 (Public Review):

      The regulation of the calcium pump SERCA by phospholamban has been studied extensively over many years as this system has become a focus of many biophysical approaches to study the interplay between protein dynamics, the biological function of calcium transport, and its regulation via protein-protein interactions, all of which are occurring within the environment of the sarcoplasmic membrane of heart muscle.

      The authors themselves have a long track record with working on this system and the specific focus here is on the detailed mechanism of how phosphorylation of phospholamban leads to a release of its inhibitory function when bound to SERCA. Much effort has been spent on this question in the past, and the field has progressed over the years by deriving increasingly detailed structural models for SERCA-phospholamban interactions. There is now a structure from crystallography showing the interaction of the phospholamban TM domain with the SERCA TM helices and there is additional data from various biophysical methods that partially describe the conformational ensemble of the extramembrane N-terminal region of phospholamban and its interaction with SERCA. Some of that insight has distinguished between phosphorylated and unphosphorylated phospholamban, but despite much data and many simulation efforts, the exact mechanism for how phosphorylation of phospholamban alters its interaction with SERCA and thereby modulates its inhibitory functions has so far not been clearly described. This is the main goal of the present work.

      There is new experimental data presented here from oriented-sample solid-state NMR experiments with the main finding of orientational shifts of the phospholamban TM helix upon binding to SERCA and upon phosphorylation. Taking advantage of this data, the main part of the study is concerned with results from computer simulations that were restrained by experimental data to develop conformational ensembles of the SERCA-phospholamban complex with and without phosphorylated phospholamban. From that, new mechanistic hypotheses are developed. While the direction of the work proposed here is promising, there are concerns about the overall approach and - as a consequence - the significance of the reported findings:

      1) A main concern is the treatment of the extramembrane portion of phospholamban, which includes the serine that is being phosphorylated to relieve the inhibitory effect. Previous studies have described a helical conformation for the N-terminal segment that may be in equilibrium with a less-ordered/less-helical structure upon binding to SERCA. It is largely still not clear, however, how exactly that part of phospholamban would interact with SERCA. The idea put forth here is that a largely disordered conformation would interact with SERCA. That may be so, but it is unclear how much of that is a direct result of experimental constraints and how much could simply be a consequence of inadequate sampling. It seems that helical conformations for the N-terminal segment of phospholamban were not considered, while there is not enough discussion of why such conformations would be ruled out based on the experimental data.

      2) The simulations are probably too short to fully explore the full conformational landscape of a (partially) disordered N-terminal phospholamban and it is unclear how much the experimental constraints are really limiting the conformational space in that region.

      3) It is not completely clear how the present work relates to the crystal structure of the SERCA-phospholamban complex. Why were the starting structures for the SERCA-phospholamban complex initially taken from the available crystal structure (at least with respect to the TM domain of phospholamban) but then subsequently refined using much lower-resolution cross-linking data before initiating the simualtions? Is the crystal structure in significant disagreement with other experimental data considered here? More discussion and explanation is needed.

      4) The main focus of the analysis of the simulation results is on the impact of phosphorylated phospholamban on the conformational sampling of SERCA. That is the key step for developing new mechanistic hypotheses. However, given that the SERCA-phospholamban complex is very large and flexible and based on the results presented, it appears that the length of the simulations may not be sufficient to fully characterize the shift in the conformational ensemble of SERCA as a function of phospholamban phosphorylation. At the minimum, some time of convergence analysis is needed to establish confidence that the difference in conformational ensembles shown most prominently in Figure 2 are indeed significant. Moreover, related to Figure 2, it is unclear whether the projection of the conformational sampling onto just two principal coordinates is sufficient for a full characterization of the conformational dynamics. It is also unclear whether the principal coordinates are the same when projecting the sampling for PLN and pPLN, if not, the comparison between the two would be further complicated.

    2. Reviewer #2 (Public Review):

      In this paper, the authors present an extensive ssNMR study on the mini-membrane protein phospholamban (PLN), which regulates the Ca2+ ATPase SERCA. PLN stabilizes the low-affinity Ca2+ state of SERCA, which can be reversed by phosphorylation or increase in [Ca2+]. Despite extensive, studies this mechanism is still unknown: Although interaction sites within the membrane have been identified, not structural changes within PLN have been detected. In the paper, the authors address this question by oriented ssNMR, an approach which is highly suited to map topological changes of membrane embedded peptides and proteins. While oriented ssNMR is conceptionally very appealing, it has been hampered by sample preparation restrictions preventing its widespread use on more complex samples. A breakthrough has been magnetic alignment of membrane proteins embedded in bicelles as demonstrated here. The presented spectra represent in principle a projection of labelled transmembrane helices onto a spectroscopic plane by which re-orientations of these helices can be elegantly visualized. Based on high quality data, the authors are able to convincingly demonstrate that PLN is in a topological equilibrium, which shifts upon phosphorylation at Ser60. In complex with SERCA, phosphorylation or Ca2+ binding triggers a topological change of the whole PLN transmembrane domain, which then act as a 'switch' on SERCA.

      All presented data are of high quality and data interpretation is convincing. The paper addresses a complex and relevant biomolecular question by very advanced methodology.

      The authors have identified a topological allostery for PLN connecting a posttranslational modification at the cytoplasmic site with signal transduction across the membrane. They argue that the underlying mechanism might be of general relevance for the regulatory role fulfilled by miniproteins.

    3. Reviewer #1 (Public Review):

      The regulation of highly dynamic interactions is for many biological processes of great importance. The authors study the regulatory interaction of the single transmembrane helix protein Phospholamban with the P-type ATPase SERCA which is responsible for removing calcium ions from the sarcoplasm and restoring its concentrations in the sarcoplasmic reticulum. The inhibitory interaction between both proteins is relieved by phosphorylation of a single residue in the cytoplasmic domain of Phospholamban. The authors show by a combination of solid state NMR as well as MD simulations that phosphorylation results in a order to disorder transition in the cytoplasmic part which leads to an re-arrangement of electrostatic networks which is propagated into weakened hydrophobic interactions between the transmembrane parts, thus activating SERCA. Phospholamban has been studied extensively by solid state NMR, liquid state NMR or hybrid methods. For example the phosphorylated form was studied previously, showing that it interacts differently with lipids (doi: 10.1021/bi0614028) and that Ser-16 phosphorylation alters the structural properties of the cytoplasmic domain with respect to the lipid bilayers (doi:10.1016/j.bbamem.2009.12.020). There have also been EPR and other studies, in principle showing the same effect. The current paper adds to this a new solid state method that shows additional details that could not be investigated previously. The work confirms less well determined previous models. The major new aspect is an MD simulation that provides a more detailed view than what was previously possible.

    4. Evaluation Summary:

      There are many of membrane-embedded mini-proteins, which fulfill a large range of regulatory functions. One of them is phospholamban, a single transmembrane helix protein that regulates the sarcoplasmic reticulum Ca2+-ATPase by binding in the membrane. The work presented here combines new experiments with computer simulations with the aim of arriving at a more definitive answer to the long-standing mechanistic question of how exactly phosphorylation of phospholamban modulates its regulatory behavior. In this manuscript, an allosteric mechanism is presented, which could be of general importance for the whole family of these mini-proteins.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Reviewer #3 (Public Review):

      Using a combination of powerful approaches authors demonstrate large variability in the number of release sites at hippocampal excitatory synapses onto fast spiking interneurons in slices. High resolution studies of individual synapses showed highly variable amounts of Munc13-1within the AZs that have the same number of release sites. The authors further revealed a synapse size-independent variability in the number of Munc13-1 clusters per AZ and in the Munc13-1 content of individual clusters. There results support the presence of multiple independent release sites and provide insight into molecular heterogeneity of release sites.

      This is a high quality study using most advanced techniques available to study molecular determinants of AZ organization. In addition to some technical issues, my main concern is conceptual: this work, although of very good quality overall, is rather incremental because it largely confirms several previous studies showing a large variability in the number of release sites per AZ in small central synapses, the association between Munc-13 and release site properties, and variability in Munc-13 content. Surprisingly only one of the three of these previous studies have been cited or discussed. My second concern is that the paper could be written more clearly - there are multiple terms used to refer to the same concepts making it difficult to follow and there is some conceptual logical fallacy in the way the results are discussed.

    2. Reviewer #2 (Public Review):

      Karlocai et al addresses a prevailing concept of synapse diversity, asking whether diversity of release probability is caused by varying number of release sites and/or the properties of individual release sites. In other words, are there functionally uniform release sites (RS) that scale in numbers with the size of the AZ and thus regulate release probability (Pv), or are, in addition, RS may be heterogeneous in composition and function. Performing quantal analysis 2.0 by combining ephys from pyramidal-to-parv interneurons in hippocampus with quantitative anatomy of a presynaptic key transducer, Munc13, they define N, Pv and Q and compare it to the numbers of munc13 clusters and densities. As expected from previous studies, RS numbers covary with the size of the AZ, but the amounts of Munc13-1 are highly variable at individual RSs, providing a possible additional source of Pv variability.

      Overall the quality of data is just superb, and the conclusion are well supported by the data as sufficient electrophysiological experiments were performed, and importantly also correlated with multiple, highly quantitative microscopy techniques. Only very few labs can do this at this level.

      The findings carry enough impact as they negate the hypothesis that RS are made out of predefined release sites. Also, the finding that the post synapse as defined by PSD95 labeling was much less variable, indicates that pre- and postsynaptic makes do not necessarily correlate, arguing somewhat against the transsynaptic nano column concept as a main organizing principles. Thus, pre- and post-synapses are only loosely linked in their composition and function.

    3. Reviewer #1 (Public Review):

      The authors address the broad question of what is responsible for the large diversity of presynaptic function at synapses arising from a single type of neuron. They use a variety of sophisticated and complementary approaches to address the functional and molecular heterogeneity of hippocampal pyramidal cell to fast-spiking interneuron synapses. The rigorous functional and molecular analysis is clearly described and compelling. The conclusions are consistent with the current view that each presynaptic active zone contains a variable number of release sites, and this variability makes a substantial contribution to the heterogeneity in postsynaptic response amplitude at unitary synaptic connection. Using state-of-the-art imaging approaches, the authors report variability in the content of Munc13-1, a core component of release sites, between release sites. Although these results and conclusions are well-supported, the functional significance of Munc13-1 variability at release sites is unclear.

    4. Evaluation Summary:

      The authors study how individual synapses can compute information by tuning the properties of the individual components that drive synaptic communication between neurons. Using cutting edge physiology and morphology they show that the reliability of synaptic communication depends not only on how many units drive synaptic communication, but also the authors suggest that individual units vary in their quantitative molecular composition.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

    1. Reviewer #2 (Public Review):

      In this manuscript, the authors characterise a GluA4-knockout mouse with respect to changes of cerebellar cortical circuit properties and behaviours.

      They demonstrate a clear reduction in the component of mossy fibre--granule cell synaptic transmission mediated by AMPA receptors, as expected. They also show two parallel changes in granule cells that could be considered partially compensatory: tonic inhibition of granule cells is reduced and the NMDAR-mediated component of the mossy fibre input is upregulated. The overall effect of the mutation is nevertheless to reduce the efficacy of the mossy fibre input; spike emission is therefore reduced in frequency, delayed, and has less precise timing.

      Two other key synapses in the mossy fibre pathway are shown to be apparently unaffected in the knockout mouse, namely mossy fibre to Golgi cell transmission and also granule cell to Purkinje cell transmission.

      The authors then model representation in the granule cell layer and downstream learning by the Purkinje cell, focusing on a reduction of the effective coding space available in the expansion performed by the granule cell layer and the downstream reduction of learning speed in the Purkinje cell.

      In a final, behavioural, section, the authors show that locomotion is little affected but that eyelid conditioning is essentially abolished, with two different conditioned stimuli.

      Overall, the experiments, analysis and presentation are of excellent quality.

      However, the conceptual framework and broader interpretation of the work is quite ambitious and I believe that it requires more nuanced presentation.

      A first and reasonably straightforward issue is the fact that the authors are, as they are well aware, working with a systemic knockout. Logically, therefore, the behavioural effects on eyeblink conditioning could reflect interference with any part of the input-output loop. Within the cerebellar circuit, the authors address this reasonably comprehensively, by confirming that mossy fibre to Golgi cell and granule cell to Purkinje cell transmission are unaffected. Nevertheless, one quickly wonders whether the activity of interneurones, climbing fibres or cerebellar nuclei might somehow be altered. The authors address possible extracellular effects of the knockout by showing that eyeblink conditioning is essentially abolished with two different modalities of conditioned stimulus. Again, it remains logically possible that both inputs or the common output could be altered.

      Experimentally verifiying all possible stages of the behavioural input-output loop is not feasible, while the ideal experiment of a granule-cell-specific knockout would amount to redoing the whole project, which is obviously out of scope. Nevertheless, I believe the issue does require slightly more open and detailed discussion; maybe the developmental down-regulation of GluA4 in relevant tissues could be substantiated better with reference, for instance, to expression atlases of the Allen Brain Institute. Ultimately, if the locus of action is not completely certain, that should be reflected in the conclusions.

      Finally, I'm a little uncomfortable with the ambitious conclusion that learning and behaviour have been constrained by the reduced coding expansion by the granule cell layer. Although the changes observed are indeed almost certain to reduce coding expansion as defined, I feel that the failure of learning could also be understood in more prosaic terms. In particular, the inputs to the Purkinje cell may simply be too weak, too delayed or too unreliable to be an effective plasticity substrate for rapidly developing a conditioned response before the air puff. To a large extent the lower-level modifications will correlate with the higher-level coding expansion, so the concepts are more or less synonymous. Yet, it feels different to conclude that patterns can't be separated because they produced no granule cell activity (to consider a logical extreme) and to conclude that their separation is too difficult because of output similarity and saturation of learning.

      Furthermore, there are ways to view coding expansion that wouldn't necessarily align with the authors' conclusion. Specifically, the combinatorial pattern separation analysed in the original Marr paper would, I believe, increase as the ratio of mossy fibre input strength to granule cell threshold decreases. In other words, for given overlapping mossy fibre inputs, the overlap between granule cells outputs could decrease as the input/threshold ratio decreases.

      Addressing these issues experimentally is certainly unfeasible. However, it might be possible to explore correlations/overlaps between input and output patterns in the modelling. The discussion could be made a little less assertive on these issues, and the question of input delay should be addressed.

    2. Reviewer #1 (Public Review):

      This study focuses on the consequences of deleting the GluA4 subunit of AMPA receptors for cerebellar synaptic transmission and cerebellar-dependent behaviors. The manuscript is well organized and the information is clearly presented. The first aim of the study is to investigate the effect of the deletion at the level of synaptic function. This is well achieved by a combination of patch-clamp recordings from cerebellar slices and modeling. It is found that deletion of the GluA4 subunit results in a strong decrease in synaptic currents from mossy fibers (MF) to granule cells (GC) as well as in two «compensatory» changes pertaining to NMDA Rs and tonic inhibition. As a consequence, MF-GC transfer is strongly reduced at high frequencies but less affected al low frequencies. The second part of the work investigates the effect of the GluA4 deletion on cerebellar-dependent behaviors. GluA4 knock-out mice are found to have no deficits in locomotion but exhibit a total absence of associative learning in an eye-blink conditioning paradigm. Both, at the slice level and at the behavioral level the strength of this work resides on the quality of the data and the rigorous analysis. A shortcoming of the work stems from the «compensatory» changes which complicate interpretation. However modeling strategies are implemented incorporating those changes and they are able to well predict the observed alterations in GC firing pattern, thus limiting the negative impact.

    3. Evaluation Summary:

      This work explores the cellular and behavioural effects of a genetically induced reduction of the expression of a glutamate (excitatory) receptor (GluA4), focusing on the cerebellum , a structure involved in the acquisition of arbitrary, complex motor reflexes. The authors show that synaptic transmission at the input layer to the cerebellar cortex is reduced, despite some compensation by other mechanisms, which are characterised. Locomotion is little affected while acquisition of a "conditioned eyeblink" is abolished. The authors try to link the cellular and behavioural phenomena via modelling of the cerebellar computation, although this is not definitive. The work is of high quality, of interest to cerebellar physicists and neurocomputational modellers in particular.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Joint Public Review:

      The way homologous chromosomes identify one another and become paired is an intriguing phenomenon that has a long history of study, yet the molecular mechanism remains unclear. Recent studies have led to a phenomenological button model for homolog pairing, which hypothesizes that pairing is initiated at discrete sites along the length of each chromosome. The authors aimed to rigorously investigate this idea using biophysical modeling and live imaging. They first constructed a simple polymer model with buttons distributed along the chain that possess locus-specific interactions, and thoroughly investigated its property via stochastic simulation in 3D. Their study confirms that homolog-specific interactions are necessary for homolog pairing. They also tested the effect of time, interaction strength, initial inter-homolog distance, and button density. The authors went on to perform live imaging of pairing dynamics at two selected loci, using the fluorescent signal from nascent mRNA at the corresponding locus. They fitted the model to the experimentally quantified pairing probability of the selected loci over a 6-hour developmental window, and used the constrained model to predict the individual pairing dynamics. The predicted inter-homolog distance post pairing agrees very well with experimental observation.

      Their study supports a button mechanism for homolog pairing, where stable pairing is initiated by reversible random encounters that are propagated chromosome-wide. This work suggests that active processes are not necessary to explain pairing and paves the way for further investigating the molecular mechanism of such a pairing phenomenon.

    2. Evaluation Summary:

      This manuscript considers an important open problem in molecular biology, that is how distal chromosomes can recognise each other at a distance and become paired, as happens for example in homolog paring in Drosophila. To address this question, the authors combine theoretical models and experiments, which return valuable insights. However, a final proof of the envisaged mechanisms remains to be determined.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

    1. Reviewer #3 (Public Review):

      The manuscript entitled "Biosynthesis system of Synechan, a sulfated exopolysaccharide, in the model cyanobacterium Synechocystis sp. PCC 6803" is a scientifically sound manuscript and is of interest for a broad scientific audience. It provides interesting and valuable new insights and many experiments were performed. However, there are some points which must be addressed to make the manuscript more consistent and easier to grasp.

      • Title: I would suggest to change the title, since Biosynthesis system is not a common term.
      • Abstract: Cyanbobacteria are not unique in having sulphated polysaccharides. What is about Carrageenan's and also exopolysaccharides from Porphyridium strains (see current publications on that). If it means that amongst bacteria the cyanobacteria are the only ones, this should be clearly stated.
      • Would avoid to use may utilize the polysaccharides... Please be more specific or delete this.
      • Lane 32: Can really every bacterium produce several EPS? This should be carefully evaluated.
      • Lane 34: The applications named are very broad and not specific, what are the real applications there?
      • Lane 49: again uniquely?
      • Lane 56: the sulphated polysaccharides are used for colony and biofilm... This sentence must be rephrased and corrected.
      • Lane 84: bubbling culture etc. I can´t find any detailed explanation on the cultivation systems, what is essential for the methods part. Please add volume, light source and principle of illumination (inside outside etc.). Please rephrase the sentence that the light was generated by fluorescent lamps.... They were used for illumination.
      • Lane 97: GTs can not be screened by disruption, it is their function what is screened.
      • Figure: would suggest to use A) instead of A,
      • Table S1: What does Importance mean in the table, would suggest to change that towards a more specific value/information
      • Lane 232: to see the transcriptome... this should be rephrased
      • The description of the different EPS is a bit confusing, since it is only described that the WT contains several sugars, which are then given in table 1. The deletion strain shows a different composition. This should be explained a bit straighter. Why is ribose given in table 1, if there is no ribose observed? In general, the whole manuscript needs correction of the English language to make it clearer in some aspects. Also, the structure of the manuscript might be reworked a bit, since it is a bit confusing in some parts. Especially the effect of the different deletions should be given clearly and straight. Also, the complexity of the manuscript will be easier to grasp by some rearrangements of the results. The current complexity might come from having all supplement figures already in the manuscript, but it also comes from sometimes complex sentences, as well as jumping a bit in between the topics. But finally, this is a really valuable and interesting study.
    2. Reviewer #2 (Public Review):

      The paper does a very thorough job of identifying genes important for the production and export of a sulfated exopolysaccharide in Synechocystis, leading to a clear and well-justified model for EPS production and its regulation. The authors also make a convincing case for the importance of EPS production for the formation of floating multicellular aggregrates or "blooms". However, the relationship between EPS production and bloom formation is not quantitative (some mutants show markedly reduced EPS production without any discernible effect on bloom formation) which indicates that bloom formation must involve additional factors which are not currently discussed.

    3. Reviewer #1 (Public Review):

      The authors have identified an entire set of genes for the synthesis of sulfated exopolysaccharides (EPS) in the cyanobacterial model Synechocystis 6803. They show convincingly that the respective gene products are involved in the production of these compounds and they have extensively characterized the regulation of these genes. Among the regulators they found a STAND protein. STAND proteins include animal and plant regulators of programmed cell death but were rarely characterized in bacteria. Last but not least they come up with an entirely new model for the buoyancy regulation of cyanobacteria (as light-dependent aquatic organisms it is important for cyanobacteria where they are in the water column). The authors suggest a mechanism in which EPS-entrapped cells together with extracellular gas bubbles derived from photosynthesis form multicellular complexes that float at certain depths. This would be a very important function and explain the extensive regulatory and signaling apparatus in controlling the synthesis of these sulfated EPS.

    4. Evaluation Summary:

      The authors have elucidated the biochemical and regulatory apparatus for the biosynthesis of sulfated exopolysaccharides, an entire class of molecules not previously studied in cyanobacteria. The work has broad implications for the microbiology and ecology of these organisms and also opens the possibility to use these compounds in biotechnology and modify their structures by combinatorial synthesis.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

    1. Reviewer #2 (Public Review):

      The authors aimed to address the lack of therapeutic treatments for the Rett Syndrome by (a) identifying novel functional partners of MECP2 (mutations in which underlie Rett Syndrome), and (b) demonstrating the druggability of the partners using in-use drugs. The authors accomplish this by performing phylogenetic profiling across more than thousand species to identify genes that coevolved with MECP2. Using drugs that target three of their top hit genes in RTT models, they demonstrate the potential efficacy of these drugs against RTT and validate their new molecular targets.

      Strengths:

      Overall, the manuscript is very well written and easy to follow even for people outside the fields, and provides insights into an important biological process and identifying much needed therapeutic targets. The authors reproduced various RTT phenotypes in human neural cells with reduces MECP2 expression and demonstrated the ability of the three drugs to rescue the phenotypic profiles. In doing so, the authors were able to shed light on some of the potential mechanisms of action through which these drugs operate. Given that all three drugs have approved safety profiles, with further pre-clinical investigation, these drugs could serve as potential therapeutic agents for Rett Syndrome.

      Weakness:

      The biggest weakness of the paper is the lack of a strong link between comparative phylogenetic profiling and the identification of potential therapeutic agents. The paper is currently framed as a 'comparative genomic pipeline' to identify novel drug targets, yet the authors didn't demonstrate the robustness of the pipeline using appropriate positive and negative controls. Basic network analyses weren't performed to demonstrate a wide usability of the methodology beyond RTT.

      While the authors do a good job of demonstrating the RTT phenotype-rescuing abilities of the three drugs, they don't exhaustively demonstrate how their comparative evolutionary pipeline was essential for identifying the three drugs. MECP2 forms a complex with HDACs and all three of the drugs selected here have known direct/indirect effects on HDAC activity. It is therefore plausible that the drugs are mediating their effects through HDACs, in which case the comparative genomic pipeline was not required to select these drugs.

    2. Reviewer #1 (Public Review):

      Major Comments/Concerns

      On line 101 - The use of only the longest transcript for each gene could miss important functional sections of the genome. This could create bias against genes with many isoforms and miss exons that do not happen to lie in the longest transcript. How different would the resulting profiles of conservations be if all coding regions or exons of every gene were used?

      On line 106 - Does this approach create good specificity to our gene of interest rather than just broad functional similarity? For example, with this approach, are there any major neuronal function genes that have NPP very different from MeCP2? Could authors provide a more objective evaluation to baseline/null?

      Minor Comments/Concerns

      On line 132 - It seems fair to examine this set of genes first, but I am not sure this approach to filtering in particular moves us further towards finding a therapeutic for Rett. These genes could be all good potential targets, and your subset of focus are just the best ones for current validation.

      Figure 2C could be made with all 390 co-evolved genes to strengthen the argument that chr19p13.2 is an important region for MeCP2s role.

      Figure 3, 4, 5, 6 - Dynamite plots. While the stats tests are great for understanding the impact of different treatments, box plots or jittered dots would be even more clear.

    3. Evaluation Summary:

      The manuscript has the potential to be of broad interest to neuroscientists who are aiming to leverage concepts and tools of evolutionary biology to identify novel gene targets and much-needed therapeutic interventions. The follow up experiments are detailed, well thought out, and do a good job of proving the potential of the identified drugs in alleviating molecular signatures in in vitro disease models. However, the link between comparative genomic analysis and identification of specific drugs is not yet sufficiently established and doesn't convincingly demonstrate the usability of the evolutionary pipeline in identifying novel therapeutics.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Author Response:

      Reviewer #2 (Public Review):

      In their study, Lutes et al examine the fate of thymocytes expressing T cell receptors (TCR) with distinct strengths of self-reactivity, tracking them from the pre-selection double positive (DP) stage until they become mature single positive (SP) CD8+ T cells. Their data suggest that self-reactivity is an important variable in the time it takes to complete positive selection, and they propose that it thus accounts for differences in timescales among distinct TCR-bearing thymocytes to reach maturity. They make use of three MHC-I restricted T cell receptor transgenics, TG6, F5, and OT1, and follow their thymic development using in vitro and in vivo approaches, combining measures at the individual cell-level (calcium flux and migratory behaviour) with population-level positive selection outcomes in neonates and adults. By RNA-sequencing of the 3 TCR transgenics during thymic development, Lutes et al make the additional observation that cells with low self-reactivity have greater expression of ion channel genes, which also vary through stages of thymic maturation, raising the possibility that ion channels may play a role in TCR signal strength tuning.

      This is a well-written manuscript that describes a set of elegant experiments. However, in some instances there are concerns with how analyses are done (especially in the summaries of individual cell data in Fig 2 and 3), how the data is interpreted, and the conclusions from the RNA-seq with regard to the ion channel gene patterns are overstated given the absence of any functional data on their role in T cell TCR tuning. As such the abstract is currently not an accurate reflection of the study, and the discussion also focuses disproportionately on the data in the final figure, which forms the most speculative part of this paper.

      (1) As the authors themselves point out (discussion), one of the strengths of this study is the tracking of individual cells, their migratory behaviour and calcium flux frequency and duration over time. However, the single-cell experiments presented (Figure 2 and 3) do not make use of the availability of single-cell read-outs, but focus instead on averaging across populations. For instance, Figure 3a,b provides only 2 sets of examples, but there is no summary of the data providing a comparison between the two transgenics across all events imaged. In Figure 3c, the question that is being asked, which is to test for between-transgenic differences is ultimately not the question that is being answered: the comparison that is made is between signaling and non-signaling events within transgenics. However, this latter question is less interesting as it was already shown previously that thymocytes pause in their motion during Ca flux events (as do mature T cells). Moreover, the average speed of tracks is probably not the best measure here in reading out self-reactivity differences between TCR transgenic groups.

      We regret any lack of clarity in how we presented our analyses of the calcium imaging data. In the original submission, we did provide analyses of individual cells (Fig 2b, Fig 3c (Fig 2e in the revised manuscript), Suppl Tables 1 and 2, and supplemental videos S1 and S2). In the revised manuscript, we have added an additional analyses of individual cells (Figure 2—figure supplement 1a). In addition, Fig 3a and b (Fig 2c and d in the revised manuscript) provided information about the average behavior of thymocytes during signaling events by identifying numerous examples of individual signaling cells (23-37 individual cell signaling events per condition), aligning these multiple examples based on the start of their signaling events, and displaying the average changes in calcium and speed over time. Thus this data does take advantage of the single-cell measurements by providing information about the average behavior of signaling events, which could not be inferred from bulk measurements. Regarding Fig. 3c (Fig. 2e in the revised manuscript), we agree that a more direct comparison of pausing between TCR transgenic models was needed. To address this point, we have added a new panel (Fig 2f in the revised manuscript) that uses the difference in speed between the signaling and nonsignaling portions of the same track to define a “pause index” for each cell. The difference in pause index between the transgenic models is highly significant at both 3 and 6 hours into positive selection. In the revised manuscript we have added additional text to detail more precisely how we performed the analyses, and to make it clearer that individual tracks are being analyzed. We have also included a graph of the Calcium Ratio and the Average Speed for the individual cells shown in the supplemental videos.

      (2) The authors conclude from their data that the self-reactivity of thymocytes correlates with the time to complete positive selection. However the definition of what this includes is blurry. It could be that while an individual cell takes the same amount of time to complete positive selection (ie, the duration from the upregulation of CD69 until transition to the SP stage is the same), but the initial 'search' phase for sufficient signaling events differs (eg. because of lower availability of selecting ligands for TG6 than for OT1), in which case at the population level positive selection would appear to take longer. Given that from Fig 2/3 it appears that both the frequency of events and their duration differ along the self-reactivity spectrum, this needs to be clarified. Moreover, whether the positive selection rate and positive selection efficiency can be considered independently is not explained. It appears that the F5 transgenic in particular has very low positive selection efficiency (substantially lower %CD69+ and of %CXCR4-CCR7+ cells than the OT1 and TG6) and how this relates to the duration of positive selection, or is a function of ligand availability is unclear.

      (3) While the question of time to appearance of SP thymocytes of distinct self-reactivities during neonatal development presented (Figure 5) is interesting, it is difficult to understand the stark contrast in time-scales seen here compared with their in vitro thymic slice (Figure 4) and in vivo EdU-labelling data (Figure 6), where differences in positive selection time was estimated to be ~1-2 days between TCR transgenics of high versus low affinity. This would suggest that there may be other important changes in the development of neonates to adults not being considered, such as the availability of the selecting self-antigens.

      Since, Reviewer #2’s comments 2 and 3 are related, we will discuss them together. In this study, we have used 3 independent approaches (the thymic slice system, the EdU labeling study, and analyses of neonatal transgenic mice) to estimate the relative time for thymocytes bearing different TCRs to complete positive selection, and all three confirm that OT1 is the most rapid and TG6 the slowest of the 3 transgenic models examined here. However, each approach relies on different start times and different read outs, so they are not directly comparable to each other. The thymic slice system tracks a cohort of preselection thymocytes over time. However, given the 4 day limit for this system, it is not possible to reach the theoretical maximum number of CD8SP. Thus, our estimates of the delay in positive selection are based on the timing of multiple phenotypic changes (CD69 induction, chemokine receptor switch, and CD8SP appearance) in this system. The EdU study (Fig 5 in the revised manuscript) allows us to track a cohort of thymocytes that have recently completed TCRb selection and follows them over a longer time period (up to 9 days). Because the number of OT1 and F5 CD8 SP thymocytes reached a clear plateau, this allows us to estimate the average time between the burst of cell division after TCRb selection and the downregulation of CD4 (3.5 days for OT1 and 4.5 days for F5). However, at 9 days the number of TG6 thymocytes is still increasing, and thus we have only a lower estimate (>6 days) of the average time after TCRb selection to the appearance of CD8SP thymocytes with this TCR. When we track the appearance of mature CD8SP after birth (Figure 4 in the revised manuscript), we are not tracking a synchronized cohort of positively selecting cells, but rather we are measuring the amount of time it takes for single positive cells to accumulate into a population size similar to what is seen in an adult. Thus, these experiments do not provide a direct measure of the time to complete positive selection, but rather provide an indirect measure of the number of cells that have successfully completed positive selection at the given timepoints post birth. The observation that OT-1 CD8 SP thymocytes reach their adult steady state numbers at one week whereas TG6 CD8 SP thymocytes are well below adult levels at 21 days is likely a reflection of lengthy positive selection of TG6, resulting in a much longer time to fill the adult niche for CD8SP thymocytes. We agree with the reviewer that there could be additional important differences in positive selection between neonatal vs adult. We explore this topic and relate our data to recent published in the discussion (line 574) of the manuscript.

      With regard to point (2), our data suggest that the longer time for positive selection is a result of both a longer search phase and a longer progression phase. Specifically, the % of CD69+ cells (Fig 3b and Figure 3—figure supplement 2a) peaks at 24 hours for OT1 and F5, but is delayed until 48 hours for TG6, consistent with a 1-2 day delay in the “search phase” for TG6. However, if this initial search phase was the only factor contributing to delayed TG6 development then we might expect to see a 1-2 day lag in TG6 development compared to OT-1. However, as discussed above, the EdU data indicates a > 3 day lag in the appearance of TG6 CD8SP compared to OT1. Thus, there is evidence that both the search phase and the progression phase of positive selection are longer in thymocytes with low self reactivity.

      (4) The conclusion that "ion channel activity may be an important component of T cell tuning during both early and late stages of T cell development" is not supported by any data provided. The authors have shown an interesting association between levels of expression of ion channels, their self-affinity and the thymus selection stage. However, some functional data on their expression playing a role in either the strength of TCR signaling or progression through the thymus (for instance using thymic slices and the level of CD69 expression over time), would be needed to make this assertion. Moreover, from how the data is presented it is difficult to follow the conclusion that a 'preselection signature' is retained by the low but not the high self-reactivity thymocytes.

      We agree that a role for ion channel activity in T cell tuning is speculative at this point, and we have tempered our conclusions in the revised manuscript. With regard to the evidence that a preselection signature is retained by thymocytes with low self reactivity, this conclusion is based on 2 separate lines of evidence presented in Figure 6 (previously Figure 7 in the original submission) and Figure 6—figure supplement 2. To summarize: 1). We defined a “preselection” gene signature based on preselection (CD69-DP) wild type thymocytes from the ImmGen microarray data, and show that this set of genes is also tends to be more highly expressed in thymocytes of low vs high self reactivity (TG6>F5>OT1) at equivalent stages of development (Fig 6d). 2). We identify a set of ion channel genes (cluster 2a from Fig 6c) that are more highly expressed in thymocytes of low vs high self reactivity (TG6>F5>OT1), and are also more highly expressed in earlier stages of positive selection for each TCR. This trend can also be seen in Figure 6— figure supplement 2c when comparing the expression of all cluster 2 ion channel genes across the wild type thymocyte subsets from ImmGen microarray data. Again, expression of this gene set peaks in the DP CD69- (preselection) population compared to other stages, including the preceding (DN4) and following (DP CD69+) stages of thymocyte development. We have edited this part of the results section in the revised manuscript to improve clarity.

    1. Reviewer #3:

      The prevalent treatment options for LSCC are limited in efficacy. Through genetic inactivation of Usp28 in a novel lung cancer mouse model, and chemical inhibition of Usp28 in induced LSCC in mice and human LSCC xenograft tumors, the authors demonstrated the specific dependency of LSCC (but not LADC) on the protein deubiquitinase Usp28. The authors also showed that loss of Usp28 by either means leads to depletion of the oncoproteins c-Myc, p63 and c-Jun in LSCC. Finally, the authors described a novel small molecule that is specific for Usp25/28 among a group of assessed deubiquitinases. Based on these results, the authors suggested chemically targeting USP28 as a potential therapeutic option for human LSCC patients.

      Strengths: The presentation of the work is clear, concise and easily readable. The data presented largely supports the authors' conclusions on the role of USP28 in LSCC tumorigenesis and that inhibition of USP28 is a viable therapeutic option for LSCC treatment. The generation of the KFCU mice model that can give rise to both LADC and LSCC concurrently is interesting and presents a valuable tool for the wider cancer community.

      Weakness: The manuscript can benefit from a deeper analysis of the relationship between FBW7 and USP28 in patient cohorts. A comparison of the activity/efficacy of FT206 to existing USP28 inhibitors will also be helpful.

    2. Reviewer #2:

      In this work Ruiz et al, use a couple of elegant mouse genetic models - KFCU (Fbxw7 deletion and mutant Ras over-expression) and KPCU (p53 deletion and mutant Ras over-expression) - to generate both LADC and LSCC tumors. Using this system, the authors show that deletion of USP28 resulted in less LSCC but not LADC tumor formation. However, both tumor types showed an overall decrease in tumor size (in KFCU; data are not shown in KPCU). These results are the genetic proof of concept that USP28 inhibition will be particularly detrimental in the context of LSCC tumors. They further test a compound (FT206) that was previously found to target USP28 and show that indeed this compound is specific for USP28 binding among USPs and can reduce the tumor numbers and size only in LSCC tumors and not LADC in the KF model and in three separate LSCC cell line xenograft models. Altogether, they make the argument that targeting LSCC tumors with chemical inhibitors of USP28 is a promising clinical strategy for LSCC cancers. Overall this paper is interesting and the results provided in vivo are strong and nicely demonstrate an on-target effect of FT206 and its specificity in LSCC tumors. The work is very similar to a recent publication of (Prieto-Garcia EMBO Mol Med 2020) describing very similar results for USP28 dependency in LSCC tumors and previous findings regarding the chemical matter used in this paper (FT206).

      The major strengths of this paper is that the authors use several very elegant mouse models to establish that Usp28 is a good candidate target for potential therapeutic development designated for LSCC patients. They also show the proof of concept using a compound that is described as a Usp28 inhibitor (FT206). It should be noted that much of the genetic data, showing the importance of Usp28 in LSCC was previously described (Prieto-Garcia EMBO Mol Med 2020) including the potential benefit of chemical inhibition of USP28 . A potential weakness is that there is no rigorous characterizing of Usp28 substrate ubiquitination and degradation following FT206 treatment. This work will likely motivate the development of the USP28 inhibitor(s) for further preclinical assessment in Usp28 dependent tumors such as LSCC.

    3. Reviewer #1:

      The authors investigate a role for a candidate new inhibitor of USP28 in destabilizing c-MYC to reduce the growth of lung squamous carcinomas. They demonstrate that c-MYC levels are higher in lung squamous cell carcinomas (LSCC) versus lung adenocarcinomas (LADC), and depletion of c-MYC reduces LSCC cell growth. The deubiquitinase USP28 is known to stabilize c-MYC; the authors show that depletion of USP28 also decreases c-MYC protein levels. USP28 action opposes that of a ubiquitin complex targeted by the FBXW7 tumor suppressor. The authors create a new mouse model in which FLP recombinase initially causes deletion of FBXW7 and activation of KRAS to cause tumorigenesis with LSCC and LADC, followed by tamoxifen-dependent CRE recombinase deletion of USP28. Loss of USP28 in this model reduced numbers of LSCC but not LADC, and led to decreased expression of c-MYC and other short-lived proteins such as c-JUN and deltap63. A limitation of the data shown is that tumor number calculations are shown for a relatively small number of mice. Deletion of USP28 also did not restrict LADC growth in a second mouse model, with tumors forming based on activation of KRAS and loss of TP53. The authors then describe a compound, FT206, which they show is a specific inhibitor of USP28 among other ubiquitinases. They demonstrate that this compound reduces expression of c-MYC, c-JUN, and deltap63, but do not demonstrate this effect is directly mediated through USP28. They also show FT206 reduces growth of LSCC but not LADC in the KRAS/FBXW7 tumor model, and in human LSCC xenografts. These latter data suggest the compound FT206 may be useful as a lead compound. However, the current data are not sufficient to demonstrate FT206 binding and biological effect is specific for USP28, as the compound may also bind and regulate other non-deubiquitinase proteins.

    4. Summary:

      This paper is of general interest to cancer biologists focusing on identifying new targets for cancer therapy particularly in the context of squamous cell lung carcinoma. The authors demonstrate that genetic ablation of the deubiquitinase USP28 reduces the growth of lung squamous cell carcinomas but not lung adenocarcinomas in a mouse model of lung cancer, and that that this restriction of growth is accompanied by loss of expression of several USP28 targets. They also describe activity of a new small molecule compound in controlling the growth of lung squamous cell carcinomas in mouse genetic and xenograft models, and reducing expression of USP28 targets. They demonstrate that USP28 is one target of the newly identified compound, but they do not establish whether it is the only and biologically relevant target of this compound.

      Reviewer #3 opted to reveal their name to the authors in the decision letter after review.

    1. Reviewer #3 (Public Review):

      In this manuscript, authors seek to resolve conflicting models for corepressor function using the elegant synthetic auxin response system. Auxin signaling is governed by a de-repression paradigm and is ideally suited to interrogate co-repressor function - in this case, the TOPLESS (TPL) co-repressor. Several contradicting models have been put forward for the mechanism of TPL-mediated gene repression, ranging from a requirement for protein oligomerization for activity, interaction with distinct partners, and even which regions of the protein are required for repressive activity. Leydon et al use the yeast-based synthetic auxin response system to interrogate these models using a single reporter locus, allowing for straight-forward examination of TPL function.

    2. Reviewer #2 (Public Review):

      In this manuscript, the authors studied the specific domains of the plant A. thaliana TPL corepressor using a synthetic auxin response circuit (ARC) in the yeast S. cerevisiae that allows to monitor the repression and response to auxin of the reporter expression. Two domains of TPL corepressor that independently contribute to repression in this system were identified. Moreover, one of these domains interacts with Med21 and Med10 Mediator subunits. The authors show that this interaction is required for TPL-mediated auxin-responsive repression in plants. On the contrary to some repression models, they propose that multimerization of TPL is not required for repression mechanisms. Taken together, the work provides important information on auxin-responsive repression mechanisms involving TLP corepressor and the Mediator complex.

      A lot of work was done to analyze the TPL domains and critical residues involved in repression using ARC system, TPL interaction with Mediator using yeast cytoSUS and two-hybrid assays, completed by CoIP experiments with yeast and plant extracts. Point mutations, small deletions or Anchor Away-mediated depleted strains were used to analyze their consequences on TPL-Mediator interactions and auxin-responsive repression in artificial system in yeast and directly in plants.

      The mechanism of how TPL-Mediator interaction is involved in auxin-responsive repression remains to be determine. No results were provided in the manuscript on the composition of Mediator upon auxin induction and a discussion sentence that "as supported by our synthetic system, auxin-induced removal of TPL is sufficient to induce changes in the composition of the Mediator complex" is not supported by the results. In general, the transition between transcriptionally repressed and active states was not analyzed. The authors have made considerable efforts to answer the reviewers' criticism and to include a number of new experiments and approaches. However, several points and conclusions need to be further developed and specified. In particular, CoIP experiments in plant extracts lack a negative IP control to conclude on the specificity of CoIP signal. Moreover, the relevance of ChIP experiments on yeast plasmid remains questionable and appropriate control regions (chromosomal ACT1 gene body is completely inappropriate as a background for Pol II ChIP), regulatory, core promoter and transcribed regions, as well as experiments with untagged control strains should be added. The ChIP occupancy was analyzed only in transcriptionally repressed state and essentially on a plasmid and no results are provided for transition to the active state.

      Many problems with inappropriate citations for Figures or Figure panels did not facilitate the reading of the manuscript.

    3. Reviewer #1 (Public Review):

      In this study, Leydon et al., use an elegant multi-component genetic system to address the mechanisms of repression by the Arabadopsis TOPLESS (Tpl) protein. Taking advantage of the genetic tools and knowledge of the structure of the Tpl protein the authors determine two short alpha helical regions that act as independent repression domains. They provide evidence that the target of one of these domains is the N-terminal region of the Med21 subunit of the mediator complex. Chromatin immunoprecipitation experiments, anchor-away loss of function and co-immunoprecipitation assays indicate that Tpl mediated repression involves formation of a promoter complex comprising the mediator complex along with several general transcription factors, but lacking RNA polymerase II. The authors also show that Tpl-Med21 interactions are involved in Tpl mediated repression in plants.

    4. Evaluation Summary:

      In this study, Leydon et al. use an elegant multi-component genetic system to address the mechanisms of repression by the Arabadopsis TOPLESS (Tpl) protein. Taking advantage of the genetic tools and knowledge of the structure of the Tpl protein, the authors determine two short alpha helical regions that act as independent repression domains, with the target of one of these domains being the N-terminal region of the Med21 subunit of the mediator complex. Experiments are presented that indicate that Tpl mediated repression involves formation of a promoter complex comprising the mediator complex along with several general transcription factors, but lacking RNA polymerase II. The experimental data comes from both heterologous experimental systems in yeast and the native plant setting and involves diverse but complementary experimental approaches that converge towards a model for gene repression. This paper will be of interest to researchers investigating the mechanisms regulating gene expression, in particular how specific protein-protein interactions repress gene expression.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their names with the authors.)

    1. Reviewer #3 (Public Review):

      In this manuscript, the authors aim to elucidate the evolutionary history of the paired NLRs Pik-1/Pik-2 in rice. They ask two primary questions:

      (1) When (in evolutionary history) did the paired Pik-1/Pik-2 locus arise and when was the integrated domain integrated into the locus?

      (2) Has the binding affinity of the integrated domain changed over evolutionary time?

      The authors convincingly demonstrate that the integrated domain is undergoing positive selection, that its integration is ancient (~15MYA) and that inferred ancient alleles bind modern AVR-PikD with poor affinity. The subsequent biochemistry experiments and structural analyses identify which residues are important for interactions with AVR-PikD and which allelic combinations induce autoimmunity.

      The biochemical work, while interesting in and of itself for identifying the interacting residues and interactions between domains, was less informative about the evolution of the NLR-effector interaction, and most of the work did not advance our understanding of the questions listed above. The most emphasized biochemistry finding was that of reduced binding affinity of ancestral Pik-1 integrated domain. Specifically, the authors demonstrate that modern AVR-PikD has poor affinity with the ancient Pik-1 integrated domain. From this result the authors infer that ancestral Pik-1 likely bound a different effector. But it was not clear how the authors ruled out binding to an ancient AVR-PikD? I was confused as to why the authors excluded this possibility. Perhaps the authors contend that the absence of the Avr-PikD in other modern blast lineages indicates Avr-PikD is unique to modern rice-infecting M. oryzae. But this modern absence does not preclude Avr-PikD in the ancestral population. Furthermore, changes in binding over time would be the effective null hypothesis in the scenario of coevolving NLR and effector. Their finding seems consistent with expectations of coevolution, a phenomenon that has been widely reported in interactions between NLRs and effectors. The novelty in this manuscript stems from the synthesis of molecular evolution analysis with ancestral state reconstruction and testing.

      Overall this manuscript is exemplary in its integration of biochemical and evolutionary analyses to study plant-pathogen coevolution. While the findings are unsurprising, future emulation of this type of data integration will likely lead to significant insight into the coevolution of plants and their pathogens.

    2. Reviewer #2 (Public Review):

      In this study, Bialas et al. aimed at understanding the evolution of the diversity of Pik-1 immune receptors. First, using phylogenetic and selection analyses they determined that the Pik family of immune receptors is present in multiple grass species, with both Pik-1 and Pik-2 evolving before the radiation of the PACMAD and BOP clades. The author dated the insertion of an HMA domain in a Pik-1 subclade before the radiation of the Oryzinae and detected signs of positive selection on this domain. Using a combination of ancestral sequence reconstructions and biochemistry they determined that two of the extant Pik-1 haplotypes (Pikp-1 and Pikm-1) evolved independently the ability to associate at high affinity with the AVR-PikD effector following two different evolutionary paths. The authors determined that the increased binding correlates at least in one case with the improved ability to induce cell death when co-expressed in tobacco leaves with Pik-2 and AVR-PikD.

      Main strengths:

      The study combines a large diversity of methods to comprehensively address an important question. Despite the large amount of presented data (including a large number of variant names) it was a pleasure to read this very well structured manuscript. The work conducted here by the authors on the ancestral sequence reconstruction, the chimera and the biochemical assays (on two haplotypes!) is impressive and supports a very exciting conclusion. The presentation of all the experimental replicates as supplementary figure is a model of transparency and strengthen the conclusions.

      Weaknesses:

      The conclusions reached by the authors are mostly supported by the presented data, although there are a few points that need to be clarified. The Pik-1 phylogeny (Fig 1A): From the phylogenetic tree presented in Figure 1A it seems that Pik-1 experienced a duplication before the radiation of the BOP and PACMAD clades, with varying patterns of gene retention/loss (for instance loss of both copies in Brachypodium, loss in one clade for maize) and expansion (massive in wheat for instance in the clade where the fusion with the HMA domain did not occur, not in the other). I did not find this point discussed in the manuscript, although this could have an important impact. This would support the hypothesis that the HMA integration occurred before the radiation of the PACMAD clade. A better resolved phylogeny is needed to further test this possibility. In that context, the nomenclature should restrict the Pik-1 name to the actual orthologs, changing the number of Pik-1 per species (in panel 1D for instance).

      In Figure 4C and S13 the Pikp-1 variant I-N11 seems to associate more significantly with AVR-PikD than all the other variants, including I-N2 that was selected for the swap experiments. The reason why I-N2 was selected over other options (including I-N11) should be better explained.

      The correlation between evolution of high-affinity binding to AVR-PikD and the ability to induce immune response should be tested in reconstructed ancestral Pikm-1 variants. The presented data demonstrate nicely the gain of high-affinity binding in Pikm-1, but the impact this may have on the actual immunity function was not tested. It would be important to know whether additional mutations were required or not to turn the ancestral Pik1 into a functional Pikm-1 given that it is the basis for the model proposed in Figure 9. Alternatively, as the result of this experiment would not contradict the model even in absence of immune abilities (it would just add one extra step from high-affinity binding to immune function) the authors could propose this second evolutionary scenario as a supplementary figure.

      The nomenclature used for the Pik variants is not consistent throughout the manuscript, please homogenize as it is not always easy to follow.

      I am not familiar with the besthr R library used for the statistical analyses of the cell death assays, and I am not an expert in biochemistry (SPR, cristal structure) and cannot properly evaluate these aspects of the work.

    3. Reviewer #1 (Public Review):

      This paper was a pleasure to read. It is a tour-de-force study that is well-written, clear, and transparent. The study recounts how the HMA domain became integrated into the Pik NLRs and how it evolved higher affinity binding to a pathogen effector. Strikingly the authors demonstrate adaptability of distinct regions of the HMA:effector interface on two Pik NLRs, driving the convergent evolution of high-affinity binding to the effector. The study furthermore provides a framework for understanding protein evolution in the context of host-microbe interactions. The breadth and depth of the experiments that support the authors conclusions is extraordinary in my view.

    4. Evaluation Summary:

      Convergent evolution is often observed in nature, but the molecular mechanisms allowing similar functions to independently emerge are rarely understood. This work determines how the high-affinity recognition of a pathogenic effector produced by the rice blast fungus, Avr-PikD, evolved in the immune receptor Pik-1. The integration of molecular evolution analyses with structure-function biochemical testing is novel to the field and the data quality is exceptional. In addition to advancing knowledge of host-microbe co-evolution, this work is exemplary in its transparency and the breadth of approaches utilized to understand protein evolution, and we expect that this study will provide a conceptual framework for similar studies in the future.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

    1. Reviewer #3 (Public Review):

      In this study, the authors present a high-resolution single-cell transcriptomic atlas of the pancreatic ductal tree. Using a DBA+ lectin sorting strategy murine pancreatic duct, intrapancreatic bile duct, and pancreatobiliary cells were isolated and subjected to scRNA-seq. Computational analysis of the datasets unveiled important heterogeneity within the pancreatic ductal tree and identified unique cellular states. Furthermore, the authors compared these clusters to previously reported mouse and human pancreatic duct populations and focused on the functional properties of selected duct genes, including Spp1, Anxa3 and Geminin. Overall, the results presented here suggest distinct functional roles for subpopulations of duct cells in maintenance of duct cell identity and implication in chronic pancreatic inflammation. Finally, such detailed analysis of the pancreatic duct tree is relevant also in the context of cancer biology and might help elucidating the transition from pancreatitis to pancreatic cancer and/or different predisposition to cancer.

      The study is very well done, with careful controls and well-designed experiments.

    2. Reviewer #2 (Public Review):

      In this study the authors address the heterogeneity of the mouse ductal cell at the single cell level and conduct functional studies for selected marker genes. They isolated duct cells using the DBA lectin as a molecular surface marker. This is an noteworthy approach as it does not rely on the specificity and expression levels of reporter lines. Isolated cells contained a majority of non-duct cells that were identified by their transcriptomic profile and excluded from further analysis. The transcriptomic profiles of bona fide duct cells were then subjected to standard analyses for differentially expressed genes, activated pathways and lineage relationships. Of particular interest is the comparison of these data with human data from a recently published study that used a different sorting strategy for duct cells. As more studies at the single cell level are conducted, these types of comparisons need to become part of them in order to derive commonalities and identify deficits due to methodological or technological limitations. The study was by necessity descriptive up to this point and the authors addressed this with functional studies on SPP1 and GMNN which suggested that SPP1 is necessary for the maintenance of the ductal differentiated phenotype whereas GMNN protects cells against DNA damage during increased proliferation triggered by chronic pancreatitis.

      It is an interesting study, but there are caveats, particularly concerning the functional studies. The functional analysis of SPP1 needs to be strengthened and some findings on the the analysis of GMNN clarified. There is also an over reliance on the outcome of pathways analyses and upstream regulators which are often treated as actual findings rather than possibilities to be explored in this or future studies. The single cell RNA Seq analysis would benefit from reducing speculation and restrict descriptions to the essential features of each cluster. Main figures for this analysis could also be simplified along the same lines.

    3. Reviewer #1 (Public Review):

      The study by Hendley et al takes advantage of duct-specific DBA-lectin expression to purify pancreatic ductal populations that were then subjected to scRNA-seq analysis. The ability to enrich for this relatively low abundant pancreatic cell population resulted in a more robust dataset that had been generated previously from whole pancreas analyses. The manuscript catalogs several different gene clusters that delineate heterogeneous subpopulations of three different pancreatic ductal subpopulations in mice: mouse pancreatic ductal cells, pancreatobiliary cells, and intra pancreatic bile duct cells. Additional comparisons of the resulting data sets with published embryonic and adult datasets is a strength of the study and allows the authors to subclassify the different ductal cell populations and facilitates the identification of potentially novel subpopulations. Pseudotime analysis also identified gene programs that led the authors to speculate the existence of an EMT axis in pancreatic ducts. Overall, the data analyses is strong, but the authors tend to draw conclusions that are not fully supported by the presented data.

      The second half of this study focuses on three candidate proteins that were identified in the transcriptome analysis - Anxa3, SPP1 and Geminin. Crispr-Cas9 was used to delete each gene in an immortalized human duct cell line (HPDE). Deletion of each gene resulted in increased proliferation; SPP1 mutant cells also displayed abnormal morphology. Additional functional studies of the cell lines or in mouse models suggested a role for SPP1 in maintaining the ductal phenotype and Geminin in protecting ductal cells from DNA damage, respectively. Although the provided phenotypic analysis suggest important functional roles for these proteins, follow up studies will be required to fully understand the role of these genes in homeostatic or cancer conditions.

      Strengths:

      1) Enrichment of pancreatic ductal populations enhanced the robustness of the scRNA-Seq dataset

      2) Quality of the sequencing data and extensive computational analysis is extremely good and more comprehensive than previously published datasets

      3) Comparative analysis with existing mouse and human data sets

      4) Use of human ductal cell lines and mouse models to begin to explore the function of candidate ductal genes.

      Weaknesses:

      1) There are many suppositions based on gene expression changes that are somewhat overstated.

      2) The conclusion that there is an EMT axis in pancreatic ducts is not fully supported by the gene expression and immunofluorescence data

      3) A good rationale for choosing Anxa3, SPP1 and Geminin for additional functional analysis is not provided. In addition, it isn't clear why Anxa3 function isn't pursued further.

      4) Although extensive models (transplanted cells for SPP1 and mouse conditional KOs for Geminin) were generated, the functional analysis for each gene is preliminary; additional longer term studies will be necessary to fully understand the role of these proteins in pancreatic duct development and cancer.

    4. Evaluation Summary:

      In this study, the authors present a high-resolution single-cell transcriptomic atlas of the pancreatic ductal tree. Their analysis unveiled important heterogeneity within the pancreatic ductal tree and identified unique cellular states. Overall, the results presented here suggest distinct functional roles for subpopulations of duct cells in maintenance of duct cell identity and implication in chronic pancreatic inflammation. Finally, such detailed analysis of the pancreatic duct tree is relevant also in the context of cancer biology and might help elucidating the transition from pancreatitis to pancreatic cancer and/or different predisposition to cancer.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #3 (Public Review):

      The new models proposed here provide some potentially useful alternatives to estimating the generation time, serial interval, and the relative infectiousness of pre-symptomatic infections. The framing of the paper seems very focused on improving fits to the transmission pair data, however, and I think it would be more impactful to consider the implications of poor estimation of pre-symptomatic transmission and the generation time. I think this shift in focus could also help strengthen the narrative of the paper, which wavers between focusing on model fitting and the importance of implications for contact tracing.

      I was a bit lost in the application of the models to the contact tracing example. The definition of the contact elicitation window (lines 142-144), where identification of contacts would occur up to x days prior to contact symptom onset, makes sense theoretically in this model comparison setting, but it is hard to translate these findings to real-world application. Are there any implications that could be useful for informing contact elicitation strategy (e.g., for how many days after time of infection or symptom onset could contact tracing have a measurable benefit in preventing onward transmissions?)

      Lines 147-151: Given that the impact on onward transmission events is so dependent on the contact tracing assumptions, I would recommend stating the assumptions explicitly here, reporting the results in relative terms as compared to a single model, or both.

      How different are the variable infectiousness model results from parameter estimates from the original studies that reported the transmission pairs data?

      Can the authors comment on the plausibility of the infectiousness distribution in their new proposed models? While better model fitting certainly provides a measurable improvement to leveraging existing data, I'm not aware of studies that support the discontinuous assumptions about infectiousness made here.

      Assuming alpha means the same thing across the models, why is the 95% credible interval so large for the Feretti model? In general, the model parameters should be more clearly explained for this model.

    2. Reviewer #2 (Public Review):

      In this analysis, the authors consider the impact of the duration of infectiousness of a person infected with COVID-19 prior to the appearance of clinical signs. This is an important problem, as identification of disease status often relies on a self-reporting, i.e. from people experiencing clinical signs, and in the case of COVID-19 in the UK, where they have then gone on to test positive (typically with a PCR test). The greater the proportion of transmission that occurs before clinical signs appear then, the less likely that methods based on self-reporting will be sufficient to contain epidemic spread.

      The general problem is well known, with examples of previous analyses including for livestock diseases such as foot-and-mouth disease (see for example, Haydon et al. 1997 https://doi.org/10.1093/imammb/14.1.1 and the very many papers on the 2001 FMD epidemic), and most importantly the seminal paper by Fraser et al. on the SARS-CoV-1 pandemic which laid out the problem in extensive detail https://doi.org/10.1073/pnas.0307506101. In the analyses of the current SARS-CoV-2 dynamics, the authors refer to the paper by Feretti et al. (https://doi.org/10.1126/science.abb6936) which at this point represents the most prominent analysis of this type that is directly relevant to the current pandemic. More broadly, issues with exponential distributions and the impact that their use has on analyses of infection dynamics and epidemic behaviour have been well studies in other systems such as measles (e.g. Lloyd 2001 https://doi.org/10.1006/tpbi.2001.1525, and Conlan et al. 2009 https://doi.org/10.1098/rsif.2009.0284). It would be helpful for the paper to refer to this broader literature in order to contextualise the analysis though this does not of course detract from the relevance to the current COVID-19 pandemic.

      In this analysis the authors show that, by choosing a pre-infectious period that is explicitly excludes any probability of infection, they achieve a better fit to the distribution of serial interval for a large number of known transmission pairs (previously analysed in the Ferretti paper). This is an entirely sensible result and a good use of a better mechanistically informed idea of the infection process (in essence, here incorporating explicitly the inevitable delay between virus entering the body, and a person becoming infectious).

      By examining the proportion of infections that would be captured by contact tracing when considering a two-day window prior to symptom onset, they show a substantially greater efficacy for contact tracing, compared to a more standard compartmental modelling approach (where the duration of each consecutive period is independently determined).

      While the analysis itself is detailed and thoroughly explained I have some questions regarding the utility of the result when making the comparison to other models. As noted earlier, the fundamental problem is already well known, and the application to COVID-19, while useful, is better than poorer models, but only marginally better performing than the Ferretti model. The serial interval estimates are only slightly better (figure 2), there are 84% of contacts when considering tracing two days prior to symptoms, compared to what looks like about 80% for the alternative in figure 4 and by the looks of the violin plots from figure 3, quite a bit of overlap if one considers credible intervals.

      As such, while the analysis is a solid, useful addition to the literature, it could use a better exposition on how it advances scientific insight (the fundamental issues regarding exponential distributions having been identified previously), methodologically (given the thorough analysis by Fraser et al in 2004) or in terms of impact (given the limited improvement over the Ferretti model).

    3. Reviewer #1 (Public Review):

      The authors develop a mechanistic model for inferring infectiousness profile from data on times of symptom onset in pairs of infector-infectee. The novelty of their approach lies in assuming that infectiousness of an infected individual depends also on the whether or not they have symptoms. The authors fit a data set of time of symptom onset in 191 transmission pairs to a model that assumes that infectiousness varies along the incubation period. They compare the model fit to fits from models and find that their model of differential infectiousness explains the data better than the other models considered.

      This is a carefully constructed study, and the conclusions are well supported by the analysis carried out. My only concern is that the data used were obtained during the early stage of the pandemic (January to February 2020). As the pandemic was growing in most countries during this time, we are more likely to have observed shorter serial intervals. Similarly, as isolation of infected individuals would prevent them from transmitting further, longer serial intervals are likely to be under-represented in the data. Indeed, the longest serial interval in the data used was 5 days. It would be interesting to understand whether the conclusions about the proportion of onward transmissions averted by contact tracing and subsequent isolation still hold as the pandemic progresses, and we continue to observe longer serial intervals. If the authors are unable to find more recent data, this caveat should be clearly discussed.

    4. Evaluation Summary:

      The manuscript uses a new approach to model the infectiousness profile of COVID-19 infected individuals. The work suggests a higher proportion of pre-symptomatic infectiousness in COVID-19 than the current evidence. The findings are of great interest to public health policy makers. The methodology is of general interest to modellers working on COVID-19.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

  4. Mar 2021
    1. Reviewer #2 (Public Review):

      Summary:

      Frey et al develop an automated decoding method, based on convolutional neural networks, for wideband neural activity recordings. This allows the entire neural signal (across all frequency bands) to be used as decoding inputs, as opposed to spike sorting or using specific LFP frequency bands. They show improved decoding accuracy relative to standard Bayesian decoder, and then demonstrate how their method can find the frequency bands that are important for decoding a given variable. This can help researchers to determine what aspects of the neural signal relate to given variables.

      Impact:

      I think this is a tool that has the potential to be widely useful for neuroscientists as part of their data analysis pipelines. The authors have publicly available code on github and Colab notebooks that make it easy to get started using their method.

      Relation to other methods:

      This paper takes the following 3 methods used in machine learning and signal processing, and combines them in a very useful way. 1) Frequency-based representations based on spectrograms or wavelet decompositions (e.g. Golshan et al, Journal of Neuroscience Methods, 2020; Vilamala et al, 2017 IEEE international workshop on on machine learning for signal processing). This is used for preprocessing the neural data; 2) Convolutional neural networks (many examples in Livezey and Glaser, Briefings in Bioinformatics, 2020). This is used to predict the decoding output; 3) Permutation feature importance, aka a shuffle analysis (https://scikit-learn.org/stable/modules/permutation_importance.htmlhttps://compstat-lmu.github.io/iml_methods_limitations/pfi.html). This is used to determine which input features are important. I think the authors could slightly improve their discussion/referencing of the connection to the related literature.

      Overall, I think this paper is a very useful contribution, but I do have a few concerns, as described below.

      Concerns:

      1) The interpretability of the method is not validated in simulations. To trust that this method uncovers the true frequency bands that matter for decoding a variable, I feel it's important to show the method discovers the truth when it is actually known (unlike in neural data). As a simple suggestion, you could take an actual wavelet decomposition, and create a simple linear mapping from a couple of the frequency bands to an imaginary variable; then, see whether your method determines these frequencies are the important ones. Even if the model does not recover the ground truth frequency bands perfectly (e.g. if it says correlated frequency bands matter, which is often a limitation of permutation feature importance), this would be very valuable for readers to be aware of.

      2) It's unclear how much data is needed to accurately recover the frequency bands that matter for decoding, which may be an important consideration for someone wanting to use your method. This could be tested in simulations as described above, and by subsampling from your CA1 recordings to see how the relative influence plots change.

      3)

      a) It is not clear why your method leads to an increase in decoding accuracy (Fig. 1)? Is this simply because of the preprocessing you are using (using the Wavelet coefficients as inputs), or because of your convolutional neural network. Having a control where you provide the wavelet coefficients as inputs into a feedforward neural network would be useful, and a more meaningful comparison than the SVM. Side note - please provide more information on the SVM you are using for comparison (what is the kernel function, are you using regularization?).

      b) Relatedly, because the reason for the increase in decoding accuracy is not clear, I don't think you can make the claim that "The high accuracy and efficiency of the model suggest that our model utilizes additional information contained in the LFP as well as from sub-threshold spikes and those that were not successfully clustered." (line 122). Based on the shown evidence, it seems to me that all of the benefits vs. the Bayesian decoder could just be due to the nonlinearities of the convolutional neural network.

    2. Reviewer #1 (Public Review):

      In the current manuscript, Frey et al. describe a convolutional neural network capable of extracting behavioral correlates from wide-band LFP recordings or even lower-frequency imaging data. Other publications (referenced by the authors) have employed similar ideas previously, but to my knowledge, the current implementation is novel. In my opinion, the real value of this method, as the authors state in their final paragraph, is that it represents a rapid, "first-pass" analysis of large-scale electrophysiological recordings to quickly identify relevant neural features which can then become the focus of more in-depth analyses. As such, I think the analysis program described by the authors is of real value to the community, particularly as it becomes more commonplace for labs to acquire multi-site in vivo recordings. However, to maximize its utility to the community, several aspects of the analysis need clarification.

    3. Evaluation Summary:

      Frey et al. describe a convolutional neural network capable of extracting behavioral correlates from wide-band LFP recordings or even lower-frequency imaging data. The analysis program described by the authors provides a rapid "first pass" analysis using raw, unprocessed data to generate hypotheses that can be tested later with conventional in-depth analyses. This approach is of real value to the community, particularly as it becomes more commonplace for labs to acquire multi-site in vivo recordings.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #3 (Public Review):

      In this manuscript, the authors use high-resolution live imaging to investigate how progenitor cells travel through an embryo to a distant site for differentiation and organ formation. The test case is the movement of dorsal forerunner cells (DFCs) in the zebrafish embryo, which give rise to a transient organ called Kupffer's vesicle that functions to establish the left-right body axis. DFCs are derived from enveloping layer (EVL) cells ~5 hours post-fertilization (hpf) and then move towards the vegetal pole of the embryo. They ultimately end up in the tailbud where they differentiate into epithelial cells to form Kupffer's vesicle between 10-11 hpf. Live imaging convincingly shows that EVL cells undergo apical constriction and delaminate from the EVL layer to form DFCs. Some DFCs remain connected to the EVL via ZO-1 enriched tight junction-like apical attachments. The authors propose that spreading of the EVL layer 'drags' the underlying DFCs towards the vegetal pole via these apical attachments. Supporting this model, EVL and DFCs co-migrate with the same speed and directionality, and perturbation of an actomyosin ring network in the yolk syncytial layer (YSL) disrupts movement of both EVL and DFCs. Between 8-9 hpf DFCs detach and are uncoupled from the EVL. The authors show that E-cadherin is necessary for DFC-DFC adhesion, and additional imaging experiments show that DFCs can extend long protrusions that 'capture' detached DFCs to facilitate clustering. Taken together, these data suggest an interesting drag mechanism for guiding progenitor cell movements, however the results presented do not fully demonstrate this mechanism, and alternative mechanisms were not thoroughly tested.

    2. Reviewer #2 (Public Review):

      This work analyses the movement of the dorsal forerunner cells (DFCs) and its interaction with the extra-embryonic enveloping layer (EVL). By doing high-resolution time lapse microscopy the authors characterize the movement of the DFCc showing that they delaminate from the epithelium by apical constriction but they remain attached to the superficial EVL. By doing laser ablations they show that the movement of the DFCc depends on the attachment and vegetal displacement of the EVL. However, they show that with some frequencies some DFCc are detached from the rest of the cluster, leading to some random movement or even being left behind and differentiating into other cell types. Importantly, they investigate an additional mechanism to explain the movement of the DFCc detached cells. They show that single cells generate protrusions that connect them with the DFCc cluster forming an E-cadherin junction. This paper makes an important contribution by adding some new mode of migrations during development. Most of the conclusion are supported by the experiments.

    3. Reviewer #1 (Public Review):

      Pulgar et al. describe an interesting mechanism explaining how directed motion of group of cells maintain their migratory path as a group of cells. Incomplete delamination allows here to maintain coordinated cell movements amongst the DFC. The story is self-contained, logical, well-written and just in general very nice. The mechanism described belongs to the so-called mechanical drag which is a new type of multicellular locomotion and may be a general feature involved in many morphogenetic systems.

      The major strength of the study is the extensive use of live imaging and analysis of dynamic events. The study provides a nice cellular mechanism in the process they described. The molecular mechanism would be the only weakness of the study.

      An overall very exciting study.

    4. Evaluation Summary:

      In this study, Pulgar et al. describe an interesting phenomenon addressing organ integrity in a unique example of collective cell migration. The group focused on the migration of the dorsal forunner cells (DFC), which will constitute the left-right organizer of the zebrafish. The authors show that DFCs retain apical contacts stemming from incomplete delamination and drag detached DFCs to their final destination. The study opens a number of exciting new questions related to the mechanism underlying the 'safeguards' process and the mechanism of coordination between migration and regulation of attachment.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #3 (Public Review):

      The authors study the leaf transcriptomes of males and females in 10 species of Leucadendron and infer genes expressed significantly differently between males and females (sex-biased genes, hereafter SBGs). Most SBGs in Leucadendron leaves evolved recently, suggesting that SBGs turnover (evolution and reversion) is very high because the genus is ancestrally dioecious since >10My. Using species in which the genes orthologous to SBGs are not sex-biased, the authors show that SBGs have high rates of expression evolution already before becoming SBGs. This suggests that most SBGs evolved under drift and the majority of SBGE (sex-biased gene expression) is evolving neutrally. This is confirmed by the estimated small proportion of SBGs evolving under adaptation (about 20% of SBGs have 5 fold higher expression divergence compared to polymorphism divergence, a mark of relatively recent adaptation). Also, SBGs are more tissue specific (less pleiotropic). Finally, the percentage of SBG is not correlated to the intensity of morphological dimorphism. All these findings go against the classical view that SBGE is driven by sex-specific selection for sexual dimorphism.

      The analyses are very cautious with well designed controls and randomizations.

      The results support well the conclusions.

      This study puts forward the role of drift in sex-biased gene expression, offering a new interpretation of this common evolutionary phenomenon.

    2. Reviewer #2 (Public Review):

      Scharmann et al. present a study of sex-biased gene expression as a function of sexual dimorphism in leaf tissue in the genus Leucadendron. Comparative studies of sex-biased expression across clades are still relatively rare, and this analysis tests some core findings of a recent paper (Harrison et al. 2015). Overall, I like the analysis and think it could be a valuable addition to the literature on sex-biased genes. This is particularly true given the difficulty of cross-species expression comparisons and the paucity of them in plants.

      However, there are some critical differences between the Harrison paper and the one here, and I think it would be helpful if the authors present them early in the text. Specifically, Harrison et al. (2015) was primarily focused on gonad tissue, which in animals is the site of the vast majority of sex-biased genes. In contrast, the authors here focus on vegetative (leaf) tissue, which is analogous to animal somatic tissue. None of the patterns that Harrison et al. (2015) observed and report from the gonad were evidence in the somatic tissue they assessed. Also, by looking at gonadal tissue, Harrison et al. (2015) focused on the tissue that produces gametes, which are thought to be subject to some of the strongest sexual selection pressures. The fairest comparison would be flower tissue in plants, so I am unsure how much of the Harrison results would be expected to hold up in leaf samples. This doesn't mean the authors should do the analyses they present, just that they should be a little more upfront about what they might reasonably expect to find.

      There is also a conflation at times in the paper between sexual dimorphism, which the authors can quantify in their leaf samples, and sexual selection. I explain this in more detail below, but to summarize here, I think the expectations for the relationship between sex-biased gene expression and sexual selection versus sexual dimorphism are somewhat distinct.

      Finally, I am a little concerned that the low numbers of sex-biased genes, expected from leaf tissue, offer limited power for some of the tests the authors want to do. Harrison et al. (2015) had hundreds of sex-biased genes from the gonad, and this power made it possible to detect subtle patterns. The authors have a few dozen sex-biased genes, and this makes it difficult to know whether their negative results are the result of low statistical power. That they find clear associations between pre-sex-biased genes and rates of evolution is quite impressive given this low power.

    3. Reviewer #1 (Public Review):

      *A summary of what the authors were trying to achieve.

      The study takes advantage of the interesting plant genus Leucadendron to compare gene expression between male vs. female in species with more or less sexual dimorphism. This question was addressed in a somewhat comparable manner in only one previous paper by Harrison et al. 2015 across six bird species. The overarching question is the role of natural selection in sexual dimorphism.

      *An account of the major strengths and weaknesses of the methods and results.

      -Beside the genus-wide comparison of whole transcriptomes across related species, which makes in itself a strong dataset, the major strength of the analysis is the phylogenetic framework that allows the authors to track the evolution of sex bias through several tens of million years of evolutionary history. Despite ancestral dioecy in the genus, very few genes show consistent sex bias across several species, with sex-bias being mostly species-specific. Two striking negative results will be of special interest to the community : 1) species with more pronounced sexual dimorphism at the morphological level do not tend to exhibit more pronounced sex-biased gene expression 2) the few genes that do show sex-biased expression were apparently recruited among those with the highest expression variance to begin with, strongly suggesting that sexual selection has not been the main force driving their expression divergence.

      -In my view, the main limitation of the work is the use of leaf rather than reproductive tissues, making the comparison to other studies less straightforward to interpret. It is especially important that the expectations for somatic vs gonadic tissues be made a lot clearer in the text. Also, the fact that a single leaf phenotype is measured (specific leaf area) seems arbitrary : one could imagine sexual dimorphism on many other characteristics, yet they are not considered here. The text on p.324 mentions "striking convergence in aspects of morphological dimorphism across the genus", but there is no way for the reader to appreciate the extent of this convergence. Finally, it would be useful to at least make some mention of the sex-determination system in these species, since the expectations would differ if some of the sex-biased genes were linked to sex chromosomes.

      *An appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      The analysis is mostly sound, but I am a bit concerned by the arbitrary threshold used to define SBGE. The text on p.305 says that "This result is extremely robust to the choice of threshold", but 1) the results are not reported so it is impossible for the readers to evaluate the basis of this assertion and 2) it is not clear whether robustness of the other results has been evaluated at all. This aspect clearly deserves more attention.

      *A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      This work will be of interest to the community, as rapid rates of expression evolution would generally be interpreted as the consequence of sex bias, whereas the phylogenetic analysis presented here instead supports the idea that the expression of genes that end up being sex biased were instead intrinsically less constrained to begin with.

    4. Evaluation Summary:

      This is one of the first studies to investigate sex-biased gene expresion in a broad phylogenetic context, and the first in a plant genus. The findings go against the classical view that sex-biased gene expression is driven by sex-specific selection for sexual dimorphism, and instead suggests that sex-bias preverentially evolved in genes that already had the highest expression variance to begin with. It will broadly appeal to researchers interested in the evolution of sexual dimorphism.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #4 (Public Review):

      The goal of the manuscript was to add to the research on the rates of success of African American/Black PI in their pursuit of NIH funding. The authors specifically addressed variability in funding levels of NIH Institutes and Centers(ICs). The authors were successful in identifying that there are differentials rates of award rates by IC. The authors describe that topic choice was not associated with funding after accounting for IC assignment which vary in their funding rates.

    2. Reviewer #3 (Public Review):

      This analysis focuses on funding success for a set of NIH R01-mechanism grant applications submitted between 2011 and 2015, with a focus only on those which had white and Black Principal Investigators (PIs). It is presented as a follow-up to the previously published paper from Hoppe and colleagues in 2019, uses the same population of applications and relies on the same analysis of application text to cluster these applications by topic. The authors set out to determine how success rates associated with the application's proposed topic may be determined by the success rates associated with the Institute or Center within the NIH to which the application had been assigned for potential funding. This is a critical and important investigation that is of high potential impact. The scholarship of the Introduction and Discussion, however, fails to convey this to the reader. There are many recent publications in the academic literature that address why a disparity of funding to AA/B investigators, and a disparity of funding of topics that are of interest to AA/B investigators, are such critical matters for the NIH to identify and redress. Similarly, the Discussion and Conclusions sections do not suggest any specific actions that may be recommended by these findings, which is an unfortunate oversight that limits the likely impact of this work.

      The significance of this work is limited by a number of methodological choices that are unexplained or have not been justified and therefore appear to be somewhat arbitrary. While it can be necessary to draw category lines in an investigation of this type, it is necessary to provide some indication of what would happen to the support for the central conclusions if other choices had been made. This includes the exclusion of multi-PI applications if the Black PI was not the contact PI, the definition of AA/B-preferred ICs as the top quartile (particularly given the distribution of success rates within this quartile), the definition of AA/B-preferred topics as the 15 word clusters that accounted for only half of the AA/B applications, and the ensuing inclusion of only 27% of the AA/B applications. Arbitrary choices to use only a subset of the data raise questions about what the conclusions would be if the entire dataset of grants assigned across all of the ICs, and on all of the topics, was used.

      A fundamental limitation to this manuscript is that the authors are relying on an indirect logic of analysis instead of simply reporting the success rates for applications with AA/B and white PIs within each IC. The primary outcome deployed in support of the central conclusion is a reduction of the regression coefficient for the contribution of PI race to award success and an elimination of statistically significant contribution of research topic preferred by AA/B applicants to the award success once IC success was partialed out. The former analysis is interpreted in imprecise terms instead of simply reporting what magnitude of effect on the white/Black success rate gap is being described. And the latter analysis appears to show a continued significant effect of PI race on award success even when the IC success rate is included. The much more intuitive question of whether award rates for white and AA/B applicants differ within each IC has not been addressed with direct data but the probit model outcome suggests it is still significantly different. This gives the impression that the authors have conducted an unnecessarily complex analysis and thereby missed the forest for the trees- i.e. even when accounting for IC award rates there is still a significant influence of PI race.

      The manuscript is further limited by atheism omission of any discussion of how and why a given grant is assigned to a particular IC (this is exacerbated by incorrect phrasing suggesting the applicant "submits an application to" a specific IC) and any discussion of the amount of the NIH budget that is assigned to a given IC and how that impacts the success rate. This is, at the least, necessary explanatory context for the investigation.

    3. Reviewer #2 (Public Review):

      The paper by Lauer et al provides further insight into the factors that might determine why RO1 applications from AAB (African American Black) principal investigators appear to fare worse than their white counterparts. Their work is derived from an earlier analysis published by Hoppe et al that found 3 factors determined funding success among AAB PIs. These included decision to discuss at study section, impact score, and topic choice. The latter, topic choice (community and population studies) appeared to represent more than 20% of the variability in funding gaps. This raised the question of whether there was reviewer bias at study sections. In the Lauer paper, after controlling for several of these variables, the authors found that the topic choice of AABs (ie. preferred topics) were indeed important in respect to funding, but they uncovered the fact that the topic choices occurred more frequent in ICs that had lower funding rates. Thus the authors conclude that the disparity between AAB and white investigator RO1s is very dependent on topic choice which ultimately ends up in larger ICs with lower funding percentiles.

      Overall the paper is relatively straightforward and could be important as It provides some additional data to consider. It is in fact basically a re-analysis of the Hoppe paper, but that is reasonable since that paper left many unanswered questions. Its implications however are less clear, and these raise additional questions of importance to the extramural scientific community as well as IC leadership.

      Overall the reader is left with the unsettling question: Can we just wish away these disparities based on IC funding rates? (Figure 1).

      1) Why would topic choice of community engagement or population studies fare worse at an Institute such as AI rather than at GM if both have the relatively same proportion of preferred topics, and both have relatively high budgets compared to other institutes. Is there one or more ICs that drive the correlations between IC funding and preferred topics or PIs?

      2) Since only 2% of all PIs are AAB does that represents another issue of low frequency relative to the larger cohort?

      3) It would be valuable to know if community engagement or population studies in total do worse than mechanistic studies. The authors do admit that preferred topics of AABs in general fare worse(Figure 2, Panel B).

      4) Another concern is that the data are up to 2015; it has now been five years and things have changed dramatically at NIH and in society. There are now many more multiple PI applications including AABs that may not be the contact PI yet are likely to be in a preferred topic area.

      5) There is nothing in the discussion about potential resolutions to this very timely issue; In other words we now know that the disparity in funding is such that AAB RO1s do worse than white PIs because they are selecting topics that end up at institutes with lower funding rates. Should the institutes devote a set aside for these topic choices to balance the portfolio of the IC and equal the playing field for AABs? Are there other alternative approaches?

    4. Reviewer #1 (Public Review):

      This manuscript by Lauer et al follows up on previous articles that ask the question whether there are funding disparities at the National Institutes of Health for African American or Black (AAB) investigators. The investigators breakdown the analysis by race, topic of proposal, and NIH institute-Center (IC) to which an application was assigned. They conclude that the most important factor in determining funding is the Institute assignment with lower funding rates related to the funding capacity of a particular Institute (e.g National Eye Institute vs Minority Health and Health Disparities). The present study is a welcome addition to this debate since if biases do exist, NIH needs to address these. The strengths of this manuscript are the detailed breakdown of the available data in order to evaluate for biases, the availability of data for multiple years (2011-2015) and the consideration of alternate explanations (e.g new applications vs resubmissions; single vs multi PI, etc). A weakness of the data is that if their conclusion is that Institute assignment was the main determinant of funding rates, why wasn't the approach for Institute assignment discussed? Are there possible biases in this assignment besides keyword searches? There is also the question of whether there is circular logic operating here. The Minority Health and Health Disparities received the most AAB applications but had one of the lowest funding rates. Wouldn't this Institute be expected to be one in which AAB applicants would try to direct their application to? This manuscript is sure to generate additional discussion on this topic which is an important step in trying to address the issue of potential funding disparities. However as the authors point out the fact that only 2% of the applications submitted to the NIH were from AAB investigators is of concern.

    5. Evaluation Summary:

      This paper provides the basis for further discussion about the perceived inequities in NIH funding based on race. The strengths of this manuscript are the detailed breakdown of the available data in order to evaluate for biases, the availability of data for multiple years (2011-2015) and the consideration of alternate explanations (e.g. new applications vs resubmissions; single vs multi PI). With that said, given their conclusion that Institute (IC) assignment was the main determinant of funding rates, the approach for IC assignment should have been discussed. Other issues relate to the complexity of statistical analyses and a lack of clarity on confounding issues towards firm conclusions.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their name with the authors.)

    1. Reviewer #2 (Public Review):

      Halliday et al. sampled plant communities and foliar fungal diseases along an elevation gradient in Swiss Alps, to test the potential relationship between environment, plant communities and diseases in the context of climate change. The authors confirmed that elevation can affect diseases by both abiotic and biotic factors, and, host community pace-of-life was the main driver for diseases along elevation. The topic is important and new, the study is well-designed, and the analysis is reasonable.

    2. Reviewer #1 (Public Review):

      Halliday et al. developed a framework to disentangle the total effect of environment on disease into a direct effect and indirect effects by environment-induced change of host community and by modifying the relationships between host community and disease.

      Applying this framework, the authors studied the direct and indirect effects of elevation on plant leaf disease in the Swiss Alps. They focused on host community structures as mediator of indirect effects. Host community structures were measured by host species richness, phylogenetic diversity, and community pace of life. One important finding is that the positive effect of host community pace-of-life on disease weakened as elevation increased, suggesting an important, but less appreciated, mechanism on how elevation can indirectly influence plant disease. However, since the major findings were based on the analyses with elevation but not specific environmental variables, it does not have that strong implications about the influence of global climate change on disease as the authors stated.

      The developed framework on environmental effects on disease, the well-designed filed study and the large-scale dataset would all make this paper an important contribution to the field.

      Overall, the statistical analyses were reasonable. However, accurate interpretations of some results would require more clarifications on the analyses.

    3. Evaluation Summary:

      This paper provides a framework for disentangling the direct vs. indirect effects of environment on disease, which should be of broad interest across domains of ecology, epidemiology and plant biology. The authors validate this framework with a well-designed field study of plant leaf disease across a large elevational gradient. Overall, the data analyses are appropriate, but a few aspects of interpretations could be improved.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

    1. Reviewer #2 (Public Review):

      In this manuscript, Lamers et al wanted to characterise the previously reported adaptation of SARS-CoV-2 to non-human (Vero) cells. Vero cells are commonly used by laboratories to grow experimental stocks of some viruses as these cells permit high titres of many viruses, they lack the ability to produce type I interferons (cytokines which could interfere with downstream assays), and their non-human nature means soluble factors in virus stocks are less likely to impact experiments in human cells. However, a number of reports have recently been published describing that growth of SARS-CoV-2 in Vero cells leads to loss of the SARS-CoV-2 Spike protein multibasic cleavage site (MBCS). This apparent adaptation to the Vero cell-line leads to a virus compromised in its ability to enter, and therefore replicate in, human cells, meaning that experimental results obtained in human cells using the Vero-adapted SARS-CoV-2 may not fully reflect the situation occurring with authentic SARS-CoV-2. It is therefore important for the research community to understand SARS-CoV-2 adaptation to laboratory cell-lines/conditions and to have propagation methods that are suitable for maintaining the authenticity of clinical virus isolates.

      The major finding of Lamers et al in this manuscript is that human cell-lines (e.g. Calu-3) and primary human organoid systems can be used to propagate clinical isolates of SARS-CoV-2 to high titres without the acquisition of 'laboratory adaptations'. To get to this finding, the authors carefully study the adaptation of a representative SARS-CoV-2 isolate in Vero cells, monitoring plaque size phenotypes and performing whole-genome deep sequencing to identify adaptive variants that appear in the viral Spike gene. These variants (including newly-described substitutions as well as deletions around the MBCS) are validated for their impact on viral infectivity in human and Vero cells using pseudovirus assays, fusion assays, and western blot assays, and their role in affecting the entry route of SARS-CoV-2 is dissected using pathway-specific inhibitors (such as camostat and E64D) and cell-lines with/without TMPRSS2 (an important protease for Spike cleavage). Importantly, using these assays and tools, the authors can make solid and well-reasoned arguments as to why SARS-CoV-2 adapts to Vero cells, and thus why certain culture conditions and cell substrates lead to a loss of SARS-CoV-2 genetic stability. Using similar tools, this also allows the authors to carefully study whether any adaptations occur when SARS-CoV-2 stocks are passaged in human cell substrates (such as Calu-3 or primary human organoids), and study culture conditions in Veros (such as expression of TMPRSS2) that prevent changes in SARS-CoV-2.

      The data in this manuscript are thorough and well-presented. Importantly, the conclusions are strongly supported by the data, particularly the overall take-home message that human cell substrates can be used to efficiently propagate SARS-CoV-2 isolates without introducing cell culture adaptations. However, beyond this simple message, the manuscript also provides new mechanistic insights into the reasons for such viral adaptations in the Vero cell system, and identifies previously undescribed adaptations in the MBCS region that will be valuable for other researchers to take note of. The authors also describe a methodological workflow to produce SARS-CoV-2 in human cells that highlights a buffer-exchange step to remove potentially interfering human cytokines/debris, and which will be useful for other researchers.

      Overall, the manuscript makes a clear and important contribution to the SARS-CoV-2 field and will be of interest to active researchers who are studying this virus experimentally.