4,993 Matching Annotations
  1. Nov 2020
    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors start by taking a previously published model of the plant circadian clock and implement five changes: 1) updating the network topology to reflect some recent experimental findings, 2) make a spatial model loosely based on a seedling template 3) introduce coupling between cells based on shared levels of CCA1/LHY 4) randomly rescale time in each cell to induce inter-cell differences in period, 5) include a light sensitivity that depends on the region considered.

      For a certain configuration of light sensitivities/intensities, the different periods of oscillations in each seedling region roughly match that of experiments. With a sufficiently high coupling between cells, the system can also generate spatial waves, which are also observed in the experimental system.

      With pulsed light inputs the spatial pattern is still produced. The authors then investigate the robustness to environmental noise by generating stochastic light signals and show that the global synchrony, as measured with a synchronisation index, increases with cell-to-cell coupling strength. The paper is overall well-written, and the background and details of the analysis are well presented.

      Major comments:

      For the first part of paper, the output of the model is certainly the focus. There is virtually no discussion of the inferred parameters and how much confidence the authors have in their values.

      My main issue with the paper is about the section with noisy light signals, which is included in the title and is ultimately one of the main themes of the article.

      Specifically, on line 224:

      "This decrease in cell-to-cell variation revealed an underlying spatial structure (Fig 4D, middle and right, and S13 Fig), comparable to that observed under idealized LD cycles (Fig 4B, middle and right, and S12 Fig)."

      Firstly, I don't feel these conclusions match with the data presented. Comparing figure 4D middle and right with figure 4B middle and right shows a clear and pronounced loss in spatial structure. In its current form, this statement has to change, but I believe there are at least two other major issues with this figure:

      1) The figure is clearly designed to invite a comparison between the noise-free light cycles on the left with the noisy cycles on the right. However, due to how the noisy light is simulated, the variance of light signal increases AND the average intensity of light decreases by 50%. When comparing the left and the right, we therefore don't know whether the changes are due to differences in the average signal or differences from the stochasticity. I think the authors should simulate a noisy light signal with the same mean intensity level as the deterministic signal. . 2) The noise model for the light doesn't seem realistic. On line 484 is says:

      "We made the simplifying assumption that each cell is exposed to an independent noisy LD cycle due to their unique positions in the environment. LD cycles were input to the molecular model through the parameter L".

      In fact, this could be considered as an incredibly complex signal, because for 800 cells it means drawing 800 random light signals. The implication is that two adjacent cells receive statistically independent light signals. Depending on chance, one cell might receive tropical levels of light while its neighbour experiences a cloudy day. This affects the interpretation and conclusions from figures 4 and 5. I propose two different ways of improving the simulation of the noisy light signal:

      a) In one extreme case, all cells receive the same noisy light signal, and the other extreme, they all receive independent signals. You could consider a mixture model of light signals, where each cell receives \lambda L_global(t) + (1-\lambda) L_individual(t), where L_global(t) is a global light signal that is shared by all cells and L_individual(t) is a light signal unique to an individual cell. The mixing parameter \lambda controls how similar the light signal is between cells

      b) Clearly the light signal will differ depending on the region, but there will be some spatial correlation. You could also consider methods of simulating light such that neighbouring cells receive correlated signals, although this might be difficult.

      Assuming that the problem with the mean signal is corrected, do you expect the average spatial pattern to be the same between figure 4 B and D with no coupling (J=0) (although an increase in the variance between cells)? Perhaps not (owing to nonlinearities in the system), but it would be interesting to comment.

      The different periods in the different regions of the seedling are caused by differences in light sensitivity, which the authors claim is justified from refs 12-15. An alternative hypothesis is the that biochemical parameters such as degradation rates are different between regions. This is briefly alluded to in the introduction, but I think it would be interesting to discuss further. What would be the pros and cons of the two different mechanisms?

      I understand that the authors used a pre-existing model, but I must say that I find the way that light is incorporated into the model a bit confusing.

      On line 345 it says: "L(t) represents the input light signal (L = 0, lights off; L > 0, lights on) and D(t) denotes a corresponding darkness input signal (D = 1, lights off; D = 0, lights on)."

      Surely the only thing that matters biophysically is the number of photons hitting the plant? Could you explain why the model needs to have a separate "darkness signal" compared to just a single light signal?

      In the model, the light intensity changes depending on the region. It might make more sense for interpretability if instead there is an additional light-sensitivity coefficient that depends on the region, because at the moment I'm not sure what units L(t) is supposed to take.

      Minor comments

      Could you more explicitly describe a possible molecular mechanism through which the coupling acts?

      In Figure 1C it looks like different genes are coupling to different genes, so you may need to rearrange it.

      Line 103: "We found that regional differences persist even under LD cycles, but cell to-cell minimized differences between neighbor cells." Missing word.

      Line 124: "The coupling strength was set to 2 (Methods)." This is meaningless in isolation, so it would be better to briefly explain what the coupling parameter is before mentioning its value.

      Through the text, I think De Caluwe should be corrected to De Caluwé

      Typo line 493

      Code and data are not made available.

      Significance

      The authors motivate the paper by highlighting that their proposed model improves on phase-based models in that it describes underlying molecular mechanisms.

      From an experimental side, it's interesting that a model is developed and directly compared with measured spatio-temporal waves of gene expression. From a theoretical side, the authors address questions relating to oscillations, multi-scale modelling and noise robustness that also generalise to other systems. I therefore expect that both experimental and theoretical audiences will be interested in the results.

      There are many possible additions and modifications that could be made to the model, and so the model and analysis could provide a platform for future research. However, I can't comment on whether there are similar pre-existing models of the plant circadian clock that contain both a molecular description of the circadian clock as well as a spatial scale.

      REFEREE'S CROSS-COMMENTING

      Comments on Review #1:

      The time is rescaled in each cell, meaning that each cell has a unique period, but the dynamics remain deterministic and hence the peak-to-peak times will be exactly the same for each cell. I imagine this isn't completely consistent with single-cell data (if available), where peak-to-peak times are very likely to be variable due to noisy gene expression. In a future paper it would be interesting to analyse the system using stochastic differential equations.

      Comments on Review #2:

      I agree on the following two points:

      1) It would add value to discuss whether the different ranking of light sensitivities by organ matches any available experimental data.

      2) As the Reviewers point out, there are many possibilities for testing the robustness of the system to light clues, including varying the length of the day. Although outside of the scope of this paper, I wonder if it's possible to find data from a light sensor measuring light intensity across an entire year? Plugging such data into the model and measuring how the amplitude and period changes would be really interesting, in my opinion.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The manuscript presents an improved model of the circadian clock network that accounts for tissue-specific clock behavior, spatial differences in light sensitivity, and local coupling achieved through intercellular sharing of mRNA. In contrast to whole-plant or "phase-only" models, the authors' approach enables them to address the mechanism behind coupling and how the clock maintains regional synchrony in a noisy environment. Using 34 parameters to describe clock activity and applying the properties mentioned above, the authors demonstrate that their model can recapitulate the spatial waves in circadian gene expression observed and can simulate how the plant maintains local synchrony with regional differences in rhythms under noisy LD cycles. Spatial models that incorporate cell-type-specific sensitivities to environmental inputs and local coupling mechanisms will be most accurate for simulating clock activity under natural environments.

      We have the following major criticisms as follows

      1) When assigning light sensitivities in different regions of the plant, the authors assign a higher sensitivity value to the root tip (L=1.03) than they do to the other part of the root (L=0.90). We are curious why the root tip would have higher light sensitivity than the rest of the root. Is this based on experimental data (if so, please cite in this section or methods)? It seems that these L values were assigned simply to make sure they recapitulated the period differences observed in Fig. 2A. Are these values based on PhyB expression in those organs? Or perhaps based on cell density in those locations?

      2) In the discussion of the test where they set the "light inputs to be equal" in all regions to simulate the phyb-9 mutant, could the authors please clarify whether that means they set the L light sensitivity value equal in all regions? a. If they are referring to setting the L value equal to all regions, we suggest that this discussion be moved to the section about different light sensitivities instead of the local sharing of mRNA section. b. Additionally, is it possible to set the light sensitivity to zero for all parts of the plant? We think this would be more suitable to simulate the phyb-9 mutant phenotype.

      3) Based on the recent Chen et al. (2020) paper showing ELF4 long-distance movement, we think it would be of great interest for the authors to model ELF4 protein synthesis/translation as the coupling factor, in addition to the modeling using CCA1/LHY mRNA sharing. We understand you may be saving this analysis for a future modeling paper, but this addition to the paper could increase the impact of this paper.

      4) This model is able to simulate circadian rhythms under 12:12 LD cycles, which represents two days of the year-the equinoxes. We are curious if the model can simulate rhythms under short days and long days as well. We understand this analysis may be outside the scope of this paper and may require changing the values of the 34 parameters used but think it could be a useful addition here or in future work.

      And minor criticisms as follows

      1) In the first paragraph of the results section, it would be helpful for the authors to reference Table S1 when they mention the 34 parameters used to model oscillator function

      2) In the first paragraph of the section titled "Local flexibility persists under idealized and noisy LD cycles", it would be helpful for the authors to reference S12 Fig after the last sentence that starts "However, ELF4/LUX appeared more synchronized..."

      3) In the first paragraph of the section titled "Cell-to-cell coupling maintains global communication under noisy light-dark cycles", the authors refer to a "Table 1" but I think they mean to refer to Table S1"

      4) In Fig. 1, panel C is described as demonstrating the cell-to-cell coupling through the "level of CCA1/LHY". This phrasing is vague and we think could be improved to the "mRNA level of CCA1/LHY".

      Significance

      This work would be broadly interesting to other researchers studying cell-to-cell signaling and coupling of circadian rhythms in plants and other species where spatial waves of gene expression have been observed (i.e., mice and humans). Additionally, the computational modeling aspect of this work was easily interpretable for someone outside this expertise. Our expertise lies in plant circadian biology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      A. Summary:

      In this modeling study, the authors devised a multicellular model to investigate how circadian clocks in different parts (organs) of plants coordinate their timing. The model uses a plausible mechanism to explain how having a different sensitivity to light leads to different phase and period of circadian clock, which is observed in different plant organs. The model allows for entrainment in Light-Dark (LD) cycles and then a release in always-light (LL) environments. The model disentangles numerous factors that have confounded previous experiments. In one instance, the authors assigned different light sensitivities to the different organs (e.g., root tip, hypocotyl, etc.) which unambiguously show that this one element alone - spatially differing sensitivity to light - is sufficient for recapitulating experimentally observed differences in periods and phases between plant organs. The model also recapitulates the spatial waves of gene expression within and between organs that experimentalists reported. At the sub-tissue level, the model-produced waves have similar patterns as the experimentally observed waves. This confirmation further validates the model. By having the cells share clock mRNA, from any clock component genes, showed the same, experimentally observed spatial dynamics. The main conclusion of the study is that regional differences (e.g., between different organs) in light senilities, when combined with cell-to-cell sharing of clock-gene mRNAs, enables a robust, yet flexible, circadian timing under noisy environmental cycles.

      B. Specific points:

      1.Lines 125-127: "To simulate the variability observed in single cell clock rhythms, we multiplied the level of each mRNA and protein by a time scaling parameter that was randomly selected from a normal distribution." - Why not add a white (Gaussian) noise term to these equations? How does multiplying by a random variable (for rescaling time) different from my proposal? Some explanation should be given in the text here.

      2.Does the spatial network model simplify calculations by assuming separations of timescales (e.g., for equilibration in concentrations of mRNAs that diffuse between cells)? If so, it would be good to spell these out in the beginning of the Results section (where the model is described).

      3.Lines 161-162: "....in a phase only model by local...." should be "....in a phase model only by local...."

      4.Lines 188-190: The authors observed that qualitatively similar/indistinguishable behaviors arose regardless of which elements are varied (e.g., global versus local cell-cell coupling, setting light input to be equal in all regions of the seedling, etc.). Then they claim here that "...these results show that the assumptions of local cell-to-cell coupling and differential light sensitivity between regions are the key aspects of our model that allow a match to experimental data." - I don't see how this follows from the observation almost any of the variations lead to the same behaviors in this section (spatial waves). Show the reasoning in the text here.

      5.Pgs. 9 -10: Section on "Cell-to-cell coupling maintains global coordination under noisy light-dark cycles": The simulation results rigorously support the authors' main conclusion here, which is that local cell-to-cell coupling allows for global coordination under noisy LD cycles. But I'm missing an intuitive explanation (or just any explanation) for why this is. At the end of this section, the authors should provide some intuition or qualitative explanation for the observations that they produced using their model in this section.

      6.Lines 261-262: Replace the present tenses with past tenses.

      7.Is the main idea that cell-to-cell coupling allows for averaging of fluctuations, between organs or cells within the same organ, while allowing for coordination of the average quantities? Is this responsible for both the flexibility and robustness observed under noisy environmental cycles?

      8.Line 304: Is it really true that the mammalian circadian rhythm is centralized? Don't some parts of our bodies have different circadian clock (e.g., slight differences in phase) than some other parts of our bodies?

      Significance

      Overall assessment:

      I enthusiastically recommend this work for publication after the authors address my comments below (please see "Specific points").

      The model's main strength is that the authors could vary each ingredient separately - light sensitivity of each cell/organ, which gene's mRNA diffuses between cells, cellular noise, local versus global cell-cell coupling, etc. Afterwards, the authors could determine which of these variations produces which experimentally observed behaviors. Another strength of the model is that it can reproduce not just one, but numerous, experimentally observed behaviors that are important for understanding circadian clocks in plants. Thus, the model is grounded in experimental truth and produces experimentally observed results. Crucially, since the authors could vary every single element in the model independently of the other elements, the authors are able to provide plausible explanations for why the experiments produced the results that they did (experimentally, a number of confounding factors prevented one from pinpointing to which element produced which observation).

      Another strength of the model is also extendable, by other researchers to investigate other plant physiologies in the future (e.g., circadian clock's influence on cell division). The authors highlight these future uses in the discussion section. Therefore, I believe that this work will be valuable to plant biologists, non-plant biologists who are interested in circadian clocks, and systems biologists in general.

      The manuscript is also well written and relatively easy to follow, even for non-plant biologists like myself.

      REFEREE'S CROSS-COMMENTING

      Comment on Reviewer #2:

      I agree with his/her major criticism #3 (ELF4 long-distance movement). I find this to be a reasonable request. Fulfilling it would increase the paper's impact.

      Comment on Reviewer #3:

      The reviewer's point (1) asks for a reasonable request. Regarding his/her point (2): This is also reasonable. I'd recommend his/her suggestion (a). In the end, I'd be interested to see how the authors respond to this (what function they choose to let adjacent cells be subjected to some correlated light-input intensity. I'd be happy with something simple such as < intensity > + noise, where <intensity> is a deterministic term that, for example, decreases exponentially as one moves away from some central cell. Basically, I'd let the authors decide how to implement this and accept their current implementation - no correlation in light-intensity between adjacent cells - as an extreme scenario, as this reviewer points out.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes (Nr1d1, Cry1, Bmal1), and performed model selection/fitting to explain the observed mRNA distributions. They decomposed the mRNA variability into distinct sources, and showed that intrinsic noise (transcription burst) dominates the variance. Therefore, looking at transcript counts may not be feasible to estimate single-cell circadian phase. However, the study is quite descriptive and ends up being a bit dissatisfying, so if the authors could improve this aspect by perhaps analyzing a mechanism on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho), it would help quite a bit in this regard. The model selection/fitting itself was not really sufficient to compensate for this, as it stands .

      We thank the reviewer for appreciating the new smFISH data, the analyses performed, and the consequences regarding phase inference from single cell snapshots.

      The reviewer suggests “perhaps analyzing a mechanism on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho)”, and we have thus added a new Results paragraph (lines 281-316) and two new Supp Figures 13 and 14 to directly address this point.

      Specifically, we have added a dynamic, stochastic model of the circadian clock in order to add mechanistic insight into the parameters of the preferred model M4. Concerning \rho, in the initial manuscript we suggested that the correlations of cell-specific burst sizes (described by the parameter \rho) in the preferred model M4 could result from the underlying network topology. To substantiate this claim, we have now added an analysis of a stochastic model of the clock that includes gene-gene interaction amongst the core-clock genes. The core-clock network involves variables (such as protein levels), parameters (such as mRNA/ protein half-lives) and additional genes (such as Clock) that are not directly measurable in our experiments; and thus offering a detailed mechanistic mathematical model for our data is therefore not realistic. We therefore developed a simplified mathematical model for the three measured genes to explore the underlying mechanisms that could control the parameter \rho, as the referee suggests. As a starting point, we used the circadian clock gene network topology for Nr1d1, Cry1 and Bmal1 as modelled in Relógio et al. (Relógio et al., 2011) (see new Supplementary Material). To keep the model close to the inference framework, we used oscillatory functions for the burst frequency while the transcription rate (and hence the burst size) for each gene is affected by the protein levels of the other genes in the network. Using stochastic simulations we show that, for particular configurations of feedback where the negative repression of Nr1d1 by CRY1 is high, the network can generate positive mRNA correlation between Bmal1/Cry1 mRNA and negative correlation between Nr1d1/Cry1mRNA, as observed in our data (Figure 2C). Furthermore, using the same inference framework as for our data on the simulated mRNA distributions, the obtained \rho is positive for Bmal1/Cry1 and negative for Nr1d1/Cry1, which was also found for our data (Figure 3C). Even though the model is clearly a simplified representation of the clock, these simulations give credence to the scenario that the \rho parameter obtained from the data is a signature of the underlying network topology.

      While the emphasis of the paper is certainly on parameter inference of the single-cell RNA FISH data, we believe the addition of this dynamic model provides more mechanistic insight into the results of the model fitting and hence significantly more depth to the article.

      \*Specific comments:** *

      1.It is hard to distinguish the RNA FISH signals (Figure 1A, 2B). It is probably technically challenging as the mRNAs are of low abundance. I think it may help if they adjust the contrast for the cytoplasm stain or just delineate the cell boundaries.

      Thank you for pointing this out, and we agree that our rendering of the FISH images was not optimal and have now significantly improved it (see new Figure 1A and 2B). Considering the other reviewers’ comments related to the images, we have now 1) added the cell contours as requested; 2) use red/green for the smFISH signal in the pairs of genes; 3) we have improved the contrast to make it easier to distinguish the RNA FISH signals.

      2.In Figure 2C, the authors showed gene-pair correlations with cells of all sizes. Could the authors do a size-dependent extrinsic-noise filtering (Padovan-Merhar, Dev. Cell, 2015; Hansen et al., 2018, Cell Systems) to better dissect the correlations?

      We used negative binomial distributions to directly model the number of mRNA in the cells, which is a natural choice given that the raw smFISH are integer counts. The model incorporates cell size dependencies in a unified framework, which predicts the joint distribution of raw counts, which is why we showed raw counts in the main figure. That being said, as the referee suggests, it can be useful for exploratory purposes to see the relationship between the measured genes while regressing out the contribution of cell area, and we have now added this analysis as Supp Figure 9. On line 156-161 we write:

      “To also estimate the correlation between genes while accounting for cell area, we regressed out the area for each gene and recalculated the correlation coefficients [37,38]. Since all genes are positively correlated with area (Fig. 2A), this processing shifted the correlations for both pairs of genes. Specifically, the correlation coefficients for the area-filtered mRNA counts decreased but remained positive for Bmal1/Cry1 and became more negative for Nr1d1/Cry1(Supp Figure 9).”

      3.For fitting model M3, as the authors pointed out, there are many local minima. Is the fitting score truly sufficient to eliminate the possibility for partial synchrony especially considering that the authors didn't show how effective the Dex treatment was to synchronize the circadian phase?

      Thank you for this comment. In fact, we didn't mean to fully eliminate the possibility of imperfect synchronization, but have tried our best to address it both experimentally and with modeling.

      Experimentally, in addition to the Dex treatment, we also compared with a condition in which we entrained the cells using temperature cycles, which is a standard in the field to achieve the best synchronization. We obtained a fold change of 2.1, which was in the range of previous studies (Saini, et al, 2012) and was slightly higher than with Dex synchronisation (1.6). Given that the improvement was not high and that it was important for us to study the system under free-running conditions and not in an entrained state (i.e. phase locking, which distorts the free dynamics and noise characteristics of the oscillator), we used the Dex protocol.

      Model 3 was used as a computational approach to correct for the individual phases. In addition to the difficult optimisation landscape, the challenge with model M3 also resides in the difficulty of estimating an individual phase for each cell, as the two mRNA counts measured in each cell do not contain sufficient phase information. This could potentially be resolved by either measuring more genes simultaneously, but is, however, beyond the scope of the present manuscript. We have added discussion on this to the text on lines 244-248:

      “Thus, it was apparently difficult to use model M3 to correct the individual phase for each cell, likely due to the fact that the two mRNA counts measured in each cell do not contain sufficient phase information, and that the global optimisation problem contains many local minima. This could potentially be improved by measuring more genes simultaneously.”

      We have also added a new Results section (lines 305-316) and Supp Figure 14 to show that imperfect synchrony alone cannot explain the correlation structure observed in our data. Indeed, if two genes have a similarly phased oscillation, the expression of the two genes will be positively correlated (as shown in the new Supp Figure 14). Similarly, when the oscillations are in anti-phase, negative correlations will be found. Given that Nr1d1 and Cry1 are closer in phase than Bmal1 and Cry1, one would expect that the correlation between Nr1d1 and Cry1 (once accounting for area) would be more positive than for Bmal1 and Cry1, which was not found in the data (area-corrected correlations shown in Supp Figure 9). It therefore seems unlikely that the observed correlations could be caused by imperfect synchrony alone. Together with our simulations of the gene network (described above), we therefore argue that gene-gene interactions are a more plausible mechanistic explanation of the correlations observed in our measured bivariate mRNA distributions.

      4.Regarding model M4, the authors added a cell-specific noise term without specifying the contributing factors. Typically adding degrees of freedom should improve fitting and make it easier for a model to fit, why not in this case? Can the authors provide some explanations/mechanisms.

      We believe there has been a misunderstanding regarding model M4. By adding parameters, model M4 is indeed easier to fit. There is even a problem of overfitting whereby the burst frequency becomes unrealistically high and the model effectively fits a Poisson distribution to each individual cell. To avoid this, we lock the burst frequency values to the posterior mean values from model M2. After describing model M4, we write (lines 260-265):

      “When all parameters are free, we noticed that the burst frequency can become unrealistically high due to a tendency to overfit to individual cells, and we therefore locked the burst frequency to the posterior mean values from model M2. The PSIS-LOO scores overall favoured model M4 (Fig. 3B), and the predicted joint probability density shows good similarity to the observed data (Fig. 3D) (all time points shown in Supp figure 11).”

      Regarding the above comment in the reviewer’s summary on contributing factors of model M4 we added a simple dynamical model that attempts to explain at least one possible mechanism of generating correlations in cell-specific bursting parameters (see above).

      5.The authors should include the number (range) of cells analyzed in the figure legends.

      We have now added the number of cells used at each time point to the legend of Figure 1D. To respond to Reviewer #2 we have also added details on the number of smFISH replicates used at each time point. The number of cells for each replicate is shown in Supp Figures 2-5.

      Reviewer #1 (Significance (Required)):

      Overall, we felt conflicted about the manuscript. On one hand, the authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes. On the other hand, the manuscript was a bit dissatisfying/descriptive. If the authors could provide and analyze some sort of mechanisms on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho) it should help improve the manuscript.

      We thank the review for the suggestion to expand on the mechanistic interpretation, which we have followed. In addition, we would like to emphasise that a similar smFISH analysis of the core circadian oscillator has never been done, and we believe our data represents a significant contribution to the field. Moreover, our quite generic probabilistic inference framework for smFISH using mixture models to describe intrinsic (transcriptional bursting) and extrinsic fluctuations is also novel and the code provided (written using the Stan probabilistic programming language) might find a wide applicability.

      Concerning the mechanistic description, as described above, we added a stochastic, dynamic model of gene expression and propose that gene-gene interactions within the core-clock network topology represent a plausible mechanism for generating correlated burst parameters between genes, which are a feature of the preferred model M4 found during inference. We additionally added an explanatory figure to argue that, given the phase relationship between genes, imperfect synchronisation alone cannot explain the observed correlations that we observe between the pairs of genes. Together, this analysis provides more mechanistic insight into the underlying factors controlling the gene-gene relationships in our measured bivariate mRNA distributions.

      \*Referees cross-commenting** *

      I agree with Reviewer #3 regarding expanding the discussion to include the Shah & Tyagi and Raj et al citations on buffering. However caution should be exercised regarding ref 26 as it is quite controversial and subsequent analyses came to different conclusions (PMID: 30359620 and 30243562). The general consensus is that nuclear buffering of transcript noise (proposed in ref 26) is not a general phenomenon (ref 27 is specific to the calcium response pathway). In fact, the presence and evolution of specific pathways to buffer transcriptional noise, such as protein-protein mechanisms (Shah & Tyagi) or extended half-life proteins (Raj et al. and others), argues that transcript fluctuations are not probably buffered in general.

      Following the suggestion of Reviewer #3, we have expanded the Discussion to include the references cited (Shah & Tyagi, Raj and others).

      Previous work from our lab is also nuancing the conclusions from references 26 and 27. Specifically, buffering effects are expected to be highly gene-specific (3’UTR), and in fact we have not seen those with our unstable construct during live-cell imaging (Suter et al., 2011; Zoller et al., 2015). We have also added text in order to explicitly state that subsequent papers have nuanced the general claims in references 26 and 27. In the text we write (lines 335-342):

      “One explanation for the low intrinsic fluctuation in these studies is that transcriptional fluctuations are filtered by nuclear retention, though other reports suggest that Fano factors (variance/mean, a measure of overdispersion compared to the Poisson distribution) can be even larger in the cytoplasm than in the nucleus [38]. In the cells used here, the strong signature of transcriptional bursting and high intrinsic noise is consistent with live imaging of a Bmal1transcriptional reporter in the same cell line under similar growth conditions, where intrinsic noise was estimated to be 4-times larger than extrinsic noise [23].”.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      \*Summary:** *

      The authors study experimentally and computationally the dynamic transcription of circadian clock genes over time in individual cells with single molecule RNA-FISH with the aim to understand how different noise sources contribute to single cell transcription variability and basic functions of circadian clocks. The authors integrate experiments with computational modeling to understand biology.

      \*Major comments:** *

      This study has some major limitations that need to be addressed to test the model usefulness, to understand noise sources and to gain biological insights into circadian clocks.

      We thank this reviewer for the constructive feedback which enabled us to significantly strengthen the revised manuscript.

      The limitations are on the experiments, the computational implementation of the modeling and the integration of experiments with models.

      Although the experimental datasets contain several hundred cells per time point for multiple time points, only a single replica experiment is presented. From the presented data it is not clear how reproducible these temporal patterns are and if indeed differences between timepoints can be resolved if multiple biological replica experiments have been analyzed. To address this point at least three biological experiments needs to be presented and analyzed for each of the genes. Plotting the SEM on the means in figure 1B is misleading because several hundred cells have been measured which automatically makes the error small. The SEM just describes how well we can determine the mean from a distribution. Instead a mean and std from the biological replicas need to be plotted to show how experimental variability in experiments is resulting in the described expression pattern. This is similar to RNA-seq data or RT-PCR from multiple replica.

      We certainly agree that demonstrating reproducibility is important. Note that our smFISH data is from three independent cell culture dishes and microscopy slides, which included independent cell synchronization. This was described in the Methods but we agree that the data presentation was not showing the individual replicas, which we have now added. In Figure 1B, we now show the mean of each replicate for each time point. While the reviewer suggested displaying the mean and standard deviation across replicates, we show all data points at each time point to make it even more transparent. The mRNA distribution of each replicate is also shown in Supp Figures 2-5, together with individual quantification of mean, coefficient of variation and number of cells.

      In addition, to further demonstrate the reproducibility of the temporal patterns we have performed an additional independent experiment on four time points. This experiment shows that the oscillatory patterns for Nr1d1 and Cry1are clearly significant and reproducible (new Supp Figure 7). The combination of the replicates shown for the main experiment (Supp Figures 2-5) and the new replicate experiment (Supp Figure 7) shows that the oscillatory temporal patterns for the mean mRNA levels are robust and reproducible, and in fact similar as those found in bulk analyses (Ukai-Tadenuma et al., 2011; Hughes et al., 2009), which is expected.

      It is also not clear how good the cell segmentation works and how does cell segmentation influence the analysis. In figure 1A show the segmentation of the cell boundary together with the membrane stain.

      Thanks to this and other reviewers’ comments, we have now significantly improved the presentation of the FISH images. We have now 1) added the cell contours as requested; 2) used red/green for the smFISH signal in the pairs of genes; 3) we have improved the contrast to make it easier to distinguish the RNA FISH signals.

      We have also added Supp Figure 1 to show that the cell segmentation we used is reliable. In fact, as we had described, we used the sum Z-stack projections of the red channel (Wu et al., 2018), which we found provides the most accurate cell segmentation. We now show in Supp Figure 1 that the obtained segmentation shows convincing agreement with the cell autofluorescence .

      The authors use the RNA mean and RNA-FISH distributions and combine this data to build and compare different models. How do you know that the given data fulfils the central limit so that a model describing the mean is an adequate approach? To test this point, the authors should show through subsampling from the data and the model that indeed their data sets have enough cells to fulfil the central limit theorem.

      This comment reflects a misunderstanding of our approach, which we now try to better explain. In our inference framework we use a negative binomial (NB) distribution (and mixtures of NBs) to model the full distribution of mRNA counts, and our approach is therefore not based exclusively on the mean of the distribution. The estimation of model parameters and comparison of models is performed using the PSIS-LOO optimisation procedure (see below). The mixture model of NB binomials makes a few assumptions which we had clearly stated. In fact it captures both bursty transcription (in the limit of short bursts as is biologically plausible, which yields the NB distribution), and cell-to-cell variability (extrinsic noise) captured by the mixture. The suitability of the NB to model bursty transcription is established (Raj et al., 2006), and it is parameterized by a mean and a dispersion coefficient, such that the CV of the distribution is the inverse of the burst frequency (Zoller et al., 2015). Therefore the mean is indeed an important parameter of the model, but we do not see the relationship with the CLT. The used probabilistic inference (PSIS-LOO: Pareto-Smoothed Importance Sampling Leave-One-Out, Vehtari et al. 2017, see below) is established and state-of-the-art for selecting models of the appropriate complexity and we are not aware of a similar previous quantitative model for smFISH analysis.

      We have now added significantly more explanations both on the general approach as well as the methodological details in a fully-revised Methods section to avoid further misunderstanding.

      A strength of the manuscript is that several competing and biologically meaningful models have been generated. However, the manuscript lacks rigor in terms of how fitting and model selection is performed. It is not clear how good the models fit the data. To address this point, the authors should visually compare the model fits to the data and plot their fit errors as a function of model complexity.

      We fully agree that comparing different models using a model selection approach is a powerful methodology, in fact it is arguably the most systematic way to approach modeling problems in quantitative biology. Model selection is an active research area and there have been significant developments recently. Here, we used a state-of-the-art and established Bayesian approach (PSIS-LOO: Pareto-Smoothed Importance Sampling Leave-One-Out, Vehtari et al. 2017), which is certainly rigorous and more objective than visual comparison. The PSIS-LOO is conceptually similar to other approaches of model performance such as AIC or WAIC, and the entire field of model selection aims at establishing rigorous methods to assess the tradeoff between fit errors and model complexity. In PSIS-LOO, this is done by using pareto-smoothed importance sampling to estimate the expected log pointwise predictive density for a new dataset using leave-one-out cross-validation. The PSIS-LOO is the currently recommended metric for measuring model performance in Bayesian analysis (Vehtari et al., 2017) and is considered superior to other approaches such as computations of Bayes factors since it is less sensitive to model priors (Gelman et al. 2013). The performance of the models as measured with PSIS-LOO is shown in Figure 3B. As already mentioned, we have added further details as to how the fitting and model selection is performed in a revised Methods section. We agree that visual comparison is useful to gain intuition and this is why we showed the bivariate distributions in Figure 3D and in Supp Figure 11.

      Regarding the comment on “fit error”, note also that we probabilistically model the full mRNA distribution for each gene. In each cell, there is a likelihood score that measures the likelihood of observing the measured mRNA count given the modelled probability distribution. As our approach is based on this likelihood, the notion of “fitting error” needs to be replaced by the log likelihood (‘fitting error’ is mathematically equivalent to a log-likelihood when the noise model is Gaussian, which is not the case here).

      Another limitation is that the models have not been validated for example by using them to make predictions. One type of prediction could be to fit the model to one biological replica and then predict the other replica (cross validation). Another prediction would be to take the distribution fitted to the experimental data and then compare the model mean to the experimental mean.

      Thank you for this comment. As explained above, we used the state-of-the-art PSIS-LOO to measure the predictive performance of the models, which approximates the result of leave-one-out cross-validation using the full data set. To further assess the predictive capabilities of the model, we have now also added a “leave-replicate-out” cross-validation, as the reviewer suggests (new Supp Figure 12). The aim of our “leave-replicate-out” cross-validation was to test how well the predictions of each model generalise to independent cells that are not in the training set. To do this, we trained each model while omitting the data from one gene on a test slide. We then calculated the likelihood score of the test slide using the parameters from the training set, and repeated this for all slides. Similarly to the PSIS-LOO, the results of the leave-replicate-out cross-validation convincingly show that model M4 has the highest predictive performance. This is now described in the updated text on lines 265-271.

      The results from fitting and prediction should be plotted as a function of model complexity. This kind of analysis will illustrate how model complexity is supported by the data.

      As already mentioned, we used state-of-the-art algorithms to analyze prediction vs. complexity. With the above addition, we now have two methods of calculating the predictive performance of each model: the approximate leave-one-out score as measured with PSIS-LOO and the leave-replicate-out cross-validation. For each model, the PSIS-LOO score is plotted in Figure 3B and the leave-replicate-out cross-validation score is shown in Supp Figure 12.

      In the method section on models, a biological motivation must be presented to justify the different model assumption.

      Thank you for pointing out that the biological justification of the models needed to be expanded. In addition to the improved justifications already provided in the Results section, we have now updated the Methods section such that a biological motivation is included for each model.

      How do the models that fit the distributions describe the mean?

      As explained above, the inference is performed on the entire distributions, using a family of distributions (mixtures of NBs) which are parameterized in a biologically relevant manner (transcriptional bursting + extrinsic noise). The mean and variance of the distribution are now described on lines 585-586 in addition to Figure 3A.

      It is necessary to list model parameters for each of the models, their description, their parameter values, their parameter uncertainty and units of each parameter.

      Thank you, this has now been added as Supplementary Tables 2-5.

      It is not clear to me how the joint probability in figures 2,4, S2 and S4 have been used to fit the model.

      Again, the joint distributions are modeled using mixtures of NBs and the inference is performed on the entire dataset at once using a log-likelihood approach. This uses all the data at once, and it is embedded in a Bayesian model selection method. The way that the joint probability is used is now clarified in the revised Methods section and in the Results section (lines 208-214):

      “For both models M1 and M2, the likelihood of observing the data given the parameters of the model is evaluated using the model-specific NB distribution and the mRNA counts for both genes in each cell. This is performed for both Bmal1/Cry1 and Nr1d1/Cry1 pairs across all time points, and this likelihood is combined with model priors to define the posterior parameter distribution for each model (Methods). We applied Hamiltonian Monte Carlo sampling within the STAN probabilistic programming language to sample the posterior distribution and infer model parameters 40.”

      How do the models make sense in the context of the fact that human genes exist as a diploids?

      This is a good point, although note though that the 3T3 cells are from mice and not humans. 3T3 cells are tetraploid, and it turns out that under the justified assumption that the bursts are short (Zoller et al., 2015; Suter et al., 2011), the number of alleles rescales the burst frequency, i.e. the effective (observed) burst frequency equals the number of alleles times the burst frequency per allele, but it does not change the shape of the distributions. On line 580-582 we have now written: “Since 3T3 cells are tetraploid, and, again assuming that the bursts are short, the inferred burst frequency for tetraploid cells will be approximately four times that of a single allele.”

      The variance decomposition is shortly described but no results are presented to show how this is done. This should be better explained.

      The variance decomposition we used is not a new result; in fact, we used the analytical results of Bowsher, C. G. & Swain, P. S. “Identifying sources of variation and the flow of information in biochemical networks” (PNAS, 2012). The mathematical proofs of the formula we use are contained within that reference; however, we have re-written this section to make it clearer to the reader (lines 688-718).

      \*Minor comments:** *

      In figure 3A, it is not clear to me what these different plots relate to the models. It is also not clear what are equations that describe each model.

      The Methods section has now been improved to show the full data-generating mechanism for each model, and each model has its own section title to make it easier to find. We have also improved the legend for Figure 3 to make the relationship to each model clearer.

      The legends in figure 3 are not very informative. More details need to be presented to understand this figure.

      Thank you for pointing this out, and we have now re-written the figure legend for Figure 3 to make the figure clearer.

      Reviewer #2 (Significance (Required)):

      This is an interesting and important topic with the potential to have general implication of how to model periodic single cell gene expression data and eventually better understand circadian clocks. This study will expand on other modeling studies of circadian clocks and has the potential to advance the field (PMCID: PMC7229691). I personally have done similar analysis and experiments in another system and biological context which has demonstrated the power of this approach if implemented rigorously. I am not an expert in circadian clocks in human cells.

      We thank the reviewer for appreciating the implications for the circadian and single cell gene expression community. Note that to our knowledge, modeling smFISH counts using mixtures of negative binomials combined with Bayesian model selection has not been done. It is both highly relevant biologically (combines intrinsic and extrinsic fluctuations in a rigorous way), general and its applicability extends far beyond the circadian oscillator. Therefore, this approach for quantitative smFISH data analysis also fills an important methodological gap.

      \*Referees Cross commenting** *

      Reviewer #1:

      I agree with the assessment that model fitting and model selection was not sufficient. But I disagreed that the data is enough. Although many cells and time points are analyzed, there is no evidence of how reproducible each mRNA distribution can be measured at each time point. I think reproducibility is key and will also help with the model fitting and identification.

      Regarding the point on reproducibility, we have made the following four changes:

      1. We have added an independent 4 time-point experiment to show that the oscillatory patterns of the distributions are reproducible (Supp Figure 7).
      2. In Figure 1 we now also show the mean of each replicate for the main experiment (Figure 1B).
      3. We also show the mRNA distributions of each replicate in Supp Figures 2-5.
      4. We have added the “leave-replicate-out” cross-validation to show that that the model performance of the preferred model generalises to independent slides that were not included in training (Supp Figure 12). In responding to Reviewer #1 regarding the modeling, we have now also added a simplified dynamical model of circadian clock expression to add mechanistic insight into our proposed models. Overall, we have significantly expanded the description of the model selection approaches to help readers who are less familiar with Bayesian model selection methods.

      Reviewer #3:

      Regarding the red background, my understanding is that this comes from the probe hybridization. This is maybe because the probe concentration has not been optimized or the number of probes per gene is low and the signal to noise is not so good.Or it could be auto fluorescent background. In this case a different fluorophore needs to be used to avoid this problem.

      Thank you for those comments, and we agree with all reviewers that the presentation of the images needed to be improved. It turned out that in Figure 1, we had shown the cell mask in red so it is clearly not related to probe concentration or autofluorescence. We have now removed the cell mask channel from the main images which allows highlighting better the smFISH signals. All smFISH images for Figures 1 and 2 have been much improved, and we’ve added a new Supp Figure 1 to show the performance of our cell segmentation.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this paper Nicholas et al image mRNAs encoding the key controllers of circadian rhythms, Rev-erba, Cry and Bmal1 in single cells over time. It was shown earlier that single cells exhibit circadian rhythms using reporter genes. A large number of studies have shown that transcription is an inherently stochastic process, which raises a question as to how single cells are able to achieve their rhythms on the face of this noise. Their results show that the number of mRNAs for the three genes exhibit the expected periodicity, but this periodicity is associated with significant cell-to-cell variation. They also explore to what extent this variability derives from stochastic transcription vs other sources of variation that are extrinsic to the genes. The results are interesting and experimental and modeling results are important (however this reviewer is not able to judge the veracity of mathematics that underlay the models).

      We thank this reviewer for appreciating the importance of our work.

      \*Some of the concerns that arose are listed below:** *

      1.The images show an annoying red background. If the red is HCS cell mask, it should be removed, and RNA presented on grey scale. This will make a better presentation. The red hue also appears in fig 2 b but here it is one of the RNA. I suggest in Fig 2 one RNA can be presented in green and the other in red, while the nuclei in blue.

      Thank you for this comment. We had indeed shown the cell mask in the red channel and now removed it. Together with the other suggestions and comments from the reviewers, we implemented the following changes: 1) added the cell contours as requested; 2) use red/green for the smFISH signal in the pairs of genes; 3) we have improved the contrast to make it easier to distinguish the RNA FISH signals. The presentation of the images is now much improved.

      2.This paper and a few others talk about the cell size contributing to the cell-to-cell variability in mRNA numbers. Where does it come from physically? One can imagine based on the cell cycle stage there could be more than two copies of then gene in a cell, which will yield more RNAs, but they say that their cells don't have much cell cycle variability. Perhaps a clearer discussion is called for rather than just being polite to other investigators.

      The referee is right that several studies observed empirically that larger cells show more mRNA molecules in smFISH experiments (Padovan et al., 2015; Kempe et al., 2015). In Padovan et al. (2015), the authors found that transcriptional burst size changes with cell volume and burst frequency with cell cycle. The main theory for transcription scaling with cell volume is to maintain transcript concentration. Using cell fusion experiments, they showed that cellular size can directly and globally affect gene expression by modulating transcription. Furthermore, they proposed that the mechanism underlying the global regulation integrates both DNA content and cellular volume to produce the appropriate amount of RNA for a cell of a given size, which is consistent with a model whereby a factor limiting for transcription is sequestered to the DNA. We used these results to propose a model whereby burst size scales with area, and we found an increase in predictive performance (compare M2 with M1 in Figure 3B). While our model selection supported the inclusion of cell area, the variance decomposition showed that the fraction of variance due to cell area ranged from 4.2% for Nr1d1 to 17.6% for Bmal1. We have now expanded the introduction to discuss this in more depth (lines 73-80) as requested.

      3.References 26 and 27 are cited for 10-80% of variance due to gene extrinsic sources. These references actually deny that there is a significant transcriptional noise in most genes. Again, stronger discussion is called for.

      As mentioned in the reply to Reviewer 1, previous work from our lab is also nuancing the conclusions from references 26 and 27. Specifically, buffering effects are expected to be highly gene-specific (3’UTR), and in fact we have not seen those with our unstable construct during live-cell imaging (Suter et al., 2011; Zoller et al., 2015). We have also added text in order to explicitly state that subsequent papers have nuanced the general claims in references 26 and 27. In the text we write (lines 335-342):

      “One explanation for the low intrinsic fluctuation in these studies is that transcriptional fluctuations are filtered by nuclear retention, though other reports suggest that Fano factors (variance/mean, a measure of overdispersion compared to the Poisson distribution) can be even larger in the cytoplasm than in the nucleus [38]. In the cells used here, the strong signature of transcriptional bursting and high intrinsic noise is consistent with live imaging of a Bmal1transcriptional reporter in the same cell line under similar growth conditions, where intrinsic noise was estimated to be 4-times larger than extrinsic noise [23].”.

      4.The results raise a very important question, whether and to what extent the transcriptional noise propagates to the next step of gene regulation and are there buffering mechanisms in the cell. For example, Raj et al, Variability in gene expression underlies incomplete penetrance, Nature 2010, show that alternative pathways serve to buffer the impact of gene expression noise. Similarly, Shah and Tyagi, Barriers to transmission of transcriptional noise in a c-fos c-jun pathway, Mol Syst Biol, 2013, show that variability in mRNA is buffered at protein level and the level of protein-protein complexes. Furthermore, they show that to the extent those vary, the chromatin intrinsically buffers against the fluctuations in numbers of transcription factors. Mention of these and other studies will enrich the paper.

      We have modified the Discussion section and now discuss these papers (and a few more). We thank the reviewer for the suggestions, which will help the reader to have a broader overview of noise buffering in gene expression and indeed enrich the paper.

      Reviewer #3 (Significance (Required)):

      Significance is high. Quality is high.

      \*Referees Cross-Commenting** *

      I agree with the comments made by other reviewers particularly about references 26 and 27. The major conclusions of reference 26 were questioned by Hansen et al 2018. At the bottom of page 7 the authors are qualifying their results in the light of references 26 and 27. Perhaps now there is less of a need to do so.

      As mentioned above, we have added the following sentence citing the Hansen paper to make it clear to the reader that key conclusions of the references 26 and 27 are disputed (lines 335-342):

      “One explanation for the low intrinsic fluctuation in these studies is that transcriptional fluctuations are filtered by nuclear retention, though other reports suggest that Fano factors (variance/mean, a measure of overdispersion compared to the Poisson distribution) can be even larger in the cytoplasm than in the nucleus [38].

      References

      Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2013. Bayesian Data Analysis, 3rd edn. CRC Press, London.

      Hughes ME, DiTacchio L, Hayes KR, Vollmers C, Pulivarthy S, Baggs JE, Panda S, Hogenesch JB. 2009. Harmonics of circadian gene transcription in mammals. PLoS Genet 5. doi:10.1371/journal.pgen.1000442

      Kempe H, Schwabe A, Cremazy F, Verschure PJ, Bruggeman FJ. 2015. The volumes and transcript counts of single cells reveal concentration homeostasis and capture biological noise. Mol Biol Cell 26:797–804. doi:10.1091/mbc.E14-08-1296

      Padovan-Merhar O, Nair GP, Biaesch AG, Mayer A, Scarfone S, Foley SW, Wu AR, Churchman LS, Singh A, Raj A. 2015. Single Mammalian Cells Compensate for Differences in Cellular Volume and DNA Copy Number through Independent Global Transcriptional Mechanisms. Mol Cell 58:339–352. doi:10.1016/j.molcel.2015.03.005

      Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. 2006. Stochastic mRNA synthesis in mammalian cells. PLoS Biol4:e309. doi:10.1371/journal.pbio.0040309

      Relógio A, Westermark PO, Wallach T, Schellenberg K, Kramer A, Herzel H. 2011. Tuning the mammalian circadian clock: Robust synergy of two loops. PLoS Comput Biol 7:1–18. doi:10.1371/journal.pcbi.1002309

      Saini C, Morf J, Stratmann M, Gos P, Schibler U. 2012. Simulated body temperature rhythms reveal the phase-shifting behavior and plasticity of mammalian circadian oscillators. Genes Dev 26:567–580. doi:10.1101/gad.183251.111

      Suter DM, Molina N, Gatfield D, Schneider K, Schibler U, Naef F. 2011. Mammalian Genes Are Transcribed with Widely Different Bursting Kinetics. Science (80- ) 332:472–474. doi:10.1126/science.1198817

      Ukai-Tadenuma M, Yamada RG, Xu H, Ripperger JA, Liu AC, Ueda HR. 2011. Delay in feedback repression by cryptochrome 1 Is required for circadian clock function. Cell 144:268–281. doi:10.1016/j.cell.2010.12.019

      Vehtari A, Gelman A, Gabry J. 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27:1413–1432. doi:10.1007/s11222-016-9696-4

      Wu C, Simonetti M, Rossell C, Mignardi M, Mirzazadeh R, Annaratone L, Marchiò C, Sapino A, Bienko M, Crosetto N, Nilsson M. 2018. RollFISH achieves robust quantification of single-molecule RNA biomarkers in paraffin-embedded tumor tissue samples. Commun Biol 1:1–8. doi:10.1038/s42003-018-0218-0

      Zoller B, Nicolas D, Molina N, Naef F. 2015. Structure of silent transcription intervals and noise characteristics of mammalian genes. Mol Syst Biol 11:823. doi:10.15252/msb.20156257

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this paper Nicholas et al image mRNAs encoding the key controllers of circadian rhythms, Rev-erba, Cry and Bmal1 in single cells over time. It was shown earlier that single cells exhibit circadian rhythms using reporter genes. A large number of studies have shown that transcription is an inherently stochastic process, which raises a question as to how single cells are able to achieve their rhythms on the face of this noise. Their results show that the number of mRNAs for the three genes exhibit the expected periodicity, but this periodicity is associated with significant cell-to-cell variation. They also explore to what extent this variability derives from stochastic transcription vs other sources of variation that are extrinsic to the genes. The results are interesting and experimental and modeling results are important (however this reviewer is not able to judge the veracity of mathematics that underlay the models).

      Some of the concerns that arose are listed below:

      1.The images show an annoying red background. If the red is HCS cell mask, it should be removed, and RNA presented on grey scale. This will make a better presentation. The red hue also appears in fig 2 b but here it is one of the RNA. I suggest in Fig 2 one RNA can be presented in green and the other in red, while the nuclei in blue.

      2.This paper and a few others talk about the cell size contributing to the cell-to-cell variability in mRNA numbers. Where does it come from physically? One can imagine based on the cell cycle stage there could be more than two copies of then gene in a cell, which will yield more RNAs, but they say that their cells don't have much cell cycle variability. Perhaps a clearer discussion is called for rather than just being polite to other investigators.

      3.References 26 and 27 are cited for 10-80% of variance due to gene extrinsic sources. These references actually deny that there is a significant transcriptional noise in most genes. Again, stronger discussion is called for.

      4.The results raise a very important question, whether and to what extent the transcriptional noise propagates to the next step of gene regulation and are there buffering mechanisms in the cell. For example, Raj et al, Variability in gene expression underlies incomplete penetrance, Nature 2010, show that alternative pathways serve to buffer the impact of gene expression noise. Similarly, Shah and Tyagi, Barriers to transmission of transcriptional noise in a c-fos c-jun pathway, Mol Syst Biol, 2013, show that variability in mRNA is buffered at protein level and the level of protein-protein complexes. Furthermore, they show that to the extent those vary, the chromatin intrinsically buffers against the fluctuations in numbers of transcription factors. Mention of these and other studies will enrich the paper.

      Significance

      Significance is high. Quality is high.

      Referees Cross-Commenting

      I agree with the comments made by other reviewers particularly about references 26 and 27. The major conclusions of reference 26 were questioned by Hansen et al 2018. At the bottom of page 7 the authors are qualifying their results in the light of references 26 and 27. Perhaps now there is less of a need to do so.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: The authors study experimentally and computationally the dynamic transcription of circadian clock genes over time in individual cells with single molecule RNA-FISH with the aim to understand how different noise sources contribute to single cell transcription variability and basic functions of circadian clocks. The authors integrate experiments with computational modeling to understand biology.

      Major comments:

      This study has some major limitations that need to be addressed to test the model usefulness, to understand noise sources and to gain biological insights into circadian clocks.

      The limitations are on the experiments, the computational implementation of the modeling and the integration of experiments with models.

      Although the experimental datasets contain several hundred cells per time point for multiple time points, only a single replica experiment is presented. From the presented data it is not clear how reproducible these temporal patterns are and if indeed differences between timepoints can be resolved if multiple biological replica experiments have been analyzed. To address this point at least three biological experiments needs to be presented and analyzed for each of the genes. Plotting the SEM on the means in figure 1B is misleading because several hundred cells have been measured which automatically makes the error small. The SEM just describes how well we can determine the mean from a distribution. Instead a mean and std from the biological replicas need to be plotted to show how experimental variability in experiments is resulting in the described expression pattern. This is similar to RNA-seq data or RT-PCR from multiple replica.

      It is also not clear how good the cell segmentation works and how does cell segmentation influence the analysis. In figure 1A show the segmentation of the cell boundary together with the membrane stain.

      The authors use the RNA mean and RNA-FISH distributions and combine this data to build and compare different models. How do you know that the given data fulfils the central limit so that a model describing the mean is an adequate approach? To test this point, the authors should show through subsampling from the data and the model that indeed their data sets have enough cells to fulfil the central limit theorem.

      A strength of the manuscript is that several competing and biologically meaningful models have been generated. However, the manuscript lacks rigor in terms of how fitting and model selection is performed. It is not clear how good the models fit the data. To address this point, the authors should visually compare the model fits to the data and plot their fit errors as a function of model complexity.

      Another limitation is that the models have not been validated for example by using them to make predictions. One type of prediction could be to fit the model to one biological replica and then predict the other replica (cross validation). Another prediction would be to take the distribution fitted to the experimental data and then compare the model mean to the experimental mean.

      The results from fitting and prediction should be plotted as a function of model complexity. This kind of analysis will illustrate how model complexity is supported by the data.

      In the method section on models, a biological motivation must be presented to justify the different model assumption.

      How do the models that fit the distributions describe the mean?

      It is necessary to list model parameters for each of the models, their description, their parameter values, their parameter uncertainty and units of each parameter.

      It is not clear to me how the joint probability in figures 2,4, S2 and S4 have been used to fit the model.

      How do the models make sense in the context of the fact that human genes exist as a diploids?

      The variance decomposition is shortly described but no results are presented to show how this is done. This should be better explained.

      Minor comments:

      In figure 3A, it is not clear to me what these different plots relate to the models. It is also not clear what are equations that describe each model.

      The legends in figure 3 are not very informative. More details need to be presented to understand this figure.

      Significance

      This is an interesting and important topic with the potential to have general implication of how to model periodic single cell gene expression data and eventually better understand circadian clocks. This study will expand on other modeling studies of circadian clocks and has the potential to advance the field (PMCID: PMC7229691). I personally have done similar analysis and experiments in another system and biological context which has demonstrated the power of this approach if implemented rigorously. I am not an expert in circadian clocks in human cells.

      Referees Cross commenting

      Reviewer #1: I agree with the assessment that model fitting and model selection was not sufficient. But I disagreed that the data is enough. Although many cells and time points are analyzed, there is no evidence of how reproducible each mRNA distribution can be measured at each time point. I think reproducibility is key and will also help with the model fitting and identification.

      Reviewer #3: Regarding the red background, my understanding is that this comes from the probe hybridization. This is maybe because the probe concentration has not been optimized or the number of probes per gene is low and the signal to noise is not so good. Or it could be auto fluorescent background. In this case a different fluorophore needs to be used to avoid this problem.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes (Nr1d1, Cry1, Bmal1), and performed model selection/fitting to explain the observed mRNA distributions. They decomposed the mRNA variability into distinct sources, and showed that intrinsic noise (transcription burst) dominates the variance. Therefore, looking at transcript counts may not be feasible to estimate single-cell circadian phase. However, the study is quite descriptive and ends up being a bit dissatisfying, so if the authors could improve this aspect by perhaps analyzing a mechanism on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho), it would help quite a bit in this regard. The model selection/fitting itself was not really sufficient to compensate for this, as it stands .

      Specific comments:

      1.It is hard to distinguish the RNA FISH signals (Figure 1A, 2B). It is probably technically challenging as the mRNAs are of low abundance. I think it may help if they adjust the contrast for the cytoplasm stain or just delineate the cell boundaries.

      2.In Figure 2C, the authors showed gene-pair correlations with cells of all sizes. Could the authors do a size-dependent extrinsic-noise filtering (Padovan-Merhar, Dev. Cell, 2015; Hansen et al., 2018, Cell Systems) to better dissect the correlations?

      3.For fitting model M3, as the authors pointed out, there are many local minima. Is the fitting score truly sufficient to eliminate the possibility for partial synchrony especially considering that the authors didn't show how effective the Dex treatment was to synchronize the circadian phase?

      4.Regarding model M4, the authors added a cell-specific noise term without specifying the contributing factors. Typically adding degrees of freedom should improve fitting and make it easier for a model to fit, why not in this case? Can the authors provide some explanations/mechanisms.

      5.The authors should include the number (range) of cells analyzed in the figure legends.

      Significance

      Overall, we felt conflicted about the manuscript. On one hand, the authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes. On the other hand, the manuscript was a bit dissatisfying/descriptive. If the authors could provide and analyze some sort of mechanisms on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho) it should help improve the manuscript.

      Referees cross-commenting

      I agree with Reviewer #3 regarding expanding the discussion to include the Shah & Tyagi and Raj et al citations on buffering. However caution should be exercised regarding ref 26 as it is quite controversial and subsequent analyses came to different conclusions (PMID: 30359620 and 30243562). The general consensus is that nuclear buffering of transcript noise (proposed in ref 26) is not a general phenomenon (ref 27 is specific to the calcium response pathway). In fact, the presence and evolution of specific pathways to buffer transcriptional noise, such as protein-protein mechanisms (Shah & Tyagi) or extended half-life proteins (Raj et al. and others), argues that transcript fluctuations are not probably buffered in general.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** In this study, the authors investigate the role of hedgehog signaling and lipid metabolism in the neural stem cell niche of the Drosophila larvae. They demonstrate that Hedgehog localizes to lipid droplets in glial cells and show that Hh is necessary but not sufficient for elaboration of glial membranes and normal rates of glial proliferation during development. In addition, they provide an extensive set of results in support of a model that FGF signaling functions upstream of lipid metabolism and hh in glial cells as well as a parallel ROS mediated pathway in glial cells to promote neuroblast proliferation. In general, the results provide strong support for the conclusions. Specifically, the approaches are sound, the images clearly demonstrate the phenotypes described, and the effects are quantified and tested for statistical significance. **Major comments:** 1.Since Hh RNAi decreases the glial compartment (which slows NB proliferation) and increases the frequency of pH3+ NBs, it is unclear why it would decrease the number of EdU+ NBs (Fig. S3C). 2.If overexpression of htl[ACT] slows the NB cell cycle (as evidenced by reduced pH3 and EdU positive cells), it unclear why it does not reduce the number of NBs (Fig. 4L). 3.What is the justification for presenting the EdU quantifications as an EdU index in which the experimental values are normalized to the average number of positive cells in the control? In many cases, the comparison is to the same w[1118] line so it does not control for a specific genetic backgrounds and yet this method may be obscuring experimental variation present between datasets. Likewise, why is glial number presented as a fold-change but NB number is presented as raw counts (e.g. 2D vs S3E)? **Minor comments:** On the top of P.14, "Figure S7A-C" should probably be "Figure S6A-C" Reviewer #1 (Significance (Required)): The cell autonomous regulation of growth and proliferation of neuroblasts in the larval brain have been well-studied, but much less is known about the non-cell autonomous signals. This paper significantly moves forward knowledge in this area by describing multiple steps of a molecular mechanism for glial regulation of the neuroblast cell cycle. These findings would be of interest not only to the study of Drosophila neuroblasts, but also to the broader adult stem cell field. My expertise is in Drosophila stem cell biology and genetics. Reviewer #2 (Evidence, reproducibility and clarity (Required)): **Summary:** The study by Dong et al., investigates the role of Hedgehog in the glial niche during larval neurogenesis in Drosophila. The authors describe the expression of Hh in cortex glia and its association with lipid droplets. They show that Hh expression in cortex glia is required for cortex glial proliferation, cell autonomously, and for maintenance of the normal cell cycle in neuroblasts. They go on to use a well characterised Drosophila glioma model, activation of FGF signalling, to investigate the requirement for Hh during cortex glial overgrowth. They show that FGF-activated cortex glial overproliferation requires Hh for modulation of neuroblast cell cycle, although Hh does not regulate cortex glial proliferation in this context. Finally, they show that inhibition of lipid modification of Hh rescues the neuroblast proliferation cell cycle defect caused by FGF activation in cortex glia. **Major comments:** 1.From the data in presented in Fig. 2H-K and Fig. S3C, I am very confused about role of Hh in the non-cell autonomous regulation of neuroblast cell cycle. Both RNAi and overexpression of Hh with Repo-Gal4 cause a reduction in the neuroblast EdU index (Fig. 2H-K and S3C). The authors conclude this section on p.7 saying "Together, our data suggests that high levels of glial Hh expression restricts NB cell cycle progression." This statement is not consistent with data. What is the normal physiological role of Hh if both decreased and increased levels of cortex glial Hh expression reduce neuroblast cell cycle? The discussion of p.15 does not clarify this issue. The model in Fig.7J relates to the role of Hh in the context of cortex glial FGF activation and does not illustrate the normal physiological role of Hh in the regulation of neuroblast cell cycle. 2.P.8 "Analysis of the total glial cell number indicates overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of cortex glial cells (Figure 4E-G, I-K)." This statement is confusing as Repo staining was used to quantify total glial numbers (including perineural, sub-perineural and cortex glia) but these data are then taken to represent and increase specifically in cortex glia. This should be clarified. 3.It should be mentioned on p.8 that the data in Fig.4A-K reproduce the findings of Avet-Rochex et al., 2012 and Read et al., 2009. 4.Figure 6F. Presumably due to the increase in glia cell number and dramatic increase in glial cell volume, any gene that is specific to, or enriched in, cortex glia will have increased expression levels in RepoGal4>htlACT larval CNS. Can the authors provide evidence that the increase in the expression of these genes is specific to FGF transcriptional regulation and not just a relative increase in the levels of these genes due to an increase in cortex glia as proportion of total CNS volume? Is there any evidence that Hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling in glia? 5.FGF signalling has been shown to be necessary and sufficient for cortex glial proliferation. So does knockdown of Htl, or expression of dominant negative Htl, cause a reduction in Hh, fasn1 and lsd2 expression in cortex glia? If so, does how does reduction of cortex glial numbers independent of FGF signalling, using for example knockdown of String or expression of Decapo, affect the expression of Hh, fasn1 and lsd2 in cortex glia? 6.Can the authors speculate on why and how increased levels of Hh in cortex glia, in the context of FGF activation, inhibit neuroblast cell cycle? Is this a physiological mechanism to limit neuroblast proliferation in the face of increased gliogenesis, or is it simply an indirect result of 'spillover' of excess Hh from cortex glia onto neuroblasts (which are autonomously regulated by Hh and so sensitive to this ligand) by due to increased cortex glia cells? **Minor comments:** -Figure 1C' some lipid droplets are extremely large, is this consistent with previous literature? -Including a profile plot of relative fluorescence intensity in Figure 1C',F',H' to illustrate colocalization of lipidTOX and Hh, would be helpful. -Figure S3A,B quantify Hh protein level and CNS size phenotypes with Hh RNAi. -p.6 include data showing overexpression of Hh does not cause glial overgrowth. -Top of p.14 should be FigS6A-C. -Include quantification of glial overgrowth and lipid droplet phenotypes with HtlACT plus catalase and SOD1 overexpression (Fig. S6D-K). Reviewer #2 (Significance (Required)): The is a novel and very interesting study, well written and the data are very clearly presented. It builds on and adds to the emerging literature on the glial niche and its role in neural stem cell regulation. It will be of great interest to Drosophila neurobiologists but also to the broader field of neural stem cell biology. My expertise is Drosophila neurobiology.

      Dear editor

      Below is our response to the reviewer’s comments and our experimental plan in addressing these concerns.

      Reviewer #1

      Major comments:

      1.Since Hh RNAi decreases the glial compartment (which slows NB proliferation) and increases the frequency of pH3+ NBs, it is unclear why it would decrease the number of EdU+ NBs (Fig. S3C).

      Our experimental data suggests that accompanying glial niche disruption and downregulation of glia-derived signals, NBs are stalled in M phase (we detected an increase in the percentage of pH3+ NBs). As a consequence, less NBs are in G1 and S phase. Therefore, when we conducted a 15-min EdU incorporation, we observed a reduction in EdU incorporation. This NB phenotype (increase in pH3 index and decrease in EdU index) was also observed by Speder and Brand, 2018, when they induced glial niche impairment by inhibiting the PI3K signaling pathway (discussed in P7 of this ms).

      To address whether glial-Hh knockdown reduces the ability of NBs to produce progeny, we plan to carry out two experiments:

      • We will assess the total number of neurons in the CB by assessing Elav+ neurons.

      • We will conduct two EdU pulse-chase experiments. First, we will assess the total number of EdU+ neurons produced within a 4-hr time window (neurons marked with Elav); and the secondly, we will mark the NB lineage (with either nerfin-1-GFP or pros-GFP) and quantify the number of EdU+ neurons produced per lineage during a 4-hr time window.

      Together, these experiments should allow us to assess the consequence of glial-Hh knockdown on NB proliferation.

      If overexpression of htl[ACT] slows the NB cell cycle (as evidenced by reduced pH3 and EdU positive cells), it unclear why it does not reduce the number of NBs (Fig. 4L).

      The number of NBs in the larval CNS is specified at the beginning of post-embryonic neurogenesis, when quiescent NBs re-enter the cell cycle (reviewed by Homem and Knoblich, 2012). Once NBs re-enter the cell cycle, the number of NBs remain constant. NBs undergo asymmetric division to produce one daughter NB and a GMC, which divides once to generate two neurons. With each round of NB-division, the number of NBs remain constant. Therefore, changes in NB cell cycle speed does not alter the overall NB number, only the number of neurons produced.

      To clarify this, we will add a schematic depicting NB asymmetric division to Figure 1.

      3.What is the justification for presenting the EdU quantifications as an EdU index in which the experimental values are normalized to the average number of positive cells in the control?

      EdU index is calculated as number of EdU+ NBs normalised to control EdU+ NBs. The number of EdU+ NBs reflects the NBs that progress through S phase in a 15-min time relative to the control. A similar method was used in Kanai et al., 2018. This method would not be valid only if NB number varied between control and experimental data sets, however, the number of NBs in all our genetic manipulations are not significantly altered relative to their control. We present the quantification of some key manipulations in Reviewer_Figure 1A, B.

      As regards to why we normalise to control in each of these experiments, this is because in-vitro EdU incorporation rely on Click-IT chemistry, which is inherently variable due to incubation conditions. To overcome this, we always incubate control and experimental brains in the same tube and imaged them with the same confocal setting, and each experiment is normalised to its control done in parallel. We have now included Table 1 which includes all the raw data from these experiments (Table 1)

      In the revised manuscript, we will clarify our methodology in greater detail in the Methods section, and we are happy to include Table 1in the supplementary data.

      In many cases, the comparison is to the same w [1118] line so it does not control for a specific genetic backgrounds and yet this method may be obscuring experimental variation present between datasets.

      We have used three different controls in our experiments, namely GAL4 or lexA >w1118, or UAS-mcherryRNAi, or UAS-luc. We detect no significant difference in terms of raw EdU+ NB numbers between the controls used in our experiments, as demonstrated below (Reviewer_Figure 1C). In our revised manuscript, we will include a sentence “As UAS-mcherryRNAi or UAS-luc are indistinguishable from the > w1118 control, we have used GAL4 driver > w1118 as control in place of UAS-luc in our results”.

      Reviewer_Figure 1. Total NB number and Edu+ NB number quantification

      1. A) Hh knockdown or overexpression in glia does not significantly alter NB number compared to control.
      2. B) htlACT overexpression in glia does not significantly alter NB number compared to control.
      3. C) EdU+ NB number is not significantly different within the controls GAL4 or lexA > w1118, or UAS-mcherryRNAi, or UAS-luc. P-value was obtained performing student t-test in A, B and One-way ANOVA in C.

      Likewise, why is glial number presented as a fold-change but NB number is presented as raw counts (e.g. 2D vs S3E)?

      Glial number quantification was carried out using Fiji 3D object counter and a plug-in called “DeadEasy Larval Glia” (Forero et al., 2012), where the threshold of detection is dependent on the brightness of Repo staining in each experiment, this data is presented as fold-change, as control and experiment stained in the same tube are compared to each other. We represented this data as fold-change to allow easy comparison between experiments. The raw data is presented in Table 2. NB number is counted manually and is therefore presented as raw counts.

      **Minor comments:**

      On the top of P.14, "Figure S7A-C" should probably be "Figure S6A-C"

      We will correct this.

      Reviewer #1 (Significance (Required)):

      The cell autonomous regulation of growth and proliferation of neuroblasts in the larval brain have been well-studied, but much less is known about the non-cell autonomous signals. This paper significantly moves forward knowledge in this area by describing multiple steps of a molecular mechanism for glial regulation of the neuroblast cell cycle. These findings would be of interest not only to the study of Drosophila neuroblasts, but also to the broader adult stem cell field.

      My expertise is in Drosophila stem cell biology and genetics.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Major comments:**

      1.From the data in presented in Fig. 2H-K and Fig. S3C, I am very confused about role of Hh in the non-cell autonomous regulation of neuroblast cell cycle. Both RNAi and overexpression of Hh with Repo-Gal4 cause a reduction in the neuroblast EdU index (Fig. 2H-K and S3C). The authors conclude this section on p.7 saying "Together, our data suggests that high levels of glial Hh expression restricts NB cell cycle progression." This statement is not consistent with data. What is the normal physiological role of Hh if both decreased and increased levels of cortex glial Hh expression reduce neuroblast cell cycle? The discussion of p.15 does not clarify this issue. The model in Fig.7J relates to the role of Hh in the context of cortex glial FGF activation and does not illustrate the normal physiological role of Hh in the regulation of neuroblast cell cycle.

      With repo-GAL4>hhRNAi, the cortex glial niche enwrapping NBs is dramatically disrupted, which indirectly alters NB cell cycle progression, indicated by an increase in pH3 index and a decrease in EdU index. From these two pieces of data, it is likely that NBs are stuck in M phase, thus resulting in less NBs in G1 and S phase that are capable to incorporate EdU within a 15-min incubation time window. We will firm up this data with experiments proposed to address concerns of Reviewer 1, Point 1.

      Both RNAi and overexpression of Hh with repo-GAL4 causes a reduction in NB EdU index is seemingly contradictory. However, it is consistent with a previous report from Speder and Brand, 2018, where it was shown that that glial niche impairment induced by the PI3K pathway inhibition also causes a similar NB phenotype (an increase in pH3 index and a decrease in EdU incorporation). Furthermore, with repo-GAL4>htlDN, which caused a similar glia niche impairment (data not shown), we observed a similar phenotype (an increase in pH3 index and a slight decrease in EdU incorporation). Therefore, we concluded that the NB cell cycle progression defects is due to a general cortex glial niche disruption rather than a direct effect of Hh inhibition on NBs. We are happy to include the repo-GAL4>htlDN data in the supplementary data if required.

      With regards to the physiological role of Hh, we can only conclude from the data at hand that Hh is required for the development of cortex glial niche, which is required to maintain NB activities. In terms of how glial niche impairment impedes NB cell cycle progression, we observed that without a proper niche chamber, NBs cluster together instead of residing in separate niches (Figure 2F-G). Therefore, it is possible that the localization of other cell types (i.e. GMCs and neurons) are also altered as a result of NB clustering, which can potentially affect the NB cell cycle. While these questions will be interesting to explore in the future, they are beyond the scope of this current study.

      In contrast, we robustly showed Hh signals, when overexpressed in glial niche, were capable of making contact with NBs (Figure 7C-C’) and triggering a slow-down of NB S-phase progression. Therefore, it is fair to conclude that “high levels of glial Hh expression restricts NB cell cycle progression”.

      In the revised manuscript, we will discuss these findings in greater detail.

      2.P.8 "Analysis of the total glial cell number indicates overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of cortex glial cells (Figure 4E-G, I-K)." This statement is confusing as Repo staining was used to quantify total glial numbers (including perineural, sub-perineural and cortex glia) but these data are then taken to represent and increase specifically in cortex glia. This should be clarified.

      We thank the reviewer for picking this up. Our intention was to quantify the number of cortex glia cells in glial-specific htlACT, InRwt and EgfrACT manipulations. However, two reported cortex glial antibodies (PntP2 from Avet-Rochex et al., 2012 and SoxN described in Read, 2018), showed unspecific labelling of other cell types (Reviewer_Figure 2, arrows, neurons and NBs). As an alternative, we quantified the total glial cell number (Repo+) in htlACT, InRwt or EgfrACT overexpressed using a cortex glial driver (NP2222-GAL4). We expect that the alterations in glial cell number would be primarily attributed to cortex glial-specific gene manipulation. We agree that we should say that “overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of glial cell”.

      In the revised manuscript, we will clarify this in the results section.

      Reviewer_Figure 2: PntP2 staining in the larval CNS.

      A-B) Representative images showing that PntP2 antibody stains cortex glial cells (marked by NP2222-GAL4>mGFP, yellow arrows), NBs (white arrows) and neurons (blue arrows). B) is the zoomed in image of A). Scale bar = 50 mm.

      It should be mentioned on p.8 that the data in Fig.4A-K reproduce the findings of Avet-Rochex et al., 2012 and Read et al., 2009.

      We will correct this.

      4.Figure 6F. Presumably due to the increase in glia cell number and dramatic increase in glial cell volume, any gene that is specific to, or enriched in, cortex glia will have increased expression levels in RepoGal4>htlACT larval CNS. Can the authors provide evidence that the increase in the expression of these genes is specific to FGF transcriptional regulation and not just a relative increase in the levels of these genes due to an increase in cortex glia as proportion of total CNS volume? Is there any evidence that Hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling in glia?

      We agree that FGF activation causes a dramatic increase in glial cell number, thus will cause a relative increase in the level of hh, fasn1 and lsd2s. However, with RT-qPCR, the same amounts of total RNA (1μg) were extracted from control vs repo-GAL4> htlACT and reverse transcribed into cDNA for qPCR. Therefore, the mRNA level described in Figure 6 F are already normalized to the total amount of genetic material.

      In the literature, it is not reported that hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling. However, lipid metabolism rewiring is well known as a hallmark of glioblastoma. For example, high levels of FASN has been linked with high grade glioblastoma (Grube et al., 2014). Furthermore, FGF signalling has also been shown to modulate lipid metabolism and alter the transcription of the Lsd-2 homologue called Plin2 in a mouse model (Ye et al., 2016).

      To figure out whether hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling. we will have to first find out which TFs are altered in the glia upon altered FGF signalling via cortex glia specific RNA-seq, and then conduct DamID to identify their target genes. This would be interesting to follow-up but is however beyond the scope this current study.

      We will add a section on this in the discussion section of the revised ms.

      FGF signalling has been shown to be necessary and sufficient for cortex glial proliferation. So does knockdown of Htl, or expression of dominant negative Htl, cause a reduction in Hh, fasn1 and lsd2 expression in cortex glia?

      In response to glial htlDN overexpression, we observed a significant reduction in total glial number and overall Hh expression. However, RT-qPCR showed that mRNA levels of hh, fasn1 or lsd-2 were not altered upon htlDNoverexpression (Reviewer_Figure 3).

      This data will be included in the supplementary data in the revised ms.

      Reviewer_Figure 3. Glial htlDN overexpression doesn’t alter the expression of hh, fasn1 and lsd2. The mRNA levels of hh, fasn1 and lsd2 are normalized to the reference gene rpl32.

      Continued: If so, how does reduction of cortex glial numbers independent of FGF signalling, using for example knockdown of String or expression of Decapo, affect the expression of Hh, fasn1 and lsd2 in cortex glia?

      To address this question, we plan to assess the expression levels of hh, fasn1 and lsd-2 using glia specific expression of an inhibitor of the PI3K (delta p60), which has been shown by Speder and Brand, 2018 to cause a reduction in cortex glial number. We will also ascertain whether Decapo overexpression causes cortex glial niche impairment. If so, we will also assess the expression levels of hh, fasn1 and lsd-2 in this setting.

      6.Can the authors speculate on why and how increased levels of Hh in cortex glia, in the context of FGF activation, inhibit neuroblast cell cycle? Is this a physiological mechanism to limit neuroblast proliferation in the face of increased gliogenesis, or is it simply an indirect result of 'spillover' of excess Hh from cortex glia onto neuroblasts (which are autonomously regulated by Hh and so sensitive to this ligand) by due to increased cortex glia cells?

      We favour the model that excess Hh in the glia compartment “spills over” to reduce NB proliferation, which are autonomously regulated by Hh and therefore are sensitive to this ligand. We can add this to the discussion.

      **Minor comments:**

      -Figure 1C' some lipid droplets are extremely large, is this consistent with previous literature?

      These large lipid droplets are caused by lipid droplet fusion due to the use of detergent in this experiment. When we perform antibody staining together with lipid droplet staining, PBST detergent is required for antibody staining to work. However, this created the artefact of large lipid droplets, due to lipid droplet fusion. This has previously been reported by Bailey et al., 2015, and we have explained this in P19 of the Method section.

      -Including a profile plot of relative fluorescence intensity in Figure 1C',F',H' to illustrate colocalization of lipidTOX and Hh, would be helpful.

      We will include this in the revised ms.

      -Figure S3A,B quantify Hh protein level and CNS size phenotypes with Hh RNAi.

      We will include this in the revised ms.

      -p.6 include data showing overexpression of Hh does not cause glial overgrowth.

      We will include this in the revised ms.

      -Top of p.14 should be FigS6A-C.

      We will correct this.

      -Include quantification of glial overgrowth and lipid droplet phenotypes with HtlACT plus catalase and SOD1 overexpression (Fig. S6D-K).

      We will include this in the revised ms.

      Reviewer #2 (Significance (Required)):

      The is a novel and very interesting study, well written and the data are very clearly presented. It builds on and adds to the emerging literature on the glial niche and its role in neural stem cell regulation. It will be of great interest to Drosophila neurobiologists but also to the broader field of neural stem cell biology.

      My expertise is Drosophila neurobiology.








      Table 1. EdU+ NB numbers for each genotype described in each Figure

      Figure

      Genotype

      EdU incubation time

      Average EdU+ NB number

      SEM

      Number of samples

      Figure 2J

      repo-GAL4>w1118

      15 min

      66.63

      1.79

      16

      Figure 2J

      repo-GAL4>UAS-hh

      15 min

      57.35

      1.35

      20

      Figure 2K

      NP2222-GAL4>w1118

      15 min

      67.91

      1.44

      11

      Figure 2K

      NP2222-GAL4>UAS-hh

      15 min

      60.79

      0.79

      14

      Figure 2P

      dnab-GAL4>w1118

      15 min

      70.5

      1.44

      12

      Figure 2P

      dnab-GAL4>ciACT

      15 min

      60.1

      1.48

      10

      Figure S3C

      repo-GAL4>dcr2; mcherryRi

      10 min

      57.42

      0.63

      12

      Figure S3C

      repo-GAL4>dcr2; hhRi43255

      10 min

      48.56

      2.65

      9

      Figure 3K

      NP2222-GAL4>w1118

      The same dataset as Figure 2K

      Figure 3K

      NP2222-GAL4>UAS-hh

      Figure 3K

      NP2222-GAL4>UAS-hh; mcherryRi

      15 min

      57.44

      1.41

      16

      Figure 3K

      NP2222-GAL4>UAS-hh; lsdRi34617

      15 min

      63.36

      1.34

      14

      Figure 3K

      NP2222-GAL4>UAS-hh; mcherryRi

      15 min

      58.83

      2.61

      6

      Figure 3K

      NP2222-GAL4>UAS-hh; lsdRi32846

      15 min

      64.5

      1.2

      14

      Figure 5E

      repo-GAL4>w1118

      15 min

      71.6

      1.28

      15

      Figure 5E

      repo-GAL4>UAS-htlACT

      15 min

      56

      1.59

      14

      Figure 5E

      NP2222-GAL4>w1118

      15 min

      70.2

      1.58

      10

      Figure 5E

      NP2222-GAL4>UAS-htlACT

      15 min

      54.75

      1.24

      16

      Figure 6G

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure 6G

      NP2222-GAL4>UAS-htlACT

      Figure 6G

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      60

      1.24

      7

      Figure 6G

      NP2222-GAL4>UAS-htlACT;hhRi43255

      15 min

      67.17

      1.13

      12

      Figure 6G

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      59.29

      1.79

      14

      Figure 6G

      NP2222-GAL4>UAS-htlACT;hhRi25794

      15 min

      68.55

      1.68

      11

      Figure 6H

      dnab-GAL4>mcherryRi

      10 min

      49.13

      1.6

      8

      Figure 6H

      dnab-GAL4>ciRi2125-R2

      10 min

      56.54

      1.27

      13

      Figure 6H

      repo-lexA>w1118

      15 min

      68.5

      1.1

      10

      Figure 6H

      repo-lexA>lexAop-htlACT

      15 min

      55.7

      2.15

      10

      Figure 6H

      repo-lexA>lexAop-htlACT; GFPRi

      15 min

      52

      1.58

      30

      Figure 6H

      repo-lexA>lexAop-htlACT; ciRiHMJ23860

      15 min

      62.4

      1.79

      15

      Figure 6H

      repo-lexA>lexAop-htlACT; GFPRi

      15 min

      56.33

      1.49

      12

      Figure 6H

      repo-lexA>lexAop-htlACT; ciRi2125-R2

      15 min

      62.86

      1.81

      7

      Figure 6J

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure 6J

      NP2222-GAL4>UAS-htlACT

      Figure 6J

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      58.64

      0.99

      14

      Figure 6J

      NP2222-GAL4>UAS-htlACT;fasn1Ri3523R2

      15 min

      65

      2.41

      9

      Figure 6J

      NP2222-GAL4>UAS-htlACT;mcherryRi

      The same dataset as Figure 6G control of NP2222-GAL4>UAS-htlACT;hhRi25794

      Figure 6J

      NP2222-GAL4>UAS-htlACT;lsd2Rikk102269

      15 min

      68.13

      1.08

      8

      Figure S5H

      NP2222-GAL4>mcherryRi

      15 min

      66.4

      1.71

      10

      Figure S5H

      NP2222-GAL4>fasn1Ri3523R6

      15 min

      65.5

      1.38

      10

      Figure S5H

      NP2222-GAL4>mcherryRi

      15 min

      66.4

      1.13

      15

      Figure S5H

      NP2222-GAL4>lsd2Rikk102269

      15 min

      64.2

      0.94

      10

      Figure S5H

      NP2222-GAL4>UAS-luc

      15 min

      65

      1.07

      10

      Figure S5H

      NP2222-GAL4>UAS-lsd2

      15 min

      64.9

      1.51

      10

      Figure S5I

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure S5I

      NP2222-GAL4>UAS-htlACT

      Figure S5I

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      57.93

      0.9

      14

      Figure S5I

      NP2222-GAL4>UAS-htlACT;fasn1Ri3523R6

      15 min

      63.79

      1.25

      14

      Figure S5I

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      50.25

      2.52

      8

      Figure S5I

      NP2222-GAL4>UAS-htlACT;lsd2Ri32846

      15 min

      59.3

      1.2

      10

      Figure 7B

      NP2222-GAL4>mcherryRi

      15 min

      65

      0.93

      10

      Figure 7B

      NP2222-GAL4>raspRi11495R2

      15 min

      65.13

      1.29

      15

      Figure 7B

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure 7B

      NP2222-GAL4>UAS-htlACT

      Figure 7B

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      58.33

      1.06

      18

      Figure 7B

      NP2222-GAL4>UAS-htlACT;raspRi11495R1

      15 min

      63.95

      1.05

      21

      Figure 7B

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      59.04

      1.019

      26

      Figure 7B

      NP2222-GAL4>UAS-htlACT;raspRi11495R2

      15 min

      63.07

      0.92

      29

      Figure 7D

      NP2222-GAL4>w1118

      15 min

      69.46

      1.02

      13

      Figure 7D

      NP2222-GAL4>UAS-hh.N.EGFP

      15 min

      52.25

      1.9

      12

      Figure 7F

      repo-GAL4>UAS-hh.N.EGFP;mcherryRi

      15 min

      54.4

      1.18

      15

      Figure 7D

      repo-GAL4>UAS-hh.N.EGFP;fasn1Ri3523R2

      15 min

      65.69

      1.43

      13

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-LacZ

      15 min

      59.17

      1.18

      12

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-Cat.A

      15 min

      64

      1.31

      12

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-LacZ

      15 min

      53.6

      2.32

      10

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-Sod.1

      15 min

      62.7

      1.76

      10

      Table 2. Raw data on glial number

      Figure

      Genotype

      Average Repo+glial number

      SEM

      Number of samples

      Figure 2D

      repo-GAL4>dcr2; mcherryRi

      843

      44.29

      7

      Figure 2D

      repo-GAL4>dcr2; hhRi43255

      666.5

      46.77

      8

      Figure 4K

      NP2222-GAL4>w1118

      1165

      20.55

      10

      Figure 4K

      NP2222-GAL4>htlACT

      2325

      107.5

      10

      Figure 4K

      NP2222-GAL4>InRwt

      1189

      85.92

      10

      Figure 4K

      wrapper-GAL4>w1118

      1305

      51.78

      7

      Figure 4K

      wrapper-GAL4>EgfrACT

      1192

      38.16

      12

      Reference:

      Avet-Rochex, A., Kaul, A.K., Gatt, A.P., McNeill, H., and Bateman, J.M. (2012). Concerted control of gliogenesis by InR/TOR and FGF signalling in the Drosophila post-embryonic brain. Development 139, 2763-2772.

      Bailey, A.P., Koster, G., Guillermier, C., Hirst, E.M., MacRae, J.I., Lechene, C.P., Postle, A.D., and Gould, A.P. (2015). Antioxidant Role for Lipid Droplets in a Stem Cell Niche of Drosophila. Cell 163, 340-353.

      Forero, M.G., Kato, K., and Hidalgo, A. (2012). Automatic cell counting in vivo in the larval nervous system of Drosophila. J Microsc 246, 202-212.

      Grube, S., Dunisch, P., Freitag, D., Klausnitzer, M., Sakr, Y., Walter, J., Kalff, R., and Ewald, C. (2014). Overexpression of fatty acid synthase in human gliomas correlates with the WHO tumor grade and inhibition with Orlistat reduces cell viability and triggers apoptosis. J Neurooncol 118, 277-287.

      Homem, C.C., and Knoblich, J.A. (2012). Drosophila neuroblasts: a model for stem cell biology. Development 139, 4297-4310.

      Kanai, M.I., Kim, M.J., Akiyama, T., Takemura, M., Wharton, K., O'Connor, M.B., and Nakato, H. (2018). Regulation of neuroblast proliferation by surface glia in the Drosophila larval brain. Sci Rep 8, 3730.

      Read, R.D. (2018). Pvr receptor tyrosine kinase signaling promotes post-embryonic morphogenesis, and survival of glia and neural progenitor cells in Drosophila. Development 145.

      Speder, P., and Brand, A.H. (2018). Systemic and local cues drive neural stem cell niche remodelling during neurogenesis in Drosophila. Elife 7.

      Ye, M., Lu, W., Wang, X., Wang, C., Abbruzzese, J.L., Liang, G., Li, X., and Luo, Y. (2016). FGF21-FGFR1 Coordinates Phospholipid Homeostasis, Lipid Droplet Function, and ER Stress in Obesity. Endocrinology 157, 4754-4769.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The study by Dong et al., investigates the role of Hedgehog in the glial niche during larval neurogenesis in Drosophila. The authors describe the expression of Hh in cortex glia and its association with lipid droplets. They show that Hh expression in cortex glia is required for cortex glial proliferation, cell autonomously, and for maintenance of the normal cell cycle in neuroblasts. They go on to use a well characterised Drosophila glioma model, activation of FGF signalling, to investigate the requirement for Hh during cortex glial overgrowth. They show that FGF-activated cortex glial overproliferation requires Hh for modulation of neuroblast cell cycle, although Hh does not regulate cortex glial proliferation in this context. Finally, they show that inhibition of lipid modification of Hh rescues the neuroblast proliferation cell cycle defect caused by FGF activation in cortex glia.

      Major comments:

      1.From the data in presented in Fig. 2H-K and Fig. S3C, I am very confused about role of Hh in the non-cell autonomous regulation of neuroblast cell cycle. Both RNAi and overexpression of Hh with Repo-Gal4 cause a reduction in the neuroblast EdU index (Fig. 2H-K and S3C). The authors conclude this section on p.7 saying "Together, our data suggests that high levels of glial Hh expression restricts NB cell cycle progression." This statement is not consistent with data. What is the normal physiological role of Hh if both decreased and increased levels of cortex glial Hh expression reduce neuroblast cell cycle? The discussion of p.15 does not clarify this issue. The model in Fig.7J relates to the role of Hh in the context of cortex glial FGF activation and does not illustrate the normal physiological role of Hh in the regulation of neuroblast cell cycle.

      2.P.8 "Analysis of the total glial cell number indicates overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of cortex glial cells (Figure 4E-G, I-K)." This statement is confusing as Repo staining was used to quantify total glial numbers (including perineural, sub-perineural and cortex glia) but these data are then taken to represent and increase specifically in cortex glia. This should be clarified.

      3.It should be mentioned on p.8 that the data in Fig.4A-K reproduce the findings of Avet-Rochex et al., 2012 and Read et al., 2009.

      4.Figure 6F. Presumably due to the increase in glia cell number and dramatic increase in glial cell volume, any gene that is specific to, or enriched in, cortex glia will have increased expression levels in RepoGal4>htlACT larval CNS. Can the authors provide evidence that the increase in the expression of these genes is specific to FGF transcriptional regulation and not just a relative increase in the levels of these genes due to an increase in cortex glia as proportion of total CNS volume? Is there any evidence that Hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling in glia?

      5.FGF signalling has been shown to be necessary and sufficient for cortex glial proliferation. So does knockdown of Htl, or expression of dominant negative Htl, cause a reduction in Hh, fasn1 and lsd2 expression in cortex glia? If so, does how does reduction of cortex glial numbers independent of FGF signalling, using for example knockdown of String or expression of Decapo, affect the expression of Hh, fasn1 and lsd2 in cortex glia?

      6.Can the authors speculate on why and how increased levels of Hh in cortex glia, in the context of FGF activation, inhibit neuroblast cell cycle? Is this a physiological mechanism to limit neuroblast proliferation in the face of increased gliogenesis, or is it simply an indirect result of 'spillover' of excess Hh from cortex glia onto neuroblasts (which are autonomously regulated by Hh and so sensitive to this ligand) by due to increased cortex glia cells?

      Minor comments:

      -Figure 1C' some lipid droplets are extremely large, is this consistent with previous literature?

      -Including a profile plot of relative fluorescence intensity in Figure 1C',F',H' to illustrate colocalization of lipidTOX and Hh, would be helpful.

      -Figure S3A,B quantify Hh protein level and CNS size phenotypes with Hh RNAi.

      -p.6 include data showing overexpression of Hh does not cause glial overgrowth.

      -Top of p.14 should be FigS6A-C.

      -Include quantification of glial overgrowth and lipid droplet phenotypes with HtlACT plus catalase and SOD1 overexpression (Fig. S6D-K).

      Significance

      The is a novel and very interesting study, well written and the data are very clearly presented. It builds on and adds to the emerging literature on the glial niche and its role in neural stem cell regulation. It will be of great interest to Drosophila neurobiologists but also to the broader field of neural stem cell biology.

      My expertise is Drosophila neurobiology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this study, the authors investigate the role of hedgehog signaling and lipid metabolism in the neural stem cell niche of the Drosophila larvae. They demonstrate that Hedgehog localizes to lipid droplets in glial cells and show that Hh is necessary but not sufficient for elaboration of glial membranes and normal rates of glial proliferation during development. In addition, they provide an extensive set of results in support of a model that FGF signaling functions upstream of lipid metabolism and hh in glial cells as well as a parallel ROS mediated pathway in glial cells to promote neuroblast proliferation. In general, the results provide strong support for the conclusions. Specifically, the approaches are sound, the images clearly demonstrate the phenotypes described, and the effects are quantified and tested for statistical significance.

      Major comments:

      1.Since Hh RNAi decreases the glial compartment (which slows NB proliferation) and increases the frequency of pH3+ NBs, it is unclear why it would decrease the number of EdU+ NBs (Fig. S3C).

      2.If overexpression of htl[ACT] slows the NB cell cycle (as evidenced by reduced pH3 and EdU positive cells), it unclear why it does not reduce the number of NBs (Fig. 4L).

      3.What is the justification for presenting the EdU quantifications as an EdU index in which the experimental values are normalized to the average number of positive cells in the control? In many cases, the comparison is to the same w[1118] line so it does not control for a specific genetic backgrounds and yet this method may be obscuring experimental variation present between datasets. Likewise, why is glial number presented as a fold-change but NB number is presented as raw counts (e.g. 2D vs S3E)?

      Minor comments:

      On the top of P.14, "Figure S7A-C" should probably be "Figure S6A-C"

      Significance

      The cell autonomous regulation of growth and proliferation of neuroblasts in the larval brain have been well-studied, but much less is known about the non-cell autonomous signals. This paper significantly moves forward knowledge in this area by describing multiple steps of a molecular mechanism for glial regulation of the neuroblast cell cycle. These findings would be of interest not only to the study of Drosophila neuroblasts, but also to the broader adult stem cell field.

      My expertise is in Drosophila stem cell biology and genetics.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for providing valuable feedback on our original manuscript. A point-by-point response to all of these comments is provided below. [Note that figures are not added in-line because of text-only limitations.]

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The submitted manuscript entitled 'Predicting cell health phenotypes using image-based morphology profiling' (RC-2020-00394) by Way et al. presents a set of seven dyes/staining (as two separate panels) to microscopically screen cell viability. For automatic classification a training/test set of 119 CRISPR (approximately 2 sgRNAs per gene) perturbations on 3 cancer cell lines were generated (lung A549, ovarian ES2, lung HCC44). After segmentation of cell nuclei a set of morphological cell measurements were extracted from each perturbation (total 952 features). The nature of these feature spanning cell cycle and viability phenotypes, enabled the authors to define 70 different phenotype classes, which are used to model a classifier by elastic linear regression. Specific definitions (cell cycle and ROS) were partly predicted/validated in an independent existing image data set (Drug Repurposing Hub project). The data is available as web-based application/visualization and the supplementary method is well described.

      We thank the reviewer for their constructive comments and helpful feedback.

      There is one subtle point that is worth raising given this description: The images we use to measure the cell cycle and viability phenotypes (two different staining panels in the Cell Health assays) are not the same images we use to extract morphology measurements (Cell Painting assay). This lack of connection, which is based on a light wavelength limitation present in all microscopes that limits the number of stains in a single assay, prevents us from developing a method that analyzes the same cells across the three assays. This distinction will become important later in the review, and we have made specific changes in the manuscript to increase clarity.

      **Major concerns:**

      (1)The only fundamental argument of this manuscript not to apply state-of-the-art deep learning (DL) machine-learning (mentioned in McCain et al. 2018), which does not require segmentation, feature extraction, abstraction, manual gating is the 'interpretability' of the predictions. However, performance, precision, scalability (by modern GPUs) with DL should clearly outperform 'manual' regression models. All recent machine vision benchmarks in microscopy confirm this, but also clearly shows 'real world' translational applications, e.g.

      https://www.nature.com/articles/s43018-020-0085-8,

      https://www.biorxiv.org/content/10.1101/2020.07.02.183814v1.full.pdf,

      In other words, the presented methodology is not compared to DL, and is not convincing in terms of interpretability benefits.

      (We’ve copied a similar critique from __Significance sectio__n from Reviewer #1 in order to reduce redundancy) The author/co-authors have been instrumental/pioneered with their past work on cell-based image processing (CellProfiler software), but the presented methodology is simply outdated. Therefore, a revision towards a comparison and benchmarking with DL will also not help.

      Ref (DL with MIL): https://academic.oup.com/bioinformatics/article/32/12/i52/2288769

      We agree that deep learning approaches are exciting; much of our laboratory’s work focuses on their application (see https://doi.org/10.1073/pnas.2001227117, https://doi.org/10.1038/s41592-019-0612-7; https://doi.org/10.1002/cyto.a.23863, https://doi.org/10.1109/CVPR.2018.00970), and we agree that they are likely to outperform simpler regression models trained using so-called hand-engineered features. We thank the reviewer for highlighting our failure to accurately and fully describe our rationale.

      We intentionally did not use deep learning for this problem given (a) data limitations (b) the primary goal of the manuscript, which is to demonstrate feasibility.

      Data limitations. There is no mechanism to link the cells of the assays (Cell Health and Cell Painting) together, which greatly reduces the available sample size. In the two referenced manuscripts, which each propose an exciting approach, the dataset is much larger (~17,000 and ~1,000 images respectively). Our dataset is only 357 perturbations that can only be linked between assays at the perturbation level rather than a single-cell level. Therefore, a deep learning approach is likely to produce models that don’t generalize to other datasets. Furthermore, reviewer 3 commented in favor of the approach we presented: “Using elastic net regression models is well-suited to the problem due to the low number of observations.”

      Primary goal of the manuscript is to demonstrate feasibility. In addition, the primary goal of the manuscript is to add cell health annotations as functional readouts to perturbations. Our aim was to demonstrate feasibility of predicting cell health states, not to optimize performance. Optimizing performance would require collecting much more data, or developing new deep learning or data collection methods to account for the lack of matched single cell readouts.

      To make this rationale more clear and concise, we have made the following changes in the manuscript:

      In the first paragraph of page 3, we make some minor contextual updates (”To demonstrate proof of concept, we collected a small pilot dataset of 119 CRISPR knockout perturbations…”) and replaced “We used simple machine learning methods, which are relatively easy to interpret compared to deep learning” with:

      We used simple machine learning methods instead of a deep learning approach because of our limited sample size of 119 perturbations and the inability to increase the sample size by linking single cell measurements across assays.

      We have also amended the Conclusions section to emphasize our primary goal and note possible deep learning extensions as future directions. The Conclusions now reads:

      We have demonstrated feasibility that information in Cell Painting images can predict many different Cell Health indicators even when trained on a small dataset. The results motivate collecting larger datasets for training, with more perturbations and multiple cell lines. These new datasets would enable the development of more expressive models, based on deep learning, that can be applied to single cells. Including orthogonal imaging markers of CRISPR infection would also enable us to isolate cells with expected morphologies. More data and better models would improve the performance and generalizability of Cell Health models and enable annotation of new and existing large-scale Cell Painting datasets with important mechanisms of cell health and toxicity.

      (2)One aforementioned point of the methodology is cryptically/not described: Why it should be less expensive compared with other (which?) approaches (see introduction)?

      We thank the reviewer for bringing up this point. We believe that part of this confusion stems from a slight misunderstanding about how images from the three assays (two Cell Health and one Cell Painting) are collected. The Cell Health assays are two distinct panels of targeted reagents that are separately prepared as two physically distinct assays. The Cell Painting assay is already an established assay used by many labs and companies around the world to mark cell morphology in an unbiased and relatively cheap way. We are comparing the expenses between the two Cell Health assays vs. the Cell Painting assay.

      We believe that this misunderstanding likely results from our somewhat cryptic and inconsistent language when describing the Cell Health assays in the abstract and introduction. We’ve updated the third sentence of the abstract from “We developed two customized microscopy assays that use seven reagents to measure 70 specific cell health phenotypes...” to now read:

      We developed two customized microscopy assays, one using four targeted reagents and the other three targeted reagents, to collectively measure 70 specific cell health phenotypes including proliferation, apoptosis, reactive oxygen species (ROS), DNA damage, and cell cycle stage.

      For consistency, we have also updated the penultimate paragraph in the introduction to now read:

      To do this, we first developed two customized microscopy assays, which collectively report on 70 different cell health indicators via a total of seven reagents applied in two reagent panels. Collectively, we call these assays “Cell Health”.

      With these clarifications in mind, we believe that the question of comparing monetary costs is more clear. We are comparing the costs of the targeted reagents in the two Cell Health assays to the unbiased reagents in the single Cell Painting assay. We’ve also modified the last two sentences in the first paragraph of the introduction to strengthen the connection between Cell Health assays, targeted reagents, and high cost:

      Cell health is normally assessed by eye or measured by specifically targeted reagents, which are either focused on a single Cell Health parameter (ATP assays) or multiple, in combination, via FACS-based or image-based analyses, which involves a manual gating approach, complicated staining procedures, and significant reagent cost. These traditional approaches limit the ability to scale to large perturbation libraries such as candidate compounds in academic and pharmaceutical screening centers.

      (3)Generalizability and/or training data size is essential for any model-based classification, but not evaluated or validated in the current manuscript. The independent validation on a A549 cell line only data might be not sufficient/convincing.

      We separately address the two distinct points raised by the reviewer of 1) generalizability and 2) training data size:

      Generalizability We agree that any model-based classification must demonstrate generalizability. For this reason, we have taken careful consideration to assess the generalizability of all 70 models in two contexts. First, we assessed model performance in a single held out test set (15% of all data). All results we report in the main text (e.g. Figure 2) report performance on this test set. We see high performance in many (but not all) models, and we observe much better model performance compared to a negative control baseline (New Supplementary Figure S5). High performance in the test set indicates that, for some cell health indicators, the models generalize well.

      Second, we also demonstrate that these models generalize to data from an entirely different experiment using a fundamentally different perturbation (CRISPR vs. drug compounds). We demonstrate generalizability to this external validation data in four different ways: 1) Validating a relatively simple model (“Number of Live Cells”) with an orthogonal viability readout from the PRISM assay (barcoding-based cell viability; updated Figure 4); 2) Demonstrating that proteasome inhibitors, which are known to produce reactive oxygen species, are predicted to do so; 3) Demonstrating that PLK inhibitors, which are known to reduce entry to G1, show a robust dose response in the "G1 Cell Count" model; and 4) Demonstrating that aurora kinase and tubulin inhibitors are predicted to induce high DNA damage (gH2AX) in G1 cells. These two drug classes are known to cause “mitotic slippage” and double stranded DNA breaks. The fourth example was added in response to a comment by reviewer 3.

      We’ve also added a series of enrichment tests, as described in the following new text:

      We also chose to validate three additional models: ROS, G1 cell count, and Number of gH2AX spots in G1 cells. We observed that the two proteasome inhibitors (bortezomib and MG-132) in the Drug Repurposing Hub set yielded high ROS predictions (OR = 76.7; p -15) (Figure 4C). Proteasome inhibitors are known to induce ROS (Han and Park, 2010; Ling et al., 2003). As well, PLK inhibitors yielded low G1 cell counts (OR = 0.035; p = 3.9 x 10-8) (Figure 4C). The PLK inhibitor HM-214 showed an appropriate dose response (Figure 4D). PLK inhibitors block mitotic progression, thus reducing entry into the G1 cell cycle phase (Lee et al., 2014). Lastly, we observed that aurora kinase and tubulin inhibitors were enriched for high Number of gH2AX spots in G1 cells predictions (OR = 11.3; p -15) (Figure 4E). In particular, we observed a strong dose response for the aurora kinase inhibitor barasertib (AZD1152) (Figure 4F). Aurora kinase and tubulin inhibitors cause prolonged mitotic arrest, which can lead to mitotic slippage, G1 arrest, DNA damage, and senescence (Orth et al. 2011; Cheng and Crasta 2017; Tsuda et al. 2017).

      The updated methods section describing our approach to assess generalizability perform the enrichment tests now states:

      Assessing generalizability of cell health models applied to Drug Repurposing Hub data

      We used our cell health webapp (https://broad.io/cell-health-app) to identify compounds with high predictions for three models with high or intermediate performance: ROS, Number of G1 cells, and Number of gH2AX spots in G1 cells. For each model, we identified classes of compounds with consistently high scores, then tested for statistical enrichment: for proteasome inhibitors in the ROS model, PLK inhibitors in the Number of G1 cells model, and aurora kinase and tubulin inhibitors in the Number of gH2AX spots in G1 cells model. We used one-sided Fisher’s exact tests to quantify differences in expected proportions between high and low model predictions. For each case, we determined high and low predictions based on the 50% quantile threshold for each model independently.

      We acknowledge that prospectively making predictions and measuring Cell Health readouts directly in a new experiment would be more convincing, but we note that our existing assessment of generalizability in an external experiment is already unusual in machine learning publications. Additionally and unfortunately, collecting a second validation dataset for this manuscript is not currently feasible given experiments backlogged from COVID.

      1. Training data size

      We also agree that a more comprehensive analysis on training data size would be an important indicator of model limitations. Therefore, we performed a sample titration analysis in which we randomly dropped samples from the training procedure, and tracked performance of the held out test set. We add the following figure, figure legend, and results text to describe and interpret the results.

      Supplementary Figure S13: Dropping samples from training reduces test set model performance in high, mid, and low performing models. We determined model performance stratification by taking the top third, mid third, and bottom third of test set performance when using all data. We performed the sample titration analysis with 10 different random seeds and visualized the median test set performance for each model.

      We updated the results section to introduce and discuss this result:

      Lastly, we performed a sample size titration analysis in which we randomly removed a decreasing amount of samples from training. For the high and mid performing models, we observed a consistent performance drop, suggesting that increasing sample size would result in better overall performance (Supplementary Figure 13).

      Finally, the updated methods section describing our sample titration analysis now reads:

      Machine learning robustness: Investigating the impact of sample size

      We performed an analysis in which we randomly dropped an increasing amount of samples from the training set before model training. After dropping the predefined number of samples, we retrained all 70 cell health models and assessed performance on the original holdout test set. We performed this procedure ten times with ten unique random seeds to mirror a more realistic scenario of new data collection and to reduce the impact of outlier samples on model training.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      **Minor concerns:**

      (1)Highest test performance comprises that precision is mainly driven by cell cycle/count and live status and could be probably derived from DRAQ7 (Fig. 2) and DNA granularity (Fig. 3, bottom right) and would argue for rigid feature selection across channels and features.

      We believe that clarifying the confusion between the two Cell Health assays we developed and the well-established Cell Painting assay addresses part of this concern. The DRAQ7 dye marks dead cells, and is measured in Cell Health. In other words, readouts from this reagent are what we aim to predict, not what we use for training. Indeed, DRAQ7-based phenotypes are among the top predicted models, which is a result we present in Supplementary Figure S7 - this figure uncovers which Cell Health phenotypes are more easily predicted by Cell Painting.

      The DNA granularity morphology measurements are collected from the Cell Painting assay and thus are available for training, and, as noted by the reviewer, encode a high proportion of signal in predicting the various cell health phenotypes. In our most common processing workflows for other projects, we do apply a rigid feature selection pipeline to all Cell Painting profiles before analysis, but we do not do this in this analysis since we were using a model with a sparsity-inducing penalty (elastic net).

      To directly answer the question of how channels and feature groups influence model performance, we’ve performed a systematic experiment removing different channel, compartment, and feature groups and retraining all models with the specific group dropped. We now include the following supplementary figure:

      Supplementary Figure S12: Systematically removing classes of features has little impact on most models’ performance. We retrained all 70 cell health models after dropping features associated with specific (a) feature groups, (b) channels, and (c) compartments. Each dot is one model (predictor), and the performance difference between the original model and the retrained model after dropping features is shown on the x axis. Any positive change indicates that the models got worse after dropping the feature group. (d) Individual model differences in performance after dropping features. Each dot is one class of features removed (as in a-c).

      Additionally, we updated the results section to introduce and discuss this result:

      We also performed a systematic feature removal analysis, in which we retrained cell health models after dropping features that are measured from specific groups, compartments, and channels. We observed that most models were robust to dropping entire feature classes during training (Supplementary Figure 12). This result demonstrates that many Cell Painting features are highly correlated, which might permit prediction “rescue” even if the directly implicated morphology features are not measured. Because of this, we urge caution when generating hypotheses regarding causal relationships between readouts and individual Cell Painting features.

      And we add the following to the methods section:

      Machine learning robustness: Systematically removing feature classes

      We performed an analysis in which we systematically dropped features measured in specific compartments (Nuclei, Cells, and Cytoplasm), specific channels (RNA, Mito, ER, DNA, AGP), and specific feature groups (Texture, Radial Distribution, Neighbors, Intensity, Granularity, Correlation, Area Shape) and retrained all models. We omitted one feature class and then independently optimized all 70 cell health models as described in the Machine learning framework results section above. We repeated this procedure once per feature class.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      (2)Any H2AX and 'polynuclear' would probably fail in any cell line with this size of training data.

      Indeed we would expect certain cell health phenotype models to fail if they had few hits and a relatively low variance of output values. This hit rate is directly associated with the phenotypes that the CRISPR perturbations induce, which is why we intentionally selected them to span multiple gene pathways in an attempt to maximize morphology diversity (see Supplementary Table S1).

      We did indeed observe that the polynuclear model had few hits in the training data and relatively poor performance. We did not expect this result, given that DNA stains are captured in the Cell Health and Cell Painting assays. We suspect the poor performance in this model is likely because so few cells were classified as polynuclear in our gating strategy, making it perhaps an inconsistently measured readout.

      By contrast, some gH2AX models did have relatively good performance. In the conclusion, we note that increased training data size using more perturbations is likely to improve model performance:

      The results motivate collecting larger datasets for training, with more perturbations and multiple cell lines. These new datasets would enable the development of more expressive models, based on deep learning, that can be applied to single cells. Including orthogonal imaging markers of CRISPR infection would also enable us to isolate cells with expected morphologies. More data and better models would improve the performance and generalizability of Cell Health models and enable annotation of new and existing large-scale Cell Painting datasets with important mechanisms of cell health and toxicity.

      (3)To what refers the 'weights' of the model in Fig. 1c?

      We thank the reviewer for pointing out that we never defined this term in the Figure 1 legend. We use “weights” to refer to the coefficients from the regression model. To make this more clear, we have updated the legend to now read: “Model coefficient weights” and the text in Figure 1C to now read “model weights”.

      Reviewer #1 (Significance (Required)):

      This manuscript is not advanced in the context of latest improvements/developments of cell-based microscopic classification. Rationale in the introduction and the conclusion are not linked (interpretability, generalizability, costs). It seems to be unfinished or unformatted to this end?

      Since responding to these reviews, we believe that our primary motivation - to demonstrate proof-of-concept of predicting cell health phenotypes directly from Cell Painting data - is now much clearer, holistically. We provide below an updated introduction, which improves rationale.

      Perturbing cells with specific genetic and chemical reagents in different environmental contexts impacts cells in various ways (Kitano, 2002). For example, certain perturbations impact cell health by stalling cells in specific cell cycle stages, increasing or decreasing proliferation rate, or inducing cell death via specific pathways (Markowetz, 2010; Szalai et al., 2019). Cell health is normally assessed by eye or measured by specifically targeted reagents, which are either focused on a single Cell Health parameter (ATP assays) or multiple, in combination, via FACS-based or image-based analyses, which involves a manual gating approach, complicated staining procedures, and significant reagent cost. These traditional approaches limit the ability to scale to large perturbation libraries such as candidate compounds in academic and pharmaceutical screening centers.

      Image-based profiling assays are increasingly being used to quantitatively study the morphological impact of chemical and genetic perturbations in various cell contexts (Caicedo et al., 2016; Scheeder et al., 2018). One unbiased assay, called Cell Painting, stains for various cellular compartments and organelles using non-specific and inexpensive reagents (Gustafsdottir et al., 2013). Cell Painting has been used to identify small-molecule mechanisms of action (MOA), study the impact of overexpressing cancer mutations, and discover new bioactive mechanisms, among many other applications (Caicedo et al., 2018; Christoforow et al., 2019; Hughes et al., 2020; Pahl and Sievers, 2019; Rohban et al., 2017; Simm et al., 2018; Wawer et al., 2014). Additionally, Cell Painting can predict mammalian toxicity levels for environmental chemicals (Nyffeler et al., 2020) and some of its derived morphology measurements are readily interpreted by cell biologists and relate to cell health (Bray et al., 2016). However, no single assay enables discovery of fine-grained cell health readouts.

      We hypothesized that we could predict many cell health readouts directly from the Cell Painting data, which is already available for hundreds of thousands of perturbations. This would enable the rapid and interpretable annotation of small molecules or genetic perturbations. To do this, we first developed a customized microscopy assays, which collectively report on 70 different cell health indicators via a total of seven reagents applied in two reagent panels. Collectively, we call these assay panels “Cell Health”.

      To demonstrate proof of concept, we collected a small pilot dataset of 119 CRISPR knockout perturbations in three different cell lines using Cell Painting and Cell Health. We used the Cell Painting morphology readouts to train 70 different regression models to predict each Cell Health indicator independently. We used simple machine learning methods instead of a deep learning approach because of our limited sample size and the inability to increase it by linking single cell measurements from both assays. We predicted certain readouts, such as the number of S phase cells, with high performance, while performance on other readouts, such as DNA damage in G2 phase cells, was low. We applied and validated these models on a separate set of existing Cell Painting images acquired from 1,571 compound perturbations measured across six different doses from the Drug Repurposing Hub project (Corsello et al., 2017). We provide all predictions in an intuitive web-based application at http://broad.io/cell-health-app, so that others can extend our work and explore cell health impacts of specific compounds.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This report from Way et al describes a method of extending a very popular screening technology called Cell Painting developed by the Carpenter Lab. The authors are contending with an important issue and as such this paper potentially will be of great interest to the community. Cell Painting provides quantitative fingerprints of cell phenotypes in response to changes in the molecular or physiological status of cells. However the molecular basis or even the candidate pathways for those changes is not always clear. Here, the authors take specific markers of cell physiology, e.g., DNA damage, ROS production, cell cycle progression etc. and relate them to Cell Painting features. The authors are trying to address the issue that running many probes of cell physiology is expensive and time consuming and that identifying proxies for these assays using much simpler Cell Painting technologies would be a useful and potentially powerful approach. The overall goal is to develop some type of regression model that can link the state of cells (the "health") to Cell Painting fingerprints.

      The authors use three separate cell lines and CRISPR knockouts delivered through lentivirus that target 59 genes to establish a range of cell physiologies that they directly measure (the "Cell Health") and then relate to similar assays performed by Cell Painting. Ultimately they aim to use Cell Painting models to predict Cell Health.

      We thank the reviewer for their succinct summary of our goals and rationale for this manuscript, and for the constructive and valuable comments herein.

      **Major Issues:**

      It appears that the phenotypes that are detected at a high enough level of significance (see Fig. 2), e.g DNA damage (gH2Ax), apoptosis (Caspase 3/7), dead cells, ROS (CellROX), etc. are probably most easily detected by simply monitoring DAPI signal in these screens. To detect many of the phenotypes, the authors have presented a fairly complex method of doing much simpler assays. The authors correctly highlight in Fig. 3 that the phenotypes they are detecting go beyond pure signals from DAPI. They report power in their models from Radial Distribution across many different components of the Cell Painting feature set.

      We agree that the two assays we’re collectively calling “Cell Health” are indeed fairly complex - we use two different panels of multiplexed stains and a series of gating strategies to measure phenotypes in various cell subpopulations. However, the fundamental message in the manuscript is that we may no longer need to perform these complex assays if we get this information from the simpler Cell Painting assay.

      We agree that our machine learning approach to predict the various cell health phenotypes uses signals beyond nucleus-based stains. However, even if we are predicting just DAPI signals, this reinforces our argument that the specific stains in the Cell Health assays (which are commonly used in targeted experiments) are not necessary to measure specifically. Instead, in certain circumstances, a scientist should just use unbiased stains to capture their biology of interest, since the stains are cheaper at scale and one has access to much more information.

      It is also worth noting that the DNA damage phenotypes in specific cell subpopulations (e.g. DNA Damage in G1 cells) would not be possible to measure with high precision without EdU co-staining.

      However these appear to give outputs that won't be that useful. It is hard to tell whether this is simply because they don't have enough images or whether their signal is confounded by using cell lines where the lentivirus CRISPR knockouts are working less efficiently.

      (Reviewer 2 introduced a similar critique below, which we now move here) A fundamental issue that the authors mention but do not address is the efficiency of the CRISPR KOs. The authors should measure the efficiency of representative guides and present these data to help support the interpretation of their models.

      We definitely agree that sample size is a limitation in this manuscript. Our primary goal with this paper was to demonstrate feasibility of the approach to predict the targeted Cell Health readouts using a simpler (and more affordable/scalable) assay in Cell Painting. The promising results we observed, especially given this sample size limitation, motivates collecting a larger dataset using more perturbations.

      Potentially confounded signal by low efficiency CRISPR knockouts is also an interesting topic. We do provide Supplementary Figure S8 to describe a subtle relationship that we observed regarding CRISPR infection efficiency. We also discuss this in the results as: “We observed overall better predictivity in ES2 cells, which had the highest CRISPR infection efficiency (Supplementary Figure 8), suggesting that stronger perturbations provide better information for training and that training on additional data should provide further benefit.”

      Additionally, we made a substantial effort to maximize CRISPR efficiency by independently optimizing lentivirus volumes for each sgRNA. In general, we observed that some cell lines are easier to CRISPR, probably based on more factors beyond Cas9 expression. However, we note that CRISPR is being used simply as a perturbation to elicit a variable morphology response. In other words, the type, efficacy, and even accuracy of perturbation does not matter as long as it satisfies two constraints: 1) induces a morphology response for a sufficient number of perturbations, and 2) is consistent between the two assays (Cell Health and Cell Painting). Our setup satisfies both constraints.

      However, this experiment (and data from the experiment) can be used in other contexts in which the CRISPR efficiency is extremely important. Therefore, we added three columns to Supplementary Table 1 providing the efficiency readouts for the three cell lines. (This information was already present in GitHub, but we moved it to a more obvious location in Supplementary Table 1). Code describing this change can be viewed here: https://github.com/broadinstitute/cell-health/pull/142

      In regards to the first sentence of this concern: “However these appear to give outputs that won’t be that useful” - indeed, we fully expected that many cell health readouts would be difficult to predict. In the original submission, we included the following explanation for potential sources of low performing models: ”Performance differences might result from random technical variation, small sample sizes for training models, different number of cells in certain Cell Health subpopulations (e.g. mitosis or polynuclear cells), fewer cells collected in the viability panel (see methods), or the inability of Cell Painting reagents to capture certain phenotypes.”

      It seems misleading (or perhaps the explanation lacks clarity) to describe in the same paragraph the need to validate the model by applying it to new datasets, namely the Drug Repurposing Hub project, then describe gradients in cell health features across UMAP coordinates.

      We thank the reviewer for pointing out this source of confusion and for providing an opportunity to improve the clarity of this section. Our major revisions here are as follows: 1) Introduce the Drug Repurposing Hub as an external dataset for validation; 2) Validate a high performing and simple model (number of live cells) by comparing model readout predictions from the Drug Repurposing Hub Cell Painting profiles against orthogonal PRISM viability readouts (in compounds with slightly different doses); 3) Validate three additional models: enrichment of proteasome inhibitors in the ROS model, enrichment of PLK inhibitors in the G1 cell count model, and enrichment of tubulin-destabilizing compounds in the Number of gH2Ax spots in G1 cells model; 4) Display a global structure of Cell Health predictions in UMAP space for select models. Note that for the fourth point, we are using the UMAP gradients to observe patterns, and not to validate models.

      In order to encapsulate the updated flow, we’ve pasted below the entire Drug Repurposing Hub results/discussion section, which introduces two additional analyses and new text in response to various other reviewer comments. We feel that the updated section improves clarity and purpose.

      The updated section now reads:

      “Predictive models of cell health would be most useful if they could be trained once and successfully applied to data sets collected separately from the experiment used for training. Otherwise one could not annotate existing datasets that lack parallel Cell Health results, and Cell Health assays would have to be run alongside each new dataset. We therefore applied our trained models to a large, publicly-available Cell Painting dataset collected as part of the Drug Repurposing Hub project (Corsello et al., 2017). The data derive from A549 lung cancer cells treated with 1,571 compound perturbations measured in six doses.

      We first chose a simple, high-performing model to validate. The number of live cells model captures the number of cells that are unstained by DRAQ7. We compared model predictions to orthogonal viability readouts from a third dataset: Publicly available PRISM assay readouts, which count barcoded cells after an incubation period (Yu et al., 2016). Despite measuring perturbations with slightly different doses and being fundamentally different ways to count live cells (Figure 4A), the predictions correlated with the assay readout (Spearman's Rho = 0.35, p -3; Figure 4B).

      We also chose to validate three additional models: ROS, G1 cell count, and Number of gH2AX spots in G1 cells. We observed that the two proteasome inhibitors (bortezomib and MG-132) in the Drug Repurposing Hub set yielded high ROS predictions (OR = 76.7; p -15) (Figure 4C). Proteasome inhibitors are known to induce ROS (Han and Park, 2010; Ling et al., 2003). As well, PLK inhibitors yielded low G1 cell counts (OR = 0.035; p = 3.9 x 10-8) (Figure 4C). The PLK inhibitor HM-214 showed an appropriate dose response (Figure 4D). PLK inhibitors block mitotic progression, thus reducing entry into the G1 cell cycle phase (Lee et al., 2014). Lastly, we observed that aurora kinase and tubulin inhibitors yielded high Number of gH2AX spots in G1 cells predictions (OR = 11.3; p Figure 4E). In particular, we observed a strong dose response for the aurora kinase inhibitor barasertib (AZD1152) (Figure 4F). Aurora kinase and tubulin inhibitors cause prolonged mitotic arrest, which can lead to mitotic slippage, G1 arrest, DNA damage, and senescence (Orth et al. 2011; Cheng and Crasta 2017; Tsuda et al. 2017).

      We applied uniform manifold approximation (UMAP) to observe the underlying structure of the samples as captured by morphology data (McInnes et al., 2018). We observed that the UMAP space captures gradients in predicted G1 cell count (Supplementary Figure S14A) and in predicted ROS (Supplementary Figure S14B). We also observed similar gradients in the ground truth cell health readouts in the CRISPR Cell Painting profiles used for training cell health models (Supplementary Figure S15). Gradients in our data suggest that cell health phenotypes manifest in a continuum rather than in discrete states.

      Lastly, we observed moderate technical artifacts in the Drug Repurposing Hub profiles, indicated by high DMSO profile dispersion in the Cell Painting UMAP space (Supplementary Figure 14C). This represents an opportunity to improve model predictions with new batch effect correction tools. Additionally, it is important to note that the expected performance of each Cell Health model can only be as good as the performance observed in the original test set (see Figure 2), and that all predictions require further experimental validation.“

      Updated Figure 4:

      Figure 4: Validating Cell Health models applied to Cell Painting data from The Drug Repurposing Hub. The models were not trained using the Drug Repurposing Hub data. (a) The results of the dose alignment between the PRISM assay and the Drug Repurposing Hub data. This view indicates that there was not a one-to-one matching between perturbation doses. (b) Comparing viability estimates from the PRISM assay to the predicted number of live cells in the Drug Repurposing Hub. The PRISM assay estimates viability by measuring barcoded A549 cells after an incubation period. (c) Drug Repurposing Hub profiles stratified by G1 cell count and ROS predictions. Bortezomib and MG-132 are proteasome inhibitors and are used as positive controls in the Drug Repurposing Hub set; DMSO is a negative control. We also highlight all PLK inhibitors in the dataset. (d) HMN-214 is an example of a PLK inhibitor that shows strong dose response for G1 cell count predictions. (e) Tubulin and aurora kinase inhibitors are predicted to have high Number of gH2AX spots in G1 cells compared to other compounds and controls. (f) Barasertib (AZD1152) is an aurora kinase inhibitor that is predicted to have a strong dose response for Number of gH2AX spots in G1 cells predictions.

      Updated Supplementary Figure:

      Supplementary Figure S14: Applying a Uniform Manifold Approximation (UMAP) to Drug Repurposing Hub consensus profiles of 1,571 compounds across six doses. The models were not trained using the Drug Repurposing Hub data. (a) The point color represents the output of the Cell Health model trained to predict the number of cells in G1 phase (G1 cell count). (b) The same UMAP dimensions, but colored by the output of the Cell Health model trained to predict reactive oxygen species (ROS). (c) In the UMAP space, we highlight DMSO as a negative control, and Bortezomib and MG-132 as two positive controls (proteasome inhibitors) in the Drug Repurposing Hub set. We observe moderate batch effects in the negative control DMSO profiles, based on their spread in this visualization. The color represents the predicted number of live cells. The positive controls were acquired with a very high dose and are expected to result in a very low number of predicted live cells.

      All software updates required to update these figures can be viewed at https://github.com/broadinstitute/cell-health/pull/145

      Is it surprising that cell health phenotypes and gradients therein are present in a dataset describing cell health perturbations?

      This was not surprising to us, and we thank the reviewer for asking the question. We have now added a new Supplementary Figure to present a UMAP with ground truth cell health measurements in the CRISPR dataset (pasted below). By adding the figure, we show how Cell Health predictions are expected to show gradients in UMAP space. In fact, for any lower-dimensional embedding that is able to preserve local neighborhoods of the high-dimensional space, we should expect all linear transformations of the input data (in the high-dimensional space) to vary smoothly across the lower-dimensional embedding. However, it is still informative to observe where the specific Cell Health phenotype predictions manifest in relation to global morphology structure. We add the following sentence in the Drug Repurposing Hub paragraph juxtaposed to the other UMAP gradient observations:

      We applied uniform manifold approximation (UMAP) to observe the underlying structure of the samples as captured by morphology data (McInnes et al., 2018). We observed that the UMAP space captures gradients in predicted G1 cell count (Supplementary Figure S14A) and in predicted ROS (Supplementary Figure S14B). We also observed similar gradients in the ground truth cell health readouts in the CRISPR Cell Painting profiles used for training cell health models (Supplementary Figure S15). Gradients in our data suggest that cell health phenotypes manifest in a continuum rather than in discrete states.

      Supplementary Figure S15: Applying a Uniform Manifold Approximation (UMAP) to the Cell Painting consensus profile data of CRISPR perturbations. UMAP coordinates visualized by (a) cell line, (b) ground truth G1 cell counts, and (c) ground truth ROS counts. (d) Visualizing the distribution of ground truth ROS compared against G1 cell count. The two outlier ES2 profiles are CRISPR knockdowns of GPX4, which is known to cause high ROS.

      We have also added the option to explore the CRISPR profile Cell Health ground truth in our shiny app https://broad.io/cell-health (screenshot pasted below)

      Modifications to the software introducing these changes can be viewed at https://github.com/broadinstitute/cell-health/pull/141.

      The actual test of the model's performance is in the paragraph below, but the data associated with the Spearman correlation is hidden in Fig. S10b. The data is not convincing by eye, and the artifactually low p value suggests that proper statistical corrections were not applied.

      We have moved the Spearman correlation figure (previously Supplementary Figure S10B) into a main figure, along with a complete restructuring of the results and discussion in the Drug Repurposing Hub section.

      We appreciate the careful observations and interpretations, and confirm the statistical test performed here is sound and the p value is correct (there is no need to account for multiple testing since there is only one test being applied, a test of correlation between two variables).

      We add this rationale to the “Comparing viability predictions to an orthogonal readout” methods section:

      We performed the non-parametric Spearman correlation test because 1) the doses were not aligned between the datasets we compared, and 2) it is possible that a strong nonlinear correlation exists between readouts from two fundamentally different ways to measure viability.

      It is definitely valid to critique the scatter plot relationship to understand that the mean squared error is quite high (i.e. if two datasets had viability measurements using the two approaches, it would be wrong to assume that lower measurements in one assay automatically could be compared to lower measurements of the other assay). This level of variability would be lost if all we did was report the test statistic, which is the reason why we included the scatter plot as a figure.

      It may also be important to mention that the authors of the PRISM paper also noted high variation in their estimates (from Corsello et al https://doi.org/10.1038/s43018-019-0018-6): "At the level of individual compound dose–responses, we note that the PRISM Repurposing dataset tends to be somewhat noisier, with a higher standard error estimated from vehicle control measurements (Extended Data Fig. 5c and Extended Data Fig. 6a–c)."

      Nevertheless, we agree that the current way we report this p value is distracting and potentially misleading, depending on how the p value is interpreted. Therefore, we have updated the reporting of all p values to say that they are less than a predefined cutoff. The figure now states that p

      Fig 1A and associated methods are not sufficient information to describe the manual gating strategy and any variability found across iterations in these gates. Effort should be taken to quantify where these manual boundaries were set and why.

      We describe the manual gating strategies in much detail in the methods section “Cell Health assay: Image analysis”. However, we agree that a description of measurement variability and experimental approach requires more detail, and we agree that the manuscript would benefit from a visual example of these gates. These improvements required us to rearrange Figure 1.

      With a goal of increasing reproducibility in the cell health assay, we’ve (1) moved example images of the Cell Health assay to Figure 1A; (2) Moved the existing gating strategies drawing to Supplementary Figure 1; (3) Added real data examples of the manual gating strategy as a new Supplementary Figure 2. We show all updates below:

      Updated Figure 1:

      Figure 1. Data processing and modeling approach. (a) Example images and workflow from the Cell Health assays. We apply a series of manual gating strategies (see Methods) to isolate cell subpopulations and to generate cell health readouts for each perturbation. (top) In the “Cell Cycle” panel, in each nucleus we measure Hoechst, EdU, PH3, and gH2AX. (bottom) In the “Cell Viability” panel, we capture digital phase contrast images, measure Caspase 3/7, DRAQ7, and CellROX. (b) Example Cell Painting image across five channels, plus a merged representation across channels. The image is cropped from a larger image and shows ES2 cells. Below are the steps applied in an image-based profiling pipeline, after features have been extracted from each cell’s image. (c) Modeling approach where we fit 70 different regression models using CellProfiler features derived from Cell Painting images to predict Cell Health readouts.

      Updated Supplementary Figure S1:

      Supplementary Figure S1: Illustration of the gating strategy in the Cell Health assays. We extract 70 different readouts from the Cell Health imaging assay. The assay consists of two customized reagent panels, which use measurements from seven different targeted reagents and one channel based on digital phase contrast (DPC) imaging; shown are five toy examples to demonstrate that individual cells are isolated into subpopulations by various gating strategies to define the Cell Health readouts.

      Updated Supplementary Figure S2 (Example gating strategies):

      Supplementary Figure S2: Real data of manual gating in the Cell Health assays.

      For each cell line, we apply a series of manual gating strategies defined by various stain measurements in single cells to define cell subpopulations. (a) In the cell cycle panel, we first select cells that are useful for cell cycle analysis based on nucleus roundness and Hoechst intensity measurements. We also identify polyploid and “large not round” (polynuclear) cells. (b) We then subdivide the cells used for cell cycle to G1, G2, and S cells based on total Hoechst intensity (DNA content) and EdU incorporation signal intensity. (c) We use Hoechst and PH3 nucleus intensity to define mitotic cells. The points are colored by EdU intensity in the nucleus in both (b) and (c). (d) Example gating in the viability panel. We use DRAQ7 and CellEvent (Caspase 3/7) to distinguish alive and dead cells, and categorize early or late apoptosis. See Methods for more details about how the Cell Health measurements are made.

      We’ve also added the following to the methods section:

      Additionally, we set these gates for each cell subpopulation using a set of random wells from each cell line and experiment independently. We observed that the intensity measurements used to form the gates were consistent across wells and plates, and generally formed distinct cell subpopulation clusters. After using the random wells to set the gates, we used the Harmony microscope software to apply the gates to the remaining wells and plates.

      In general however, the need to clearly define this process further emphasizes a strength in our approach: There is great potential for inconsistencies when different humans draw gates. We aim to reduce these inconsistencies by predicting these readouts from Cell Painting images directly.

      The authors conclude that their results motivate further data acquisition and model training, and that this will improve model performance. This is only true if their lack of predictive power comes from the data volume itself, and not in larger problems of data quality, variability and the core assumptions of their method. The authors note the better predictability in ES2 cells, likely due to higher CRISPR efficiency and therefore stronger phenotypes. It is possible, as I believe the authors suggest, that the ES2 cells provide information that improves the predictive power of cells with poor infection efficiency. It is instead possible that only the ES2 cells with strong phenotypes yield predictive power, pulling the average of the dataset up. Authors could train the cell line specific datasets independently and compare relative changes in predictive performance. Otherwise, is it possible that subtle or highly complex phenotypes simply cannot be detected by this method and more data will be unlikely to improve predictability in modest perturbations.

      We thank the reviewers for raising this possibility. To explore this, we performed a cell-line holdout analysis in which we retrained (and individually reoptimized) all 70 cell health models on every combination of two cell lines and predicted readouts from the held out third cell line.

      Despite there being fewer samples in the training set in the cell line holdout test compared to the original test set (66% vs. 85%) and the fact that each model had never seen the held out cell line before, many cell health phenotypes could still be predicted. We add the following results in a new Supplementary Figure:

      Supplementary Figure S11: Results from a cell line holdout analysis. We trained and evaluated all 70 cell health models in three different scenarios using each combination of two cell lines to train, and the remaining cell line to evaluate. For example, we trained all 70 models using data from A549 and ES2 and evaluated performance in HCC44. We bin all cell health models into 14 different categories (see Supplementary Table S3 and https://github.com/broadinstitute/cell-health/6.ml-robustness for details about the categories and scores). We also provide the original test set (15% of the data, distributed evenly across all cell types) performance in the last row, as well as results after training with randomly permuted data. This cross-cell-type analysis yields worse performance overall. Nevertheless, despite the models never encountering certain cell lines, and having fewer training data points, many models still have predictive power across cell line contexts. Note that we truncated the y axis to remove extreme outliers far below -1. The raw scores are available on https://github.com/broadinstitute/cell-health.

      We’ve also performed a sample size titration analysis, which suggests that more data would indeed improve model performance. More data would also enable a deep learning approach, which is also likely to improve performance.

      Supplementary Figure S13: Dropping samples from training reduces test set model performance in high, mid, and low performing models. We determined model performance stratification by taking the top third, mid third, and bottom third of test set performance when using all data. We performed the sample titration analysis with 10 different random seeds and visualized the median test set performance for each model.

      We also update the results section to introduce and discuss this result:

      Lastly, we performed a sample size titration analysis in which we randomly removed a decreasing amount of samples from training. For the high and mid performing models, we observed a consistent performance drop, suggesting that increasing sample size would result in better overall performance (Supplementary Figure 13).

      And an updated methods describing this analysis now reads:

      Machine learning robustness: Investigating the impact of sample size

      We performed an analysis in which we randomly dropped an increasing amount of samples from the training set before model training. After dropping the predefined number of samples, we retrained all 70 cell health models and assessed performance on the original holdout test set. We performed this procedure ten times with ten unique random seeds to mirror a more realistic scenario of new data collection and to reduce the impact of outlier samples on model training.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      Although the authors argue that the Cell Painting assay is capturing complex health phenotypes using a variety of morphological features, there is a clear overweighting of a particular few (in fact two...). It would be interesting to systematically retrain with exclusion of particular features to determine if equalizing the weight across features changes performance. These are also notably the feature groups with the fewest features-- how many individual features within these feature groups are pulling all the weight?

      We agree that an additional computational analysis including a systematic feature removal would be interesting and valuable. We’ve included this analysis as part of a new results subsection in which we assess where classification improvements are likely to come from by testing robustness of the ML models.

      Specifically, we’ve systematically removed individual features that belong to specific feature groups, channels, and compartments to determine how much their absence negatively affects model performance. The added supplementary figure is pasted below.

      Supplementary Figure S12: Systematically removing classes of features has little impact on most models’ performance. We retrained all 70 cell health models after dropping features associated with specific (a) feature groups, (b) channels, and (c) compartments. Each dot is one model (predictor), and the performance difference between the original model and the retrained model after dropping features is shown on the x axis. Any positive change indicates that the models got worse after dropping the feature group. (d) Individual model differences in performance after dropping features. Each dot is one class of features removed (as in a-c).

      We conclude that the majority of cell health models are robust to missing feature groups. Some models actually improve with a reduction in the feature space. Combined with the feature heatmap presented in Figure 3, these results tell us that a lot of the morphology signal is redundant across Cell Painting features.

      We add the following text to the results:

      We also performed a systematic feature removal analysis, in which we retrained cell health models after dropping features that are measured from specific groups, compartments, and channels. We observed that most models were robust to dropping entire feature classes during training (Supplementary Figure 12). This result demonstrates that many Cell Painting features are highly correlated, which might permit prediction “rescue” even if the directly implicated morphology features are not measured. Because of this, we urge caution when generating hypotheses regarding causal relationships between readouts and individual Cell Painting features.

      And the following to the methods:

      Machine learning robustness: Systematically removing feature classes

      We performed an analysis in which we systematically dropped features measured in specific compartments (Nuclei, Cells, and Cytoplasm), specific channels (RNA, Mito, ER, DNA, AGP), and specific feature groups (Texture, Radial Distribution, Neighbors, Intensity, Granularity, Correlation, Area Shape) and retrained all models. We omitted one feature class and then independently optimized all 70 cell health models as described in the Machine learning framework results section above. We repeated this procedure once per feature class.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      In summary there is a very interesting concept here, but for several possible, currently undefined reasons, the authors are reporting a very weak measurement. The authors allude to these limitations, but it would be great if the authors could address these issues and provide a stronger dataset.

      We thank the reviewers for their encouraging remarks. We believe that with the added robustness analyses and with increased clarity about the motivation behind the paper, we’ve successfully demonstrated a proof of concept for the approach to predict cell health phenotypes from Cell Painting images. We believe that we’ve provided sufficient evidence to a reader to demonstrate the benefits of the prediction approach. As well, given the additional details describing the Cell Health assay reproducibility, that the paper also successfully introduces a new assay paradigm.

      Furthermore, while many of the cell health measurements are definitely weak (and unreliable), it is not fair to generalize all predictions as weak (especially given the sample size limitations).

      It is also worth noting that, under the current circumstances, separating the one dataset we have into a train/test set and validating the model in an external set is the best we could do; we do not have additional budget to run further wet lab experiments (which would also face a COVID backlog in our chemical screening group). We agree that additional datasets would benefit the field; our current data is now public, all of our future data will be public (to the extent possible), and we hope that others building on our work will make their data public too to address these questions.

      Lastly, in response to the “currently undefined reasons” comment, as well as other comments throughout, we’ve now included a new subsection in the Results/Discussion subsection to more directly answer some of the reasons why many models may have underperformed. Specifically, and as mentioned previously in this response, we perform three distinct robustness analyses: 1) Cell line holdout; 2) feature holdout; 3) sample size titration.

      Authors should include representative images of their Cell Health assay in the main figures. A full figure of all labels and examples of manual gating should be included (S1 is too limited)

      Scale bars need to be included in all images, some are missing in S1

      We thank the reviewers for this suggestion. We have since substantially updated figure 1 and supplementary figure S1. We have also added a new supplementary figure S2 as an example of the manual gating strategies, and we have updated all scale bars appropriately. We’ve attached the specific figure updates in an earlier response.

      "20x water objective in confocal mode" is not a sufficient level of detail on image acquisition parameters especially considering the lack of representative images. At the very least, NA and if appropriate pinhole size should be reported. Similarly, "9 FOV per well" is not sufficient. Pixel size and FOV area/dimensions are necessary.

      We have added these necessary details in their representative methods sections:

      We acquired all cell images using an Opera Phenix High Content Imaging Instrument (PerkinElmer) with a 20X water objective (a numerical aperture (NA) of 1.0), in confocal mode (a pinhole size of 50µm). The effective pixel size was 0.65µm/pixel. We acquired images in four channels using default excitation / emission combinations: for the blue channel (Hoechst) 405/435-480; for the green channel (Alexa 488 and CellEvent) 488/500-550; for the orange channel (Alexa 568 and CellRox Orange) 561/570-630 and for the far-red channel (Alexa 647 and DRAQ7) 640/650-760. We applied the Cell Health reagents for cell viability and for cell cycle in two separate plates.

      The legends for the different parts of Fig S10 are transposed which makes the figure quite confusing.The authors should amend or clarify the language of "guide perturbation" and "guide profile".

      Wow! We thank the reviewers for pointing out this oversight, and for their careful attention to detail. This figure is now completely different after the restructuring of the Drug Repurposing Hub results/discussion section. The legends for all figures are now correct.

      EdU is defined after it is abbreviated in methods

      We thank the reviewers for noting this. We’ve now fixed where these acronyms are abbreviated in the methods section and removed their definition in later sections where redundant:

      The authors should address the following image processing reproducibility concerns:

      Segmentation and feature extraction parameters are not included in the Supplementary Information. Either attach the CellProfiler pipeline or add a table with parameters and settings used for each module.

      CellProfiler and Harmony versions are missing.

      We thank the reviewers for pointing out these very important omissions. We have since rectified in the methods section:

      We built a CellProfiler image analysis and illumination correction pipeline (version 2.2.0) to extract these image-based features (McQuin et al., 2018). We include the CellProfiler pipelines in our github repository.

      We developed and ran two distinct image analysis pipelines in Harmony software (version 4.1; PerkinElmer) for each of the Cell Health plates.

      We also add the CellProfiler pipelines to our GitHub repository. A pull request introducing this change can be viewed here: https://github.com/broadinstitute/cell-health/pull/149

      Subpopulation definition (page 14) should be defined in a way that the algorithms (pipelines) could be reproduced, e.g.: "unusually high intensity of Hoechst max" requires a stricter definition.

      These definitions are subjective by nature. Gating decisions will be different depending on the scientist performing the image analysis. We feel that the sentence: “We excluded outlier nuclei with unusually high intensity of Hoechst max” conveys this subjectivity well. One of the strengths of the proposed approach to predict cell health phenotypes directly from the Cell Painting images is the removal of gating subjectivity.

      Why is the nucleus roundness calculated in PE Harmony and not in the CellProfiler pipeline itself?

      We used the nucleus roundness measurements as calculated in PE Harmony to define the “cells selected for cell cycle” subpopulation in the first panel of the Cell Health assay. I.e. this measurement was integral to the Cell Health assay itself. We believe that the addition of example gates (in supplementary figure 2) clears up this confusion.

      Reviewers:

      Jason Swedlow

      Melpi Platani

      Erin Diel

      Emil Rozbicki

      Reviewer #2 (Significance (Required)):

      Nature and Significance: This study aims to demonstrate how phenotypic studies using different markers can be combined and linked to deliver wider application and value.

      Relationship to Published Work: This study extends previous work from the same group and attempts a novel extension. The approach is a useful concept and potentially important.

      Audience: The method this paper proposes will be of interests to scientists involved with drug discovery and/or computational biology.

      Reviewer's Expertise: Cell Biology, Imaging, Imaging Informatics, Machine Learning, Computer Vision

      We would like to again express thanks to these reviewers for their careful read, very helpful comments, and encouraging remarks.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The authors present a novel idea on predicting various cell health readouts based on a general set of markers and cell painting assay. The cell health readouts are based on more specific markers performed in different assays measuring cell proliferation and death. The authors suggest that such an approach can reduce the number of experiments needed. The paper is well written, and the figures are clear and comprehensive.

      We thank the reviewer for their helpful comments and encouragement!

      **Major comments:**

      Some of the health readouts are based on general morphology (cell and nucleus) which can be obtained based on cell painting assay. Although some of these models perform well, it is surprising that the model of nuclear roundness did not perform very well especially for HCC4 (R-square reaching zero). This is surprising as these data can be extracted from cell painting assays. Can the author elaborate on why this is the case?

      We agree that the performance of the live cell roundness and nucleus roundness models were unexpectedly low. One would expect that these shape features as measured by PerkinElmer Harmony software, would be easily predicted from CellProfiler readouts from the Cell Painting assay.

      The roundness property was used in Harmony versions,

      2*sqrt(π)*sqrt(Area-BorderArea/2.0)/BorderArea-0.1)

      where Area is object area in pixels and BorderArea is border area in pixels (we thank Joe Trask, Olavi Ollikainen, Hartwig Preckel, and Kaupo Palo at PerkinElmer for this information.)

      No single feature in the CellProfiler readouts measures roundness directly; instead, CellProfiler will measure a combination of shape features that together could synthesize the idea of “roundness”. However, given that the elastic net approach is well-suited for this type of synthesis, it remains unclear why roundess is not predicted well.

      One possible explanation is that shape features are the most different measurements across cell lines and they are measured precisely in both assays. Precise measurements coupled with our training strategy of using all three lines together, might lead to poor performance in predicting certain cell-line intrinsic features.

      We tested this shape result directly (and also generally to the other cell health features) in a “cell line holdout” analysis, which we describe in more detail in response to the next comment. In this analysis, we tested how well models generalized to cell lines not encountered in the training process. In this analysis, we trained on every combination of two cell lines and applied the trained models to the third. We observed that cell line intrinsic features, like shape, are predicted poorly if a model was not trained using the cell line.

      Using elastic net regression models is well-suited to the problem due to the low number of observations. However, there is a significant difference between the performance of different cell lines. Does the performance of the models improve if different models were trained for every cell line? Leave one out approach can be used to accommodate the scarcity of samples.

      We thank the reviewer for this important question. We also appreciate how different certain models behaved with certain cell lines. We would like to stress that the results presented here represent a small pilot study that is not meant to optimize model performance. Instead, the motivation of the manuscript is to demonstrate proof-of-concept of the approach to predict specific cell health phenotypes directly from Cell Painting images. We believe that the current results demonstrate positive proof, which warrants an expansion of data collection and an improvement of the classification methodology.

      Nevertheless, with our current data, we can answer an important question about the feasibility of signal transfer between cell lines. Therefore, we performed an additional “cell line holdout” analysis. We believe that the cell line holdout analysis tells us that signals can be transferred across contexts, but that any leading observations must be followed up with experiments performed directly in the cell line of interest. This signal transfer is diluted compared to the original test set performance, but it is also worth noting that the models presented in Supplementary Figure 11 (pasted below) were trained on only 66% of the data in the holdout cell line analysis and 85% of the data in the original analysis.

      Supplementary Figure S11: Results from a cell line holdout analysis. We trained and evaluated all 70 cell health models in three different scenarios using each combination of two cell lines to train, and the remaining cell line to evaluate. For example, we trained all 70 models using data from A549 and ES2 and evaluated performance in HCC44. We bin all cell health models into 14 different categories (see Supplementary Table S3 and https://github.com/broadinstitute/cell-health/6.ml-robustness for details about the categories and scores). We also provide the original test set (15% of the data, distributed evenly across all cell types) performance in the last row, as well as results after training with randomly permuted data. This cross-cell-type analysis yields worse performance overall. Nevertheless, despite the models never encountering certain cell lines, and having fewer training data points, many models still have predictive power across cell line contexts. Note that we truncated the y axis to remove extreme outliers far below -1. The raw scores are available on https://github.com/broadinstitute/cell-health.

      And we add the following text to the results section:

      We performed a series of analyses to determine certain parameters and options that are likely to improve models in the future. First, we performed a “cell line holdout” analysis, in which we trained models on two of three cell lines and predicted cell health readouts on the held out cell line. We observed that certain models including those based on viability, S phase, early mitotic and death phenotypes could be moderately predicted in cell lines agnostic to training (Supplementary Figure 11). Not surprisingly, shape-based phenotypes could not be predicted in holdout cell lines, which emphasizes the limitations of transferring certain cell-line specific measurements across cell lines.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      The authors chose to validate based on the number of live cells as it is one of the best models. However, this readout can be obtained using simple viability assays. It would be more convincing to validate on a more complex phenotype that can only be attained using imaging such as #gH2AX spots.

      It is worth noting that we do also show generalizability in the Drug Repurposing Hub for two other models: ROS and G1 cell count. We show that proteasome inhibitors significantly induce high ROS and PLK inhibitors restrict entry to G1. We have also added enrichment tests demonstrating high statistical significance for these compound mechanisms.

      While we recognize that these two examples provide anecdotal evidence, they suggest the ability and power of the approach to assign phenotypes to Cell Painting images.

      Nevertheless, we thank the reviewer for bringing up this critical point and certainly appreciate the benefit of validating a gH2AX model. Therefore, we’ve added a similar analysis in which we demonstrate generalizability of the top performing gH2Ax model: Number of gH2AX spots in G1 cells. We discuss these changes in an updated section:

      We also chose to validate three additional models: ROS, G1 cell count, and Number of gH2AX spots in G1 cells. We observed that the two proteasome inhibitors (bortezomib and MG-132) in the Drug Repurposing Hub set yielded high ROS predictions (OR = 76.7; p -15) (Figure 4C). Proteasome inhibitors are known to induce ROS (Han and Park, 2010; Ling et al., 2003). As well, PLK inhibitors yielded low G1 cell counts (OR = 0.035; p = 3.9 x 10-8) (Figure 4C). The PLK inhibitor HM-214 showed an appropriate dose response (Figure 4D). PLK inhibitors block mitotic progression, thus reducing entry into the G1 cell cycle phase (Lee et al., 2014). Lastly, we observed that aurora kinase and tubulin inhibitors yielded high Number of gH2AX spots in G1 cells predictions (OR = 11.3; p Figure 4E). In particular, we observed a strong dose response for the aurora kinase inhibitor barasertib (AZD1152) (Figure 4F). Aurora kinase and tubulin inhibitors cause prolonged mitotic arrest, which can lead to mitotic slippage, G1 arrest, DNA damage, and senescence (Orth et al. 2011; Cheng and Crasta 2017; Tsuda et al. 2017).

      We also modify the abstract summarizing this result:

      For Cell Painting images from a set of 1,500+ compound perturbations across multiple doses, we validated predictions by orthogonal assay readouts, and by confirming mitotic arrest, ROS, and DNA damage phenotypes via PLK, proteasome, and aurora kinase/tubulin inhibition, respectively.

      And we add this analysis to an updated Figure 4:

      Figure 4: Validating Cell Health models applied to Cell Painting data from The Drug Repurposing Hub. The models were not trained using the Drug Repurposing Hub data. (a) The results of the dose alignment between the PRISM assay and the Drug Repurposing Hub data. This view indicates that there was not a one-to-one matching between perturbation doses. (b) Comparing viability estimates from the PRISM assay to the predicted number of live cells in the Drug Repurposing Hub. The PRISM assay estimates viability by measuring barcoded A549 cells after an incubation period. (c) Drug Repurposing Hub profiles stratified by G1 cell count and ROS predictions. Bortezomib and MG-132 are proteasome inhibitors and are used as positive controls in the Drug Repurposing Hub set; DMSO is a negative control. We also highlight all PLK inhibitors in the dataset. (d) HMN-214 is an example of a PLK inhibitor that shows strong dose response for G1 cell count predictions. (e) Tubulin and aurora kinase inhibitors are predicted to have high Number of gH2AX spots in G1 cells compared to other compounds and controls. (f) Barasertib (AZD1152) is an aurora kinase inhibitor that is predicted to have a strong dose response for Number of gH2AX spots in G1 cells predictions.

      All software updates required to update these figures can be viewed at https://github.com/broadinstitute/cell-health/pull/145

      It is also worth noting that collecting more data for this manuscript is not currently feasible given the amount of projects backlogged from COVID. We feel that given that the motivation of the project is to demonstrate feasibility of the approach, with our current training/testing machine learning framework and the application to Drug Repurposing Hub data is sufficient.

      The text would benefit from expanding the discussion to include the advantages and limitations of their approach.

      We thank the reviewer for bringing up this concern, and we agree that it is worth an increased discussion about advantages and limitations of the approach. Indeed, we’ve added a full new results/discussion subsection directly testing many of the assumptions for why some models performed well and others didn’t. The new section introduces many model limitations:

      We performed a series of analyses to determine certain parameters and options that are likely to improve models in the future. First, we performed a “cell line holdout” analysis, in which we trained models on two of three cell lines and predicted cell health readouts on the held out cell line. We observed that certain models including those based on viability, S phase, early mitotic and death phenotypes could be moderately predicted in cell lines agnostic to training (Supplementary Figure 11). Not surprisingly, shape-based phenotypes could not be predicted in holdout cell lines, which emphasizes the limitations of transferring certain cell-line specific measurements across cell lines. We also performed a systematic feature removal analysis, in which we retrained cell health models after dropping features that are measured from specific groups, compartments, and channels. We observed that many models were robust to dropping entire feature classes during training (Supplementary Figure 12). This result demonstrates that many Cell Painting features are highly correlated, which might permit prediction “rescue” even if the directly implicated morphology features are not measured. Because of this, we urge caution when generating hypotheses regarding causal relationships between phenotypes and individual Cell Painting features. Lastly, we performed a sample size titration analysis in which we randomly removed a decreasing amount of samples from training. For the high and mid performing models we observed a consistent performance drop, suggesting that increasing sample size would result in better overall performance (Supplementary Figure 13).

      **Minor comments**

      Page 8: The authors visualize the predicted G1 cell count and ROS when overlayed on a UMAP based on cell painting data from Drug Repurposing Hub. How these visualisations look like if applied to the original CRISPR training data.

      We address this comment by adding a supplementary figure showing ground truth G1 cell count and ROS readouts.

      We applied uniform manifold approximation (UMAP) to observe the underlying structure of the samples as captured by morphology data (McInnes et al., 2018). We observed that the UMAP space captures gradients in predicted G1 cell count (Supplementary Figure S14A) and in predicted ROS (Supplementary Figure S14B). We also observed similar gradients in the ground truth cell health readouts in the CRISPR Cell Painting profiles used for training cell health models (Supplementary Figure S15). Gradients in our data suggest that cell health phenotypes manifest in a continuum rather than in discrete states.

      Where Supplementary Figure 15 is pasted below:

      Supplementary Figure S15: Applying a Uniform Manifold Approximation (UMAP) to the Cell Painting consensus profile data of CRISPR perturbations. UMAP coordinates visualized by (a) cell line, (b) ground truth G1 cell counts, and (c) ground truth ROS counts. (d) Visualizing the distribution of ground truth ROS compared against G1 cell count. The two outlier ES2 profiles are CRISPR knockdowns of GPX4, which is known to cause high ROS.

      We have also added the option to explore the CRISPR profile Cell Health ground truth in our shiny app https://broad.io/cell-health (screenshot pasted below)

      Modifications to the software introducing these changes can be viewed at https://github.com/broadinstitute/cell-health/pull/141.

      The second part of the last paragraph on page 8 is confusing as it is not related to the first part using the PRISM data.

      We thank the reviewer for noting this. We agree that the clarity of this section could be improved. We have now completely reworked the final section of applying the cell health models to the Drug Repurposing Hub data.

      In particular, we’ve moved the PRISM data section as the first, most simple model to validate, and moved these results to Figure 4. We then describe validation for three other models: ROS, G1 cell count and Number of gH2Ax spots in G1 cells. And we end with the UMAP discussion, which is the original second part of the last paragraph on page 8.

      The PRISM section now reads:

      We first chose a simple, high-performing model to validate. The number of live cells model captures the number of cells that are unstained by DRAQ7. We compared model predictions to orthogonal viability readouts from a third dataset: Publicly available PRISM assay readouts, which count barcoded cells after an incubation period (Yu et al., 2016). Despite measuring perturbations with slightly different doses and being fundamentally different ways to count live cells (Figure 4A), the predictions correlated with the assay readout (Spearman's Rho = 0.35, p -3; Figure 4B).

      Reviewer #3 (Significance (Required)):

      This approach can be of wide interest as it is easy to implement, cost-effective and lead to interpretable models. It would be interesting to see if the results improve when increasing the sample size. Another aspect that can be useful to investigate in the future is whether including a separate marker that indicates infected cells only in the more detailed assays would result in better accuracies.

      We thank the reviewer for their enthusiasm and for this concluding idea. Indeed, we also feel that including a separate marker to indicate infected cells could lead to improved accuracy. We add this thought to the concluding section as a future direction. The full updated conclusion reads as follows:

      We have demonstrated feasibility that information in Cell Painting images can predict many different Cell Health indicators even when trained on a small dataset. The results motivate collecting larger datasets for training, with more perturbations and multiple cell lines. These new datasets would enable the development of more expressive models, based on deep learning, that can be applied to single cells. Including orthogonal imaging markers of CRISPR infection would also enable us to isolate cells with expected morphologies. More data and better models would improve the performance and generalizability of Cell Health models and enable annotation of new and existing large-scale Cell Painting datasets with important mechanisms of cell health and toxicity.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The authors present a novel idea on predicting various cell health readouts based on a general set of markers and cell painting assay. The cell health readouts are based on more specific markers performed in different assays measuring cell proliferation and death. The authors suggest that such an approach can reduce the number of experiments needed. The paper is well written, and the figures are clear and comprehensive.

      Major comments:

      Some of the health readouts are based on general morphology (cell and nucleus) which can be obtained based on cell painting assay. Although some of these models perform well, it is surprising that the model of nuclear roundness did not perform very well especially for HCC4 (R-square reaching zero). This is surprising as these data can be extracted from cell painting assays. Can the author elaborate on why this is the case?

      Using elastic net regression models is well-suited to the problem due to the low number of observations. However, there is a significant difference between the performance of different cell lines. Does the performance of the models improve if different models were trained for every cell line? Leave one out approach can be used to accommodate the scarcity of samples.

      The authors chose to validate based on the number of live cells as it is one of the best models. However, this readout can be obtained using simple viability assays. It would be more convincing to validate on a more complex phenotype that can only be attained using imaging such as #gH2AX spots.

      The text would benefit from expanding the discussion to include the advantages and limitations of their approach.

      Minor comments

      Page 8: The authors visualize the predicted G1 cell count and ROS when overlayed on a UMAP based on cell painting data from Drug Repurposing Hub. How these visualisations look like if applied to the original CRISPR training data.

      The second part of the last paragraph on page 8 is confusing as it is not related to the first part using the PRISM data.

      Significance

      This approach can be of wide interest as it is easy to implement, cost-effective and lead to interpretable models. It would be interesting to see if the results improve when increasing the sample size. Another aspect that can be useful to investigate in the future is whether including a separate marker that indicates infected cells only in the more detailed assays would result in better accuracies.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This report from Way et al describes a method of extending a very popular screening technology called Cell Painting developed by the Carpenter Lab. The authors are contending with an important issue and as such this paper potentially will be of great interest to the community. Cell Painting provides quantitative fingerprints of cell phenotypes in response to changes in the molecular or physiological status of cells. However the molecular basis or even the candidate pathways for those changes is not always clear. Here, the authors take specific markers of cell physiology, e.g., DNA damage, ROS production, cell cycle progression etc. and relate them to Cell Painting features. The authors are trying to address the issue that running many probes of cell physiology is expensive and time consuming and that identifying proxies for these assays using much simpler Cell Painting technologies would be a useful and potentially powerful approach. The overall goal is to develop some type of regression model that can link the state of cells (the "health") to Cell Painting fingerprints.

      The authors use three separate cell lines and CRISPR knockouts delivered through lentivirus that target 59 genes to establish a range of cell physiologies that they directly measure (the "Cell Health") and then relate to similar assays performed by Cell Painting. Ultimately they aim to use Cell Painting models to predict Cell Health.

      Major Issues:

      It appears that the phenotypes that are detected at a high enough level of significance (see Fig. 2), e.g DNA damage (gH2Ax), apoptosis (Caspase 3/7), dead cells, ROS (CellROX), etc. are probably most easily detected by simply monitoring DAPI signal in these screens. To detect many of the phenotypes, the authors have presented a fairly complex method of doing much simpler assays. The authors correctly highlight in Fig. 3 that the phenotypes they are detecting go beyond pure signals from DAPI. They report power in their models from Radial Distribution across many different components of the Cell Painting feature set. However these appear to give outputs that won't be that useful. It is hard to tell whether this is simply because they don't have enough images or whether their signal is confounded by using cell lines where the lentivirus CRISPR knockouts are working less efficiently.

      It seems misleading (or perhaps the explanation lacks clarity) to describe in the same paragraph the need to validate the model by applying it to new datasets, namely the Drug Repurposing Hub project, then describe gradients in cell health features across UMAP coordinates. Is it surprising that cell health phenotypes and gradients therein are present in a dataset describing cell health perturbations? The actual test of the model's performance is in the paragraph below, but the data associated with the Spearman correlation is hidden in Fig. S10b. The data is not convincing by eye, and the artifactually low p value suggests that proper statistical corrections were not applied.

      Fig 1A and associated methods are not sufficient information to describe the manual gating strategy and any variability found across iterations in these gates. Effort should be taken to quantify where these manual boundaries were set and why.

      A fundamental issue that the authors mention but do not address is the efficiency of the CRISPR KOs. The authors should measure the efficiency of representative guides and present these data to help support the interpretation of their models.

      The authors conclude that their results motivate further data acquisition and model training, and that this will improve model performance. This is only true if their lack of predictive power comes from the data volume itself, and not in larger problems of data quality, variability and the core assumptions of their method. The authors note the better predictability in ES2 cells, likely due to higher CRISPR efficiency and therefore stronger phenotypes. It is possible, as I believe the authors suggest, that the ES2 cells provide information that improves the predictive power of cells with poor infection efficiency. It is instead possible that only the ES2 cells with strong phenotypes yield predictive power, pulling the average of the dataset up. Authors could train the cell line specific datasets independently and compare relative changes in predictive performance. Otherwise, is it possible that subtle or highly complex phenotypes simply cannot be detected by this method and more data will be unlikely to improve predictability in modest perturbations.

      Although the authors argue that the Cell Painting assay is capturing complex health phenotypes using a variety of morphological features, there is a clear overweighting of a particular few (in fact two...). It would be interesting to systematically retrain with exclusion of particular features to determine if equalizing the weight across features changes performance. These are also notably the feature groups with the fewest features-- how many individual features within these feature groups are pulling all the weight?

      In summary there is a very interesting concept here, but for several possible, currently undefined reasons, the authors are reporting a very weak measurement. The authors allude to these limitations, but it would be great if the authors could address these issues and provide a stronger dataset.

      Minor issues: Authors should include representative images of their Cell Health assay in the main figures. A full figure of all labels and examples of manual gating should be included (S1 is too limited) Scale bars need to be included in all images, some are missing in S1

      "20x water objective in confocal mode" is not a sufficient level of detail on image acquisition parameters especially considering the lack of representative images. At the very least, NA and if appropriate pinhole size should be reported. Similarly, "9 FOV per well" is not sufficient. Pixel size and FOV area/dimensions are necessary.

      The legends for the different parts of Fig S10 are transposed which makes the figure quite confusing.The authors should amend or clarify the language of "guide perturbation" and "guide profile".

      EdU is defined after it is abbreviated in methods

      The authors should address the following image processing reproducibility concerns:

      Segmentation and feature extraction parameters are not included in the Supplementary Information. Either attach the CellProfiler pipeline or add a table with parameters and settings used for each module.

      CellProfiler and Harmony versions are missing.

      Subpopulation definition (page 14) should be defined in a way that the algorithms (pipelines) could be reproduced, e.g.: "unusually high intensity of Hoechst max" requires a stricter definition.

      Why is the nucleus roundness calculated in PE Harmony and not in the CellProfiler pipeline itself?

      Reviewers: Jason Swedlow Melpi Platani Erin Diel Emil Rozbicki

      Significance

      Nature and Significance: This study aims to demonstrate how phenotypic studies using different markers can be combined and linked to deliver wider application and value.

      Relationship to Published Work: This study extends previous work from the same group and attempts a novel extension. The approach is a useful concept and potentially important.

      Audience: The method this paper proposes will be of interests to scientists involved with drug discovery and/or computational biology.

      Reviewer's Expertise: Cell Biology, Imaging, Imaging Informatics, Machine Learning, Computer Vision

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The submitted manuscript entitled 'Predicting cell health phenotypes using image-based morphology profiling' (RC-2020-00394) by Way et al. presents a set of seven dyes/staining (as two separate panels) to microscopically screen cell viability. For automatic classification a training/test set of 119 CRISPR (approximately 2 sgRNAs per gene) perturbations on 3 cancer cell lines were generated (lung A549, ovarian ES2, lung HCC44). After segmentation of cell nuclei a set of morphological cell measurements were extracted from each perturbation (total 952 features). The nature of these feature spanning cell cycle and viability phenotypes, enabled the authors to define 70 different phenotype classes, which are used to model a classifier by elastic linear regression. Specific definitions (cell cycle and ROS) were partly predicted/validated in an independent existing image data set (Drug Repurposing Hub project). The data is available as web-based application/visualization and the supplementary method is well described.

      Major concerns:

      (1)The only fundamental argument of this manuscript not to apply state-of-the-art deep learning (DL) machine-learning (mentioned in McCain et al. 2018), which does not require segmentation, feature extraction, abstraction, manual gating is the 'interpretability' of the predictions. However, performance, precision, scalability (by modern GPUs) with DL should clearly outperform 'manual' regression models. All recent machine vision benchmarks in microscopy confirm this, but also clearly shows 'real world' translational applications, e.g.

      https://www.nature.com/articles/s43018-020-0085-8,

      https://www.biorxiv.org/content/10.1101/2020.07.02.183814v1.full.pdf,

      In other words, the presented methodology is not compared to DL, and is not convincing in terms of interpretability benefits.

      (2)One aforementioned point of the methodology is cryptically/not described: Why it should be less expensive compared with other (which?) approaches (see introduction)?

      (3)Generalizability and/or training data size is essential for any model-based classification, but not evaluated or validated in the current manuscript. The independent validation on a A549 cell line only data might be not sufficient/convincing.

      Minor concerns:

      (1)Highest test performance comprises that precision is mainly driven by cell cycle/count and live status and could be probably derived from DRAQ7 (Fig. 2) and DNA granularity (Fig. 3, bottom right) and would argue for rigid feature selection across channels and features.

      (2)Any H2AX and 'polynuclear' would probably fail in any cell line with this size of training data.

      (3)To what refers the 'weights' of the model in Fig. 1c?

      Significance

      This manuscript is not advanced in the context of latest improvements/developments of cell-based microscopic classification. Rationale in the introduction and the conclusion are not linked (interpretability, generalizability, costs). It seems to be unfinished or unformatted to this end?

      The author/co-authors have been instrumental/pioneered with their past work on cell-based image processing (CellProfiler software), but the presented methodology is simply outdated. Therefore, a revision towards a comparison and benchmarking with DL will also not help.

      Ref (DL with MIL): https://academic.oup.com/bioinformatics/article/32/12/i52/2288769

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers:

      We are grateful to the referees for investing valuable time in reviewing our work, and for recognising the importance and utility. We thank them for their insightful and constructive comments that have helped us significantly improve the manuscript.

      Below, we provide a point-by-point response to all specific questions raised.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In order to improve SARS-CoV-2 diagnostics, Reijns et al. developed a multiplexed RT-qPCR protocol that allows simultaneous detection of two viral genes, one housekeeping gene as well as an external gene as an extraction control. Compared to running parallel assays to detect genes individually, the turnaround time is much shorter and reagents are saved. Furthermore, the presented data suggest that the assay is more sensitive than commercial kits. The authors also propose the detection of the human housekeeping gene as a measure of sample quality control. In principal, this work has potential but the manuscript itself needs a better structure. **Major concerns:** The authors have used the Takara RT-qPCR kit for their study. Did the authors try other commercial kits?

      We have not assessed other commercial kits as the Takara reagent performed well, and has been easy to source. We expect that other one-step kits could be used if the need arose.

      When we initiated this work in March 2020, we selected the Takara One Step PrimeScript™ III RT-PCR Kit based on 1) the practical advantages of a one-step reaction mix, 2) published evidence of its successful use in SARS-CoV-2 detection (see below), 3) availability in sufficient quantities for testing at scale, and 4) affordability.

      (Published evidence: One of the first descriptions of an assay to detect SARS-CoV-2 [1] employed the Takara One Step PrimeScript™ III RT-PCR Kit, and this kit was later shown by others to perform as well as or better than Qiagen Quantifast Multiplex RT-PCR +R mastermix, ThermoFisher TaqPath 1-Step RT-qPCR MasterMix and ThermoFisher Taqman Fast Virus 1-step mastermix, when used to detect SARS-CoV-2 RNA from nose and throat swabs with N1, N2 or N gene assays [2].)

      Can the authors elaborate on the supply chain of the Takara kit?

      We have not had problems securing the Takara kit in sufficient quantities and in a timely fashion, and did so through the company’s Scotland and NE England representative. The managing director of Takara Bio Europe provided the following statement, as a clarification of the supply chain:

      “Takara Bio Inc. has worked on significantly increasing the production of one-step RT-qPCR reagents to cover worldwide needs for SARS-CoV-2 detection. The production of this kit is based in China under ISO13485 certification and the European stock is based in, and distributed, from Paris. Throughout this pandemic, Takara Bio Europe has supplied millions of reactions around Europe to COVID-19 testing labs, without encountering any shortages or significant shipping delays.”

      Could it cover population testing in case of shortages of other commercial kits?

      Yes, it could. The Takara kit is available in 4,000 and 20,000 reaction pack sizes and therefore could well be a useful option in case of shortages of other commercial kits. Indeed, one motivation for developing the multiplex assay was to ensure diagnostic testing resilience in the face of reagent shortages.

      For better comparison, is it possible to give information on which primers the commercial kits are based on?

      We contacted both ThermoFisher and Abbott to ask for more information on the primers and probes included in the TaqPath COVID‐19 Combo Kit (detects N, ORF1ab and S gene) and Abbott RealTime SARS‐CoV‐2 assay (detects RdRp and N gene). Unfortunately, we were informed that this information is proprietary. For clarity, we have included the following in the Materials and Methods section:

      “Primers and probes included in the TaqPath COVID-19 Combo Kit (Thermo Fisher Scientific, Cat. No. A47814) detect SARS-CoV-2 ORF1ab, N and S gene; those in the Abbott RealTime SARS-CoV-2 assay (Cat. No. 09N77-090) detect RdRp and N gene. Further details are not available, as this information is proprietary.”

      Also, explain better the primers used in this study. For example, the N1 and N2 primers are directed against different regions of the SARS-CoV-2 N gene.

      We thank the reviewer for encouraging us to better explain the primers we use for our own assays, and now provide more detailed information in a new Fig 1.

      The result section needs a better structure as the first two pages do not refer to any of the main figures. For example, in which figure or table can the reader find the data that are discussed in lines 83 to 87?

      We have now substantially re-structured the entire Results section, and include the data that was discussed in lines 83 to 87 of the original manuscript, in Fig 1D of the revised manuscript.

      Table S1, instead of current Table 1, could be moved to main figures as it contains the important finding that the multiplexed assay may be more sensitive than the commercial one.

      As suggested, we have moved Table S1 to the main display items (now Table 1), and moved the original Table 1 to the supplementary items (now Table S3).

      The authors identified some samples that scored negative in commercial assays but positive in their new assay. This is important, however, the possibility of detecting false positives should be strengthened in a "Discussion" section.

      We thank the reviewer for highlighting this, and now discuss the issue of detecting false positives in more detail in the Discussion section of the revised manuscript:

      “RT-qPCR tests are molecular tests with high intrinsic accuracy, however false positive and false negative results can occur. The use of multiplex assays that detect multiple SARS-CoV-2 targets, such as those reported here, reduces the chance of both. Off-target reactivity is one possible cause of false positives, and although some have reported high false positive rates for the E gene assay [20, 22], this does not match our experience. In two patients, our N1E-RP and N2E-RP assays detected virus, albeit weakly, whereas commercial assays did not. As multiple SARS-CoV-2 targets were positive, these are likely true positive results and not due to off-target reactivity. False positives can also occur due to lab issues such as sample mislabelling, data entry errors, reagent contamination with target nucleic acids or contamination of primary specimens. However high standards of quality control at all stages of testing, and effective mitigation strategies should quickly identify problems. Additionally, sample re-test with an independent assay and/or patient re-sampling should also be effective measures to counter false positives, particularly in low pre-test probability situations such as mass screening.”

      Figures 1 to 3 have different panels which seem to be redundant. For example, Fig 1 A and B, Fig 2 B and C, Fig 3 C and D.

      These panels did contain the same data, plotted to convey slightly different information. However, we agree that this introduces a level of redundancy. For enhanced clarity, in the revised manuscript, we have removed most of these panels altogether, or moved them to supplementary figures.

      Figure 1: Give a rational why comparing before and after extraction. This heavily depends on the extraction method and not on the detection itself. In addition, IVT RNA does not reflect the complexity of a clinical specimen. This is rather confusing and deviates from the important findings.

      As part of the validation procedure it was important for us to show that the entire workflow, including the extraction procedure, was robust for use in clinical diagnostics. In this context, comparing pre- and post-extraction RT-qPCR results for both IVT RNA and viral samples provided us with an opportunity to test extraction efficiency. However, we agree that for the purpose of this manuscript, the inclusion of these data in (the former) Fig 1 detracted from the main message. In the revised manuscript we have therefore moved the data comparing Cq values before and after extraction to a new Fig S1, and briefly state the rationale behind this in the main text and figure legend.

      It was not our intention to imply that IVT RNA in any way reflects the complexity of a clinical specimen. We include these data as part of the step-by-step validation of our assays. Firstly, we show high sensitivity using IVT RNA; secondly, we show that a similar sensitivity is achieved on viral positive controls; and thirdly, we show that our assays perform equally well to widely used commercial assays on clinical samples.

      Figure 3: Were any of the negative samples/patients tested with an undetectable housekeeping gene, re-test positively?

      None of our patient samples had undetectable levels of RPP30. We note that all NTS samples were collected by healthcare professionals and in this context such findings will likely be rare. However this may not be the case when dealing with samples obtained by self-swabbing as the reviewer highlights in a comment below.

      Did adding this housekeeping gene as a control actually improve the detection of any patient samples? If the authors want to convince the readership of this quality control, experimental evidence should be provided.

      Fig 3C and D seem to contain this information somewhat, as here, the values were normalized and the CT values for the E and N gene decreased. Nevertheless there is no real explanation of this figure provided in the Result section at all. While this figure has potential, the authors have to keep in mind that the number of cells in a swab can be affected by many biological factors, including age, sample timing, inflammation of the respiratory tract, etc. In addition, viral genomes can exist intra- as well as extracellular, in the form of free virus. So even in the absence of human cells/detectable housekeeping genomes, viral RNA can be or should be present in a sample in case of infection. This explains (probably) why a correlation between detectable housekeeping gene and viral RNA is absent (Fig 3A and B?). This entire Fig 3 just needs a better explanation. The provided text does not describe any results and should go into a "Discussion" section.

      We thank the reviewer for highlighting the need to explain Fig 3 more clearly and that a key question is whether there is a correlation between the levels of the housekeeping control and viral RNA. Prompted by this question, we reanalysed our data and now show that there is a strong and statistically significant positive correlation between Cq values for RPP30 and SARS-CoV-2 targets (see below, and new Fig 4C). This shows that there is a lower probability of detecting SARS-CoV-2 RNA in samples that contain fewer human cells. This likely implies that for samples with high RPP30 Cq values, a proportion of virus positive samples will be missed, contributing to the high false negative rates that have been reported [3-5].

      Providing additional experimentation would require systematic re-contacting and re-testing of cases, and this is beyond our current research framework. While outside the scope of the current study, we hope that our manuscript will encourage others to perform the necessary large-scale experiments. Nonetheless, with this correlation alone, we believe that RPP30 provides useful information of benefit to clinical diagnostics (also see our response to Reviewer 2), and in the revised manuscript we outline how it might be best utilised (Discussion, Table S6).

      To provide a better explanation of Figure 3 (now Fig 4), we have included the following in the Results section:

      “A statistically significant linear correlation between Cq values for each of the viral probes (E, N1, and N2) and the Cq values for the RPP30 sample quality probe (p 40; Fig 4D and Fig S4A). Theoretically, using this approach, even a strong positive sample (SARS-CoV-2 Cq value of 28.2) of good quality (RPP30 Cq value of 20.3) may have given a false negative test result (SARS-CoV-2 Cq value of 40) if it had contained the same low amount of human material as the reference sample (RPP30 Cq value of 32.1; viral Cq: 32.1-20.3+28.2=40). Conversely, normalising samples to an optimal quality sample (RPP30 Cq 20.1/20.3) gives an indication of what viral Cq values may have been if all samples had contained a similar (more optimal) amount of material (Fig 4E, Fig S4B). This highlights the possibility that a proportion of apparent SARS-CoV-2 negative samples are in fact false negatives as a result of insufficient material in the swab fluid.”

      Self-swabbing is surely a potential source of variability and false-negatives, but many publications have shown the suitability of saliva testing. This should also be discussed and would probably negate the need for such a quality control.

      We agree with the reviewer that self-swabbing will be more prone to variability. Therefore, the RPP30 control will have particular value here, lowering the associated risk of false negatives. While NTS sampling remains a major modality for testing for the foreseeable future, saliva is certainly a potential alternative strategy, one that may benefit from lower sample variability.

      We now include the reviewer’s point on this in the Discussion:

      “Testing saliva, as an alternative to NTS sampling, could also be beneficial as a modality that may have less-sample to sample variability [7]”

      Which assay works better, the N1E-RP or the N2E-RP assay? A final conclusion is missing here.

      Although we could not detect substantial differences between these two assays during our validation process, others have reported a marginally higher sensitivity of the N1 over the N2 assay [6]. We would therefore recommend the use of the N1E-RP assay for first line testing, with the N2E-RP assay available as a second line test of equivalent sensitivity in case of inconclusive initial test results. We comment on this in the revised manuscript:

      “Although we did not detect substantial differences between our two assays, others have reported higher sensitivity of the N1 over the N2 assay [19]. We therefore recommend the use of the N1E-RP assay for primary testing, and the N2E-RP assay could be employed if initial results are inconclusive.”

      Reviewer #1 (Significance (Required)): Naturally, in this pandemic, this topic is important as sensitive and affordable methods to detect SARS-CoV-2 infections are in need. This Reviewer agrees that multiplexing could be an elegant approach to fill this need.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In this manuscript, Reijns and colleagues describe an approach to detect the causative agent of COVID-19, the beta coronavirus SARS-CoV-2, using an inexpensive in-house multiplex RT-qPCR. Concomitantly, viral E, N and RdRP(probe P2) as well as human RPP30 and a herpesvirus nucleic acid are also detected in order to monitor both the sample quality and the sample preparation. Reijns et al. performed testing on a huge amount of samples and used the data to describe the strength and limitations of the assay. The data is sound and give a very good impression of the 4-plex PCR capabilities. I read manuscript fluently and consider as linguistically very good. However, I still have a few comments and remarks that would strengthen the manuscript:

      **Major issues:** In the first section of the results section, many primer / probe conditions are given that make the reading flow difficult. Instead of using (data not shown) it would be helpful to use a table or a graphic to illustrate the various approaches.

      We thank the reviewer for suggesting the use of graphics to explain our different approaches. To aid the reader, we now include a diagram in the new Fig 1 that shows the positions of primers and probes used in our work (A), and illustrate the various 4-plex assays (B, C).

      In general, I suggest to replace Ct by Cq, since the IVT standards are a quantification method.

      As suggested by the reviewer we now use Cq instead of Ct throughout our manuscript, following MIQE guidelines [7].

      There has already been a change away from the initial E and RdRP gene based assay because of the published sensitivity issues and the use of degenerate bases as well as the detection of unspecific nucleic acids for E gene). In particular, it has been shown that the Sarbeco-E-yields false positive results (Toptan et al. 2020 (https://doi.org/10.3390/ijms21124396), Konrad et al. 2020 (https://doi.org/10.2807/1560-7917.ES.2020.25.9.2000173)), so that many laboratories do not consider E-gene-based results for borderline samples anymore. In this manuscript, the authors should comment on why they still use the results from the E gene / RdRP and describe their experience.

      We thank the reviewer for highlighting issues with the RdRp and E gene primer/probe pairs. In the process of our work we had also become aware that the RdRp-P2 assay suffers from low sensitivity, as has now been widely reported. However, although the E gene assay also detects SARS-CoV, we were not aware of potential problems with high rates of false positives as described by Toptan et al (2020) and Konrad et al (2020). It did come to our attention that early on in the pandemic some oligonucleotide producers reported problems with contamination of primers with SARS-CoV-2 template RNA synthesised in the same facilities, and were careful to avoid these providers. In our work, we did not experience any problems with apparent false positive detection of the E gene: it was never detected in any of our negative controls, and out of 84 patients that tested negative with the commercial TaqPath assay we did not find any that were positive for E gene only when using our N1E-RP and N2E-RP assays. In this context, it is also important to emphasise that a positive diagnosis is given only when both viral targets are detected (Table S6), which is one of the strengths of our multiplex assays.

      As suggested by Reviewer 1, we now discuss the issue of false positives in more detail in the revised manuscript. We also comment on high false positive rates observed by others for the E gene assay, citing the two studies, but state that this does not match our own experience:

      “Off-target reactivity is one possible cause of false positives, and although some have reported high false positive rates for the E gene assay [20, 23], this does not match our experience.”

      In this manuscript, it should be indicated that the SARS-CoV-2 specific Probe P2 (according to Corman et al. 2020) was used. The reason for lower sensitivity due to nucleotide ambiguity and mismatch has to be explained in more detail. In addition to Corman et al. 2020 (see reference 2), Toptan et al 2020 (https://doi.org/10.3390/ijms21124396) might serve as helpful literature.

      In tables describing primers and probes, and the new Fig 1, we indicate that we used RdRp probe P2. In addition, we now also specifically state in the legend of Fig 1 that this probe only detects SARS-CoV-2 and that the primers used in the RdRp-P2 assay (as originally designed by Corman et al) contain nucleotide ambiguities and a mismatch. Finally, in the main text we explain:

      “Overall, we find RdRp detection to be at least 20-fold less sensitive than for E gene, N1 and N2 under our assay conditions; consistent with reports by others [19]. This may be due to a mismatch in the reverse primer employed in the RdRp (P2) assay, as originally designed [14].”

      With regard to the marginally positive samples that were not consistent in all assays, were the PCR products analyzed using high-resolution PAA genes and, if possible, sequenced? The sequencing approach (Sanger or NGS) offers the final characterization of the PCR products (especially for pan-genotypic primers such as E-Sarbeco). The samples declared as "inconclusive" could be further characterized in this way.

      Unfortunately, it has not been possible for us to carry out additional analyses for such (now historical) samples. Given the high prevalence of SARS-CoV-2 and the low sequence variability at primer/probe binding sites (new Table 2 and S5), inconclusive or marginally positive samples most likely reflect low viral load and/or low sample quality. Nevertheless, we now highlight the utility of further characterising such samples in the revised manuscript:

      “However, differentiating between samples with low viral loads and false positives is challenging. Analysis of such samples by Sanger sequencing of PCR products, or nanopore sequencing of RNA present could provide useful information. Further clinical evaluation and repeat sampling of the patient involved may also be a beneficial route to a secure clinical diagnosis.”

      The normalization in figure 3 should be also explained in the main text. Especially, why this approach was used for normalization.

      In the Results section we now describe the normalisation as follows:

      “A statistically significant linear correlation between Cq values for each of the viral probes (E, N1, and N2) and the Cq values for the RPP30 sample quality probe (p 40; Fig 4D and Fig S4A). Theoretically, using this approach, even a strong positive sample (SARS-CoV-2 Cq value of 28.2) of good quality (RPP30 Cq value of 20.3) may have given a false negative test result (SARS-CoV-2 Cq value of 40) if it had contained the same low amount of human material as the reference sample (RPP30 Cq value of 32.1; viral Cq: 32.1-20.3+28.2=40). Conversely, normalising samples to an optimal quality sample (RPP30 Cq 20.1/20.3) gives an indication of what viral Cq values may have been if all samples had contained a similar (more optimal) amount of material (Fig 4E, Fig S4B). This highlights the possibility that a proportion of apparent SARS-CoV-2 negative samples are in fact false negatives as a result of insufficient material in the swab fluid.”

      Nonetheless, it looks like the normalized values wills cluster much more strongly than those corresponding to the actual values. The authors should comment on this phenomenon. It appears that the higher cq values (less virus) are subject to a strong correction factor more often than high values. Are there any statistical relevant tendencies towards this phenomenon? For everyday clinical practice, does this mean that low samples Cqs (mostly) only reflect the quality of the sample, but not the viral load?

      We thank the reviewer for highlighting the stronger clustering of Cq values after normalisation, and for encouraging us to explore this further. We now show that there is a statistically significant linear correlation between RPP30 and SARS-CoV-2 Cq values (Fig 4C). This would indeed imply that a substantial proportion of the variability in SARS-CoV-2 Cq values seen in clinical practice is due to sample quality rather than different viral loads. However, outliers from the linear correlation when comparing samples from many different patients are to be expected (as seen in Fig 4C), because viral load is known to vary, with time of sampling relative to onset of symptoms one important contributing factor. In a research context, expressing viral load relative to a human control may be beneficial to differentiate between sample quality and absolute quantities of (intra/extracellular) viral RNA.

      In the revised manuscript we state:

      “Notably, the SARS-CoV-2 Cq values clustered more strongly after normalisation (Fig 4D, E; Fig S4). This reduced variability not only shows that the amount of human material present in NTS samples impacts on assay sensitivity, but also suggests that variability in viral load is not as great as implied by RT-qPCR data without normalisation.”

      Finally, it remains somewhat unclear to what extent the Cq values of the RPP30 should have an influence on the routine diagnostics. The authors discuss that a fixed cutoff value would be a possibility to sort out poor swab samples, but if a cq value is available it would also make sense to generate a kind of quality score that can display the significance of a test. It would be helpful if the authors could comment on this or other possibilities.

      We agree that it would be beneficial for routine diagnostics to derive such a measure. However, at this stage we do not have sufficient data to generate a robust quality score based on the RPP30 Cq values. Nonetheless, we believe RPP30 Cq values have immediate utility for routine diagnostics, and could help improve validity of test results going forward:

      1. Samples with undetectable RPP30 should trigger repeat sample collection, and not be given a false negative test result;
      2. Samples with high viral Cq values and/or for which only one of two viral targets are detected can be better interpreted in the context of the amount of human material as measured by RPP30 Cq;
      3. Ongoing monitoring of swab quality allows rapid identification of potential technical issues with swabbing;
      4. Normalisation of viral Cq values using RPP30 Cq values might be helpful in a research context to derive a more meaningful measure of viral loads, by removing one source of variability;
      5. Collection of such data on an ongoing basis would ultimately allow this to be translated into a quality score that could be used as part of diagnostics algorithms. In the revised manuscript we now discuss this as follows:

      “Absence of RPP30 signal (undetected or Cq >40) clearly indicates that absence of viral detection cannot be interpreted as a negative test result and that a repeat test is required (Table S6). However, utilising RPP30 Cq values when interpreting an apparent SARS-CoV-2 negative sample requires further consideration: what should the RPP30 Cq limit be for which to order a repeat test? One option would be to simply set an arbitrary cut-off, e.g. one could decide to re-test any samples with RPP30 Cq >30, or with Cq values above the 95th centile (Cq ~ 31 for our 108 samples). To determine robust cut-off limits, collection of RPP30 data for a much larger number of patient samples would be desirable. This would allow development of diagnostic algorithms that could incorporate a sample quality score based on the level of RPP30 detected. Nonetheless, RPP30 data, even as it stands, are useful for the interpretation of cases for which only one of the SARS-CoV-2 targets is (weakly) positive, with samples with high RPP30 Cq values interpreted with particular caution. In such cases, repeat testing of the same sample (with an independent assay of equal or better sensitivity) would be advisable, and repeat patient specimen collection and testing might also be considered (see Table S6 for guidance).”

      Over the past few months, more and more virus subtypes have formed through the manifestation of point mutations (and amino acid substitutions). The authors should therefore definitely comment on the current strains as to whether all primers / probes are able to detect the virus variants circulating worldwide without loss of sensitivity.

      We thank the reviewer for this suggestion and now include a table providing information on mismatches in primer and probe binding sites (see Table 2 and Table S5 of the revised manuscript). This shows that only a small proportion of 97,782 strains for which high quality genome sequencing is available have changes in primer/probe binding sites. In addition, the use of two different primer/probe sets in our multiplex assays provides a further safeguard against failure to detect strains with such changes.

      Along this line, which virus strains were used for the cultivation as described in line 131? Is sequence data available? If so, it would provide helpful information to characterize the viral strain.

      We have added strain information, accession codes for genome sequences and information on primer/probe binding for both control strains (hCoV-19/England/02/2020 and BetaCoV/Munich/ChVir984/2020) we used in our work (see Materials and Methods of the revised manuscript).

      Line 206ff: In my opinion, this section belongs more to the discussion part than to material and methods that describe the technical implementation.

      We agree and have now moved this section to the Discussion. Furthermore, we’ve made additional changes to better highlight the potential for further improvements to our assays, and SARS-CoV-2 RT-qPCR assays in general.

      Is there a loss of sensitivity compared to the single PCRs? This data is very important and useful for other users. They should therefore be included explicitly in the manuscript (supplements).

      We set out to develop multiplex PCR assays to allow more efficient and cost-effective testing. In the early stages of this process we performed small pilot experiments with positive control IVT RNA and individual primer/probe pairs that are widely used and well-established to sensitively detect SARS-CoV-2 RNA. With the exception of the RdRp primers/probe, we found all to perform well, with the ability to detect 10 copies of RNA. However, we did not perform a side-by-side comparison of uni- and multiplex PCRs, and to improve the structure and flow of the Results section, as requested by Reviewer 1, we have now removed all mention of the single PCR assays.

      Altogether, the key message of our work is that the N1E-RP and N2E-RP assays are able to detect between 1 and 3 copies of SARS-CoV-2 RNA and show equivalent performance to commercially available multiplex assays.

      **Minor issues:** Line 15 ff.: Source is missing, is this WHO-data?

      The estimated number of infections and fatalities at the time of writing of the original manuscript was based on data from the online interactive dashboard hosted by Johns Hopkins University. At the suggestion of Reviewer 3, we have removed precise numbers from the revised manuscript to make the introduction less time-dependent. Nonetheless, we now include a reference to the JHU online resource, as well as the weekly epidemiological updates from the WHO (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports) for readers interested in the latest figures.

      Fig S3: How was the digital droplet PCR carried out? A brief description should be included in the legend text.

      We purchased these samples from QCMD, an independent International External Quality Assessment & Proficiency Testing (EQA/PT) organisation. QCMD performed the digital droplet PCR, before distribution under the QCMD 2020 Coronavirus Outbreak Preparedness EQA Pilot Scheme, and provided us with details, which have now been added to the Materials and Methods section:

      “Quantification of control samples was carried out by QCMD prior to distribution within the EQA scheme, using droplet digital PCR (ddPCR) with E-gene primers and probe [13, 14] on the Biorad droplet digital PCR platform. A serial dilution of inactivated SARS-CoV-2 (strain BetaCoV/Munich/ChVir984/2020; GenBank Accession MT270112, [32]) was prepared and each dilution replicate tested 4 times using both RT-qPCR and ddPCR assays. Regression analysis was used to assess the linearity across the dilution series, and the analytical measurement range established for both assays, comparing results of each by Bland-Altman difference plot."

      In addition, we provide more details with the relevant table (new Table S3) and in the legend of the associated figure (new Fig 2) we state: “See Materials and Methods for details”.

      Figure 1a: PCR efficiencies are missing.

      We have now added PCR efficiencies to all relevant graphs.

      Line 145: MS2 appears, but without explaining the context. This should be improved here with additional information (this does not appear until line 154).

      At first mention of MS2 in the main text, we now state:

      “Internal controls were included to provide confirmation of successful nucleic acid extraction and absence of PCR inhibitors, with lysis buffer spiked with both MS2 (an RNA bacteriophage that infects Escherichia coli) and PhHV (a DNA virus that infects seals), detected by the TaqPath and N1E-RP/N2E-RP assays respectively..”

      Page 15, H20 instead of H20, reaction mix instead of Reaction mix.

      In the supplementary protocol, we have changed “H2O” to “H2O” and “Reaction mix” to “reaction mix”.

      Reviewer #2 (Significance (Required)): The novel coronavirus SARS-CoV-2 is the causative agent of the acute respiratory disease COVID-19 which has become a global concern due to its rapid spread and high death rate. While some patients have no symptoms at all, but are still able to spread the virus, others have severe symptoms, often with fatal outcome. The gold standard in SARS-CoV-2 detection is the RT-qPCR approach, however, the high cost commercial kits are available in limited amounts only. The issue of the scarcity of resources is still an highly important issue, especially in terms of the incredibly rapidly increasing number of cases worldwide. Thus, the manuscript is of significance for the field and timely. Especially, diagnostic laboratories in low-income countries that are involved in the managing the pandemic but also researchers will benefit from this manuscript and save resources.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): In this study Reijns et al developed a multiplex RT-PCR assay (alternative primer-probe sets and enzyme mixes) to detect SARS‐CoV‐2 and internal controls. The authors conclude that their assay performed equally well as established commercial kits. The authors also demonstrated that nose‐and‐throat swab samples have considerable variability in patient material content (>1,000‐fold variability). High variability is expected, but it is still important to substantiate this notion with numbers. Overall, I like the study and find it methodologically sound. Sample numbers in the tests are in most cases good. I have very few objections and hope to see the manuscript published soon.

      **SPECIFIC COMMENTS:** 1."The COVID‐19 pandemic originated in Wuhan (China) in December 2019 and at the time of writing has infected more than 13.1 million people worldwide, resulting in well over 0.57 million COVID‐19‐related deaths..." I suggest a more timeless starting of the introduction, not pointing out exact number of infections and deaths since these numbers quickly become obsolete. The reader will know the severity of the pandemic and the importance of methodological development without statement of exact numbers. This comment reflects my personal opinion and it is completely up to the authors to choose how to phrase this section.

      We agree with the reviewer that a more timeless start to the introduction makes more sense. Therefore, as suggested, we have changed this section of the manuscript, which now reads as follows:

      “The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2 [1], originated in Wuhan (China) in December 2019 and rapidly spread across the globe, resulting in substantial mortality [2, 3] and widespread economic damage. Until a vaccine becomes available, public health strategies centred on reducing the rate of transmission are crucial to mitigating the epidemic, for which effective and affordable testing strategies to enable widespread population surveillance are essential.”

      2.Tables listing primers and probes should include the amplicon (PCR product) length for each primer-probe pair. Product length is an important consideration for fragmented RNA samples, such as for example heat-inactivated or longer-term stored samples. It should not be put on the reader to find out the amplicon lengths.

      To provide the reader with this information, the revised manuscript now includes the following:

      1. As suggested, the amplicon length for each primer pair is added to all tables that list primers and probes (PCR products: RdRp – 100 bp; E- 113 bp; N1 – 72 bp; N2 – 67 bp);
      2. A diagram in a new Fig 1 indicates the positions of all primers and probes on the SARS-CoV-2 genome along with amplicon length.
      3. A supplementary SnapGene file with primers and probes on the SARS-CoV-2 (Wuhan-Hu-1) genome to allow readers to look at further details in the context of the viral genome.

        3.Line 131: "To confirm sensitivity using total viral RNA, nucleic acids isolated from cultured SARS‐CoV‐2 were also used to make a dilution series (10^‐1 to 10^‐6)." I lack a methodological description how viral nucleic acid was quantified. It is not entirely trivial to separate viral RNA from RNA contributed from the cells used for the in vitro expansion of the virus.

      We apologise for the lack of clarity on this in our original manuscript. The purpose of this experiment was not to measure a defined number of RNA molecules, but to ensure that there was no inhibition of viral target amplification in a more complex sample by demonstrating linearity over a range of dilutions. The cultured SARS-CoV-2 positive control was provided by Prof Rory Gunson (Clinical Lead West of Scotland Specialist Virology Centre, NHS Greater Glasgow and Clyde) as inactivated supernatant from virus (strain hCoV-19/England/02/2020) propagated in cell culture. We then isolated RNA from a dilution series of this supernatant, using the methods described in our manuscript, but did not determine the precise concentration. The RT-qPCR data for this series shows a good fit and amplification efficiency, similar to what was found for the IVT RNA, and QCMD virus calibration curves (new Fig 2). The known copy number of the QCMD virus (as determined by ddPCR) allowed us to calculate that the concentration of the virus in the supernatant provided to us was between 0.7 and 2.2 x 105 copies/ml, with viral RNA detected down to between 0.7 and 3 copies with our N1E-RP and N2E-RP assays. We have substantially restructured the results section, and hope to have made the way we used the different viral controls clearer in the revised version of the manuscript.

      4.Line 150: "All positive and negative controls gave the expected results (Table S4)" I don't like the exact formulation since it is not clear for the reader what are the "expected results", including the "expected" quantitative results (Ct).

      We agree that the use of “expected results” does not provide the reader with sufficient information. We have therefore changed this to:

      “Results for controls were as anticipated (Table S4), with signal absent (undetermined) for SARS-CoV-2 and RPP30 targets for the negative controls, and Cq values for the SARS-CoV-2 RNA positive control (50 copies) similar to those obtained previously (Fig 2A).”

      In addition, we now provide more information on the precise nature of the negative and positive controls with Table S4:

      “-ve (extr), negative control with viral transport medium after RNA isolation (does not contain SARS-CoV-2 or human material; does contain PhHV);

      -ve, negative control containing water only (should not contain any RNA)

      +ve, positive control with in vitro transcribed RNA (50 copies; contains SARS-CoV-2 target RNA, does not contain human or PhHV nucleic acids)”

      Reviewer #3 (Significance (Required)): This study provides an alternative multiplex RT-PCR assay to detect SARS-CoV-2 infection. I find the results important and useful for the research and medical community.

      Rebuttal references

      1. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med. 2020;382(8):727-33. Epub 2020/01/25. doi: 10.1056/NEJMoa2001017. PubMed PMID: 31978945; PubMed Central PMCID: PMCPMC7092803.
      2. Brown JR, O’Sullivan D, Pereira RP, Whale AS, Busby E, Huggett J, et al. Comparison of SARS-CoV2 N gene real-time RT-PCR targets and commercially available mastermixes. 2020:2020.04.17.047118. doi: 10.1101/2020.04.17.047118 %J bioRxiv.
      3. Arevalo-Rodriguez I, Buitrago-Garcia D, Simancas-Racines D, Zambrano-Achig P, del Campo R, Ciapponi A, et al. False-negative results of initial RT-PCR assays for COVID-19: A systematic review. 2020:2020.04.16.20066787. doi: 10.1101/2020.04.16.20066787 %J medRxiv.
      4. Watson J, Whiting PF, Brush JE. Interpreting a covid-19 test result. BMJ. 2020;369:m1808. Epub 2020/05/14. doi: 10.1136/bmj.m1808. PubMed PMID: 32398230.
      5. Woloshin S, Patel N, Kesselheim AS. False Negative Tests for SARS-CoV-2 Infection - Challenges and Implications. N Engl J Med. 2020;383(6):e38. Epub 2020/06/06. doi: 10.1056/NEJMp2015897. PubMed PMID: 32502334.
      6. Vogels CBF, Brito AF, Wyllie AL, Fauver JR, Ott IM, Kalinich CC, et al. Analytical sensitivity and efficiency comparisons of SARS-CoV-2 RT-qPCR primer-probe sets. Nat Microbiol. 2020. Epub 2020/07/12. doi: 10.1038/s41564-020-0761-6. PubMed PMID: 32651556.
      7. Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem. 2009;55(4):611-22. Epub 2009/02/28. doi: 10.1373/clinchem.2008.112797. PubMed PMID: 19246619.
    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this study Reijns et al developed a multiplex RT-PCR assay (alternative primer-probe sets and enzyme mixes) to detect SARS‐CoV‐2 and internal controls. The authors conclude that their assay performed equally well as established commercial kits. The authors also demonstrated that nose‐and‐throat swab samples have considerable variability in patient material content (>1,000‐fold variability). High variability is expected, but it is still important to substantiate this notion with numbers. Overall, I like the study and find it methodologically sound. Sample numbers in the tests are in most cases good. I have very few objections and hope to see the manuscript published soon.

      SPECIFIC COMMENTS:

      1."The COVID‐19 pandemic originated in Wuhan (China) in December 2019 and at the time of writing has infected more than 13.1 million people worldwide, resulting in well over 0.57 million COVID‐19‐related deaths..." I suggest a more timeless starting of the introduction, not pointing out exact number of infections and deaths since these numbers quickly become obsolete. The reader will know the severity of the pandemic and the importance of methodological development without statement of exact numbers. This comment reflects my personal opinion and it is completely up to the authors to choose how to phrase this section.

      2.Tables listing primers and probes should include the amplicon (PCR product) length for each primer-probe pair. Product length is an important consideration for fragmented RNA samples, such as for example heat-inactivated or longer-term stored samples. It should not be put on the reader to find out the amplicon lengths.

      3.Line 131: "To confirm sensitivity using total viral RNA, nucleic acids isolated from cultured SARS‐CoV‐2 were also used to make a dilution series (10^‐1 to 10^‐6)." I lack a methodological description how viral nucleic acid was quantified. It is not entirely trivial to separate viral RNA from RNA contributed from the cells used for the in vitro expansion of the virus.

      4.Line 150: "All positive and negative controls gave the expected results (Table S4)" I don't like the exact formulation since it is not clear for the reader what are the "expected results", including the "expected" quantitative results (Ct).

      Significance

      This study provides an alternative multiplex RT-PCR assay to detect SARS-CoV-2 infection. I find the results important and useful for the research and medical community.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Reijns and colleagues describe an approach to detect the causative agent of COVID-19, the beta coronavirus SARS-CoV-2, using an inexpensive in-house multiplex RT-qPCR. Concomitantly, viral E, N and RdRP(probe P2) as well as human RPP30 and a herpesvirus nucleic acid are also detected in order to monitor both the sample quality and the sample preparation. Reijns et al. performed testing on a huge amount of samples and used the data to describe the strength and limitations of the assay. The data is sound and give a very good impression of the 4-plex PCR capabilities. I read manuscript fluently and consider as linguistically very good. However, I still have a few comments and remarks that would strengthen the manuscript:

      Major issues:

      In the first section of the results section, many primer / probe conditions are given that make the reading flow difficult. Instead of using (data not shown) it would be helpful to use a table or a graphic to illustrate the various approaches. In general, I suggest to replace Ct by Cq, since the IVT standards are a quantification method.

      There has already been a change away from the initial E and RdRP gene based assay because of the published sensitivity issues and the use of degenerate bases as well as the detection of unspecific nucleic acids for E gene). In particular, it has been shown that the Sarbeco-E-yields false positive results (Toptan et al. 2020 (https://doi.org/10.3390/ijms21124396), Konrad et al. 2020 (https://doi.org/10.2807/1560-7917.ES.2020.25.9.2000173)), so that many laboratories do not consider E-gene-based results for borderline samples anymore. In this manuscript, the authors should comment on why they still use the results from the E gene / RdRP and describe their experience.

      In this manuscript, it should be indicated that the SARS-CoV-2 specific Probe P2 (according to Corman et al. 2020) was used. The reason for lower sensitivity due to nucleotide ambiguity and mismatch has to be explained in more detail. In addition to Corman et al. 2020 (see reference 2), Toptan et al 2020 (https://doi.org/10.3390/ijms21124396) might serve as helpful literature. With regard to the marginally positive samples that were not consistent in all assays, were the PCR products analyzed using high-resolution PAA genes and, if possible, sequenced? The sequencing approach (Sanger or NGS) offers the final characterization of the PCR products (especially for pan-genotypic primers such as E-Sarbeco). The samples declared as "inconclusive" could be further characterized in this way.

      The normalization in figure 3 should be also explained in the main text. Especially, why this approach was used for normalization. Nonetheless, it looks like the normalized values wills cluster much more strongly than those corresponding to the actual values. The authors should comment on this phenomenon. It appears that the higher cq values (less virus) are subject to a strong correction factor more often than high values. Are there any statistical relevant tendencies towards this phenomenon? For everyday clinical practice, does this mean that low samples Cqs (mostly) only reflect the quality of the sample, but not the viral load? Finally, it remains somewhat unclear to what extent the Cq values of the RPP30 should have an influence on the routine diagnostics. The authors discuss that a fixed cutoff value would be a possibility to sort out poor swab samples, but if a cq value is available it would also make sense to generate a kind of quality score that can display the significance of a test. It would be helpful if the authors could comment on this or other possibilities.

      Over the past few months, more and more virus subtypes have formed through the manifestation of point mutations (and amino acid substitutions). The authors should therefore definitely comment on the current strains as to whether all primers / probes are able to detect the virus variants circulating worldwide without loss of sensitivity. Along this line,which virus strains were used for the cultivation as described in line 131? Is sequence data available? If so, it would provide helpful information to characterize the viral strain.

      Line 206ff: In my opinion, this section belongs more to the discussion part than to material and methods that describe the technical implementation.

      Is there a loss of sensitivity compared to the single PCRs? This data is very important and useful for other users. They should therefore be included explicitly in the manuscript (supplements).

      Minor issues:

      Line 15 ff.: Source is missing, is this WHO-data?

      Fig S3: How was the digital droplet PCR carried out? A brief description should be included in the legend text.

      Figure 1a: PCR efficiencies are missing.

      Line 145: MS2 appears, but without explaining the context. This should be improved here with additional information (this does not appear until line 154).

      Page 15, H20 instead of H20, reaction mix instead of Reaction mix.

      Significance

      The novel coronavirus SARS-CoV-2 is the causative agent of the acute respiratory disease COVID-19 which has become a global concern due to its rapid spread and high death rate. While some patients have no symptoms at all, but are still able to spread the virus, others have severe symptoms, often with fatal outcome. The gold standard in SARS-CoV-2 detection is the RT-qPCR approach, however, the high cost commercial kits are available in limited amounts only. The issue of the scarcity of resources is still an highly important issue, especially in terms of the incredibly rapidly increasing number of cases worldwide. Thus, the manuscript is of significance for the field and timely. Especially, diagnostic laboratories in low-income countries that are involved in the managing the pandemic but also researchers will benefit from this manuscript and save resources.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In order to improve SARS-CoV-2 diagnostics, Reijns et al. developed a multiplexed RT-qPCR protocol that allows simultaneous detection of two viral genes, one housekeeping gene as well as an external gene as an extraction control. Compared to running parallel assays to detect genes individually, the turnaround time is much shorter and reagents are saved. Furthermore, the presented data suggest that the assay is more sensitive than commercial kits. The authors also propose the detection of the human housekeeping gene as a measure of sample quality control. In principal, this work has potential but the manuscript itself needs a better structure.

      Major concerns:

      The authors have used the Takara RT-qPCR kit for their study. Did the authors try other commercial kits? Can the authors elaborate on the supply chain of the Takara kit? Could it cover population testing in case of shortages of other commercial kits?

      For better comparison, is it possible to give information on which primers the commercial kits are based on? Also, explain better the primers used in this study. For example, the N1 and N2 primers are directed against different regions of the SARS-CoV-2 N gene.

      The result section needs a better structure as the first two pages do not refer to any of the main figures. For example, in which figure or table can the reader find the data that are discussed in lines 83 to 87?

      Table S1, instead of current Table 1, could be moved to main figures as it contains the important finding that the multiplexed assay may be more sensitive than the commercial one. The authors identified some samples that scored negative in commercial assays but positive in their new assay. This is important, however, the possibility of detecting false positives should be strengthened in a "Discussion" section.

      Figures 1 to 3 have different panels which seem to be redundant. For example, Fig 1 A and B, Fig 2 B and C, Fig 3 C and D.

      Figure 1: Give a rational why comparing before and after extraction. This heavily depends on the extraction method and not on the detection itself. In addition, IVT RNA does not reflect the complexity of a clinical specimen. This is rather confusing and deviates from the important findings.

      Figure 3: Were any of the negative samples/patients tested with an undetectable housekeeping gene, re-test positively? Did adding this housekeeping gene as a control actually improve the detection of any patient samples? If the authors want to convince the readership of this quality control, experimental evidence should be provided.

      Fig 3C and D seem to contain this information somewhat, as here, the values were normalized and the CT values for the E and N gene decreased. Nevertheless there is no real explanation of this figure provided in the Result section at all. While this figure has potential, the authors have to keep in mind that the number of cells in a swab can be affected by many biological factors, including age, sample timing, inflammation of the respiratory tract, etc. In addition, viral genomes can exist intra- as well as extracellular, in the form of free virus. So even in the absence of human cells/detectable housekeeping genomes, viral RNA can be or should be present in a sample in case of infection. This explains (probably) why a correlation between detectable housekeeping gene and viral RNA is absent (Fig 3A and B?). This entire Fig 3 just needs a better explanation. The provided text does not describe any results and should go into a "Discussion" section.

      Self-swabbing is surely a potential source of variability and false-negatives, but many publications have shown the suitability of saliva testing. This should also be discussed and would probably negate the need for such a quality control.

      Which assay works better, the N1E-RP or the N2E-RP assay? A final conclusion is missing here.

      Significance

      Naturally, in this pandemic, this topic is important as sensitive and affordable methods to detect SARS-CoV-2 infections are in need. This Reviewer agrees that multiplexing could be an elegant approach to fill this need.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their feedback and constructive comments to our work. We provide here a point-by-point response to the comments of Reviewers #1, #2 and #3 (text in grey and italic).

      Responses written in plain text correspond to Reviewer comments that have been addressed in the revised version of the manuscript provided at this stage of the review process (referred-to as “revised version I” below).

      Reponses written in bold text correspond to comments that need further experiments. The list of experiments we intend to perform to address these comments is provided in a separate document (Revision plan). The results of these additional experiments will be included in a later revised version of the manuscript referred-to as “revised version II” below.

      Reviewer #1

      The manuscript addresses an important topic, the posttranscriptional maturation of ribosomes. This topic is inherently interesting because we normally think of ribosome biogenesis as a sequential series of steps that automatically proceeds and cannot be "accelerated" in physiological conditions, but only "delayed" in the presence of genetic mutations. In short, the manuscript proposes that RIOK2 phosphorylation by the action of RSK, below the Ras/MAPK pathway promotes the synthesis of the human small ribosomal subunit.

      I honestly admit that I have some difficulties in reviewing this manuscript. The quality of the presented data is, in generally, good. However, overall I find the whole manuscript preliminary and I am not much convinced of the conclusions. Several aspects are superficially analyzed. In short, I think that most of the conclusions are not fully supported by the data because shortcuts are present. A list of all the aspects that I found wrong are listed.

      Biological issue

      1. _The authors claim that the effects of the inhibition of the maturation of ribosomes by acting on a pathway upstream of RIOk2 are limited to the 40S subunit. This is far from being a trivial point, for the following reason. RIOK2 is known to affect the maturation of 40S ribosomes. Hence, the fact that using an upstream inhibitor of the MAPK pathway such as PD does not inhibit 60S processing in reality would argue against a biologically relevant control in ribosome maturation (of the MAPK patheay). Have the authors considered this? In a way, also, given the fact that the mutants confirm a role in 18S final maturation, it is a bit complex to put all the data in a clear biological context.

      We agree that we put more emphasis on the effects on the pre-40S pathway than on the pre-60S pathway in the original manuscript but we did not claim that the effects of PD or LJH inhibitors of the MAPK pathway are restricted to the 40S subunit. We described that the effect of PD or LJH on the 32S was less severe than on the 30S, and we did mention variations of the 12S intermediate. These changes are in the same range of amplitude as the changes in the 21S and 18S-E intermediates in the small subunit pathway. The Northern blot data concerning the pre-60S pathway were placed in the supplementary material of the original manuscript, which may have left the reader with an impression of lesser emphasis. We rephrased this part in the present revised version I of the manuscript (Page 6, Line 26) and we now show the pre-40S and pre-60S intermediates on the same figures (Figures 1A and 1C).

      In addition, we will probe more exhaustively the intermediates of the pre-60S pathway in the revised version II of the manuscript as described in the revision plan. These data will be complemented with metabolic labeling experiments to provide a more dynamic analysis of the pre-rRNA processing defects resulting from inactivation of the MAPK pathway. Furthermore, as requested by Reviewer #2 (see below), we will quantify more accurately these data.

      A number of specific issues will be concisely described.

      Manuscript very well written. Data do not always support the strong conclusions. Low magnitude of the observed effects.

      In introduction the authors make a general claim that ribosome biogenesis is one of the most energetically demanding cellular activities. This statement lingers in the literature since 15 years but in reality it has never been formally proved for mammalian cells, and certainly not for HEK293 cells. The original statement, to my knowledge, can be traced by some obscure statement referred to the yeast case and then repeated as a truth. In conclusion, beside being a very banal observation, it should be referenced.

      We agree with this comment of Reviewer #1. The original statement has been proposed by Jonathan R. Warner (Warner, 1999, TiBS and references therein) and data from the Bähler group also supported this statement (Marguerat et al., 2012, Cell). However, these data were indeed referring to yeast (S. cerevisiae and S. pombe). In the present revised version I of the manuscript, we introduced the reference of a review providing quantitative data of ribosome biogenesis in human cells (Lewis & Tollervey, 2000, Science) and we modified the problematic sentence as follows:” Growing human cells produce around 7500 ribosomal subunits per minutes (Lewis and Tollervey 2000), which represents a significant expenditure of energy.” (Page 4, Line 1).

      Growth factors, energy status are not cues but are proteins or metabolites (introduction).

      We agree with this comment of Reviewer #1. We changed the text accordingly in the revised version I of the manuscript (Page 4, Line 8).

      Authors write about mTOR without making statements on mTORC1/2. This is very obsolete. Also I am not sure that the choice of Geyer et al., 1982, and subsequent papers makes much sense. At the very minimum TOP mRNA concepts and mTORC1 must be defined.

      We provide more details on the mTOR pathway in the revised version I of the manuscript according to Reviewer #1’s suggestions (Page 4, Line 13 and Page 5, Line 3).

      The authors claim that their work fills a major gap between known functions of MAPK and cytoplasmic translation. I would not be so sure about it.

      Our original sentence stated that “our work fills a major gap between currently known functions of MAPK signaling in Pol I transcription and cytoplasmic translation”. Indeed, although MAPK signaling was known to regulate Pol I transcription and cytoplasmic translation, the impact of the pathway on the post-transcriptional steps of ribosome synthesis, namely pre-ribosome assembly and maturation, has been very little investigated and remains poorly understood. Our data provides the first example of a detailed mechanism of regulation of the maturation of pre-ribosomal particles by the MAPK pathway. Reviewers #2 and #3 seem to agree with this point:

      Reviewer #2: “However, there is a lacking mechanistic connection of signaling pathways to pre-rRNA processing and maturation steps of ribosome biogenesis. The authors set out to provide a specific example of a direct target of MAPK signaling, RSK that regulates pre-rRNA maturation through the phosphorylation of a ribosome assembly factor (RIOK2), offering for the first time providing mechanistic insight into MAPK regulation of pre-rRNA maturation.

      Reviewer #3: “With these provisos, the work is technically good and will be of considerable interest to the field. The post-transcriptional regulation of ribosome synthesis is increasingly recognized a significant topic.

      Results. Authors start with a major mistake, i.e. that PMA selectively stimulates the MAPK pathway. Perhaps it stimulates, certainly it does not do it selectively.

      We agree with this comment of Reviewer #1. We removed the term “selectively” in the problematic sentence (Page 6, Line 8).

      RIOK2 phosphosites are first found by bioinformatics analysis. It should be noted that the predicted phosphosite (S483) is found only in a limited set of datasets from MS databases. The actual importance of this site would not emerge from unbiased studies. Also, there are many other phosphosites that were not analyzed in this study.

      We agree with Reviewer #1 that phosphorylation of S483 of RIOK2 has been detected in a limited number of mass spectrometry datasets, but these datasets have been reported in high impact journals (Nature Methods, Mol Cell Proteomics, Science), attesting of the quality of these studies

      As mentioned by Reviewer #1, there are several other phosphosites within RIOK2 that were not analyzed in our study. We provided the list of these phosphosites in Supplementary Table S1 of the original manuscript. Besides T481 and S483, none of the other sites belong to consensus motifs recognized by ERK or RSK at medium and high stringency. They are therefore less relevant to our study. We only analyzed phosphorylation at S483 because: (i) our mass spectrometry analysis revealed that S483 is the only phosphosite in RIOK2 whose level increases upon MAPK activation but not in the presence of the MAPK inhibitor PD184352 (Figure 2B); (ii) our in vitro kinase assay showed that the phosphorylation level of RIOK2 by RSK is residual when S483 is replaced by a non-phosphorylatable alanine (Figure 3D); (iii) our data presented in Figure 2C further show that mutation of T481 to an alanine does not prevent RIOK2 phosphorylation on RxRxxS/T motifs upon stimulation of the MAPK pathway.

      We clarified this point in the relevant part of the result section of the revised version I of the manuscript (Page 7, Lines 16 and 24, Page 8, Line 17 and Page 9, Line 5).

      Throughout the paper the authors use the word strongly, significantly, but the actual effects seem in general quite marginal.

      We agree with Reviewer #1 that some of the phenotypes described in the manuscript are modest, in particular the phenotypes resulting from the S483A mutation of RIOK2, which is not aberrant for a point mutation. We rephrased several sentences throughout the manuscript to soften the formulation in the description and interpretation of the data and in the conclusions.

      Discussion. The authors claim that they provide solid evidence on MAPK signalling to ribosome maturation. At the very best this is circumstantial evidence for the 40S maturation.

      We rephrased the sentence accordingly (Page 16, Line 5): “Our study provides evidence that MAPK signaling applies another level of coordination during ribosome biogenesis, by directly regulating pre-40S particle assembly and maturation.

      Figure 1.

      Unclear why LJH should increase P-ERK.

      A negative feedback loop has been described in the MAPK pathway whereby RSK activation partially inhibits ERK phosphorylation (Saha et al., 2012, Horm Metab Res; Dufresne et al., 2001, MCB; Schneider et al., 2011, Neurochem; Re Nett et al., 2018, EMBO Rep). Inactivation of RSK with LJH alleviates this inhibition, which results in increased phosphorylation levels of ERK.

      We added this information in the revised version of the manuscript along with the corresponding references (Page 6, Line 17).

      General lack of quantitation (sd, replicates, bars). Experiment done only on a single cell line in a single experimental setup.

      As also requested by Reviewer #2 (Major comment 1.), we applied in the revised version I of the manuscript RAMP quantifications to all Northern blot data. We included error bars corresponding to biological replicates.

      Furthermore, in order to validate the impact of the MAPK pathway on pre-ribosome assembly and maturation, we plan to perform the same experiments using PD inhibitors in different cell lines and we will provide a figure with accurate RAMP quantifications, error bars and statistical significance, in the revised version II of the manuscript (see revision plan).

      Very different effects on 21S by LJH, PMA and siRNA for RIOK2. Overall the message given by the authors is to me mysterious.

      We assume that the reviewer wanted to point out the difference between PMA, PMA+LJH and shRNA for RSK since we did not perform RNAi targeting RIOK2. We agree with this comment. We believe that this difference is likely due to experimental setups that are different between both experiments. In the experiment using inhibitors, we assessed short-term effects of RSK inhibition after acute stimulation of the MAPK pathway (starved cells stimulated with PMA), while in the experiment using shRSK, we monitored long term effects of RSK depletion in serum-growing cells in which other signaling pathways are also active. Prolonged RSK depletion is likely to induce pleiotropic cellular effects, which would interfere with ribosome biogenesis both directly and indirectly. These differences probably explain the variable effects on the 21S intermediate. However, in both experiments we do observe an accumulation of the early 30S intermediate, consistent with the phenotype observed when ERK is inactivated (PD inhibitor), therefore indicating that RSK regulates some post-transcriptional stages of ribosome biogenesis.

      To make our results clearer we have withdrawn the experiments using shRSK to avoid the risk of showing indirect effects due to the prolonged absence of RSK. Instead, we included RAMP analyses with error bars from 2 biological replicates using PD and LJH inhibitors (Figure 1B).

      Figure 2.

      Several red flags. For instance in 2C the loaded levels of RIOK2-HA loaded are clearly less than the ones of the other genotypes, hence the conclusion on P-RIOK2 is not convincing.

      Our aim in this experiment was to compare the impact of PMA treatment on the phosphorylation levels of different RIOK2 mutants (T481A, S483A, double mutant). For a given mutant, the levels of RIOK2 loaded in the two conditions (i.e. not stimulated and PMA stimulated) are very similar and we therefore assume that our conclusions are valid.

      We nevertheless plan to repeat these experiments and quantify the data for the revised version II of the manuscript.

      Staining with anti-P RIOK2 lacks controls, how can be sure that the signal is due to the phosphate? Phosphatase treatment?

      We fully agree with Reviewer #1 and we did perform an experiment showing that the phosphorylation signal disappears following treatment of the protein extracts with λ-phosphatase. We did not show these data in the original version of the manuscript because of space limitations. We added these data in the supplementary material of the revised version I of the manuscript (Supplementary Figure S2B) and amended the text accordingly (Page 7, Line 24)

      Why FBS does not lead to ERK staining in HEK293? There are plenty of growth factors in FBS that should lead to ERK phosphorylation. I do not understand this experiment.

      We agree with this comment. Addition of serum to starved cells does lead to ERK and RSK phosphorylation but with a much lesser efficiency compared to stimulation by EGF and PMA. ERK phosphorylation is barely visible on the exposure shown in Figure 2D but RSK-phosphorylation is clearly observed, although the signal is much weaker compared to EGF and PMA treatments. It is common to observe a stronger response with purified PMA and EGF (see Carrière et al., 2011, JBC ; Ray et al., 2013, Oncogene). There are indeed several growth factors in the serum, but the most abundant (Insulin, IGF1, TGF) are present at ng/ml concentration, while EGF is used at 25 µg/ml in Figure 2D. Moreover, they are not very strong activators of the Ras/MAPK pathway, and it is also possible that after 20 min of FBS treatment the phosphorylation is in the decreasing phase.

      In the present revised version I of the manuscript, we included a set of western blots from another experiment showing the same results but of better quality to make the effects more visible (Fig. 2D). We also provided quantifications of phosphorylation of RIOK2 and associated statistical analyses (Fig. 2E).

      Figure 3. In vitro phosphorylation, if I understood, it relies on a truncated version of RIOK2. Why? Is the folding of the full length protein not permissive to in vitro phosphorylation?

      We did not test phosphorylation of the full length RIOK2 protein in vitro because RIOK2 has been reported to auto-phosphorylate (Zemp I. et al., 2009, JCB) and we were concerned that this auto-phosphorylation activity of RIOK2 in addition to RSK phosphorylation may render this experiment inconclusive.

      HA-RSK3 is less?

      It was reported that RSK3 is insoluble when over-expressed (Zhao et al., 1996, JBC), which explains the lower levels of protein recovered in our soluble extract. The information was present in the legend of Figure but we transferred it to the main text of the result section in the present revised version I of the manuscript (Page 10, Line 3).

      Figure 4. Immunofluorescence is low mag, difficult to understand.

      We agree with Reviewer #1. We modified the FISH experiment figure to show cells with a higher magnification and we provided more details in the text (Page 12, Lines 20-25) to facilitate the understanding of the data.

      I really like the experiments with RIOK2 mutants, however I wonder what about protein levels after the knock-in? Given the 18S phenotype overlap between the phenotype of the RIOK2 loss of function with the S483A, testing protein level becomes of the utmost importance.

      We checked RIOK2 protein levels and observed that the mutations do not decrease the level of RIOK2. On the contrary, the mutations slightly increase RIOK2 levels. Therefore, we are pretty confident that the phenotypes resulting from expression of RIOK2 mutants do not result from defects in the global accumulation of the protein. These data have been added to Figure 4C of the revised version I of the manuscript and we amended the text accordingly (Page 12, Line 5).

      Figure 5. Low quality IFL.

      Our aim in preparing this figure was to show many cells in the different images to show that the effect of our mutation was homogenous at the level of cell populations. The drawback is that cells are small and look blurred. We improved the quality of the figure in this revised version I of the manuscript with new images from the same experiment, showing less cells with a higher magnification.

      Hard to think that histogram quantitation of nuclear versus cytoplasmic staining are reliable in the absence of fractionation, better quantitation, experiment done in other cell lines and so on.

      We provide in this revised version I of the manuscript a supplementary figure explaining the procedure we used to quantify the fluorescence data (Supplementary Fig. S7).

      Furthermore, to confirm this result using other experimental conditions and cell lines, we will transfect HEK293 and HeLa cells with plasmids expressing GFP-tagged RIOK2 WT or the S483S mutant and we will compare the kinetics of nuclear import of both proteins upon inhibition of pre-40S particle export by leptomycin B using fluorescence microscopy and GFP quantifications. Second, we will transfect HeLa cells with plasmids expressing HA-tagged RIOK2 WT or S483A and perform fractionation assays to monitor their presence in both cytoplasmic and nuclear compartments. We will include these data in the revised version II of the manuscript.

      However, very beautiful Fig. 5E perhaps the best of the paper shows also mobility shift driven by S483, thus supporting posttranslational modifications.

      We thank Reviewer #1 for this comment. We added the note on the evidence of RIOK2 post-translational modification in the result section (Page 14, Line 9).

      Fig. 6. IFL studies are really impossible to interpret.

      We improved the quality of the figure with new images from the same experiment, showing less cells with a higher magnification. NOB1 IF data and quantifications have been transferred to the supplemental material (Supplemental Fig. S4A and S4B) to clarify the figure. In addition, we provided more explanations on the principle of this experiment and expected results in the text (Page 15, Line 9).

      The effects on RIOK2 release (this figure) and 18S maturation (Fig. 5) are very clear and of great quality.

      We thank Reviewer #1 for this comment.

      Overall conclusions. The manuscript tends to overinflate the meaning of several experiments. What to me is very clear and interesting is that the the authors provide clear evidence that S483A mutants have a defect in 40S maturation. Whether this is due to MAPK signalling, is only circumstantial. I would suggest to build up on the strong findings and eliminate ambiguous data.

      We do not fully agree with this comment of Reviewer #1. If mutation S483A were simply a partial loss of function mutation, this would not be of strong interest for the subject of this manuscript. It would just indicate that S483 is important for RIOK2 function independently of its phosphorylation status. Our data show that the impact of S483 mutation on pre-rRNA processing and other phenotypes is different depending on whether the serine is converted to an alanine (phosphorylation mutant) or to an aspartic acid (phospho-mimetic mutation). These data are a strong indication that what matters is not simply the serine residue by itself but its phosphorylation status.

      Reviewer #1 (Significance (Required)):

      The paper deals with an important topic, namely whether a regulation of ribosome maturation exists, and how it is mechanistically regulated. In this context, the analysis of the ERK pathway is highly needed considered that most works deal with effects of the PI3K-mTOR pathway, and the parallel, yet important RAS-ERK pathway, is less understood.

      As a final note, we should consider that S6K downstream of mTOR, and ribosomal S6K, downstream of ERK have been considered to share some substrates.

      We introduced this information in the revised version of the manuscript (Page 19, Line 20). A related comment has been raised by Reviewer #3 (see below, Caveat #2).

      The manuscript is interesting, but several statements given by the authors are rather superficial. An example, listed in the previous section, relates to the linguistic usage of mTOR kinase, instead of detailing whether we are dealing with mTORc1 or mTORc2.

      We agree with this comment of Reviewer #1. Given that the main focus of this manuscript is the regulation by the MAPK pathway, we had chosen to put less emphasis on mTOR in the introduction. However, we added more precise information on mTOR in the present revised version I of the manuscript to address this comment (Page 4, Line 13 and Page 5, Line 3).

      A second gross mistake is the definition of PMA as a stimulator of the ERK pathway. If this is certainly true, this is historically not correct as seminal papers by the group of Parker define this drug as a stimulator of conventional PKC kinases. In short, this paper is a step back in knowledge from the perspective of the literature context.

      We are a bit confused by this comment because seminal papers from the Parker group clearly state that PMA activates the MAPK pathway via PKC (Adams and Parker, 1991, FEBS Lett.; Ways et al., 1992, JBC; Whelan et al., 1999, Cell Growth Differ.). We agree, as mentioned earlier by Reviewer #1, that PMA is not specific to MAPK, a comment that has been addressed above.

      All people interested to the crosstalk between ribosome maturation and signaling pathways will be certainly read this manuscript.

      My expertise is within the ribosome biology and signalling field.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      There have been mechanistic connections of various signaling pathways to regulation ribosome biogenesis steps including rDNA transcription by RNA polymerase I and III, ribosomal protein transcription, and differential mRNA translation efficiency. However, there is a lacking mechanistic connection of signaling pathways to pre-rRNA processing and maturation steps of ribosome biogenesis. The authors set out to provide a specific example of a direct target of MAPK signaling, RSK that regulates pre-rRNA maturation through the phosphorylation of a ribosome assembly factor (RIOK2), offering for the first time providing mechanistic insight into MAPK regulation of pre-rRNA maturation.

      The authors observe slight pre-rRNA processing defects upon the use of RSK inhibitors and RSK depletion. They identified several candidate ribosome assembly and modification factors containing the canonical RSK substrate motif, including the RIOK2 kinase. Phosphorylation at this motif was verified to be specifically phosphorylated by RSK1 and 2 isoforms in cells and in an in-vitro kinase assay. The authors produced RIOK2 knock-in eHAP1 cell lines expressing non-phosphorylatable or phosphomimetic versions of RIOK2, observing slowed cellular proliferation, decreases in global translation, slight pre-rRNA processing abnormalities, but not changes in overall mature 18S rRNA levels. More specifically, the authors defined the inability of RIOK2 to be phosphorylated leads to defects in RIOK2 dissociation from the pre-40S ribosomal subunit in an in-vitro assay, and inability for it to be recycled for reuse in pre-ribosome export from the nucleus to the cytoplasm by immunofluorescence.

      Overall, the authors provide an interesting mechanism of MAPK regulation of a ribosome assembly factor RIOK2. However, they fail to provide the necessary reproducibility, controls, quantification, and consistent results between experiments to support their hypotheses.

      Major Comments:

      1. The northern blots reported throughout the manuscript are lacking proper reproducibility and quantification. First, the northern blots are lacking a loading control, which is necessary to report fold changes that are being measured across treatments. Please include a proper loading control (i.e. 7SL or U6 RNAs). Additionally, more rigorous analysis of the pre-rRNA precursor levels through ratio analysis of multiple precursors (RAMP) (Wang et al 2014) can be completed to provide a clearer depiction on which precursor(s) are accumulating. It is unclear for the Figure 1 northern blots if there were replicates completed and what the error bars represent in Figure 1B. Please report replicates, so that statistical analysis can be completed on the differences in precursor relative abundance. This need is emphasized by the small changes observed in pre-rRNA levels (less than 2 fold) between conditions.

      As mentioned above (Reviewer #1), we applied in the revised version I of the manuscript RAMP quantifications to all Northern blot data. These quantifications are shown as separate panels in the figures of the revised manuscript.

      Furthermore, we are planning to repeat the Northern blot experiments of Figure 1 to obtain biological replicates in other cell lines. We will probe the membranes to detect the 7SL RNA as a loading control in all these experiments. We will perform RAMP analyses on all these Northern blot experiments to provide more accurate quantifications of the pre-rRNA levels in the different conditions. These data will be included in the revised version II of the manuscript.

      1. The western blots reported throughout the manuscript are lacking proper reproducibility and quantification. For example, the western blots validating RSK1 and RSK2 depletion in Figure 1C lack a proper loading control. Additionally, it is unclear if there are replicates completed and there is lack of statistical analysis to determine if the changes are significant. Please include loading controls, replicates, and quantification of the western blots throughout the manuscript.

      We have included actin levels as loading controls in several figures (Figures 2D, 3A, 3C, 3E, 4C) of the revised version I of the manuscript. We also added phosphorylated Rps6 at Ser235/36 to monitor RSK activity in Figures 1A, 2D, 3A.

      We provided quantifications and associated statistical analyses of phosphorylation of RIOK2 presented in Figures 3A and 3C of the revised version I of the manuscript. We also included quantifications of the in vitro phosphorylation assays presented in Figures 3F and 3G.

      We are nevertheless planning to repeat and quantify more accurately the western blot experiments presented in Figures 2A, 2C and 3E of the revised version I of the manuscript. These data will be included in the revised version II of the manuscript.

      1. Please report the full bioinformatic analysis of the RSK substrate motif search among human AMFs including other AMFs found in this search. A sorted list format would be valuable for the reader to understand other potential RSK substrates involved in ribosome biogenesis.

      We understand the request of Reviewer #2. Providing the full list of AMFs identified in our bioinformatic screen would be valuable for the reader, mostly because it would make clearer that RSK seems to be regulating multiple stages of the pre-ribosome maturation pathway, therefore that RSK inhibition induces pleiotropic defects in ribosome synthesis. However, we are currently working on a more global study of the impact of MAPK regulation on the post-transcriptional steps of ribosome synthesis that we would like to publish in a near future.

      1. The authors report that RSK inhibition/depletion leads to accumulation of the 30S pre-rRNA, yet mutation of its target site on RIOK2 or RIOK2 depletion leads to an accumulation of the 18S-E pre-rRNA. Additionally, the phosphomimic mutation of RIOK2 leads to an accumulation of 30S, the opposite of the expected result. Please elaborate on this discrepancy in processing defects observed across experiments.

      In contrast to RIOK2 which is specifically involved in the late, cytoplasmic stages of the maturation of the pre-40S particles, RSK regulates ribosome biogenesis at multiple levels. Upon activation of the MAPK pathway, RSK activates Pol I transcription in the nucleoli and promotes translation of mRNAs encoding ribosomal proteins and AMFs. In addition, our bioinformatic screen identified several AMFs at different stages of the maturation pathway of both ribosomal subunits as potential targets of RSK. These considerations imply that RSK inhibition is expected to impact ribosome biogenesis at multiple levels (Pol I transcription, availability of RPs and AMFs, export of the pre-ribosomal particles, probably several maturation steps) whereas RIOK2 inactivation more specifically delays 18S-E processing in the cytoplasm. In terms of processing, RSK inhibition induces a significant accumulation of the 30S intermediate. This is another evidence that RSK regulates pre-rRNA processing at several stages. This phenotype might result, as recently described in yeast (Yerlikaya et al., 2016, MCB), from an inhibition of RPS6 phosphorylation which affects its early incorporation into pre-ribosomes, although this has not been demonstrated in human cells. This 30S precursor accumulation affects production of the downstream intermediates and we strongly believe that this precludes accumulation of 18S-E even if the activity of RIOK2 is affected. Given the broad implication of RSK at different stages of ribosome biogenesis, it is biologically relevant to observe that inactivation of RSK does not result in the same processing defects as inactivation of RIOK2.

      We nevertheless tried to make this point clearer in the present revised version I of the manuscript. We added in the supplementary material a diagram (Supplementary Fig. S1C) showing all the known and hypothetical targets of ERK and RSK in ribosome synthesis to provide the readers with a global view of the function of RSK in this process and refer to this figure in the introduction and results. In the introduction, we also emphasize more on the multiple aspects of the regulation of ribosome synthesis by ERK and RSK (Page 4, Line 18).

      Concerning the phospho-mimetic mutant, it does accumulate slightly the 45S and 30S intermediates contrary to the non-phosphorylatable mutant but this is not totally unexpected. RIOK2 is incorporated into pre-ribosomes in the nucleus, at a stage that remains unclear, and constitutive RIOK2 phosphorylation may interfere with this recruitment and affect processing at an earlier stage. This point has been addressed in the discussion of the revised version I of the manuscript (Page 18, Line 7).

      Are there similar results for RSK depletion/inhibition and RIOK2 release from the pre-40S and inability to import into the nucleus? If so, this could provide phenotypic consistency between these two proteins in the proposed pathway to further support the hypothesis.

      We performed the same experiments as reported in Figure 6C to try to demonstrate a cytoplasmic retention of RIOK2 after leptomycin B treatment upon ERK inhibition (PD treatment). We also performed IF and cell fractionation experiments upon PD treatment. In all cases, we failed to observe the expected result. We strongly believe that we are facing here the same problem as described above for the previous comment of Reviewer #2. ERK and thus RSK inhibition leads to accumulation of the early, nucleolar 30S intermediate, indicating that the processing pathway is significantly blocked at an early stage preceding formation of the pre-40S particles in which RIOK2 is recruited. This early blockage most likely explains why we do not see the same phenotypes. We discussed this comment in the discussion section of the revised version I of the manuscript (Page 18, Line 19).

      1. Mature levels of 18S rRNA are not altered in the RIOK2 mutant cell lines. This could be due to compensation in these mutant cell lines since RIOK2 is essential.

      We agree with Reviewer #2 that compensation mechanisms may operate to restore mature 18S rRNA levels despite RIOK2 mutation. On the other hand, although RIOK2 is indeed essential, we may expect that the point mutation of S483 only partially affects RIOK2 function and delays the maturation of pre-40S particles but not to a sufficient extent to impact the mature 18S rRNA levels. This has been observed by others (Montellese et al., 2017, NAR; Srivastava et al., 2010, MCB).

      We added this point in the discussion section of the revised version I of the manuscript (Page 19, Line 9).

      Please report the mature 18S rRNA levels upon shRNA depletion and RSK inhibitors to provide insight into if this pathway significantly alters mature 18S rRNAs as a mechanism for the altered translation and proliferation observed.

      We will probe the levels of the mature 18S and 28S rRNAs in these experiments and the results will be included in Figure 1 of the revised version II of the manuscript.

      Minor Comments:

      1. Figure 1A lower: The authors use an RSK inhibitor LJH685, that does not inhibit RSK phosphorylation S380. Therefore, another verification of RSK inhibition must be used besides RSK-pS380 abundance as for PD184352 inhibition. Please validate the usage of this RSK inhibitor in the experiments by inclusion of quantification of a direct downstream substrate of RSK, such as YB1-pS102 quantification.

      We agree with Reviewer #2. We have probed the membrane with anti-RPS6 and anti-phosho-RPS6 antibodies to show the effect of LJH treatment on RPS6 phosphorylation. These data have been added to Figure 1A in the revised version I of the manuscript and the text has been updated (Page 6, Line 16).

      1. Page 7, Lines 8-12: The authors state that RSK knockdown led to increases in the 45S, while the LJH685 treatment led to no changes in 45S levels due to differences in growth conditions. Please elaborate more on how growth conditions would alter 45S pre-rRNA levels. It would be expected that stimulation of the MAPK pathway would increase pre-rRNA transcription compared to steady state growth conditions. However, pre-rRNA processing northern blots are only measuring steady state levels of the precursors. Thus, an rDNA transcription assay would need to be completed to evaluate these differences.

      We do observe that PMA treatment of starved cells induces an increase in 45S precursor levels, consistent with an increase in transcription but we agree that northern blot experiments measure the steady-state levels of the intermediates.

      To address this comment, we propose to perform short pulse labelings with ortho-phosphate to assess synthesis of the 45S precursor independently of its processing in the different conditions. These data will be included in the revised version II of the manuscript.

      1. Figure 2C: Please quantify these results to properly evaluate the role of these two phosphorylation sites in MAPK signaling.

      We will repeat these experiments and quantify the results in the new version of Figure 2C.

      1. Please include the RIOK2 pS483 antibody generation methodology used in this study.

      We added this information in the Materials and Methods section of the revised version I of the manuscript (Page 21, Line 22).

      1. In vitro kinase assay methods: Is the recombinant RSK1 the human version of the protein? Please clarify in methods.

      Human recombinant RSK1 has been purchased from SignalChem. The information has been added in the revised version I of the manuscript (Page 30, Line 5).

      1. Figure 4B: Please include statistical analysis of the puromycin incorporation assay.

      We performed a statistical analysis of this assay out of 3 replicates. This analysis has been included in the present revised version I of the manuscript (Figure 4B).

      1. Page 13, Line 18: Please explain why RIOK2 co-IP with NOB1 is important.

      We added this explanation in the result section of the revised version I of the manuscript (Page 14, Line 3).

      1. In vitro dissociation assay: There is no control for pulldown of entire pre-40S particles and not just NOB1 protein. Thus, it is unclear if RIOK2 is dissociating from NOB1 or entire pre-40S particles. Please reference previous literature of the methodology of this experiment if applicable. Additionally, please include controls, such as western blotting of ribosomal proteins or northern blotting of rRNA in the pulldown fraction used.

      We agree with Reviewer #2. We have probed the membranes with antibodies detecting LTV1 and ribosomal protein RPS7 to show that the entire pre-40S particle is indeed pulled down. These additional data have been added in Figure 6A of the revised version I of the manuscript and the text has been amended accordingly (Page 14, Line 20).

      1. Page 16, Lines 10-12: The authors state "RSK facilitates the release of RIOK2 and other AMFs", however the only other AMF in this study was NOB1. Please reword appropriately that most likely facilitates release of RIOK2 and other AMFs in a RIOK2 dependent or independent manner if it also phosphorylates other AMFs which possess the motif.

      We agree with Reviewer #2 and we changed the text accordingly (Page 16, Line 11) but we did not introduce the hypothesis that RIOK2 may target directly other AMFs of late pre-40S particles which possess the motif because our in silico screen did not identify consensus RXRXXS/T motifs in any of these factors.

      Reviewer #2 (Significance (Required)):

      This manuscript is significant due to the lack of mechanistic connection of cellular signaling pathways to pre-rRNA processing. There have been, for the most part, no mechanistic connection of signaling pathways to pre-rRNA processing regulation and none for direct targets of MAPK signaling (Reviewed in Gaviraghi et al 2019). They provide the groundwork for analysis of MAPK signaling in regulation of an assembly factor and inclusion of their motif analysis could provide RSK signaling targets' regulation of specific steps of ribosome biogenesis that remain to be elucidated.

      Although the research delves into a specific mechanism, its audience could be far reaching as it is in the ribosome biogenesis field and MAPK signaling, which have broad implications in cancer and developmental diseases.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The authors report that inhibition of MAPK signaling via RSK is associated with modest alterations in the relative abundance of human pre-rRNA species, that are most marked for 30S but also visible for 21S - although not clearly shown for 18S-E.

      RIOK2 has two closely spaced sites predicted as RSK targets, one of which was confirmed to be MAPK sensitive and shown to be an RSK substrate in vitro. Substitution of Ser483 with Ala was associated with reduced growth and 18S-E accumulation, consistent with impaired NOB1 cleavage activity. RIOK2-S483A also showed greater pre-ribosome association in vivo and consistent with this, more stable association in vitro and increase cytoplasmic residence. These effects are clear, although the data do not directly demonstrate their linkage to loss of RSK phosphorylation.

      The mutations were apparently generated directly in the genome of haploid cells, potentially raising concerns that the introduction of a deleterious mutation might have been accompanied by compensatory mutations elsewhere. However, three cells line gave similar results, mitigating this concern.

      Specific comments:

      1. To help the reader, the authors should directly discuss why they think the data on MAPK inhibition did not reveal a clearer pre-18S cleavage phenotype, as would have been expected for loss of RIOK2 activity.

      This comment is similar to major comment #4 of Reviewer #2.

      Please refer to the above response.

      1. Fig. S3: The degree of RSK depletion with the siRNAs appears very modest, as are the effects on RIOK2-P. Moreover, the double depletion is not clearly better than single depletions. These data should probably be supported by quantitation or withdrawn._

      We agree with Reviewer #3 that the effects shown in this figure are modest but we originally chose to show these data because their further supported the role of RSK in RIOK2 phosphorylation at S483 in complement to Figure 3.

      We have withdrawn this figure from the present revised version I of the manuscript.

      1. Fig. 5D: For 18S-E recovery with RIOK2, is the ratio adjusted for the increase in 18S-E abundance in the mutant - ie is recovery increased when adjusted for the increased pre-rRNA abundance?_

      In these experiments, the tagged versions of RIOK2 WT and S483A have been expressed ectopically from plasmids in cells expressing the endogenous wild-type protein. RIOK2 S483A does not behave as a dominant negative mutant in these conditions and does not induce 18S-E accumulation, as shown in the northern blot analysis of the 18S-E levels in the cell lysates (lower panel). This information is indicated in the revised version I of the manuscript (Page 13, Line 26).

      Reviewer #3 (Significance (Required)):

      Overall, the analyses on the phenotype of RIOK2-S483A, and the demonstration that this site is an RSK target, appear convincing.

      Caveats are

      1) the phenotype seen on inhibition of RSK, would not have implicated RIOK2 as the obvious candidate for the factor responsible for the observed processing defects;

      We agree with this comment, which has also been raised by Reviewer #2 (Major comment 4.). We provide several evidence in the manuscript that RSK phosphorylates RIOK2 on S483 in vivo and in vitro (Figure 3). However, as explained above in response to Reviewer #2, we cannot correlate the in vivo phenotypes resulting from RSK or RIOK2 inactivation for biological reasons. As mentioned in the introduction, RSK regulates multiple substrates at different stages of ribosome biogenesis (Translation of RPs and AMFs, Pol I transcription, pre-ribosome maturation and export), whereas RIOK2 is specifically implicated in the cytoplasmic maturation of pre-40S particles. Inactivation of RSK is therefore expected to induce pleiotropic defects in ribosome biogenesis, and in particular early defects (Reduced Pol I transcription, 30S precursor accumulation) that preclude observation of the expected phenotype linked to RIOK2 inactivation, i.e. 18S-E accumulation.

      We nevertheless tried to clarify this point as described in the response to Reviewer #2, major comment 4.

      2) the RIOK2-S483A phenotype is not demonstrated to be RSK dependent. This raises the possibility that, although RSK can phosphorylate S483, the effects of the mutation are not due to the loss of this modification.

      As mentioned by Reviewer #3, our data show that RSK can phosphorylate RIOK2 S483 in vitro and in vivo (Figure 3). We believe that Figure 4C strongly suggests that the accumulation of the 18S-E in cells expressing RIOK2 S483A mutant is due to the loss of S483 phosphorylation, since mutation of S483 to an aspartic acid (S483D), generally considered as a mutation mimicking a phosphorylated serine, does not affect 18S-E maturation. However, although our manuscript provides many lines of evidence identifying RSK as the kinase responsible for RIOK2 phosphorylation at S483, we cannot formally exclude that other AGC kinases involved in growth and proliferation, such as S6K or Akt, may also be involved redundantly or alternatively. Our data presented in Figure 3A showing that treatment of cells with the RSK inhibitors LJH decrease RIOK2 phosphorylation at S483 support a specific role of RSK.

      We developed this point in the discussion section (Page 18, from Line 25).

      With these provisos, the work is technically good and will be of considerable interest to the field. The post-transcriptional regulation of ribosome synthesis is increasingly recognized a significant topic.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      There have been mechanistic connections of various signaling pathways to regulation ribosome biogenesis steps including rDNA transcription by RNA polymerase I and III, ribosomal protein transcription, and differential mRNA translation efficiency. However, there is a lacking mechanistic connection of signaling pathways to pre-rRNA processing and maturation steps of ribosome biogenesis. The authors set out to provide a specific example of a direct target of MAPK signaling, RSK that regulates pre-rRNA maturation through the phosphorylation of a ribosome assembly factor (RIOK2), offering for the first time providing mechanistic insight into MAPK regulation of pre-rRNA maturation.

      The authors observe slight pre-rRNA processing defects upon the use of RSK inhibitors and RSK depletion. They identified several candidate ribosome assembly and modification factors containing the canonical RSK substrate motif, including the RIOK2 kinase. Phosphorylation at this motif was verified to be specifically phosphorylated by RSK1 and 2 isoforms in cells and in an in-vitro kinase assay. The authors produced RIOK2 knock-in eHAP1 cell lines expressing non-phosphorylatable or phosphomimetic versions of RIOK2, observing slowed cellular proliferation, decreases in global translation, slight pre-rRNA processing abnormalities, but not changes in overall mature 18S rRNA levels. More specifically, the authors defined the inability of RIOK2 to be phosphorylated leads to defects in RIOK2 dissociation from the pre-40S ribosomal subunit in an in-vitro assay, and inability for it to be recycled for reuse in pre-ribosome export from the nucleus to the cytoplasm by immunofluorescence.

      Overall, the authors provide an interesting mechanism of MAPK regulation of a ribosome assembly factor RIOK2. However, they fail to provide the necessary reproducibility, controls, quantification, and consistent results between experiments to support their hypotheses.

      Major Comments:

      1.The northern blots reported throughout the manuscript are lacking proper reproducibility and quantification. First, the northern blots are lacking a loading control, which is necessary to report fold changes that are being measured across treatments. Please include a proper loading control (i.e. 7SL or U6 RNAs). Additionally, more rigorous analysis of the pre-rRNA precursor levels through ratio analysis of multiple precursors (RAMP) (Wang et al 2014) can be completed to provide a clearer depiction on which precursor(s) are accumulating. It is unclear for the Figure 1 northern blots if there were replicates completed and what the error bars represent in Figure 1B. Please report replicates, so that statistical analysis can be completed on the differences in precursor relative abundance. This need is emphasized by the small changes observed in pre-rRNA levels (less than 2 fold) between conditions.

      2.The western blots reported throughout the manuscript are lacking proper reproducibility and quantification. For example, the western blots validating RSK1 and RSK2 depletion in Figure 1C lack a proper loading control. Additionally, it is unclear if there are replicates completed and there is lack of statistical analysis to determine if the changes are significant. Please include loading controls, replicates, and quantification of the western blots throughout the manuscript.

      3.Please report the full bioinformatic analysis of the RSK substrate motif search among human AMFs including other AMFs found in this search. A sorted list format would be valuable for the reader to understand other potential RSK substrates involved in ribosome biogenesis.

      4.The authors report that RSK inhibition/depletion leads to accumulation of the 30S pre-rRNA, yet mutation of its target site on RIOK2 or RIOK2 depletion leads to an accumulation of the 18S-E pre-rRNA. Additionally, the phosphomimic mutation of RIOK2 leads to an accumulation of 30S, the opposite of the expected result. Please elaborate on this discrepancy in processing defects observed across experiments. Are there similar results for RSK depletion/inhibition and RIOK2 release from the pre-40S and inability to import into the nucleus? If so, this could provide phenotypic consistency between these two proteins in the proposed pathway to further support the hypothesis.

      5.Mature levels of 18S rRNA are not altered in the RIOK2 mutant cell lines. This could be due to compensation in these mutant cell lines since RIOK2 is essential. Please report the mature 18S rRNA levels upon shRNA depletion and RSK inhibitors to provide insight into if this pathway significantly alters mature 18S rRNAs as a mechanism for the altered translation and proliferation observed.

      Minor Comments:

      1.Figure 1A lower: The authors use an RSK inhibitor LJH685, that does not inhibit RSK phosphorylation S380. Therefore, another verification of RSK inhibition must be used besides RSK-pS380 abundance as for PD184352 inhibition. Please validate the usage of this RSK inhibitor in the experiments by inclusion of quantification of a direct downstream substrate of RSK, such as YB1-pS102 quantification.

      2.Page 7, Lines 8-12: The authors state that RSK knockdown led to increases in the 45S, while the LJH685 treatment led to no changes in 45S levels due to differences in growth conditions. Please elaborate more on how growth conditions would alter 45S pre-rRNA levels. It would be expected that stimulation of the MAPK pathway would increase pre-rRNA transcription compared to steady state growth conditions. However, pre-rRNA processing northern blots are only measuring steady state levels of the precursors. Thus, an rDNA transcription assay would need to be completed to evaluate these differences.

      3.Figure 2C: Please quantify these results to properly evaluate the role of these two phosphorylation sites in MAPK signaling.

      4.Please include the RIOK2 pS483 antibody generation methodology used in this study.

      5.In vitro kinase assay methods: Is the recombinant RSK1 the human version of the protein? Please clarify in methods.

      6.Figure 4B: Please include statistical analysis of the puromycin incorporation assay.

      7.Page 13, Line 18: Please explain why RIOK2 co-IP with NOB1 is important.

      8.In vitro dissociation assay: There is no control for pulldown of entire pre-40S particles and not just NOB1 protein. Thus, it is unclear if RIOK2 is dissociating from NOB1 or entire pre-40S particles. Please reference previous literature of the methodology of this experiment if applicable. Additionally, please include controls, such as western blotting of ribosomal proteins or northern blotting of rRNA in the pulldown fraction used.

      9.Page 16, Lines 10-12: The authors state "RSK facilitates the release of RIOK2 and other AMFs", however the only other AMF in this study was NOB1. Please reword appropriately that most likely facilitates release of RIOK2 and other AMFs in a RIOK2 dependent or independent manner if it also phosphorylates other AMFs which possess the motif.

      Significance:

      This manuscript is significant due to the lack of mechanistic connection of cellular signaling pathways to pre-rRNA processing. There have been, for the most part, no mechanistic connection of signaling pathways to pre-rRNA processing regulation and none for direct targets of MAPK signaling (Reviewed in Gaviraghi et al 2019). They provide the groundwork for analysis of MAPK signaling in regulation of an assembly factor and inclusion of their motif analysis could provide RSK signaling targets' regulation of specific steps of ribosome biogenesis that remain to be elucidated.

      Although the research delves into a specific mechanism, its audience could be far reaching as it is in the ribosome biogenesis field and MAPK signaling, which have broad implications in cancer and developmental diseases.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The authors report that inhibition of MAPK signaling via RSK is associated with modest alterations in the relative abundance of human pre-rRNA species, that are most marked for 30S but also visible for 21S - although not clearly shown for 18S-E.

      RIOK2 has two closely spaced sites predicted as RSK targets, one of which was confirmed to be MAPK sensitive and shown to be an RSK substrate in vitro. Substitution of Ser483 with Ala was associated with reduced growth and 18S-E accumulation, consistent with impaired NOB1 cleavage activity. RIOK2-S483A also showed greater pre-ribosome association in vivo and consistent with this, more stable association in vitro and increase cytoplasmic residence. These effects are clear, although the data do not directly demonstrate their linkage to loss of RSK phosphorylation.

      The mutations were apparently generated directly in the genome of haploid cells, potentially raising concerns that the introduction of a deleterious mutation might have been accompanied by compensatory mutations elsewhere. However, three cells line gave similar results, mitigating this concern.

      Specific comments:

      1.To help the reader, the authors should directly discuss why they think the data on MAPK inhibition did not reveal a clearer pre-18S cleavage phenotype, as would have been expected for loss of RIOK2 activity.

      2.Fig. S3: The degree of RSK depletion with the siRNAs appears very modest, as are the effects on RIOK2-P. Moreover, the double depletion is not clearly better than single depletions. These data should probably be supported by quantitation or withdrawn.

      3.Fig. 5D: For 18S-E recovery with RIOK2, is the ratio adjusted for the increase in 18S-E abundance in the mutant - ie is recovery increased when adjusted for the increased pre-rRNA abundance?

      Significance

      Overall, the analyses on the phenotype of RIOK2-S483A, and the demonstration that this site is an RSK target, appear convincing.

      Caveats are

      1)the phenotype seen on inhibition of RSK, would not have implicated RIOK2 as the obvious candidate for the factor responsible for the observed processing defects;

      2)the RIOK2-S483A phenotype is not demonstrated to be RSK dependent. This raises the possibility that, although RSK can phosphorylate S483, the effects of the mutation are not due to the loss of this modification.

      With these provisos, the work is technically good and will be of considerable interest to the field. The post-transcriptional regulation of ribosome synthesis is increasingly recognized a significant topic.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript addresses an important topic, the posttranscriptional maturation of ribosomes. This topic is inherently interesting because we normally think of ribosome biogenesis as a sequential series of steps that automatically proceeds and cannot be "accelerated" in physiological conditions, but only "delayed" in the presence of genetic mutations. In short, the manuscript proposes that RIOK2 phosphorylation by the action of RSK, below the Ras/MAPK pathway promotes the synthesis of the human small ribosomal subunit.

      I honestly admit that I have some difficulties in reviewing this manuscript. The quality of the presented data is, in generally, good. However, overall I find the whole manuscript preliminary and I am not much convinced of the conclusions. Several aspects are superficially analyzed. In short, I think that most of the conclusions are not fully supported by the data because shortcuts are present. A list of all the aspects that I found wrong are listed.

      Biological issue

      1. The authors claim that the effects of the inhibition of the maturation of ribosomes by acting on a pathway upstream of RIOk2 are limited to the 40S subunit. This is far from being a trivial point, for the following reason. RIOK2 is known to affect the maturation of 40S ribosomes. Hence, the fact that using an upstream inhibitor of the MAPK pathway such as PD does not inhibit 60S processing in reality would argue against a biologically relevant control in ribosome maturation (of the MAPK patheay). Have the authors considered this? In a way, also, given the fact that the mutants confirm a role in 18S final maturation, it is a bit complex to put all the data in a clear biological context.

      A number of specific issues will be concisely described.

      Manuscript very well written. Data do not always support the strong conclusions. Low magnitude of the observed effects.

      In introduction the authors make a general claim that ribosome biogenesis is one of the most energetically demanding cellular activities. This statement lingers in the literature since 15 years but in reality it has never been formally proved for mammalian cells, and certainly not for HEK293 cells. The original statement, to my knowledge, can be traced by some obscure statement referred to the yeast case and then repeated as a truth. In conclusion, beside being a very banal observation, it should be referenced.

      Growth factors, energy status are not cues but are proteins or metabolites (introduction). Authors write about mTOR without making statements on mTORC1/2. This is very obsolete. Also I am not sure that the choice of Geyer et al., 1982, and subsequent papers makes much sense. At the very minimum TOP mRNA concepts and mTORC1 must be defined.

      The authors claim that heir work fills a major gap between known functions of MAPK and cytoplasmic translation. I would not be so sure about it.

      Results. Authors start with a major mistake, i.e. that PMA selectively stimulates the MAPK pathway. Perhaps it stimulates, certainly it does not do it selectively.

      RIOK2 phosphosites are first found by bioinformatics analysis. It should be noted that the predicted phosphosite (S483) is found only in a limited set of datasets from MS databases. The actual importance of this site would not emerge from unbiased studies. Also, there are many other phosphosites that were not analyzed in this study.

      Throughout the paper the authors use the word strongly, significantly, but the actual effects seem in general quite marginal.

      Discussion. The authors claim that they provide solid evidence on MAPK signalling to ribosome maturation. At the very best this is circumstantial evidence for the 40S maturation.

      Figure 1. Unclear why LJH should increase P-ERK. General lack of quantitation (sd, replicates, bars). Experiment done only on a single cell line in a single experimental setup. Very different effects on 21S by LJH,PMA and siRNA for RIOK2. Overall the message given by the authors is to me mysterious.

      Figure 2. Several red flags. For instance in 2C the loaded levels of RIOK2-HA loaded are clearly less than the ones of the other genotypes, hence the conclusion on P-RIOK2 is not convincing. Staining with anti-P RIOK2 lacks controls, how can be sure that the signal is due to the phosphate? Phosphatase treatment? Why FBS does not lead to ERK staining in HEK293? There are plenty of growth factors in FBS that should lead to ERK phosphorylation. I do not understand this experiment.

      Figure 3. In vitro phosphorylation, if I understood, it relies on a truncated version of RIOK2. Why? Is the folding of the full length protein not permissive to in vitro phosphorylation? HA-RSK3 is less?

      Figure 4. Immunofluorescence is low mag, difficult to understand. I really like the experiments with RIOK2 mutants, however I wonder what about protein levels after the knock-in? Given the 18S phenotype overlap between the phenotype of the RIOK2 loss of function with the S483A, testing protein level becomes of the utmost importance.

      Figure 5. Low quality IFL. Hard to think that histogram quantitation of nuclear versus cytoplasmic staining are reliable in the absence of fractionation, better quantitation, experiment done in other cell lines and so on. However, very beautiful Fig. 5E perhaps the best of the paper shows also mobility shift driven by S483, thus supporting posttranslational modifications.

      Fig. 6. IFL studies are really impossible to interpret. The effects on RIOK2 release (this figure) and 18S maturation (Fig. 5) are very clear and of great quality. Overall conclusions. The manuscript tends to overinflate the meaning of several experiments. What to me is very clear and interesting is that the the authors provide clear evidence that S483A mutants have a defect in 40S maturation. Whether this is due to MAPK signalling, is only circumstantial. I would suggest to build up on the strong findings and eliminate ambiguous data.

      Significance

      The paper deals with an important topic, namely whether a regulation of ribosome maturation exists, and how it is mechanistically regulated. In this context, the analysis of the ERK pathway is highly needed considered that most works deal with effects of the PI3K-mTOR pathway, and the parallel, yet important RAS-ERK pathway, is less understood. As a final note, we should consider that S6K downstream of mTOR, and ribosomal S6K, downstream of ERK have been considered to share some substrates.

      The manuscript is interesting, but several statements given by the authors are rather superficial. An example, listed in the previous section, relates to the linguistic usage of mTOR kinase, instead of detailing whether we are dealing with mTORc1 or mTORc2. A second gross mistake is the definition of PMA as a stimulator of the ERK pathway. If this is certainly true, this is historically not correct as seminal papers by the group of Parker define this drug as a stimulator of conventional PKC kinases. In short, this paper is a step back in knowledge from the perspective of the literature context.

      All people interested to the crosstalk between ribosome maturation and signaling pathways will be certainly read this manuscript.

      My expertise is within the ribosome biology and signalling field.

  2. Oct 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Please note that the authors have provided a formatted PDF version of this rebuttal, including additional figures and references, via the Open Science Framework: https://osf.io/5acqp/

      Reviewer #1

      This is an interesting and thorough study characterising human iPSC with hetero or homozygous mutation in pi3k pathway that lead to its hyper-activation. They prove that the increased stemness results from enhanced autocrine responsiveness to TGF signalling pathway. The main conclusions are well supported by the presented data. Cutting edge tools and bioinformatic analysis are adequately applied. I have only one important point:

      1) Western blot based validation of TGF pathway activation in wt and mutant iPSCs will be helpful to strengthen the results based on bioinformatic data.

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for the positive evaluation of our work.

      Functional validation of the signalling hypothesis is indeed important, and we did in fact already present supportive data. Current evidence suggests that SMAD2 is the main transcription factor mediating actions of the TGFb/NODAL pathway in an early developmental context [1,2], and we have shown increased phosphorylation of SMAD2 (S465/S467) in PIK3CAH1047R/H1047R iPSCs using RPPA in the two datasets shown in Fig.2.

      We have attempted to demonstrate increased NODAL protein directly in PIK3CAH1047R/H1047R cells, but have been unsuccessful due to poor signal on immunoblotting. We thus opted for functional testing of our hypothesis using the experiment presented in Fig. 5, wherein TGFb (a surrogate for NODAL) is removed from the culture medium. Human iPSCs depend strictly on TGFb/NODAL for maintenance of NANOG expression and thus pluripotency [3,4]. Upon exclusion of TGFb/NODAL from the culture medium of normal human iPSCs, the early responses (prior to overt differentiation) are expected to be: (A) decreased NODAL expression, due to well-established autoregulation [2], then (B) a decrease in NANOG and ultimately POU5F1 (OCT3/4) mRNA levels (see also Introduction, lines 80-90). The evidence in Fig. 5 that PIK3CAH1047R/H1047R fail to exhibit these responses upon exogenous TGFb/NODAL removal supports the notion that these cells autonomously sustain TGFb/NODAL signalling.

      For improved clarity, we have also added the following information to the revised manuscript:

      lines 202-205: “This is consistent with strong NODAL mRNA upregulation and increased pSMAD2 (S465/S467) in PIK3CAH1047R/H1047R iPSCs in the current study (Dataset S2 and RPPA data in Fig. 2, respectively), and with prior evidence of activation of the NODAL/TGFb pathway in homozygous PIK3CAH1047R iPSCs.”

      Reviewer #2

      In this manuscript, Madsen et al have investigated the role of heterozygous versus homozygous PIK3CAH1047R gain-of-function mutation at maintaining stemness of induced pluripotent stem cells (iPSCs). The authors have performed high-depth RNAseq, proteomic, and RPPA analyses to show that biallelic PIK3CA alterations induce stronger activation of the PI3K signaling axis, compared to monoallelic mutations. The authors claim that a higher PI3K signaling dose activates the NODAL/TGF-b pathway, which in turn supports stemness in an autocrine fashion. These are important findings, however, the manuscript and its conclusions can be improved.

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for acknowledging the importance of the work and for their constructive suggestions for improvements.

      The authors have described the role of PIK3CAH-1047R gain-of-function mutation in cancer and overgrowth syndromes. However, cancer associated somatic mutations in PIK3CA are mostly heterozygous. Similarly, PIK3CA related overgrowth syndromes (PROS) are caused by post-zygotic mosaic PIK3CA activating mutation. As such, the relevance of homozygous PIK3CA alterations to these pathological conditions is unclear. The authors should elaborate on the biological implications of their findings.

      AUTHORS’ RESPONSE__:__ We disagree with the Reviewer’s comment which implies that homozygous PIK3CA mutations are not relevant to many cancers. In our previous work [5], we provided evidence that many human cancers harbour multiple PIK3CA mutant alleles. Specifically, among cancers with a unique PIK3CA mutation, approximately 50% exhibit multiple copies according to allele copy number analysis. We further demonstrated that a substantial proportion of cancers have multiple different PIK3CA variants or additional oncogenic ‘hits’ within the pathway. These findings have been supported by other recent high-profile papers [6–8]. Such multiple alterations increase activity of the PI3K pathway beyond the level seen with heterozygosity alone [5,6]. This substantial body of literature renders our PIK3CAH1047R iPSC model system highly relevant for studying disease-relevant, dose-dependent oncogenic PIK3CA activation.

      The Reviewer is correct, however, that PROS is caused by postzygotic heterozygous PIK3CA mutations almost exclusively. Observations in homozygous cells are therefore not directly relevant to the pathogenesis of PROS. On the other hand, the heterozygous cells are closely relevant, being human, carefully matched with isogenic controls, and unperturbed by further manipulations such as artificial immortalisation. Our prior studies demonstrated no clear phenotypes in heterozygous cells in the iPSC differentiation paradigm, despite the rock solid causal nature of heterozygous mutations in PROS. This negative finding, surprising given the dramatic PROS phenotypes, is very important in understanding how best to create disease-relevant PROS models. One intent of the current study was to increase the sensitivity of our transcriptomic analysis, and to combine this with proteomic studies to determine if heterozygous cells really do not exhibit a phenotype. We now show that there are indeed faint echoes in heterozygous cells of the dramatic changes in homozygous cells. We believe that the human growth phenotype is a summative consequence of such small differences in growth behaviours sustained over months and years, highlighting how subtle difference in signalling can lead to dramatic human growth consequences across the lifecourse. Similar observations were also recently made following systematic analyses of oncogenic RAS mutations [9]. The new information we present about heterozygous PIK3CAH1047R cells, while much less “showy” than the cancer-relevant behavious of homozygous cells, we thus contend is very important for understanding of the PROS phenotype and its experimental modelling. To emphasise this point, we have added the following statements to the abstract and discussion, respectively.

      • lines 56-57: “This work illustrates the importance of allele dosage and expression when artificial systems are used to model human genetic disease caused by activating PIK3CA mutations.”
      • lines 104-106: “We discuss the implications of our findings for understanding and modelling developmental disorders and cancers driven by genetic PI3K activation.”
      • lines 333-340: “Finally, our observations are important for future studies seeking to model human PIK3CA-related diseases. The modest changes observed in heterozygous PIK3CAH1047R cells, in sharp contrast to the radical transcriptional alterations in homozygous cells, emphasise the importance of careful allele dose titration when artificial overexpression systems are used to model disorders caused by genetic PIK3CA activation. Our findings in heterozygous cells are also a reminder that very small effect sizes in cellular systems may summate and result in major human phenotypes over a life course. That such minor changes are found in a cellular study of a rare and severe disorder emphasises the challenges of modelling much more subtle disease susceptibility conferred by GWAS-detected genetic associations, where cellular effect sizes are likely to be smaller still.”

        The role of biallelic PIK3CA mutation is reminiscent of compound mutations in PIK3CA which have also been shown to increase PI3K signaling output. However, double PIK3CA mutations confer enhanced sensitivity to PI3K inhibition (Toska et al. Science 2019). Could the authors kindly speculate on this discrepancy.

      AUTHORS’ RESPONSE: We emphasise first that PIK3CAH1047R/H1047R cells do respond to BYL719 at the signalling level, as demonstrated previously [5] and in the manuscript (revised Figure S5; see also additional Western blot below). Our point is that the cells have undergone a switch to self-sustained stemness. That is, while PIK3CA activation was the driver of the initial change in cell state, the induced stemness phenotype is no longer reversed by removal of that trigger, with our data suggesting that this is now driven by self-sustained TGFb/NODAL signalling. This is in line with the role of this pathway in the maintenance of the pluripotent state. We speculate that this may be important in a cancer context where surviving stem cells may permit cancer persistence after toxic therapies, even if short term growth of tumours is reduced by agents such as PI3K inhibitors.

      Our data are not directly comparable to prior cellular data, for example in Vasan et al. [6], due to: (a) use of different cell model system and (b) assessment of different functional responses. We would also sound some methodological notes of caution re some of the prior studies alluded to, as potentially confounding differences in growth rate in the cells studied was not corrected for. It is well-established that IC50 and Emax values depend on cell division rates, and failure to correct for this can result in artefactual correlations between genotype and drug sensitivity (see, e.g., Hafner et al. Nature Methods 2016: “Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs” [10]**).

      Similarly, the p110 alpha specific inhibitor, alpelisib, is highly effective against PIK3CA-mutant ER+ breast cancer and PROS. As such, the clinical relevance of the insensitivity of homozygous PIK3CA mutation to PI3K inhibitors is unclear.

      AUTHORS’ RESPONSE__:__ Efficacy of Alpelisib in PROS is currently supported only by unregistered observational studies, but is nevertheless striking. It is not relevant to our findings in homozygous cells, as the Reviewer has previously observed, however.

      As for cancer, in a randomised phase 3 trial that compared Alpelisib/BYL719 with fulvestrant to fulvestrant alone, the overall response (irrespective of PIK3CA mutant status) was indeed greater with the combination treatment (26.6 % vs 12.8 %), with a hazard ratio of 0.65 (95% CI, 0.5 to 0.85) in patients with PIK3CA-mutant caners versus a hazard ratio of 0.85 (95% CI, 0.58 to 1.25) in those without a PIK3CA mutation [11]. This trial demonstrated the utility of additional PIK3CA mutant-centric stratification, but a substantial proportion of patients with PIK3CA-mutant tumours (>50%) did not benefit from the BYL719 and fulvestrant combination [11]. However, these observations are not directly relevant to this manuscript and are instead included in a separate manuscript focused on PI3K signalling and stemness in human breast cancers (preprint [12]**).

      Figure 2: The authors have performed RPPA analysis in the presence of 100 nM BYL719. Alpelisib is commonly used at 1 uM concentration for in-vitro experiments, and has a cMax of ~5 uM. We suggest the authors perform western blot analysis to confirm the results of RPPA.

      AUTHORS’ RESPONSE__:__ We carefully chose the optimal concentration of BYL719 to preserve inhibitor selectivity, and to avoid undue toxicity and confounding off-target effects, rather than copying the dose “commonly used”. The Cmax is not relevant to our use of BYL719 in the current study as a precise tool compound. We refer the Reviewer to the known pharmacological characteristics of this compound [13,14]. According to available evidence, it is only a selective PI3Kα inhibitor at concentrations 250 nM (Table below adapted from Ref. **[13]; for formatted version, please see PDF version: https://osf.io/ecmhr/)

      Enzyme

      In vitro IC50 for NVP-BYL719 (nM)

      PI3Kα

      4.6 +/- 0.4

      PI3Kα-H1047R

      4.8 +/- 0.4

      PI3K**b

      1156 +/- 77

      PI3K**d

      290 +/- 180

      PI3K**g

      250 +/- 140

      PI4K**b

      571 +/- 42

      We have previously demonstrated (Fig. 2C in Ref. [5]) that 100 nM BYL719 is sufficient to restore pAKT (S473) levels in both heterozygous and homozygous PIK3CAH1047R to levels observed in WT cells. This is consistent with the RPPA data reported in the current work (Fig. 2B). Of note, while 500 nM BYL719 completely ablates pAKT irrespective of genotype, we previously noted substantial toxicity [5], precluding use of this or higher doses of BYL719 in our model system. This is in line with a recent Nature Cell Biology study by Yilmaz et al. ([15]) which demonstrated the essential growth-promoting role of the PI3K pathway in human pluripotent stem cells; Yilmaz et al. also demonstrate that compared to somatic cells (fibroblasts), human pluripotent stem cells suffer dramatic effects on growth/survival in response to Torin1/rapamycin [15], overall suggesting that this cell type is exquisitely sensitive to inhibition of the PI3K/AKT/mTOR pathway.

      In the present study we have also confirmed that 250 nM BYL719, used for Fig. 5 experiments, has worked as expected at the level of pAKT (S473) as shown in the below Western blot (see also revised Fig. S5; please access PDF version to view Western blot: https://osf.io/ecmhr/)

      Figures 3 and 4: The authors should expand their RNAseq analysis to demonstrate enrichment of stemness and TGFb signaling in homozygous mutant cells compared to heterozygous cells.

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for this suggestion. The unsupervised MDS plot (Fig. 1A) clearly demonstrates the overlap between wild-type and heterozygous cells, strongly suggesting functional concordance and consistent differences to homozygous counterparts. Indeed, the below count table illustrates that the majority of differentially expressed genes in homozygous versus wild-type cells are also differentially expressed in homozygous versus heterozygous cells, including the direction of the change (please access the PDF version for formatted table: https://osf.io/ecmhr/)

      Comparison

      Differentially expressed gene count

      HOMvsWT

      5644

      HOMvsHET

      5764

      HOMvsWT AND HOMvsHET

      4825 (2300 upregulated; 2525 downregulated; 1 discordant)

      We have now performed additional fast gene set enrichment analyses (fgsea; shown below - please access PDF version to view figure: https://osf.io/ecmhr/) using the R package fgsea ([16]) and 14 of the Broad Institute’s 50 Hallmark Gene Set Collection [17], including manual addition of the PLURINET signature [18]. The 14 gene sets were chosen based on their relevance to answering the Reviewer’s question as well as their connection to PI3K signalling. Fold changes for all expressed genes were included in the analyses, without further thresholding in order to minimise bias.

      The results for homozygous vs wild-type comparisons are concordant with our upstream regulator analyses using IPA; as expected, TGFb signalling and PI3K signalling are among the top positively enriched (NES > 1) in comparison between homozygous and heterozygous cells. Unsurprisingly, however, the strength of the enrichments are lower when comparing the two PIK3CAH1047R genotypes.

      We are not convinced that including these surplus data will add value to the manuscript and its main message, however we will leave this decision to the discretion of the Editor (please also refer to our response to the subsequent question from Reviewer 2). Moreover, these data will remain visible in the publicly available rebuttal document.

      The authors should confirm the results of pathway analysis in vitro to show that homozygous PIK3CA mutation confers increased stemness compared to heterozygous mutation.

      AUTHORS’ RESPONSE__:__ This was a key finding in our previous publication [5]. The aim of the current study was to interrogate this phenomenon further through high-depth transcriptomic/signalling analyses.

      Figure 5: Kindly provide direct evidence demonstrating that increased PIK3CA signaling output induces NODAL expression in this experimental setting.

      AUTHORS’ RESPONSE__:__ We have consistently demonstrated increased NODAL mRNA expression (RNAseq data, Fig. S4 and Ref. [5]). Unfortunately, we have been unsuccessful in attempts to obtain good quality immunoblots for NODAL protein in PIK3CAH1047R/H1047R cells (as noted in response to Reviewer 1). We note, in fact, that such documentation of NODAL protein levels, while not unprecedented, is fairly rare.

      Also, please normalize gene expression data to WT cells so it is easy to visualize the changes in NODAL and NANOG expression in homozygous and heterozygous mutants compared to WT iPSCs.

      AUTHORS’ RESPONSE__:__ It is arithmetically more precise to normalise to the highest expression (i.e. that of PIK3CAH1047R/H1047R cells) – thereby avoiding artificial inflation of fold-changes when normalising to very low levels of expression. Ultimately, the relative levels calculated – and the increased expression of NODAL in PIK3CAH1047R/H1047R cells – are identical visually. Only the entirely arbitrary units change. Thus we do not deem normalisation to WT to be necessary or to add value to the analysis.

      Kindly quantify Fig. S5.

      AUTHORS’ RESPONSE__:__ These brightfield micrographs were taken as part of routine practice to monitor cell health during maintenance and experimentation, and are suboptimal for direct quantitation due to uneven illumination background and lack of whole-well imaging. Nevertheless, we have now undertaken quantification as the Reviewer suggests, using individual images taken during independent experimental replicates. The results have been added to Fig. S5 and support our assertion that 250 nM BYL719 had a growth inhibitory effect in homozygous PIK3CAH1047R iPSCs. All raw images and associated data have been uploaded to the Open Science Framework (https://osf.io/hbf7x/). The following short method section detailing the image analysis algorithm has also been included in the revised supplementary material:

      “Colony size quantitation from light micrographs

      Routine cell culture light micrographs were acquired on an EVOS FL digital inverted microscope (AMF4300, Thermo Fisher Scientific) using the 4X or 10X objective (final magnification 40X and 100X, respectively). For quantitation, 4X images were used for colony segmentation with Definiens Developer XD software. Background was detected using a contrast threshold; for this each pixel was compared to those in the surrounding 24 pixels (i.e. a 5x5 pixel box), and pixels with low contrast (between -50 and +50) were classified as background. Remaining pixels were classified as colonies, and any holes (pixels that were not initially classified as being part of the colony due to low contrast) were filled. Edges of the resulting colonies were smoothened by shrinking and then growing the colonies by 2 pixels. Finally, colonies less than 2000 pixels in size were reclassified as background. The area of the resulting colonies could then be measured and averaged over each field of view.”

      Reviewer #3

      In this manuscript by Madsen et al., a comparison of the transcriptome and proteome in heterozygous and homozygous PIK3CAH1047R human pluripotent stem cells mutants is presented. The authors demonstrate marked alterations in expression at both the protein and RNA level of homozygous mutants compared to wildtype, while heterozygous lines exhibit only minor changes. Multiple analytical approaches are employed to investigate network alterations, leading the authors to suggest a TGFβ-mediated rewiring of key pluripotent genes to induce a state of sustained stemness. Madsen et al. conclude with a set of experiments to functionally implicate NODAL/TGFβ autocrine signalling in PIK3CAH1047R dose-dependent stemness. The key conclusions are not convincing. While the unbiased omics approach sets up this study well, the study suffers from a lack of convincing functional assays (cell biological assays) to test their model and tease apart a phenotype for the het cells. More robust functional experiments are required to support the finding the NODAL/TGFβ signalling mediates the self-sustained stemness, particularly because this is the major novel finding distinguished from the authors previous work.

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for their detailed critique. Our perspective on the robustness and novelty of our findings diverges from that of the Reviewer, however, as we elaborate on in more detail below.

      While the authors present a comprehensive omics investigation into alterations between wild type, homozygous, and heterozygous mutants, the critical functional experiments are lacking. In Figure 5, the authors seek to support the role of TGFβ in mediated stemness in the homozygous mutants, however, are not able to directly deplete TGFβ due to technical limitations of the culture conditions. Consequentially, the experiments are primarily built on the use of NODAL withdrawal and stimulation. The data presented thus implicate NODAL in the stemness phenotype, but it's not obvious TGFβ is substantially involved, particularly considering the inhibitor subsequently employed also inhibits NODAL type 1 receptors.

      AUTHORS’ RESPONSE__:__ NODAL and TGFb activate shared signalling pathways downstream from their respective receptors, and indeed they (as well as Activin) can be used interchangeably in stem cell culture, which is common practice [19–21]. Commercially available Essential 8/TeSR-E8 is supplemented with TGFb not NODAL; therefore the factor we have removed is TGFb, prior to any controlled introduction of NODAL (based on strong upregulation of its mRNA in PIK3CAH1047R/H1047R). Any residual TGFb-like ligands will be contributed by Matrigel as outlined in the text (lines 247-251). It is well-established that “NODAL/TGFb signalling” denotes signalling through SMAD2/3/4 (as opposed to BMP signalling through SMAD1/5/8), and this is how we use the term throughout the manuscript. Accordingly, it is functional activation of the “NODAL/TGFb signalling pathway” that we investigate (see also response to Reviewer 1, p.1).

      In summary, we seek not to make a distinct point about TGFb, but rather refer to NODAL/TGFb signalling as a matter of biochemical correctness. For clarity, we now replace mentions of “TGFb signalling” with “NODAL/TGFb signalling” throughout the revised manuscript. We have also revised the legend for Figure 3 to make this clearer.

      Furthermore, there is a paucity of readouts for stemness. For example, a more convincing narrative would include additional expression markers of the core pluripotency network (e.g. OCT4, SOX2, etc.) as well as functional readouts (e.g. NODAL withdrawal and assessment of differentiation) after NODAL stimulation/depletion and comparing across genotypes. Overall, the primary conclusions of this work are not well-evidence by the presented data and the authors should consider additional functional experiments or reframing the narrative.

      AUTHORS’ RESPONSE__:__ We chose the current strategy because we wanted to capture the earliest changes after depletion of NODAL/TGFb/ signalling, prior to any signalling rewiring triggered by differentiation. In fact, we believe that a strength of this study is our observation of differences in critical stemness markers in spite of the short time course. To aid non-expert readers we offered a primer on stemness genes and rationale for the markers chosen in the existing introduction (lines 80-90).

      We have further assessed additional stemness and differentiation marker genes in two independent homozygous PIK3CAH1047R cell lines using a high-throughput pluripotent stem cell scorecard (Fig. S4). This replicates the effect on cell marker genes documented by RT-qPCR in Fig.5, while also showing additional reductions in genes that were upregulated in homozygous PIK3CAH1047R cells (MYC, GDF3, FGF4) and which have previously been shown to be highly expressed in pluripotent stem cells (we have now added this additional clarification to the legend of Fig. S4) [22]. Despite the short term treatment, these data also show that no other treatment but SB431542 is capable of triggering expression of early neuroectoderm markers (CDH9, MAP2 and PAPLN) [23], prior to overt morphological changes in the cultures (Fig. S5; higher resolution images are also available via The Open Science Framework: https://osf.io/hbf7x/). Neuroectodermal gene expression is expected upon inhibition of TGFb signalling in human pluripotent stem cells [24,25].

      A key conclusion of this study is there is a dose-dependent stemness phenotype. As this is not explicitly defined, to this reader, it would imply a graded response between wild type, heterozygotes, and homozygotes in the phenotypic and molecular characteristics. However, as is noted particularly in the omics components of the manuscript, there is in fact "near-binary" alteration in the assayed characteristics. Again, this should be qualified more explicitly, but it is more consistent with the data, which suggests the heterozygotes behave very similarly to the wild types, while homozygotes have substantial alterations. I would suggest the authors consider renaming their descriptions, removing "near-binary" and "dose-dependent" to something like "dose-threshold." This suggests after X threshold of oncogenic PI3K signalling, substantial alterations occur; under this threshold (e.g. hets), changes are marginal. In the event however that there may be a more "dose-dependent" effect, I would expect the transcriptomic and proteomic changes observed in the heterozygous cell lines should be seen in the homozygous cell lines (of which they are likely in greater in magnitude in addition to other changes).

      AUTHORS’ RESPONSE__:__ This appears to us to be largely a matter of semantics. In talking of “dose dependency” we were certainly not implying a graded affect (as the Reviewer points out, our are findings are far from this, suggesting a sharp threshold of dose which triggers widespread changes), and indeed nothing in these words strictly suggests this interpretation. Nevertheless we are sensitive to the fact of the Reviewer’s interpretation of the term, and mindful that this might be shared by other readers. On the other hand talking of a “near-binary” effect seems to us to be an accurate description of our findings. We have edited the manuscript to minimise ambiguity with the following changes:

      • line 49 “dose” replaced with “strength”: “We demonstrate signalling rewiring as a function of oncogenic PI3K signalling strength, and provide experimental evidence that self-sustained stemness is causally related to enhanced autocrine NODAL/TGFb
      • line 102: “This work provides in-depth characterisation of the near-binary PI3K signalling effects seen in hPSCs ….”
      • lines 195, 198, 317: inserted “allele dose-dependent We would also like to take issue with the case that the Reviewer seems to be making that a more graded change in gene expression across heterozygotes and homozygotes is to be expected. As mentioned in the manuscript (lines 206-210), there is evidence for NODAL/TGFb pathway activation in heterozygous cells. Nevertheless given the known temporal, context- and dose-dependent effects of this pathway [1,2,26,27] and, importantly, the widely described biological properties of developmental systems (featuring positive feedback loops, bistability and hysteresis; see Ref. [28,29]), we have no reason to expect that transcriptomic and proteomic changes observed in homozygous cell lines will be reproduced in heterozygous cell lines.

      The manuscript would benefit from more direct comparisons between the heterozygotes and homozygotes.

      AUTHORS’ RESPONSE__:__ Please refer to the additional data provided in response to a similar question by Reviewer 2.

      Further to the above point, as the marginal phenotype observed in heterozygotes is a critical point in this paper, the authors would benefit from including heterozygote lines in the functional experiments presented in Fig 5. Inclusion of the hets in these experiments would instill confidence in this reader that the marginal molecular alterations characterized at the proteomic and transcriptomic level is reflected in the lack of functional stemness-sustaining behaviour.

      AUTHORS’ RESPONSE__:__ The lack of stemness-sustaining behaviour in the heterozygous clones was demonstrated across multiple different experiments in our previous work, and further functional studies of early differentiation in these cells seemed a poor use of resource and very unlikely to give useful insights. Given the major disease phenotype associated with the same genetic change (PROS), the relative lack of phenotype in heterozygous cells was surprising and holds obvious implications for disease modelling (see also response to Reviewer 2, pp.2-3), and for how model systems are “calibrated” against human developmental disease. The aim of the current work was to:

        • Determine whether increasing the depth of signalling and transcriptomic analyses would unmask small but important changes in heterozygous mutants that might have been missed in prior studies (i.e. we actively aimed to increase the power of the study for identification of subtle changes) and *
        • To characterise in greater depth the signalling and transcriptional changes underpinning the robust threshold effect observed for self-sustained stemness driven by PIK3CAH1047R/H1047R. We would further observe that PROS does not feature obvious qualititative errors in tissue specification, but rather excessive growth of more or less normally differentiated tissues. We conceptualise this as reflecting a small incremental growth advantage in normally differented tissues of certain lineages that summates to create a major disease phenotype over months and years.*

      Thus, without the functional and mechanistic experiments alluded to above, the claims/ conclusions are speculative. In particular, the cancer narrative is irrelevant to the study. Considering both the lack of conclusive differentiation experiments or relevant breast cancer experiments, the discussion on differentiation therapy for breast cancer should be removed.

      AUTHORS’ RESPONSE__:__ The reference to cancer links to a computational study of human breast cancers where we specifically looked at the relationship between strength of PI3K signalling and ‘stemness’ [12], both measured using established transcriptional indices. We have included the bioRxiv reference in our revised manuscript (see l.337). While there is an element of speculation in this cancer observation, we do feel it is important and grounded in this and the BioRXiv study, and would prefer to maintain it. However, if editors take a different view it can be removed.

      Reproducibility is a concern for this study. The authors should perform more replicates on their experiments (focusing on technical replicates of the lines employed to discern technical vs biological variability). A challenge in reading this manuscript is understanding which replicates were used for which experiments, and whether they are technical or biological (i.e. different lines). While some of the figure legends note this information, it would be helpful to provide clarity throughout the text. In addition, it should be noted that some experiments (e.g. the RPPA analysis in Fig 2B and Fig S3B) show substantial variability between replicates, but because it appears only a single technical replicate from two different cell lines was used, it is impossible to distinguish whether the variability is of a biological or technical nature. The authors would do well to focus on collecting more technical replicates of fewer biological replicates, and then expand to include more biological replicates if initial biological variation is observed.

      AUTHORS’ RESPONSE__:__ We strenuously disagree with the Reviewer on this point. Throughout this manuscript, we have been transparent and thorough in reporting how experiments were performed, including the number of both biological and technical replicates. Representative examples include:

      Legend to Figure 2A (RPPA dataset in growth-replete conditions): “The data are based on 10 wild-type cultures (3 clones), 5 PIK3CAWT/H1047R cultures (3 clones) and 7 PIK3CAH1047R/H1047R cultures (2 clones) as indicated.”

      Legend to Figure 5: “The data are from two independent experiments, with each treatment applied to triplicate cultures of three wild-type and two homozygous iPSC clones.

      Specifically to address the RPPA studies, and as is clear from the Figure 2 legend, we initially performed RPPA analyses in growth factor-replete conditions with extensive technical and biological replication, arguing against the Reviewer’s point. To aid interpretation, we opted for summarising this large dataset in Venn diagrams (following extensive limma-based statistical analysis, including correction for multiple comparisons and sample interdependence as advised in Ref. [30]). If the Reviewer deems it valuable, we could include a heatmap overview as shown below:

      [To view figure, please access PDF version of this rebuttal on https://osf.io/ecmhr/]

      We took the view that the above representation, while comprehensive, is not particularly informative to the reader. All individual data points for both total and phosphoproteins – with and without normalisation – are plotted as part of separate barplots in the accompanying RNotebook (https://osf.io/d9tca/). These clearly demonstrate that the technical and biological variability in canonical PI3K signalling responses at the level of AKT and immediately downstream of AKT is very low. The same applies to the increased phosphorylation of SMAD2 (S465/S467) in PIK3CAH1047R iPSCs. We include two examples below, and would be happy to include the link to the above RNotebook in the respective Figure legend if the Reviewer deems this helpful.

      [To view figure, please access PDF version of this rebuttal on https://osf.io/ecmhr/]

      The interpretation of the second RPPA experiment (Fig. 2B) in growth factor-depleted conditions is focused entirely on these responses due to their consistency across both datasets (further supported by low-throughput signalling analyses in the previous PNAS publication).

      We had made all raw data and guided analysis scripts for the above RPPA dataset publicly available, and the same is true for all original data as highlighted in the Materials & Methods section. Thus we strongly believe that readers have the opportunity to assess our work and reproduce our analyses/conclusions fully should they wish to do so.

      • Finally, we noted in the initial PNAS paper describing these models that we derived and worked with up to 10 independent homozygous PIK3CAH1047R clones, as well as with 3 and 4 independent heterozygous and wild-type clones, respectively. This exceeds the common use of 2 clones (if at all mentioned) in many similar studies in the stem cell literature (e.g. Ref. [31–34]). In our view, derivation of more than two independent clones is crucial for reproducibility in gene editing studies given substantial variability arising from genetic drift [35,36]. We have consistently shown the phenotypic robustness of our mutant clones across the two studies; note, for example, the low technical and biological variability in both heterozygous and homozygous mutants in the transcriptomic data in Fig. 1A. As noted in the manuscript, the high-depth RNAseq data analysis was performed in different clones and independently of the RNAseq reported in Ref. [5], yet yields highly similar results and confirms transcriptional rewiring of PIK3CAH1047R/H1047R iPSCs.*

      Throughout the text, the authors frequently reference their previous study in PNAS and often the lines of what is novel in this paper vs. reproduction of previous findings is blurred. The authors would benefit from reducing the frequency of referencing their previous study and focusing on emphasizing the novelty of the present findings.

      AUTHORS’ RESPONSE__:__ We have carefully reviewed all instances of citation of our previous study in the manuscript and have reduced their numbers to improve focus on the current findings as suggested. As noted above, however, the current study builds closely upon the findings of the previous work, and referring to these to put the current work in context is important. Indeed, this is reflected in some of the reviewers’ collective comments and questions which are answered by the prior study. We have carefully reviewed the places in which we have cited our previous study and note that except for 2 citations in the Introduction and 3 more in the Discussion, all remaining citations are in the context of linking new and old data, which we believe is important for clarity as suggested by the reviewers. However, if editors take a different view we can minimise this and reduce the number of citations.

      Without functional assays to complement and test their models, this manuscript is not a significant advance.

      AUTHORS’ RESPONSE__:__ While we take the Reviewer’s point that further studies could have strengthened robustness of the evidence supporting a mediating role of NODAL/TGFb signalling in PI3K-driven stemness, we think this assertion is far too sweeping, and neglects numerous facets of the study of use and interest to several fields (as agreed by the other reviewers). To recapitulate some key points of interest/use of this study:

      • Using a carefully derived PIK3CAH1047R iPSC model system and pharmacologically relevant doses of a recently approved PI3Ka-selective inhibitor, we demonstrate that the efficacy of the latter can depend on the strength of PI3K pathway activation and phenotype under investigation – despite expected downregulation of PI3K signalling by Alpelisib, the stemness phenotype is not reversed.
      • We link this to self-sustained TGFb signalling in cells with strong PI3K activation by homozygous PIK3CAH1047R The link between the two pathways and the underlying rewiring are likely to be relevant in other contexts, as observed recently in a breast epithelial model system [37]. Given similarity between human pluripotent stem cells and cancer cells, our findings are of wider relevance.
      • Aberrant PI3K activation has been associated with numerous pathologies, so it is important for the field to have well-characterised model systems with endogenous expression of one of the most common PIK3CA mutations. Our thorough characterisation of PIK3CAH1047R iPSCs validates one such model.
      • To our knowledge, this is the first study to provide a comprehensive and integrated characterisation of isoform-specific PI3K signalling and transcriptomic changes in human pluripotent stem cells. This is important because current knowledge of PI3K signalling in human PSCs is largely based on extrapolation of findings from mouse embryonic stem cells, with many previous studies relying on high concentrations of the non-specific pan-PI3K inhibitor LY294002 (the use of which has been discouraged by the PI3K signalling community [38]).

        I believe the narrative was written for pluripotent stem cell biologists but without robust functional and quantitative cell biological assays to test their models, I don't anticipate stem cell biologists will be very interested.

      AUTHORS’ RESPONSE__:__ The Reviewer is incorrect in his/her assertion about the target audience. PI3K signalling plays a key role in numerous disease and physiological processes as well as in development, and is of broad interest to cancer biologists, genetecists, rare disease biologists, biochemists, cell signallers, and endocrinologists among many others. Indeed we started with a primary focus on disease modelling (cancer, PROS) rather than stem cell biology, but because our findings are significant for the role of PI3K in stem cell biology as well as for these diseases, we aimed to make findings accessible across many of these readers. We refer the Reviewer to our previous response with regards to the significance of this work.

      **Minor Comments:**

      Consider adding gridlines to the MDS plots for clarity of read

      AUTHORS’ RESPONSE__:__ This is a matter of taste, and as we honestly can not see how it would enhance appreciation of the very clear clustering, we have decided to leave the plot in its current form.

      In Fig S2, some of the in-figure labelling is incorrect

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for spotting this. We believe the labelling error to be corrected now and we have further tried to streamline the plot headings, but please do let us know if there is something else which we may have missed.

      In Fig S1C, the authors note poor correlation in the heterozygotes between this and a previous study. It would be helpful to qualify this discrepancy, as it is potentially concerning.

      AUTHORS’ RESPONSE__: The sensitivity to detect differential gene expression is high for large fold changes (as seen in PIK3CAH1047R/H1047R mutants) in transcriptomic studies, but declines rapidly for fold changes in expression lines 126-131: “The magnitudes of gene expression changes in PIK3CAH1047R/H1047R cells correlated strongly with our previous findings (Spearman’s rho = 0.74, p WT/H1047R iPSCs (Fig. S1C), as expected given the smaller number and lower magnitude of observed gene expression changes in heterozygous cells, and the lower depth of previous transcriptomic studies__.”*

      Line 208, the authors state that the small p-value for the homozygotes is suggestive of a dose-dependent effect. This is not the case; it simply suggests a greater probability of the effect being non-random.

      AUTHORS’ RESPONSE__:__ The Reviewer is formally correct, and we apologise for the imprecision of our language. Nevertheless biological effect size is pertinent to the p value determined, and so our statement, while requiring an inductive leap from the reader, is not wholly invalid. To tidy this up and improve precision we have reworded as follows:

      lines 215-217: “This is in keeping with the much lower effect size in heterozygous cells, and consistent with a critical role for the TGFbeta pathway in mediating the allele dose-dependent effect of PIK3CAH1047R in human iPSCs.”

      What does the height in Fig 4B correspond to? It would perhaps be of value to scale nodes based on the significance value.

      AUTHORS’ RESPONSE__:__ 4B illustrates hierarchical clustering of the module eigengenes - the height corresponds to similarity of gene expression. We clarify this in the revised manuscript.

      References

      1 Lee, K. L. et al. (2011) Graded Nodal/Activin signaling titrates conversion of quantitative phospho-Smad2 levels into qualitative embryonic stem cell fate decisions. PLoS Genet. 7.

      2 Hill, C. S. (2018) Spatial and temporal control of NODAL signaling. Curr. Opin. Cell Biol. 51, 50–57.

      3 Xu, R. H. et al. (2008) NANOG is a Direct Target of TGFβ/Activin-Mediated SMAD Signaling in Human ESCs. Cell Stem Cell 3, 196–206.

      4 Vallier, L. et al. (2009) Activin/Nodal signalling maintains pluripotency by controlling Nanog expression. Development 136, 1339–49.

      5 Madsen, R. R. et al. (2019) Oncogenic PIK3CA promotes cellular stemness in an allele dose-dependent manner. Proc. Natl. Acad. Sci. 116, 8380–8389.

      6 Vasan, N. et al. (2019) Double PIK3CA mutations in cis increase oncogenicity and sensitivity to PI3Kα inhibitors. Science (80-. ). 366, 714–723.

      7 Saito, Y. et al. (2020) Landscape and function of multiple mutations within individual oncogenes. Nature 582, 95–99.

      8 Gorelick, A. N. et al. (2020) Phase and context shape the function of composite oncogenic mutations. Nature.

      9 Gillies, T. et al. (2020) Oncogenic mutant RAS signaling activity is rescaled by the ERK/MAPK pathway 1–19.

      10 Hafner, M. et al. (2016) Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs. Nat. Methods 13, 521–527.

      11 André, F. et al. (2019) Alpelisib for PIK3CA-mutated, hormone receptor-positive advanced breast cancer. N. Engl. J. Med. 380, 1929–1940.

      12 Madsen, R. R. et al. (2020) Relationship between stemness and transcriptionally-inferred PI3K activity in human breast cancer. bioRxiv 2020.07.09.195974.

      13 Fritsch, C. et al. (2014) Characterization of the novel and specific PI3Ka inhibitor NVP-BYL719 and development of the patient stratification strategy for clinical trials. Mol. Cancer Ther. 13, 1117–1129.

      14 Furet, P. et al. (2013) Discovery of NVP-BYL719 a potent and selective phosphatidylinositol-3 kinase alpha inhibitor selected for clinical evaluation. Bioorganic Med. Chem. Lett.

      15 Yilmaz, A. et al. (2018) Defining essential genes for human pluripotent stem cells by CRISPR–Cas9 screening in haploid cells. Nat. Cell Biol. 20, 610–619.

      16 Sergushichev, A. A. (2016) An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv 060012.

      17 Liberzon, A. et al. (2015) The Molecular Signatures Database Hallmark Gene Set Collection. Cell Syst. 1, 417–425.

      18 Müller, F. J. et al. (2008) Regulatory networks define phenotypic classes of human stem cell lines. Nature 455, 401–405.

      19 James, D. et al. (2005) TGFbeta/activin/nodal signaling is necessary for the maintenance of pluripotency in human embryonic stem cells. Development 132, 1273–82.

      20 Vallier, L. et al. (2005) Activin/Nodal and FGF pathways cooperate to maintain pluripotency of human embryonic stem cells. J. Cell Sci. 118, 4495–4509.

      21 Chen, G. et al. (2011) Chemically defined conditions for human iPSC derivation and culture. Nat. Methods 8, 424–429.

      22 Adewumi, O. et al. (2007) Characterization of human embryonic stem cell lines by the International Stem Cell Initiative. Nat. Biotechnol. 25, 803–816.

      23 Tsankov, A. M. et al. (2015) A qPCR ScoreCard quantifies the differentiation potential of human pluripotent stem cells. Nat. Biotechnol. 33, 1–15.

      24 Smith, J. R. et al. (2008) Inhibition of Activin/Nodal signaling promotes specification of human embryonic stem cells into neuroectoderm. Dev. Biol. 313, 107–117.

      25 Vallier, L. et al. (2004) Nodal inhibits differentiation of human embryonic stem cells along the neuroectodermal default pathway. Dev. Biol. 275, 403–421.

      26 Sorre, B. et al. (2014) Encoding of temporal signals by the TGF-β Pathway and implications for embryonic patterning. Dev. Cell 30, 334–342.

      27 David, C. J. and Massagué, J. (2018) Contextual determinants of TGFβ action in development, immunity and cancer. Nat. Rev. Mol. Cell Biol. 19, 1–17.

      28 Alon, U. (2007) Network motifs: theory and experimental approaches. Nat. Rev. Genet. 8, 450–461.

      29 Sonnen, K. F. and Aulehla, A. (2014) Dynamic signal encoding-From cells to organisms. Semin. Cell Dev. Biol. 34, 91–98.

      30 Germain, P. L. and Testa, G. (2017) Taming Human Genetic Variability: Transcriptomic Meta-Analysis Guides the Experimental Design and Interpretation of iPSC-Based Disease Modeling. Stem Cell Reports 8, 1784–1796.

      31 Wang, L. et al. (2017) GCN5 Regulates FGF Signaling and Activates Selective MYC Target Genes during Early Embryoid Body Differentiation. Stem Cell Reports 10, 287–299.

      32 Zeng, H. et al. (2016) An Isogenic Human ESC Platform for Functional Evaluation of Genome-wide-Association-Study-Identified Diabetes Genes and Drug Discovery. Cell Stem Cell 0, 1660–1669.

      33 Ho, L. et al. (2015) ELABELA Is an Endogenous Growth Factor that Sustains hESC Self-Renewal via the PI3K/AKT Pathway. Cell Stem Cell 17, 435–447.

      34 Roudnicky, F. et al. (2019) Modeling the effects of severe metabolic disease by genome editing of HPSC-derived endothelial cells reveals an inflammatory phenotype. Int. J. Mol. Sci. 20, 1–10.

      35 Veres, A. et al. (2014) Low incidence of Off-target mutations in individual CRISPR-Cas9 and TALEN targeted human stem cell clones detected by whole-genome sequencing. Cell Stem Cell 15, 27–30.

      36 Ben-David, U. et al. (2018) Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330.

      37 Katsuno, Y. et al. (2019) Chronic TGF-b exposure drives stabilized EMT, tumor stemness, and cancer drug resistance with vulnerability to bitopic mTOR inhibition. Sci. Signal. 12, eaau8544.

      38 Manning, B. D. and Toker, A. (2017) AKT/PKB Signaling: Navigating the Network. Cell 169, 381–405.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript by Madsen et al., a comparison of the transcriptome and proteome in heterozygous and homozygous PIK3CAH1047R human pluripotent stem cells mutants is presented. The authors demonstrate marked alterations in expression at both the protein and RNA level of homozygous mutants compared to wildtype, while heterozygous lines exhibit only minor changes. Multiple analytical approaches are employed to investigate network alterations, leading the authors to suggest a TGFβ-mediated rewiring of key pluripotent genes to induce a state of sustained stemness. Madsen et al. conclude with a set of experiments to functionally implicate NODAL/TGFβ autocrine signalling in PIK3CAH1047R dose-dependent stemness.

      Major Comments:

      1.The key conclusions are not convincing. While the unbiased omics approach sets up this study well, the study suffers from a lack of convincing functional assays (cell biological assays) to test their model and tease apart a phenotype for the het cells. More robust functional experiments are required to support the finding the NODAL/TGFβ signalling mediates the self-sustained stemness, particularly because this is the major novel finding distinguished from the authors previous work. • While the authors present a comprehensive omics investigation into alterations between wild type, homozygous, and heterozygous mutants, the critical functional experiments are lacking. In Figure 5, the authors seek to support the role of TGFβ in mediated stemness in the homozygous mutants, however, are not able to directly deplete TGFβ due to technical limitations of the culture conditions. Consequentially, the experiments are primarily built on the use of NODAL withdrawal and stimulation. The data presented thus implicate NODAL in the stemness phenotype, but it's not obvious TGFβ is substantially involved, particularly considering the inhibitor subsequently employed also inhibits NODAL type 1 receptors. Furthermore, there is a paucity of readouts for stemness. For example, a more convincing narrative would include additional expression markers of the core pluripotency network (e.g. OCT4, SOX2, etc.) as well as functional readouts (e.g. NODAL withdrawal and assessment of differentiation) after NODAL stimulation/depletion and comparing across genotypes. Overall, the primary conclusions of this work are not well-evidence by the presented data and the authors should consider additional functional experiments or reframing the narrative.

      • A key conclusion of this study is there is a dose-dependent stemness phenotype. As this is not explicitly defined, to this reader, it would imply a graded response between wild type, heterozygotes, and homozygotes in the phenotypic and molecular characteristics. However, as is noted particularly in the omics components of the manuscript, there is in fact "near-binary" alteration in the assayed characteristics. Again, this should be qualified more explicitly, but it is more consistent with the data, which suggests the heterozygotes behave very similarly to the wild types, while homozygotes have substantial alterations. I would suggest the authors consider renaming their descriptions, removing "near-binary" and "dose-dependent" to something like "dose-threshold." This suggests after X threshold of oncogenic PI3K signalling, substantial alterations occur; under this threshold (e.g. hets), changes are marginal. In the event however that there may be a more "dose-dependent" effect, I would expect the transcriptomic and proteomic changes observed in the heterozygous cell lines should be seen in the homozygous cell lines (of which they are likely in greater in magnitude in addition to other changes). The manuscript would benefit from more direct comparisons between the heterozygotes and homozygotes.

      • Further to the above point, as the marginal phenotype observed in heterozygotes is a critical point in this paper, the authors would benefit from including heterozygote lines in the functional experiments presented in Fig 5. Inclusion of the hets in these experiments would instill confidence in this reader that the marginal molecular alterations characterized at the proteomic and transcriptomic level is reflected in the lack of functional stemness-sustaining behaviour.

      2.Thus, without the functional and mechanistic experiments alluded to above, the claims/ conclusions are speculative. In particular, the cancer narrative is irrelevant to the study. Considering both the lack of conclusive differentiation experiments or relevant breast cancer experiments, the discussion on differentiation therapy for breast cancer should be removed.

      3.Reproducibility is a concern for this study. The authors should perform more replicates on their experiments (focusing on technical replicates of the lines employed to discern technical vs biological variability). A challenge in reading this manuscript is understanding which replicates were used for which experiments, and whether they are technical or biological (i.e. different lines). While some of the figure legends note this information, it would be helpful to provide clarity throughout the text. In addition, it should be noted that some experiments (e.g. the RPPA analysis in Fig 2B and Fig S3B) show substantial variability between replicates, but because it appears only a single technical replicate from two different cell lines was used, it is impossible to distinguish whether the variability is of a biological or technical nature. The authors would do well to focus on collecting more technical replicates of fewer biological replicates, and then expand to include more biological replicates if initial biological variation is observed.

      Minor Comments:

      • Consider adding gridlines to the MDS plots for clarity of read
      • In Fig S2, some of the in-figure labelling is incorrect
      • In Fig S1C, the authors note poor correlation in the heterozygotes between this and a previous study. It would be helpful to qualify this discrepancy, as it is potentially concerning.
      • Line 208, the authors state that the small p-value for the homozygotes is suggestive of a dose-dependent effect. This is not the case; it simply suggests a greater probability of the effect being non-random.
      • What does the height in Fig 4B correspond to? It would perhaps be of value to scale nodes based on the significance value.

      Significance

      Nature and significance of the advance:

      • Throughout the text, the authors frequently reference their previous study in PNAS and often the lines of what is novel in this paper vs. reproduction of previous findings is blurred. The authors would benefit from reducing the frequency of referencing their previous study and focusing on emphasizing the novelty of the present findings.

      • Without functional assays to complement and test their models, this manuscript is not a significant advance.

      State what audience might be interested in and influenced by the reported findings.

      • I believe the narrative was written for pluripotent stem cell biologists but without robust functional and quantitative cell biological assays to test their models, I don't anticipate stem cell biologists will be very interested.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      • Stem cell biology, cancer biology, systems biology, mTORC1 signalling

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      As below.

      Significance

      In this manuscript, Madsen et al have investigated the role of heterozygous versus homozygous PIK3CAH1047R gain-of-function mutation at maintaining stemness of induced pluripotent stem cells (iPSCs). The authors have performed high-depth RNAseq, proteomic, and RPPA analyses to show that biallelic PIK3CA alterations induce stronger activation of the PI3K signaling axis, compared to monoallelic mutations. The authors claim that a higher PI3K signaling dose activates the NODAL/TGF-b pathway, which in turn supports stemness in an autocrine fashion. These are important findings, however, the manuscript and its conclusions can be improved.

      The authors have described the role of PIK3CAH-1047R gain-of-function mutation in cancer and overgrowth syndromes. However, cancer associated somatic mutations in PIK3CA are mostly heterozygous. Similarly, PIK3CA related overgrowth syndromes (PROS) are caused by post-zygotic mosaic PIK3CA activating mutation. As such, the relevance of homozygous PIK3CA alterations to these pathological conditions is unclear. The authors should elaborate on the biological implications of their findings.

      The role of biallelic PIK3CA mutation is reminiscent of compound mutations in PIK3CA which have also been shown to increase PI3K signaling output. However, double PIK3CA mutations confer enhanced sensitivity to PI3K inhibition (Toska et al. Science 2019). Could the authors kindly speculate on this discrepancy. Similarly, p110 alpha specific inhibitor, alpelisib, is highly effective against PIK3CA-mutant ER+ breast cancer and PROS. As such, the clinical relevance of the insensitivity of homozygous PIK3CA mutation to PI3K inhibitors is unclear.

      Figure 2: The authors have performed RPPA analysis in the presence of 100 nM BYL719. Alpelisib is commonly used at 1 uM concentration for in-vitro experiments, and has a cMax of ~5 uM. We suggest the authors perform western blot analysis to confirm the results of RPPA.

      Figures 3 and 4: The authors should expand their RNAseq analysis to demonstrate enrichment of stemness and TGFb signaling in homozygous mutant cells compared to heterozygous cells.

      The authors should confirm the results of pathway analysis in-vitro to show that homozygous PIK3CA mutation confers increased stemness compared to heterozygous mutation.

      Figure 5: Kindly provide direct evidence demonstrating that increased PIK3CA signaling output induces NODAL expression in this experimental setting. Also, please normalize gene expression data to WT cells so it is easy to visualize the changes in NODAL and NANOG expression in homozygous and heterozygous mutants compared to WT iPSCs

      Kindly quantify Fig. S5.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This is an interesting and thorough study characterising human ipsc with hetero or homozygous mutation in pi3k pathway that lead to its hyper-activation. They prove that the increased stemness is results from enhanced autocrine responsiveness to TGF signalling pathway.

      The main conclusions are well supported by the presented data. cutting edge tools and bioinformatic analysis are adequately applied. I have only one important point:

      Major comment:

      1) western blot based validation of TGF pathway activation in wt and mutant ipscs will be helpful to strengthen the results based on bioinformatic data.

      Significance

      Important work for studies on signalling, cancer mutations, modelling cancer in stem cells, pluripotency regulation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to the reviewers’ comments:

      We thank both the reviewer for their critical evaluation and excellent suggestion to improve the manuscript. We are making all the changes suggested by both the reviewers and performing the experiments to address all the concerns specifically from the reviewer #1. Please find below our response to the reviewers’ comments:

      Reviewer #1:

      This is an interesting study from the Rahaman group that identifies cardiolipin (CL) as a potential binding target for Drp6 recruitment to the nuclear membrane in Tetrahymena (that has a unique nuclear remodeling program). In addition, they identify a residue, I553 in the DTD region, which they claim is a key residue involved in specific CL interactions. While the experiments themselves are technically sound, and are well performed and controlled, I don't find the major conclusion that I553 is involved in direct CL interactions justified or well rationalized. By their own admission (in the discussion), the conservative mutation I553M may perturb local folding and may indirectly affect CL interactions. There is no test of DTD folding with and without the I553M mutation, nor are there other mutations (e.g. I553A and in the vicinity) tested. CD experiments in the absence and presence of CL-containing membranes will likely yield information on the impact of the I553 mutations, while DLS experiments would inform on the hydrodynamic properties (overall 3D fold) of the DTD and the impact of these mutations. CL interactions generally involve a combination of electrostatic and hydrophobic forces. Where do the electrostatic interactions come from? Why would an Isoleucine to Methionine mutation affect the hydrophobic component, even if I553 is the key hydrophobic residue?

      Response:

      We thank the reviewer for the comments that the experiments are sound, well performed with appropriate controls. While we agree that the exact mechanism of how I553 provides specificity to cardiolipin binding is not addressed in the present manuscript, our study clearly demonstrates that the isoleucine at 553 plays important role in determining cardiolipin specificity and nuclear recruitment. As pointed out by the reviewer, it is possible that changing isoleucine to methionine may affect the local conformation. However, there is no major conformational change in the DTD due to this mutation. This conclusion is based on clear loss of nuclear localization and cardiolipin interaction for the mutant without affecting other properties. The in vitro floatation assay clearly stablish that the effect is directly by inhibiting interaction specifically with cardiolipin containing membrane. It should be further noted that the same domain DTD interacts with other two lipids (PS and PA) and mutant retains interaction with them arguing that conformation of this domain is not significantly changed due to I to M mutation. Consistent with these results I553M mutant could be targeted to the nuclear membrane as a complex with wildtype Drp6 further confirming that I553 could form correct self-assembled structure with wildtype protein required for association with nuclear membrane. This is further substantiated by comparing all the known biochemical properties including GTPase activity, membrane binding via other two lipids, formation of helical spirals and ring structures. Hence it is clear that I553 provides specificity to bind cardiolipin and recruitment to the nuclear membrane. We will further confirm if there is any local conformation change due to the mutation I to M by fluorescence quenching experiments and will be incorporated in the revised manuscript.

      Regarding overall folding of the mutant, this is an excellent suggestion by the reviewer. We are planning to perform CD experiments of the I553M mutant and wildtype proteins to compare if there is any change in overall folding due to mutation. This result would be incorporated in the revised manuscript.

      Reviewer is right to point out that both electrostatic and hydrophobic interactions are important for interaction with cardiolipin. Electrostatic interaction is important for all the phospholipids while interacting with protein and is expected to come from other amino acid residues which are positively charged. Electrostatic interaction may contribute to the affinity of the interaction by providing additional binding energy. But considering its universal nature of interaction with all the phospholipids, it cannot give specificity for a specific lipid and hence would not discriminate among different phospholipids.

      Regarding affecting hydrophobic component, the reviewer is correct that both are strong hydrophobic amino acids and loss of I553M interaction with cardiolipin may not be due to change in hydrophobicity

      To address that the loss of cardiolipin interaction is not specific to methionine and is due to absence of isoleucine, the suggestion from the reviewer to replace I553 with A (alanine) is an excellent one. We are doing the experiments and we anticipate to incorporate these results in our revised manuscript.

      Reviewer #1 (Significance (Required)):

      The addressed phenomenon is restricted to Tetrahymena and may not have far reaching implications. Regardless, the identification of CL as a binding target for Drp6 at the nuclear membrane of this organism is in itself significant. The conclusion that I553 is the key CL binding residue is however not warranted. Additional experiments are needed to dissect how this residue impacts CL interactions and examine whether the observed effect is direct or indirect.

      Response:

      We thank the reviewer for appreciating the significance of this work. We agree that our data is Tetrahymena specific. However, we believe that the study is relevant for all the proteins whose association with target membranes depend on cardiolipin including many cardiolipin interacting DRPs (such as DRPs involved in biogenesis and maintenance of mitochondria).

      We really appreciate the reviewer for the excellent suggestions. Based on this we are performing the following experiments.

      1. CD experiments to assess overall folding of I553M and Wildtype protein
      2. Fluorescence quenching of Tryptophan (at amino acid position 548) residue in the vicinity of I553 to compare conformation of the mutant with that of wildtype protein.
      3. Evaluation of I553A in nuclear localization and cardiolipin binding. We anticipate these results to further confirm if I553 is the key CL binding residue and if the effect is direct.

      The writing is not clear in some parts and may require a round of language editing. There are no issues with reproducibility.

      Response

      We thank the reviewer for pointing out the language editing. We will edit the language wherever we find it appropriate. We would highly appreciate if reviewer can indicate the portions that need special attention.

      Reviewr #2:

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Dynamin is a GTPase superfamily protein involved in membrane fusion and division. This paper focused on Drp6, one of the eight dynamin superfamily proteins of Tetrahymena, and analyzed its nuclear envelope localization mechanism by a combination of in vivo cytogenetical analysis and in vitro biochemical analysis for the various mutant Drp6 proteins. Results showed that a specific amino acid residue (isoleucine at the 553rd) in the membrane binding domain of Drp6 was required for its nuclear membrane localization, but this residue is not required for ER/endosome localization and GTPase activity. Furthermore, in vitro floating analysis using centrifugation indicated that Drp6 specifically bound to the cardiolipin at the 553rd isoleucine residue and this binding was required for Drp6's nuclear membrane localization. Finally, removal of cardiolipin from the conjugating cells using inhibitor treatment showed that cardiolipin was required for the new macronucleus formation (including the expansion of macronuclear envelope) through the function of Drp6. Based on these results, authors concluded that cardiolipin targets Drp6 to the nuclear membrane in Tetrahymena.

      \*Major comments:***

      The experimental data presented in this paper are reasonable and the results are solid, and therefore I think the deduced conclusions are convincing. However, to improve this paper, I have several minor comments to be revised before publication.

      \*Minor comments:***

      1. In the previous paper, it has been shown that GFP-Drp6 is localized in the inner nuclear membrane of both macronucleus and micronucleus. In this paper, however, this point is not clearly stated and is not shown in the figures --- I could not understand such localization pattern of GFP-Drp6 in Fig. 1C and Fig. 3b and the statements in the text. I suggest adding such statements somewhere in Introduction or Result section. Also, add adequate references to the corresponding statements in the text.
        • Related to the comment 1, I suggest replacing Fig. 1C (images of fixed cells) with Fig. S1B (images of live cells) because nuclear localization of GFP-Drp6 are much clearer in Fig. S1B (live cell) than Fig. 1C (fixed cell), and because fixation may cause artificial redistribution of the proteins. Please add arrows in those figures to point out the position of micronucleus in those figures if necessary.*
        • Similarly, I suggest replacing images of Fig. 5B (fixed cells) with those of Fig. S3 (live cells).*
        • page 7, line 224: GFP-Nup3 is used as a marker protein of the nuclear pore complex (NPC). However, there is no description of how GFP-Nup3 is obtained or made. Add description how this DNA plasmid was obtained or generated.*
        • Related to the comment 4, "Nup3" is first discovered in Malone et al., Eukaryotic Cells, 2009, but also soon after discovered as the name of "MicNup98B" in Iwamoto et al., Curr Biol, 2009 and used in several papers including Iwamoto et al., Genes Cells, 2010; JCS, 2015; JCS 2017; and more. Because Nup3 is the Tetrahymena paralogs of human Nup98 and the name of "Nup98" is well established to call these homologs in various eukaryotes, I suggest adding the name of "MacNup98B" after the word of "Nup3" for reader's better understanding. I also suggest adding appropriate references to refer to this protein as follows: Add Malone et al. 2009 for "Nup3" and Iwamoto et al., 2009 for "MacNup98B."*
        • page 9, line 295: I wonder if "Fig. 3b" may be a mistake of "Fig. 5C." If so, please correct this.*
        • page 10, the second paragraph (lines 311-322): This paragraph discussed the possible involvement of Drp6 in the nuclear envelope expansion of the post-zygotic nucleus. It may be interesting to point out that large-scale nuclear envelope reorganization including the formation of the redundant nuclear envelope and the type-switching of the NPC (from the MIC-type NPC to the MAC-type one) has been reported at this developmental stage (Iwamoto et al., JCS 2015). For example, the peculiar shaped nuclear envelope with the redundant/overlapping nuclear envelope structure can be seen and the MAC-type NPCs rapidly assembles to the expanding nuclear envelope. It may be interesting to point out that cardiolipin and Drp6 may be involved in these phenomena. But it is too speculative and therefore consider adding such a discussion as an option.*
        • page 13, line 412: Is the word "GFP-drp6-I553M" written in italics intended for the gene for the GFP-drp6-I553M protein? If so, protein may be acceptable here. Make sure there are no problems with italicized characters. Also, check if the lowercase letter "d" in "drp6" is OK because large letters are used in other cases.*
        • page 20, figure 1: I recommend switching the positions of HDyn1 and Drp6 in Figure 1a to keep the order in Figure 1b.*
        • page 21, line 671: Add the word "Tetrahymena" before "Drp 6" to pair with the word "human dynamin 1".*
        • page 23, line 729: Remove "and."*
        • page 23, lines 729 and 731: Unify the expression of "cardiolipin" and "Cardiolipin"*
        • page 23, line 732: Add "or" before "10% Phosphatidylserin."*
        • page 24, Figure 3a: Please mark the position of I553M in the figure if possible. Alternatively, indicate the range of amino acid residues after the words "red" and "green" in the figure legend.* Response:

      We thank the reviewer for the excellent comments that “the experimental data presented in this paper are reasonable and the results are solid, and therefore I think the deduced conclusions are convincing.” We also thank the reviewer for the minor comments which are thorough and very insightful. it will improve the manuscript substantially. We would incorporate all the changes in the revised manuscript.

      Reviewer #2 (Significance (Required)):

      The corresponding author and his colleagues have reported that Tetrahymena Drp6 is localized to the outer nuclear membrane of both macronucleus and micronucleus of Tetrahymena (Elde et al., 2005) and that Drp6 is required for the formation of new macronuclei during nuclear differentiation (Rahaman et al., 2008). Therefore, these parts are not novel.

      The novelty of this study is as follows:

      (1) The discovery of a specific amino acid residue (isoleucine at the 553rd) of Drp6 that is required for its nuclear membrane localization.

      (2) the discovery of a lipid molecule, cardiolipin, as a critical partner for Drp6's nuclear membrane targeting.

      (3) Discovery of involvement of cardiolipin in the new macronucleus formation (the expansion of macronuclear envelope) through the function of Drp6.

      *

      I think their findings are highly novel and will provide new insight into a field of cell biology. Especially, their findings will contribute to understanding how specific proteins targeted to the specific intracellular membranes. In addition, their methods (such as floatation assay) for analyzing the interaction between the protein of interest and lipid/liposomes will become an important tool.*

      Response:

      We are very happy to note that the reviewer has pointed out the significance of the present study. We fully agree with reviewer and appreciate thorough analysis and excellent conclusion from the reviewer.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Dynamin is a GTPase superfamily protein involved in membrane fusion and division. This paper focused on Drp6, one of the eight dynamin superfamily proteins of Tetrahymena, and analyzed its nuclear envelope localization mechanism by a combination of in vivo cytogenetical analysis and in vitro biochemical analysis for the various mutant Drp6 proteins. Results showed that a specific amino acid residue (isoleucine at the 553rd) in the membrane binding domain of Drp6 was required for its nuclear membrane localization, but this residue is not required for ER/endosome localization and GTPase activity. Furthermore, in vitro floating analysis using centrifugation indicated that Drp6 specifically bound to the cardiolipin at the 553rd isoleucine residue and this binding was required for Drp6's nuclear membrane localization. Finally, removal of cardiolipin from the conjugating cells using inhibitor treatment showed that cardiolipin was required for the new macronucleus formation (including the expansion of macronuclear envelope) through the function of Drp6. Based on these results, authors concluded that cardiolipin targets Drp6 to the nuclear membrane in Tetrahymena.

      Major comments:

      The experimental data presented in this paper are reasonable and the results are solid, and therefore I think the deduced conclusions are convincing. However, to improve this paper, I have several minor comments to be revised before publication.

      Minor comments:

      1. In the previous paper, it has been shown that GFP-Drp6 is localized in the inner nuclear membrane of both macronucleus and micronucleus. In this paper, however, this point is not clearly stated and is not shown in the figures --- I could not understand such localization pattern of GFP-Drp6 in Fig. 1C and Fig. 3b and the statements in the text. I suggest adding such statements somewhere in Introduction or Result section. Also, add adequate references to the corresponding statements in the text.
      2. Related to the comment 1, I suggest replacing Fig. 1C (images of fixed cells) with Fig. S1B (images of live cells) because nuclear localization of GFP-Drp6 are much clearer in Fig. S1B (live cell) than Fig. 1C (fixed cell), and because fixation may cause artificial redistribution of the proteins. Please add arrows in those figures to point out the position of micronucleus in those figures if necessary.
      3. Similarly, I suggest replacing images of Fig. 5B (fixed cells) with those of Fig. S3 (live cells).
      4. page 7, line 224: GFP-Nup3 is used as a marker protein of the nuclear pore complex (NPC). However, there is no description of how GFP-Nup3 is obtained or made. Add description how this DNA plasmid was obtained or generated.
      5. Related to the comment 4, "Nup3" is first discovered in Malone et al., Eukaryotic Cells, 2009, but also soon after discovered as the name of "MicNup98B" in Iwamoto et al., Curr Biol, 2009 and used in several papers including Iwamoto et al., Genes Cells, 2010; JCS, 2015; JCS 2017; and more. Because Nup3 is the Tetrahymena paralogs of human Nup98 and the name of "Nup98" is well established to call these homologs in various eukaryotes, I suggest adding the name of "MacNup98B" after the word of "Nup3" for reader's better understanding. I also suggest adding appropriate references to refer to this protein as follows: Add Malone et al. 2009 for "Nup3" and Iwamoto et al., 2009 for "MacNup98B."
      6. page 9, line 295: I wonder if "Fig. 3b" may be a mistake of "Fig. 5C." If so, please correct this.
      7. page 10, the second paragraph (lines 311-322): This paragraph discussed the possible involvement of Drp6 in the nuclear envelope expansion of the post-zygotic nucleus. It may be interesting to point out that large-scale nuclear envelope reorganization including the formation of the redundant nuclear envelope and the type-switching of the NPC (from the MIC-type NPC to the MAC-type one) has been reported at this developmental stage (Iwamoto et al., JCS 2015). For example, the peculiar shaped nuclear envelope with the redundant/overlapping nuclear envelope structure can be seen and the MAC-type NPCs rapidly assembles to the expanding nuclear envelope. It may be interesting to point out that cardiolipin and Drp6 may be involved in these phenomena. But it is too speculative and therefore consider adding such a discussion as an option.
      8. page 13, line 412: Is the word "GFP-drp6-I553M" written in italics intended for the gene for the GFP-drp6-I553M protein? If so, protein may be acceptable here. Make sure there are no problems with italicized characters. Also, check if the lowercase letter "d" in "drp6" is OK because large letters are used in other cases.
      9. page 20, figure 1: I recommend switching the positions of HDyn1 and Drp6 in Figure 1a to keep the order in Figure 1b. 
      10. page 21, line 671: Add the word "Tetrahymena" before "Drp 6" to pair with the word "human dynamin 1".
      11. page 23, line 729: Remove "and."
      12. page 23, lines 729 and 731: Unify the expression of "cardiolipin" and "Cardiolipin"
      13. page 23, line 732: Add "or" before "10% Phosphatidylserin."
      14. page 24, Figure 3a: Please mark the position of I553M in the figure if possible. Alternatively, indicate the range of amino acid residues after the words "red" and "green" in the figure legend. 

      Significance

      The corresponding author and his colleagues have reported that Tetrahymena Drp6 is localized to the outer nuclear membrane of both macronucleus and micronucleus of Tetrahymena (Elde et al., 2005) and that Drp6 is required for the formation of new macronuclei during nuclear differentiation (Rahaman et al., 2008). Therefore, these parts are not novel.

      The novelty of this study is as follows: (1) The discovery of a specific amino acid residue (isoleucine at the 553rd) of Drp6 that is required for its nuclear membrane localization. (2) the discovery of a lipid molecule, cardiolipin, as a critical partner for Drp6's nuclear membrane targeting. (3) Discovery of involvement of cardiolipin in the new macronucleus formation (the expansion of macronuclear envelope) through the function of Drp6.

      I think their findings are highly novel and will provide new insight into a field of cell biology. Especially, their findings will contribute to understanding how specific proteins targeted to the specific intracellular membranes. In addition, their methods (such as floatation assay) for analyzing the interaction between the protein of interest and lipid/liposomes will become an important tool.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This is an interesting study from the Rahaman group that identifies cardiolipin (CL) as a potential binding target for Drp6 recruitment to the nuclear membrane in Tetrahymena (that has a unique nuclear remodeling program). In addition, they identify a residue, I553 in the DTD region, which they claim is a key residue involved in specific CL interactions. While the experiments themselves are technically sound, and are well performed and controlled, I don't find the major conclusion that I553 is involved in direct CL interactions justified or well rationalized. By their own admission (in the discussion), the conservative mutation I553M may perturb local folding and may indirectly affect CL interactions. There is no test of DTD folding with and without the I553M mutation, nor are there other mutations (e.g. I553A and in the vicinity) tested. CL interactions generally involve a combination of electrostatic and hydrophobic forces. Where do the electrostatic interactions come from? Why would an Isoleucine to Methionine mutation affect the hydrophobic component, even if I553 is the key hydrophobic residue? Additional experiments are therefore essential to identify the actual residues involved in specific CL interactions. CD experiments in the absence and presence of CL-containing membranes will likely yield information on the impact of the I553 mutations, while DLS experiments would inform on the hydrodynamic properties (overall 3D fold) of the DTD and the impact of these mutations.

      The writing is not clear in some parts and may require a round of language editing. There are no issues with reproducibility.

      Significance

      The addressed phenomenon is restricted to Tetrahymena and may not have far reaching implications. Regardless, the identification of CL as a binding target for Drp6 at the nuclear membrane of this organism is in itself significant. The conclusion that I553 is the key CL binding residue is however not warranted. Additional experiments are needed to dissect how this residue impacts CL interactions and examine whether the observed effect is direct or indirect.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      This is a fascinating and beautifully written article about the possible evolutionary relationship between two major protein superfamilies - the P-loop NTPases and the Rossmans. Both are ancient and highly diverse superfamilies, containing a significant proportion of all extant domain sequences and were probably amongst the earliest enzyme superfamilies to emerge in evolution. No major evolutionary classification of proteins, such as SCOP, reports evolutionary relationships between them.

      Both share the same structural architecture of a beta-alpha-beta 3-layer sandwich and have an intriguing number of other shared structural features including the location of the binding site for phospho-ligands. However, whilst both bind phosphorylated ribonucleosides, the mode of binding differs and also the manner in which these compounds are exploited. Furthermore, there are differences in the topologies of the folds possibly suggesting distinct evolutionary trajectories. The Rossmanns appear to be more structurally conserved, whilst the P-Loops vary more in their topologies and possibly represent less stable arrangements of beta-sheets and alpha-helices. The authors have brought together several strands of evidence to explore possibly evolutionary relationships. Detailed structural analyses allow the authors to explicitly detail the significant shared structural features. For example, similarities in the mode of binding the phosphate moiety in the ligand. The structural features are well described and there are appropriate illustrations visualising key differences and similarities. The shared features of the phosphate binding site likely emerged and were favoured early in evolution, as supported by other analyses reported by Longo et al. However, as the authors point out there are other compelling similarities including the equivalent location of this site in the first beta-loop-alpha element in both superfamilies, which is not a necessary constraint of phosphate binding and the authors support this by giving examples of phosphate binding at the tip of alpha-4. In addition, they provide evidence supporting the common involvement of beta-2 which contains the conserved Asp in the Rossmanns common ancestor. The Walker-B Asp in the P-loops is also at the tip of the beta-strand adjacent to beta-1, as in the Rossmanns - although this is an inserted strand relative to the Rossmann topology. The authors propose feasible evolutionary scenarios for how the P-Loops and Rossmans may have diverged to acquire additional secondary structure elements extending the common beta-PBL-alpha-beta-Asp feature present in both superfamilies. Further compelling evidence is given by detection of a bridging protein - Tubulin - linking the two superfamilies. This has the distinct Rossmann topology but binds GTP in the P-loop NTPase mode. Furthermore, the GTP is hydrolysed by water activated by a ligated metal dication. Final support is given by reporting common sequence themes between the P-loop enzyme HPr kinase/phosphatase and some Rossmann proteins. The authors present further interesting and detailed analyses of similarities between the proteins sharing this unusual theme. The evidence provided by the authors for the shared beta-PBL-alpha-beta-Asp fragment seems very strong to me and has been presented in an interesting and informative way. Of course, it is not possible to know the subsequent evolutionary trajectories but the scenarios presented seem plausible.

      We thank the reviewer for their encouraging remarks on our manuscript.

      **I only have minor comments** 1) SCOP2 provides information on links between superfamilies based on rare sequence or structural features. Have the authors checked this resource for any details on beta-PBL-alpha-beta-ASP fragment? Or perhaps consulted with Alexey Murzin about this feature?

      The classification of Rossmann and P-Loop proteins in SCOP2 is consistent with the ECOD classification scheme. For further confirmation, we wrote Alexey Murzin and he replied that Rosmanns and P-Loops are annotated as two separate evolutionary lineages, termed “hyperfamilies” in SCOP2. He found our new evidence compelling, but that given the current criteria for shared ancestry, P-loops and Rossmanns are separate lineages.

      2) I was rather confused by the way in which EC annotations were collected for the two superfamilies ie via Pfam – wouldn’t it be better to use SUPERFAMILY as the domain structures would map directly to these sequence relatives. I’m also surprised that they only took the common EC from a Pfam family since the aim of this analysis was to identify how many different enzyme functions the two superfamilies supported. Pfam does not classify by function and so inevitably groups functionally diverse relatives. However, to get the full range of enzyme functions supported by these superfamilies I would have thought all non-redundant EC functions across these constituent Pfam families should be counted. Perhaps I have misunderstood.

      We have updated the analysis to make use of the SUPERFAMILY database and, as per your suggestion, we now count all non-redundant EC numbers. Although the EC number counts have somewhat changed, the major point – that these are exceptionally diverse evolutionary lineages – has not.

      3) The authors refer to a set of previously curated ‘themes’ and allude to a methodology that will be reported in a forthcoming manuscript. The idea of identifying rare themes and then using them to locate very distant homologues is appealing. However, I think some details should be provided here. For example, some brief details on the technology for detecting the themes and thresholds on significance. How rare are they and how conserved do these fragments need to be between superfamilies to join their curated list? Furthermore, how many of these curated themes are similar to the one reported in their article and do they get crosslinks to other superfamilies based on closely related themes? ie how unique is this theme to the P-loop and Rossmanns and are there closely related themes linking these two superfamilies to other superfamilies? I would imagine it is quite a distinct theme but I would have liked to see a few more details on this to reassure that there are no closely related themes.

      We have updated the manuscript to include a more detailed description of the methods used to detect bridging themes shared between the Rossmann and P-Loop evolutionary lineages. In addition, we now include a supplemental table (Table S2) with all of the initial hits from the theme analysis.

      4) The authors have built model structures to allow them to estimate ligand location in proteins with no structural characterisation. It would be helpful if they reported the degree of sequence similarity between the query and template proteins and also the model quality.

      We have updated this section to include more details. In addition, we have identified a structure from the same T-group to serve as our ligand donor. The updated ligand donor is more closely related to 1ko7 than the previous ligand donor, though the positioning of the ligand is effectively unchanged. We note that the global sequence identity to both the previous and new ligand donor is low (less than 30% sequence identity). However, the phosphate binding loops align well in both sequence and structure, as is detailed in the revised Methods section.


      The study by Longo et al. was devoted to evolutionary history of P-loop NTPases and Rossmann fold proteins. Although not related in sequence, the two protein families share some structural features that imply that they could be diverged from a common ancestor. Using bioinformatic analyses, the study under review identified some bridge proteins (of tubulin family) that share themes of both P-loops and Rossmanns, offering a possible support for the common ancestry. A minimum ancestral peptide structure is proposed based on the analysis and its possible diversification trajectory is hypothesized. Even though the divergence scenario is clearly outlined, the authors do not over-interpret the observations and admit that convergence could still explain the scenario. The methodology and results are sufficiently described and conclusions are explained in detail. Although it would be really interesting to design an experimental study to support the conclusion (and I suppose that the authors will do that), that is clearly outside the scope of this bioinformatic study.

      Obtaining experimental evidence for our hypothesis is far from trivial. Modern proteins, including the bridging ones identified here, may not be amenable to exchange due to differing contexts (epistasis). Still, we agree that highlighting experimental directions is a good idea. We have updated the sections From an ancestral seed to intact domains and Conclusion to include a brief discussion of experiments that may help test our hypotheses about the evolution of these protein lineages.

      I would not propose any major changes to the manuscript as I think that the message is very clear. **Minor comments:** (1)In the results section, the text is very clear but tends to be repetitive in places. I think the manuscript would be more easily readable if more to the point at some sections.

      We have edited the manuscript to remove cases of unnecessary repetition in the results section and throughout.

      (2)There is probably a few typos or unclear sentences, e.g. pg 5, mid-page, "The core, most common topology...); pg 12, three lines from the bottom "(where this element in canonical", probably should be "is canonical"; pg 11, mid page "the mode of binding of the catalytic dication of tubuling (often Ca2+)" - all the structures listed in Table S1 list Mg2+, so "often" is a bit misleading.

      We have corrected the unclear sentences and typos noted above, as well as a few others.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The study by Longo et al. was devoted to evolutionary history of P-loop NTPases and Rossmann fold proteins. Although not related in sequence, the two protein families share some structural features that imply that they could be diverged from a common ancestor. Using bioinformatic analyses, the study under review identified some bridge proteins (of tubulin family) that share themes of both P-loops and Rossmanns, offering a possible support for the common ancestry. A minimum ancestral peptide structure is proposed based on the analysis and its possible diversification trajectory is hypothesized.

      Even though the divergence scenario is clearly outlined, the authors do not over-interpret the observations and admit that convergence could still explain the scenario. The methodology and results are sufficiently described and conclusions are explained in detail. Although it would be really interesting to design an experimental study to support the conclusion (and I suppose that the authors will do that), that is clearly outside the scope of this bioinformatic study.

      I would not propose any major changes to the manuscript as I think that the message is very clear.

      Minor comments:

      (1)In the results section, the text is very clear but tends to be repetitive in places. I think the manuscript would be more easily readable if more to the point at some sections.

      (2)There is probably a few typos or unclear sentences, e.g. pg 5, mid-page, "The core, most common topology...); pg 12, three lines from the bottom "(where this element in canonical", probably should be "is canonical"; pg 11, mid page "the mode of binding of the catalytic dication of tubuling (often Ca2+)" - all the structures listed in Table S1 list Mg2+, so "often" is a bit misleading.

      Significance

      I think this is a very interesting analysis of the evolutionary history of the P-loop and Rossmann fold family which are considered among the most ancient and abundant protein folds. That makes them of high interest also for origins of protein structure. The results are not firmly conclusive (because of the limits of such analyses), making the outcomes of the study partly hypothetical. I think it would be very interesting to outline suggestions for future experiments that could test the hypothesis to be more valuable to a broader audience.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This is a fascinating and beautifully written article about the possible evolutionary relationship between two major protein superfamilies - the P-loop NTPases and the Rossmans. Both are ancient and highly diverse superfamilies, containing a significant proportion of all extant domain sequences and were probably amongst the earliest enzyme superfamilies to emerge in evolution. No major evolutionary classification of proteins, such as SCOP, reports evolutionary relationships between them.

      Both share the same structural architecture of a beta-alpha-beta 3-layer sandwich and have an intriguing number of other shared structural features including the location of the binding site for phospho-ligands. However, whilst both bind phosphorylated ribonucleosides, the mode of binding differs and also the manner in which these compounds are exploited. Furthermore, there are differences in the topologies of the folds possibly suggesting distinct evolutionary trajectories. The Rossmanns appear to be more structurally conserved, whilst the P-Loops vary more in their topologies and possibly represent less stable arrangements of beta-sheets and alpha-helices.

      The authors have brought together several strands of evidence to explore possibly evolutionary relationships. Detailed structural analyses allow the authors to explicitly detail the significant shared structural features. For example, similarities in the mode of binding the phosphate moiety in the ligand. The structural features are well described and there are appropriate illustrations visualising key differences and similarities.

      The shared features of the phosphate binding site likely emerged and were favoured early in evolution, as supported by other analyses reported by Longo et al. However, as the authors point out there are other compelling similarities including the equivalent location of this site in the first beta-loop-alpha element in both superfamilies, which is not a necessary constraint of phosphate binding and the authors support this by giving examples of phosphate binding at the tip of alpha-4. In addition, they provide evidence supporting the common involvement of beta-2 which contains the conserved Asp in the Rossmanns common ancestor. The Walker-B Asp in the P-loops is also at the tip of the beta-strand adjacent to beta-1, as in the Rossmanns - although this is an inserted strand relative to the Rossmann topology. The authors propose feasible evolutionary scenarios for how the P-Loops and Rossmans may have diverged to acquire additional secondary structure elements extending the common beta-PBL-alpha-beta-Asp feature present in both superfamilies.

      Further compelling evidence is given by detection of a bridging protein - Tubulin - linking the two superfamilies. This has the distinct Rossmann topology but binds GTP in the P-loop NTPase mode. Furthermore, the GTP is hydrolysed by water activated by a ligated metal dication. Final support is given by reporting common sequence themes between the P-loop enzyme HPr kinase/phosphatase and some Rossmann proteins. The authors present further interesting and detailed analyses of similarities between the proteins sharing this unusual theme.

      The evidence provided by the authors for the shared beta-PBL-alpha-beta-Asp fragment seems very strong to me and has been presented in an interesting and informative way. Of course, it is not possible to know the subsequent evolutionary trajectories but the scenarios presented seem plausible.

      I only have minor comments

      1)SCOP2 provides information on links between superfamilies based on rare sequence or structural features. Have the authors checked this resource for any details on beta-PBL-alpha-beta-ASP fragment? Or perhaps consulted with Alexey Murzin about this feature?

      2)I was rather confused by the way in which EC annotations were collected for the two superfamilies ie via Pfam - wouldn't it be better to use SUPERFAMILY as the domain structures would map directly to these sequence relatives. I'm also surprised that they only took the common EC from a Pfam family since the aim of this analysis was to identify how many different enzyme functions the two superfamilies supported. Pfam does not classify by function and so inevitably groups functionally diverse relatives. However, to get the full range of enzyme functions supported by these superfamilies I would have thought all non-redundant EC functions across these constituent Pfam families should be counted. Perhaps I have misunderstood.

      3)The authors refer to a set of previously curated 'themes' and allude to a methodology that will be reported in a forthcoming manuscript. The idea of identifying rare themes and then using them to locate very distant homologues is appealing. However, I think some details should be provided here. For example, some brief details on the technology for detecting the themes and thresholds on significance. How rare are they and how conserved do these fragments need to be between superfamilies to join their curated list? Furthermore, how many of these curated themes are similar to the one reported in their article and do they get crosslinks to other superfamilies based on closely related themes? ie how unique is this theme to the P-loop and Rossmanns and are there closely related themes linking these two superfamilies to other superfamilies? I would imagine it is quite a distinct theme but I would have liked to see a few more details on this to reassure that there are no closely related themes.

      4)The authors have built model structures to allow them to estimate ligand location in proteins with no structural characterisation. It would be helpful if they reported the degree of sequence similarity between the query and template proteins and also the model quality.

      Significance

      This article present compelling new evidence on the evolutionary relationship between two major, ancient enzyme superfamilies. As far as I'm aware these insights are novel and the detection of the bridging protein relative and the common 'theme', i.e. beta-PBL-alpha-beta-Asp fragment, is a new discovery.

      This work makes an important contribution to understanding the evolution of two major enzyme superfamilies and the insights can guide future evolutionary studies and protein design studies.

      The audience will be structural and evolutionary biologists, both experimental and computational.

      My expertise is in protein evolution and protein structure analyses and I have published a number of reviews and articles analysing and discussing Rossmann-like superfamilies.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      RESPONSE TO REVIEWER #1

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Ishihara et al. investigate and compare microtubule polymerization/depolymerization dynamics inside vs. at the periphery of microtubule asters in a cell-free Xenopus egg extract system. By tracking EB comets, which localize to growing microtubule ends, they find that the microtubule growth rates and EB comet lifetimes (interpreted as an indicator of microtubule catastrophe rates) are similar between the two spatially-distinct microtubule populations. However, using a tubulin-intensity-difference image analysis, the authors are also able to measure local microtubule depolymerization rates, and they find a significant difference in depolymerization rates of the two populations. Specifically, the authors report that the microtubule depolymerization rates measured within asters are faster than those measured at the periphery.

      \*Specific comments:***

      Figure 2.

      In the text, the authors report: "The depolymerization rate was 36.3 {plus minus} 7.9 μm/min (mean, std) in the aster interior, compared to 29.2 {plus minus} 8.9 μm/min (mean, std) at the aster periphery." This difference is certainly not two-fold (as stated in the abstract). It would also be useful to mark the mean rates on the graph in 2B.

      We removed the words ‘almost two-fold’ in the abstract. In the revision, we will mark the mean rates on Fig. 2B (using vertical lines).

      The bimodal shape of the depolymerization rate distributions in 2B is very interesting. This definitely warrants further investigation. At the minimum, the depolymerization rates should be determined at 50 um- intervals, as done for other parameters in Figure 1. Could it be that there are two coexisting populations of microtubules at the same location? Or is there a clear spatial compartmentalization of the two that is not obvious here because of the too large of a distance interval used for the measurements. This is a very important distinction for the claims of the paper.

      We understand the reviewer’s concern. There are some technical limitations that make the depolymerization measurement more challenging. While we use widefield imaging of EB1-GFP comets to obtain polymerization rates from a field of view spanning 500 microns, we may only use TIRF imaging for depolymerization measurements. In this method, we are limited to observing microtubules very close to the cover slip in a small field of view of 80x80 microns at 500 ms time intervals (movies span 1-2 minutes). One would need to move the TIRF field every 1-2 minutes at 50 micron intervals, but the aster periphery would be changing during this time, so the exact location of the measurement is hard to define. Thus, we opted to image the two spatial extremes: interior (close to the MTOCs) and the very periphery (where MT density is still sparse.)

      Perhaps, the largest limitation of this approach is the choice of peripheral regions based on the apparent sparsity of MTs in the TIRF field of view. Indeed, when we examine the depolymerization rate distributions for individual movies separately (see figure below, periphery #1-3 are three individual movies), we observe that some movies have rates as low as 20 µm/min, while others have higher values with a center around 36 µm/min. The depolymerization rates for the interior also vary from the mean values of 34.8-43.2 µm/min (interior #1-3 are three individual movies). In general, the spread of depolymerization rate within a field of view as well as across different fields of view is much larger than for polymerization. It is possible that this is partly explained by the lack of precise definition of interior vs. periphery in this TIRF-based measurement approach.

      Our data still supports the spatial regulation of depolymerization rate. However, there is no clear evidence for a bimodal distribution of depolymerization rate in any given field of view (80x80 micron square region). To clarify this point, we have removed the language “bimodal” in the main text. In the revisions, we will provide this figure as a supplement.

      We thank the critical feedback from reviewer #1 and #2 that allowed us to clarify this issue of apparent bimodality of the depolymerization rates.

      The authors make a point here that the distribution of measured polymerization rates is fairly narrow. This appears to be in contrast with Figure 1B, where polymerization rates take on a wide range of values. How do the two distributions of polymerization rates obtained by these two methods compare?

      To address this point, we directly compare the standard deviation of the polymerization rate measurements. For Fig. 1B EB1 tracking measurements, std ranges from 7.7-10.5 µm/min for a given spatial bin (as stated in Fig. 1B legend), while for Fig. 2A TIRF measurements std is 4.0 (periphery) and 4.5 µm/min (interior) as stated in the main text. Given that the mean values of polymerization rates are similar, this suggests that the TIRF measurements are less noisy. This further highlights the relative pros and cons of the two measurement methods. To discuss these issues, we have added a new paragraph in the discussion section.

      Figure 3.

      The laser ablation figure and movies are beautiful, but don't seem to add support to the story. Importantly, the authors do not confirm any spatial variability in depolymerization rate with these experiment. As a matter of fact, although the laser ablation experiments are only performed in the aster interior, the measured depolymerization rates appear to be just as consistent with the periphery rates in Figure 2. as they are with the interior rates in Figure 2. (They span quite a large range of values with the average right in the middle between what was measured for the two areas in Figure 2).

      Indeed, the values obtained with laser ablation are quite variable, even compared to the physiological depolymerization rate measured via TIRF microscopy. This perhaps reflects the variability of biology as well as the nature of the laser ablation which measures depolymerization rate at the level of microtubule populations. We hope our paper will increase interest in this rarely measured parameter, and perhaps invention of new probes to measure it more accurately and conveniently.

      Given the variability of our measurements, we conclude that the results between the TIRF based approach vs. laser ablation based approach of depolymerization rates are indistinguishable. We agree with the reviewer that the data does NOT argue that laser ablation results are more consistent with the interior TIRF measurements than peripheral TIRF measurements.

      To clarify this point, we remove the following clause “, which was comparable to the modal value of the depolymerization rates in the aster interior (Fig. 2).”

      We change the concluding sentence of our laser ablation paragraph from

      “Overall, these observations suggest that depolymerization dynamics are similar for plus ends following a natural catastrophe vs. ablation in the aster interior.”

      to

      “Overall, these observations confirm that depolymerization rates are variable, and we find no statistical distinction of rates between plus ends following a natural catastrophe vs. ablation.”

      Although the authors report they don't see any correlation between the distance and depolymerization rate, they should still plot the rate as a function of initial cut positions (Figures 3D, 3E).

      To address this concern, we plan to provide a supplemental figure in the revision. Please see the preliminary figure below. Due to technical limitations with the laser ablation system (field of view for 60x magnification), we only have measurements that span 15-100 microns from the center..

      From the single decaying inward wave the authors conclude that microtubules depolymerize fully to their minus ends which are distributed throughout the aster. Can the possibility that depolymerization is stopped by microtubule lattice defects/islands be excluded by these observations?

      The existence of microtubule lattice/defects is a recent development in the field and much is not known. If we assume that defects are structurally unstable, we predict that the episode of depolymerization will continue even when reaching a defect. If defects are stable and lead to instantaneous rescue of plus ends, we cannot distinguish the defects from minus ends. In this latter scenario, the interpretation of the decaying inward wave requires caution.

      What are the effects of the local increase in tubulin concentration due to the subunit release by depolymerization? What about the release of other lattice-binding MAPs (stabilizers)?

      We are interested in these questions as well. Soluble GDP-bound tubulin, released by depolymerization, is thought to exchange its nucleotide to GTP without need of a GEF, and no GEF is known. The dissociation rate of GDP is ~0.1 [1/sec], for a half-life of ~5 sec (Brylawski and Caplow, 1983, J. of Biol. Chem.), so we believe the tubulin subunits are recycled relatively quickly. It is not entirely obvious whether this necessarily results in a significant increase in ‘soluble’ tubulin concentration given tubulin diffusive transport. We hypothesize the main effect of stabilizing MAPs is on the depolymerization rate as discussed in our model in Fig. 5.

      Figure 4.

      Is the local depletion of tubulin/EB1 thought to be only within the narrow annulus at ~100 um distance, or is it not measurable on the inside due to the polymer signal? Can the two be separated? Such a sharp transition within a discrete annular region doesn't speak to the relative effects on the inside vs. the outside of the aster?!

      Yes, we also believe the soluble tubulin levels are even lower in the more inner regions of the aster. However, polymerized tubulin accounts for a large part of the fluorescence intensity in these inner regions, and our method does not faithfully reflect the soluble fraction. It will be important for future studies to employ specific methods that may unequivocally distinguish polymer vs. soluble tubulin concentrations (see below).

      More importantly, the local depletion of either tubulin or EB1 is not a good representation of a depletion of a MAP component that associates with the microtubule lattice. Both tubulin and EB1 bind preferably to microtubule ends, not lattice. Thus showing a profile of slight local tubulin and/or EB depletion does not seem to be relevant for the proposed model. Rather, overall microtubule polymer mass/density as a function of distance may be more relevant?

      Reviewer #1 makes a valid point that tubulin and EB1 are specifically incorporated to plus ends and not to the entire lattice as we assume for the MAPs in our theoretical model. To address this issue, we analyzed the fluorescence intensity of images obtained for a MAP that associates with the MT lattice, Tau-mCherry (Mooney et al. 2017). This quantification shows a depletion pattern similar to tubulin and EB1. Thus, we believe the local depletion is a general feature. For the revision, we plan to incorporate this Tau-mCherry data in Fig. 4.

      Figure 5.

      The toy model is intuitive and clear, but not sufficient without any experimental investigation. An attempt to quantify the actual distributions of at least one or a few selected proposed MAPs is needed. Is the depletion strongest where microtubule density is highest? What is the ratio of a MAP intensity to microtubule polymer density as a function of distance? How does that relate to local depolymerization rates? What are other testable model predictions that can show support for the proposed mechanism?

      We understand that our proposal is rather speculative, and the goal of this manuscript was to propose a hypothesis that may inspire others working on assembly on intracellular organelles. Although Tau is not an endogenous component of the egg extract system, we believe that our new quantification of Tau-mCherry depletion adds more credibility to our general proposal.

      Microtubule density is roughly uniform within the interior of the aster according to our current understanding (Ishihara et al. 2016 eLife). So the MAP:MT ratio is relatively uniform throughout the aster except at the very periphery where there are very few MTs assembled (i.e. “depletion is weakest where MT density is lowest.”)

      In the future, we may perform (1) FCS measurements of candidate MAPs to directly measure the concentration profile of the candidate MAP in soluble form and (2) depletion/addback to show which MAP most affects depolymerization rate. Although these experiments are appealing, this requires generation of new molecular reagents as well as calibration of a highly specialized optical method. Therefore, we decided to limit this paper to focus on the unusual observation of the variation of depolymerization rate and speculate the underlying mechanism.

      Also, the table is insufficiently described. Are any or all of these MAPs known to be specific regulators of microtubule depolymerization rates, but not other dynamics parameters?

      There are a large number of MAPs in Xenopus eggs, as there are in all cells, and the degree to which their effects on microtubules has been characterized is variable. To address this comment we include in the revised ms a list of known MAPs that are present in Xenopus egg extract, along with their estimated concentration from a published proteomic study. We annotate each MAP as to whether it increases or decreases microtubule stability, acknowledging that these data are very incomplete, in some cases there is disagreement in literature, and that we are combining pure protein and whole cell analysis. This table illustrates the challenge of associating dynamics regulation with any one MAP, since the behavior of microtubules is regulated by all these factors operating in parallel. That said, certain MAPs jump out as candidate depolymerization regulators that have been little studied for effects on dynamics, for example, MAP7.

      In the revision, we suggest to add this expanded table as a supplementary Table in addition to Table 1.

      Protein Description

      Gene Symbol

      Est. Conc. (nM)

      MT polymerization/nucleation/rescue?

      MT depolymerization/catastrophe?

      Lead reference

      Microtubule-associated protein RP/EB family member 1

      MAPRE1

      1800

      Increase

      Decrease

      PMID: 18364701

      Stathmin

      STMN1

      1600

      Decrease

      Increase

      PMID: 11792540

      MAP4

      MAP4

      960

      Increase

      Decrease

      PMID: 7962090

      Echinoderm microtubule-associated protein-like 2

      EML2

      580

      Decrease

      Increase

      PMID: 11694528

      EML4 protein

      EML4

      500

      Increase

      Decrease

      PMID: 17196341

      Disks large-associated protein 5

      DLGAP5

      380

      Increase

      Decrease

      PMID: 16631580

      Cytoskeleton-associated protein 5

      CKAP5

      300

      Increase

      Increase

      PMID: 23666085

      Kinesin-like protein KIF2C

      KIF2C

      200

      Decrease

      Increase

      PMID: 12620232

      CAP-Gly domain-containing linker protein 1

      CLIP1

      190

      na

      na

      Cytoskeleton-associated protein 4

      CKAP4

      160

      Increase

      Decrease

      PMID: 9799226

      Echinoderm microtubule-associated protein-like 1

      EML1

      140

      na

      na

      Ensconsin

      MAP7

      91

      na

      Decrease

      PMID: 31391261

      Targeting protein for Xklp2

      TPX2

      91

      Increase

      Decrease

      PMID: 26414402

      Microtubule-associated protein 1B

      MAP1B

      85

      Increase

      Decrease

      PMID: 7664878

      MAP1S

      MAP1S

      66

      Decrease

      Decrease

      PMID: 25300793

      Hyaluronan mediated motility receptor

      HMMR

      61

      na

      na

      MAP7 domain-containing protein 1

      MAP7D1

      47

      na

      na

      Cytoskeleton-associated protein 2

      CKAP2

      46

      Increase

      Decrease

      PMID: 15504249

      Microtubule-associated tumor suppressor 1

      MTUS1

      43

      na

      na

      Kinesin-like protein KIF2A

      KIF2A

      37

      Decrease

      Increase

      PMID: 29980677

      CLIP-associating protein 1

      CLASP1

      30

      Decrease

      Decrease

      PMID: 29937387

      Microtubule-associated protein RP/EB family member 3

      MAPRE3

      21

      Increase

      Decrease

      PMID: 20850319

      MAP7 domain containing 2 protein variant 2 (Fragment)

      MAP7D2

      8

      na

      na

      CAP-Gly domain-containing linker protein 4

      CLIP4

      2

      na

      na

      \*Minor comments:***

      Figure 1.

      typo in the figure legend: "interior (distance>300 μm) vs. periphery (50 μmThere appears to be a clear dip in EB1 density at 100 um (Figure 1C). What could be the cause of that?*

      Thank you for catching the typo. We corrected this to “periphery (distance>300 µm) vs. interior (50 µmFigure 2.

      Note that the distances used in Figure 2. to define 'interior' and 'periphery' are completely different than those in Figure 1. (Interior in Figure 1 is defined to be between 50 and 280 um from the MTOC, and exterior larger than 300 um. However, in Figure 2. interior is defined as less than 100 um, and exterior as larger than 200 um.) Given that the asters are actively growing, it would be good to clearly explain how these intervals were defined in each case.

      For both experiments, we had clearly stated the definitions of interior and periphery, either in the figure legends or in the methods section. We have added a new paragraph explaining why we could not choose exactly the same quantitative definitions for these two methods (please also see our reply to Reviewer #2 comment 1).

      In the periphery movie, there are several notable examples of apparent minus-end depolymerization and treadmilling. The authors state these are very rare - perhaps a quantification would be useful here?

      Thank you for pointing this out. We modified the sentence to reflect the outward depolymerization events in the periphery. “We observed few outward-moving depolymerization events (Reviewer #1 (Significance (Required)):

      The observation of distinct depolymerization rates within vs. at the periphery of microtubule asters is novel and interesting. However, the manuscript in its current form is rather preliminary. The observation can be significantly strengthened by additional experiments/analysis that would characterize the effect in more detail. Even more importantly, the authors propose a highly speculative (although compelling) mechanism, but make no attempt to test it in any way. This is a major deficiency of the current manuscript that should be addressed prior to publication.

      REFEREES CROSS COMMENTING

      I agree with Reviewer #2 that our comments are both overlapping and complementary. I also find Reviewer #2's comments fair and reasonable and see no need for further adjustments.

      RESPONSE TO REVIEWER #2

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      \*SUMMARY ***

      This paper reports measurements of microtubule dynamics in interphase asters nucleated in Xenopus egg extracts. Dynamics are measured using two methods. First tracking of GFP tagged EB1 protein forming comets at the tips of growing microtubules, as used in other studies, which can only measure growth rates. Second using a recently developed automated tracking based on subtractive difference images of fluorescently labelled microtubules, which can measure both growth and shrinkage rates. The main and novel observation of this paper, using difference image tracking, is that the MT shrinkage rate is ~2 fold faster in the interior of the aster compared with the periphery of the aster, whilst rates of MT polymerisation and catastrophe vary only slightly, if at all. The authors speculate that this might be due to a reduced MAP concentration and occupancy in the aster interior. They also discuss the role of a depletion-dependent increased shrinkage rate as a feedback mechanism to maintain a low MT polymer density in the aster interior.

      \*MAJOR COMMENTS***

      The movies are startling in their beauty and clarity and the key conclusion that the shrinkage rate is significantly faster in the interior compared to the periphery of the aster is convincing.

      The observation that the rate of net MT plus end growth rate is ~10% faster at the periphery compared to interior of the aster is only supported by EB1 tip tracking method. The difference imaging method shows no significant difference in rates. The authors need to discuss this discrepancy between the established and new methods of analysis. It is insufficient to state that the growth rates obtained by the two methods are "consistent".

      This comment prompts the comparison of the two methods (EB1 vs. TIRF difference imaging). On one hand, EB1 tracking is more sensitive in detecting plus ends, and allows large N observations so it is likely to show statistical significance. On the other hand, EB1 tracking method is noisier (higher standard deviation) than the TIRF based measurements (see our response to Reviewer #1). In the TIRF difference imaging, the exact location of the periphery (relative to the center as well as the overall microtubule density profile) is hard to evaluate.

      What is consistent between the two methods is the approximate mean value of polymerization rates. The 10% faster polymerization velocity is only suggested by the EB1 tracking method, calling for caution/further investigation. However, the potential relatively small difference in polymerization rate is not the main point of this paper.

      We deleted the sentence in the results section for the TIRF method: “These values of polymerization rates are consistent with EB1 comet tracking (Fig. 1). ” We have added a new paragraph discussing the discrepancies between the methods in reporting polymerization rate.

      The discussion proposing MAP depletion-dependent increased shrinkage rate as a feedback mechanism to limit MT polymer density is reasonable.

      The model and discussion of the role of MAPs might be criticised as highly speculative and unsupported by any experimental data. The authors do acknowledge this. Whether the ratio of data to speculative interpretation is appropriate will be an editorial decision for whichever journal ultimately hosts this.

      Thank you. This is exactly the kind of comments that we wanted to hear from an initiative like Review Commons. This helps us gauge how our work is received and decide which journal to submit our work.

      In particular since the aster forms by growth from the nucleating bead, early in its formation the final interior MTs must have first formed the peripheral MTs and could therefore enter fresh media and bind MAPs. The authors show by calculation that as the aster expands, these MTs and MAPs become isolated from mixing with the external media. This isolation would then suggest that any MAPS released by dissociation or MT depolymerisation must remain in the interior, and are therefore available to rebind to newly formed MTs. So, it is unclear why the MAPs should be depleted in the interior compared to the periphery, unless expansion of the Aster is slowed in which case additional MAPs could diffuse into the stationary periphery from the surrounding media. The kinetics of MT growth, MAP binding and aster expansion would then also be expected to have an effect on the outcome beyond a simple "depletion" of the internal MAP concentration.

      We use the term “depletion” to mean a significant decrease of MAP from the cytoplasm. As outlined in our toy model, more MTs lead to more MAP binding and depletion of soluble MAPs. Note that the total local abundance of MAP is constant unless there is significant diffusive transport of MAP from one region to another. We argue this transport is ineffective for the large length scale of interphase asters.

      It is also not clear how the authors preferred model would account for the suggestion of bimodal shrinkage rates. It is not clear if this is a simplification (binning things in to external and internal) applied for the purposes of discussion.

      Please see our comment to Reviewer #1. We now believe there is no evidence for bimodality of depolymerization rates. The spread of the data reflects the variability of depolymerization rates in a given a field of view as well as the variability across multiple fields of view.

      \*MINOR COMMENTS***

      Line 71

      Authors reference Gardner et al 2011, when discussing depolymerisation as a zero order process, as showing a free tubulin dimer concentration effect on shrinkage rates. However, the results in Gardner refer to the off rate during MT polymerisation, and measurements of rapid small scale events during overall growth phases and would be applicable to GTP-heterodimers, whereas the extended shrinkage events measured in this paper would presumably apply to post-catastrophe GDP-heterodimer dissociation and may not be comparable. The reference should be omitted or a further explanation given.

      Thanks, good point. We wanted to cite Gardner et al (2011) to make the point that classic assembly models may not always hold, but the reviewer is correct, that paper only looked at concentration dependence of depolymerization at growing ends. The text was changed to:

      “This assumption has been questioned for growing ends (Gardner 2011)​, but not for shrinking ends to our knowledge.”

      Line 89

      States "density of plus ends is approximately homogenous within interphase asters"

      However, in results section it is stated Line 111 that "the plus end density is lower at the periphery compared to the aster center".

      Please clarify

      The plus end density is approximately homogenous from the center to the periphery of the aster. However, only at the most peripheral region, where there are few microtubules, the density drops.

      Line 135

      The distances given for the interior and periphery appear to be mixed up.

      Thank you, we corrected this.

      Line277

      "approximately consistent with our Peclet number estimate". 50µm gives a Pe value of 2.8. The Peclat number "significance" is earlier given in terms of "Pe>>1" (Line255). Please clarify what range of experimental values is required for the argument to hold.

      Our statement was unclear. We modified the sentence in the following way to clarify our point: “The half-width of the depleted zone extended ~50 microns beyond the growing aster periphery, which is smaller than the typical aster radius. This analysis indicated that soluble protein levels may vary between subregions of growing asters due to subunit consumption.”

      Line 404

      needs details of the GFP-EB1 and fluorescent tubulin used in this experiment.

      The detailed concentrations are described for each method in the subsequent sections. To avoid confusion, we removed the sentence in line 404, which omitted details.

      The tubulin depletion measurements detect a 4% reduction in tubulin concentration in the interior versus the exterior, and the same for eGFP-EB1 (Fig.4B). This observation provides important support for the depletion proposal. But the experiments apparently lack a control for potential reduction of fluorescence excitation intensity with depth in these deep specimens (equivalent to the inner filter effect in spectroscopy). Is there a component whose apparent concentration (fluorescence emission intensity) does not decrease by 4% in the interior of the aster?

      Indeed, fluorescent intensity measurements require special attention. Our samples are made by squashing 4 ul of extract under a 18 mm x 18 mm coverslip and the resulting thickness is 10 micron, which we believe is a distance that is too small to result in an inner filter effect.

      In response to Reviewer #2’s request for an example of a component whose fluorescence intensity is uniform, we provide the intensity profile of the inert 10kDa Dextran labeled with Alexa568. This serves as a control for the reviewer’s specific concern with our method. We will incorporate this as a supplementary figure in the revision.

      There is no direct discussion of the relative lifetime of MTs in the interior compared to the exterior of the aster. Catastrophe rates and growth rates are essentially invariant, I think this implies that MT lifetimes are essentially the same in the interior versus the exterior? Please confirm and estimate the lifetime. This could exclude a maturation process whereby one set of MAPs got replaced by another over time?

      Indeed, MT lifetime is a function of four rates: polymerization, depolymerization, catastrophe, and rescue. The figure below shows the MT lifetime as a function of depolymerization rate, assuming other parameters are fixed at what we found in our previous report Ishihara et al. 2016. In regions of fast depolymerization rate 40 µm/min, the microtubule lifetime is 0.98 min. As the depolymerization rate decreases to 30 and 25 µm/min, the lifetime increases to 1.5 and 2.4 min. This implies that the microtubules at the aster periphery are longer lived than those in the interior.

      Association and dissociation rate constants have not been measured for most MAPs, but in general we expect them to be fast compared to the timescale of MT lifetime of ~1 minute. Most MAPs bind in the low micromolar or high nM regime, which implies dissociation rates of seconds or less. MAP4 and MAP7 were both shown to bind and dissociate rapidly in living cells (PMID: 16714020, PMID: 11719555)

      Reviewer #2 (Significance (Required)):

      This paper is significant as it is the first observation of spatial variation in MT shrinkage rates in an aster. It proposes the broad shape of an underlying mechanism (depletion of stabilising MAPS in the aster interior) and presents sound quantitative arguments, but the experiments do not directly test this mechanism. Aster formation in Xenopus egg extracts is widely used as a model system, and if indeed the spatial variation turns out to be due to spatial depletion of components then this will become a landmark paper. The paper may promote wider use of this method of automated analysis and encourage study of shrinkage rate mechanisms in other systems.

      REFEREES CROSS COMMENTING

      In my opinion the comments of reviewer #1 are fair and reasonable and overlap with and complement my own. In my opinion there is zero conflict requiring adjustment.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      SUMMARY

      This paper reports measurements of microtubule dynamics in interphase asters nucleated in Xenopus egg extracts. Dynamics are measured using two methods. First tracking of GFP tagged EB1 protein forming comets at the tips of growing microtubules, as used in other studies, which can only measure growth rates. Second using a recently developed automated tracking based on subtractive difference images of fluorescently labelled microtubules, which can measure both growth and shrinkage rates. The main and novel observation of this paper, using difference image tracking, is that the MT shrinkage rate is ~2 fold faster in the interior of the aster compared with the periphery of the aster, whilst rates of MT polymerisation and catastrophe vary only slightly, if at all. The authors speculate that this might be due to a reduced MAP concentration and occupancy in the aster interior. They also discuss the role of a depletion-dependent increased shrinkage rate as a feedback mechanism to maintain a low MT polymer density in the aster interior.

      MAJOR COMMENTS

      The movies are startling in their beauty and clarity and the key conclusion that the shrinkage rate is significantly faster in the interior compared to the periphery of the aster is convincing.

      The observation that the rate of net MT plus end growth rate is ~10% faster at the periphery compared to interior of the aster is only supported by EB1 tip tracking method. The difference imaging method shows no significant difference in rates. The authors need to discuss this discrepancy between the established and new methods of analysis. It is insufficient to state that the growth rates obtained by the two methods are "consistent".

      The discussion proposing MAP depletion-dependent increased shrinkage rate as a feedback mechanism to limit MT polymer density is reasonable.

      The model and discussion of the role of MAPs might be criticised as highly speculative and unsupported by any experimental data. The authors do acknowledge this. Whether the ratio of data to speculative interpretation is appropriate will be an editorial decision for whichever journal ultimately hosts this.

      In particular since the aster forms by growth from the nucleating bead, early in its formation the final interior MTs must have first formed the peripheral MTs and could therefore enter fresh media and bind MAPs. The authors show by calculation that as the aster expands, these MTs and MAPs become isolated from mixing with the external media. This isolation would then suggest that any MAPS released by dissociation or MT depolymerisation must remain in the interior, and are therefore available to rebind to newly formed MTs. So, it is unclear why the MAPs should be depleted in the interior compared to the periphery, unless expansion of the Aster is slowed in which case additional MAPs could diffuse into the stationary periphery from the surrounding media. The kinetics of MT growth, MAP binding and aster expansion would then also be expected to have an effect on the outcome beyond a simple "depletion" of the internal MAP concentration.

      It is also not clear how the authors preferred model would account for the suggestion of bimodal shrinkage rates. It is not clear if this is a simplification (binning things in to external and internal) applied for the purposes of discussion.

      MINOR COMMENTS

      Line 71 Authors reference Gardner et al 2011, when discussing depolymerisation as a zero order process, as showing a free tubulin dimer concentration effect on shrinkage rates. However, the results in Gardner refer to the off rate during MT polymerisation, and measurements of rapid small scale events during overall growth phases and would be applicable to GTP-heterodimers, whereas the extended shrinkage events measured in this paper would presumably apply to post-catastrophe GDP-heterodimer dissociation and may not be comparable. The reference should be omitted or a further explanation given.

      Line 89 States "density of plus ends is approximately homogenous within interphase asters" However, in results section it is stated Line 111 that "the plus end density is lower at the periphery compared to the aster center". Please clarify

      Line 135 The distances given for the interior and periphery appear to be mixed up.

      Line277 "approximately consistent with our Peclet number estimate". 50µm gives a Pe value of 2.8. The Peclat number "significance" is earlier given in terms of "Pe>>1" (Line255). Please clarify what range of experimental values is required for the argument to hold.

      Line 404 needs details of the GFP-EB1 and fluorescent tubulin used in this experiment.

      The tubulin depletion measurements detect a 4% reduction in tubulin concentration in the interior versus the exterior, and the same for eGFP-EB1 (Fig.4B). This observation provides important support for the depletion proposal. But the experiments apparently lack a control for potential reduction of fluorescence excitation intensity with depth in these deep specimens (equivalent to the inner filter effect in spectroscopy). Is there a component whose apparent concentration (fluorescence emission intensity) does not decrease by 4% in the interior of the aster?

      There is no direct discussion of the relative lifetime of MTs in the interior compared to the exterior of the aster. Catastrophe rates and growth rates are essentially invariant, I think this implies that MT lifetimes are essentially the same in the interior versus the exterior? Please confirm and estimate the lifetime. This could exclude a maturation process whereby one set of MAPs got replaced by another over time?

      Significance

      This paper is significant as it is the first observation of spatial variation in MT shrinkage rates in an aster. It proposes the broad shape of an underlying mechanism (depletion of stabilising MAPS in the aster interior) and presents sound quantitative arguments, but the experiments do not directly test this mechanism. Aster formation in Xenopus egg extracts is widely used as a model system, and if indeed the spatial variation turns out to be due to spatial depletion of components then this will become a landmark paper. The paper may promote wider use of this method of automated analysis and encourage study of shrinkage rate mechanisms in other systems.

      REFEREES CROSS COMMENTING

      In my opinion the comments of reviewer #1 are fair and reasonable and overlap with and complement my own. In my opinion there is zero conflict requiring adjustment.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Ishihara et al. investigate and compare microtubule polymerization/depolymerization dynamics inside vs. at the periphery of microtubule asters in a cell-free Xenopus egg extract system. By tracking EB comets, which localize to growing microtubule ends, they find that the microtubule growth rates and EB comet lifetimes (interpreted as an indicator of microtubule catastrophe rates) are similar between the two spatially-distinct microtubule populations. However, using a tubulin-intensity-difference image analysis, the authors are also able to measure local microtubule depolymerization rates, and they find a significant difference in depolymerization rates of the two populations. Specifically, the authors report that the microtubule depolymerization rates measured within asters are faster than those measured at the periphery.

      Specific comments:

      Figure 2. In the text, the authors report: "The depolymerization rate was 36.3 {plus minus} 7.9 μm/min (mean, std) in the aster interior, compared to 29.2 {plus minus} 8.9 μm/min (mean, std) at the aster periphery." This difference is certainly not two-fold (as stated in the abstract). It would also be useful to mark the mean rates on the graph in 2B.

      The bimodal shape of the depolymerization rate distributions in 2B is very interesting. This definitely warrants further investigation. At the minimum, the depolymerization rates should be determined at 50 um- intervals, as done for other parameters in Figure 1. Could it be that there are two coexisting populations of microtubules at the same location? Or is there a clear spatial compartmentalization of the two that is not obvious here because of the too large of a distance interval used for the measurements. This is a very important distinction for the claims of the paper.

      The authors make a point here that the distribution of measured polymerization rates is fairly narrow. This appears to be in contrast with Figure 1B, where polymerization rates take on a wide range of values. How do the two distributions of polymerization rates obtained by these two methods compare?

      Figure 3. The laser ablation figure and movies are beautiful, but don't seem to add support to the story. Importantly, the authors do not confirm any spatial variability in depolymerization rate with these experiment. As a matter of fact, although the laser ablation experiments are only performed in the aster interior, the measured depolymerization rates appear to be just as consistent with the periphery rates in Figure 2. as they are with the interior rates in Figure 2. (They span quite a large range of values with the average right in the middle between what was measured for the two areas in Figure 2).

      Although the authors report they don't see any correlation between the distance and depolymerization rate, they should still plot the rate as a function of initial cut positions (Figures 3D, 3E).

      From the single decaying inward wave the authors conclude that microtubules depolymerize fully to their minus ends which are distributed throughout the aster. Can the possibility that depolymerization is stopped by microtubule lattice defects/islands be excluded by these observations?

      What are the effects of the local increase in tubulin concentration due to the subunit release by depolymerization? What about the release of other lattice-binding MAPs (stabilizers)?

      Figure 4. Is the local depletion of tubulin/EB1 thought to be only within the narrow annulus at ~100 um distance, or is it not measurable on the inside due to the polymer signal? Can the two be separated? Such a sharp transition within a discrete annular region doesn't speak to the relative effects on the inside vs. the outside of the aster?!

      More importantly, the local depletion of either tubulin or EB1 is not a good representation of a depletion of a MAP component that associates with the microtubule lattice. Both tubulin and EB1 bind preferably to microtubule ends, not lattice. Thus showing a profile of slight local tubulin and/or EB depletion does not seem to be relevant for the proposed model. Rather, overall microtubule polymer mass/density as a function of distance may be more relevant?

      Figure 5. The toy model is intuitive and clear, but not sufficient without any experimental investigation. An attempt to quantify the actual distributions of at least one or a few selected proposed MAPs is needed. Is the depletion strongest where microtubule density is highest? What is the ratio of a MAP intensity to microtubule polymer density as a function of distance? How does that relate to local depolymerization rates? What are other testable model predictions that can show support for the proposed mechanism?

      Also, the table is insufficiently described. Are any or all of these MAPs known to be specific regulators of microtubule depolymerization rates, but not other dynamics parameters?

      Minor comments:

      Figure 1. typo in the figure legend: "interior (distance>300 μm) vs. periphery (50 μm<distance<280 μm)" There appears to be a clear dip in EB1 density at 100 um (Figure 1C). What could be the cause of that?

      Figure 2. Note that the distances used in Figure 2. to define 'interior' and 'periphery' are completely different than those in Figure 1. (Interior in Figure 1 is defined to be between 50 and 280 um from the MTOC, and exterior larger than 300 um. However, in Figure 2. interior is defined as less than 100 um, and exterior as larger than 200 um.) Given that the asters are actively growing, it would be good to clearly explain how these intervals were defined in each case.

      In the periphery movie, there are several notable examples of apparent minus-end depolymerization and treadmilling. The authors state these are very rare - perhaps a quantification would be useful here?

      Significance

      The observation of distinct depolymerization rates within vs. at the periphery of microtubule asters is novel and interesting. However, the manuscript in its current form is rather preliminary. The observation can be significantly strengthened by additional experiments/analysis that would characterize the effect in more detail. Even more importantly, the authors propose a highly speculative (although compelling) mechanism, but make no attempt to test it in any way. This is a major deficiency of the current manuscript that should be addressed prior to publication.

      REFEREES CROSS COMMENTING

      I agree with Reviewer #2 that our comments are both overlapping and complementary. I also find Reviewer #2's comments fair and reasonable and see no need for further adjustments.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to express our appreciation for both the Editors’ and Reviewers’ efforts as essential contributions to the peer review process. We highly value the Reviewers’ constructive critique of our manuscript#RC-2020_00434R entitled “A drug repurposing screen identifies hepatitis C antivirals as inhibitors of the SARS-CoV2 main protease.__” __

      We appreciate the Reviewers’ thoughtful consideration of our work and feel their critiques and recommendations have significantly improved our manuscript. Taken together, we believe the additional data, clarification of data presentation, and revised discussion address the heart of the Reviewers’ previous concerns. Thus we feel the work is ready for reconsideration and will be an impactful addition to the literature appropriate for publication. Below we provide a breakdown and a point by point response to previous review critiques.

      Thank you for your attention. We look forward to your response.

      Best Wishes,

      Brian Kraemer, PhD ▪ Associate Director for Research Geriatric Research Education and Clinical Center ▪ Veterans Affairs Puget Sound Health Care System ▪ Research Professor ▪ Departments of Medicine, Psychiatry and Behavioral Sciences, and Pathology ▪ University of Washington ▪ 1660 South Columbian Way ▪ Seattle, WA 98108 ▪ Phone 206-277-1071 ▪ www.kraemerlab.uw.edu

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Baker et al. report the screening of a collection of ~6,070 drugs for their inhibitory activity against the enzymatic activity of the SARS-SoV-2 Mpro protein in vitro using two peptide substrates. 50 compounds with activity against Mpro were identified and tested for their dose-dependent effect in the same assay. Several hits were identified, among which are approved drugs that target the HCV protease.

      Indeed, there is an urgent need for effective drugs for SARS-CoV-2 infection, and high throughput screenings can discover novel candidates. However, the novelty of this work is quite limited, as former screens have been published with the same target using the same substrates. Moreover, as discussed below the translational impact of the hits discussed is also quite limited, particularly in the absence of antiviral data. Lastly, there are several overstatements in the write up and it will require major editing.

      **Major comments:**

      1. Were there any positive controls previously shown to potently inhibit the SARS-CoV-2 Mpro included in the screen (e.g. ebselen)? How did these perform in this assay? When first designing our protease assay, we did use ebselen as the initial control. Ebselen showed low potency in all our in our assays and was not considered as a positive control subsequently. It should be noted that Ebselen failed to work against multiple substrates. It is possible that our buffer conditions prevented Ebselen activity. See data plotted below. After identifying boceprevir as a potent inhibitor, it was used in all subsequent assays as a positive control.

      It will be helpful if the authors would provide info re the 50 hits from prior screens conducted with this library of compounds - how promiscuous are they across screens? How toxic in cell based assays?

      We have updated the table to provide additional useful information as well as a footnote explaining statuses. The compounds in the Broad repurposing library are generally non-toxic and information about them can be found here: https://clue.io/repurposing

      The translational potential of the findings appears to be limited. The calculated IC50s for these drugs in the Mpro assay are very high (10-1000 fold higher) relative to their IC50 in an enzymatic assay involving the HCV protease (Boceprevir: IC50 = 0.95 μM vs. 0.084 μM in HCV), Ciluprevir (IC50 = 20.77 μM vs. 0.0087 in HCV), Telaprevir (IC50 = 15.25 μM vs.0.050 μM in HCV) (https://aac.asm.org/content/aac/57/12/6236.full.pdf ). In the absence of antiviral data, the main statement of the manuscript that "the work presented here supports the rapid evaluation of previous HCV NS3/4A inhibitors for repurposing as a COVID-19 therapy." is thus an overstatement. Even is there is some activity, since likely to be limited, as with the HIV protease inhibitors, its chances to elicit a meaningful clinical effect is low. Moreover, when used in monotherapy, some of these protease inhibitors have a very low genetic barrier to resistance.

      We have reworked the discussion to incorporate these concerns and limitations of our results.

      There are additional inaccurate or overstatements - e.g. line 61 "Probably the most successful approved antivirals are protease inhibitors such as atazanavir for HIV-1 and simeprevir for hepatitis C. [reviewed in 10 and 11]."

      We have reworded this statement: (Page 4, Lines 61-62)

      “There is precedence for targeting the protease, as this approach has been successful in treating both HIV-1 and hepatitis C (10,11).”

      The manuscript requires editing - e.g. structure of sentences, commas, spacing (including in the abstract) etc.

      The manuscript has been re-proofed throughout (see tracked changes version of manuscript)

      What is the take home message? The statement "Taken together this work suggests previous large-scale commercial drug development initiatives targeting hepatitis C NS3/4A viral protease should be revisited because some previous lead compounds may be more potent against SARS-CoV-2 Mpro than Boceprevir and suitable for rapid repurposing." is unclear.

      The take home message of the manuscript is that HCV-targeting protease inhibitors have potential in blocking the SARS-Cov2 protease and a more thorough analysis of the space is needed. As the reviewer pointed out, the identified hits boceprevir and narlaprevir are less potent when targeting the SARS-Cov2 protease as compared to the HCV protease. However, we believe this work does show the potential for screening HCV-targeting protease inhibitors that may not have made it to the clinic. For instance, Boceprevir or Narlaprevir analogs may be even more potent against Mrpo. Further, we believe that these compounds would benefit from further optimization through medicinal chemistry.

      We have expanded the discussion to incorporate issues brought up here and in point 3.

      Reviewer #1 (Significance (Required)):

      Limited. As discussed above

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      SARS-CoV-2 pandemic causing serious health crisis globally. There are no specific medicine or vaccines to contain this virus currently. To address this issue, the authors developed one efficient fluorescent Mpro assay system and screened ~6070 previous used drugs in this article. Several compounds with activity against SARS-CoV-2 Mpro in vitro were founded. Most hits are hepatitis C NS3/4A protease inhibitors with fair IC50 value. Besides, the authors found that most identified compounds in in silico screen lack activity against Mpro in kinetic protease assays.

      These research results are well proved and reproducible. But there are two minor questions I present below:

      1. In your Mpro assay optimization process you said substrate MCA-AVLQSGFR-K(Dnp)- K-NH2 had drastically lower rates of Mpro catalyzed hydrolysis and were not considered further in your assay development. And in your Fig.1 I saw extremely low RFU changes. But several nice inhibitors were screened using this substrate that was reported in April. Can you explain this result? The substrates used in our assay appear to be much more efficiently cleaved at least with our buffer conditions and Mpro concentrations tested. Variables including recombinant Mpro purity and activity, differences in assay buffer, reader sensitivity may all play a role, but our best guess is that the substrate identified by Marcin Drag’s group (https://doi.org/10.1101/2020.04.29.068890), is more readily cleaved by Mpro. Although screening with other reported substrates is feasible given previous results, we believe the Ac-Abu-Tle-Leu-Gln-AFC to be superior for use in high throughput screening because of its superior cleavage kinetics yielding an improved signal to background ratio for HTS.

      To exclude inhibitors possibly acting as aggregators, a detergent-based control should do at the same time when you do IC50 value measurement.

      Compound aggregation is a concern, and our assays were all run with detergent in the buffer. Our buffer composition was 20mM Tris pH 7.8, 150mM NaCl, 1mM EDTA, 1mM DTT, 0.05% Triton X-100.

      Reviewer #2 (Significance (Required)):

      Nice work but the significance of this article is losing now. Most screened hits are reported in the last serval months. Some inhibitor complex structures have been published or released on Protein Data Bank. The novelty is missing. I suggest the authors add more results and resubmit it again.

      **Referees Cross-commenting**

      I agree with the other two reviewers' comments. The significance of this work is losing but still has something interest. I think it can be published in the lower-impact journal if they complete our suggestions

      We concur with both reviewers that demonstration of antiviral activity would strengthen the impact of the manuscript. However, this work remains outside of the scope of feasibility at our institution. We believe that our screen and hit identification can stand on their own until further translational work can be completed.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this report, Baker et al. show that four inhibitors of hepatitis C virus (HCV) NS3/4 protease (ciluprevir, boceprevir, narlaprevir and telaprevir) are also effective inhibitors of the SARS-CoV-2 main protease (Mpro) in enzymatic assays, with lower IC50 values for narlaprevir and boceprevir (around 1 µM in their assay conditions). HCV NS3/4 inhibitors were identified after screening a library of >6,000 compounds of the Broad Institute, including approved drugs. Screening was done with fluorometric proteolytic assays.

      Experiments have been apparently well-done and results are sound. The manuscript needs editing.

      Reviewer #3 (Significance (Required)):

      Experiments have been apparently well-done and results are sound. However, this is a limited study since there are no data obtained in cell culture and a comparison of IC50 values of the selected drugs against HCV and SARS-CoV-2 proteases is missing. It is difficult to infer whether the drugs would be equally effective against SARS-CoV-2 than against HCV, and otherwise, how much should the doses increase in order to have a therapeutic effect.

      The manuscript needs editing (see below) and the Discussion is poor. The results reported by authors are not new, and a discussion of the effects of HCV inhibitors on SARS-CoV-2 replication, based on previous publications is necessary to provide the appropriate context for the study.

      Here are some references on Covid-19 and HCV inhibitors, that in my opinion should be considered for discussion and proper citation. As correctly pointed out by Baker and co- workers, docking studies should be considered with caution, though.

      We appreciate the feedback and have now reworked and expanded the discussion to incorporate reviewer #1 and #3 comments and suggestions.

      1: Ghahremanpour MM, Tirado-Rives J, Deshmukh M, Ippolito JA, Zhang CH, de Vaca IC, Liosi ME, Anderson KS, Jorgensen WL. Identification of 14 Known Drugs as Inhibitors of the Main Protease of SARS-CoV-2. bioRxiv [Preprint]. 2020 Aug 28:2020.08.28.271957. doi: 10.1101/2020.08.28.271957. PMID: 32869018; PMCID: PMC7457600.

      2: Sacco MD, Ma C, Lagarias P, Gao A, Townsend JA, Meng X, Dube P, Zhang X, Hu Y, Kitamura N, Hurst B, Tarbet B, Marty MT, Kolocouris A, Xiang Y, Chen Y, Wang J. Structure and inhibition of the SARS-CoV-2 main protease reveals strategy for developing dual inhibitors against Mpro and cathepsin L. bioRxiv [Preprint]. 2020 Jul 27:2020.07.27.223727. doi: 10.1101/2020.07.27.223727. PMID: 32766590; PMCID: PMC7402059.

      3: Ma C, Sacco MD, Hurst B, Townsend JA, Hu Y, Szeto T, Zhang X, Tarbet B, Marty MT, Chen Y, Wang J. Boceprevir, GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2viral replication by targeting the viral main protease. Cell Res. 2020 Aug;30(8):678-692. doi: 10.1038/s41422-020-0356-z. Epub 2020 Jun 15. PMID: 32541865; PMCID: PMC7294525.

      4: Ke YY, Peng TT, Yeh TK, Huang WZ, Chang SE, Wu SH, Hung HC, Hsu TA, Lee SJ, Song JS, Lin WH, Chiang TJ, Lin JH, Sytwu HK, Chen CT. Artificial intelligence approach fighting COVID-19 with repurposing drugs. Biomed J. 2020 May 15:S2319- 4170(20)30049-4. doi: 10.1016/j.bj.2020.05.001. Epub ahead of print. PMID: 32426387; PMCID: PMC7227517.

      5: Elzupir AO. Inhibition of SARS-CoV-2 main protease 3CLpro by means of α-ketoamide and pyridone-containing pharmaceuticals using in silico molecular docking. J Mol Struct. 2020 Dec 15;1222:128878. doi: 10.1016/j.molstruc.2020.128878. Epub 2020 Jul 10.

      PMID: 32834113; PMCID: PMC7347502.

      Additional computational studies:

      1: Hosseini FS, Amanlou M. Anti-HCV and anti-malaria agent, potential candidates to repurpose for coronavirus infection: Virtual screening, molecular docking, and molecular dynamics simulation study. Life Sci. 2020 Aug 8;258:118205. doi:10.1016/j.lfs.2020.118205. Epub ahead of print. PMID: 32777300; PMCID:PMC7413873.

      2: Hakmi M, Bouricha EM, Kandoussi I, Harti JE, Ibrahimi A. Repurposing of known anti- virals as potential inhibitors for SARS-CoV-2 main protease using molecular docking analysis. Bioinformation. 2020 Apr 30;16(4):301-306. doi:10.6026/97320630016301.

      PMID: 32773989; PMCID: PMC7392094.

      3: Chtita S, Belhassan A, Aouidate A, Belaidi S, Bouachrine M, Lakhlifi T. Discovery of Potent SARS-CoV-2 Inhibitors from Approved Antiviral Drugs via Docking Screening. Comb Chem High Throughput Screen. 2020 Jul 30. doi:10.2174/1386207323999200730205447. Epub ahead of print. PMID: 32748740.

      4: Alamri MA, Tahir Ul Qamar M, Mirza MU, Bhadane R, Alqahtani SM, Muneer I, Froeyen M, Salo-Ahen OMH. Pharmacoinformatics and molecular dynamics simulation studies reveal potential covalent and FDA-approved inhibitors of SARS-CoV-2 main protease 3CLpro. J Biomol Struct Dyn. 2020 Jun 24:1-13. doi:10.1080/07391102.2020.1782768. Epub ahead of print. PMID: 32579061; PMCID:PMC7332866.

      5: Bafna K, Krug RM, Montelione GT. Structural Similarity of SARS-CoV2 Mpro and HCV NS3/4A Proteases Suggests New Approaches for Identifying Existing Drugs Useful as COVID-19 Therapeutics. ChemRxiv [Preprint]. 2020 Apr 21. doi: 10.26434/chemrxiv.12153615. PMID: 32511291; PMCID: PMC7263768.

      6: Eleftheriou P, Amanatidou D, Petrou A, Geronikaki A. In Silico Evaluation of the Effectivity of Approved Protease Inhibitors against the Main Protease of the Novel SARS- CoV-2 Virus. Molecules. 2020 May 29;25(11):2529. doi:10.3390/molecules25112529.

      PMID: 32485894; PMCID: PMC7321236.

      7: Wang J. Fast Identification of Possible Drug Treatment of Coronavirus Disease-19 (COVID-19) through Computational Drug Repurposing Study. J Chem Inf Model. 2020 Jun 22;60(6):3277-3286. doi: 10.1021/acs.jcim.0c00179. Epub 2020 May 4. PMID: 32315171; PMCID: PMC7197972.

      8: Chen YW, Yiu CB, Wong KY. Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL pro) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates. F1000Res. 2020 Feb 21;9:129. doi: 10.12688/f1000research.22457.2. PMID: 32194944; PMCID: PMC7062204.

      Minor comments:

      We appreciate the time that the reviewer has taken to address grammatical changes and have addressed each throughout the manuscript with tracked changes.

      p.2, line 26: > appears as an attractive

      Manuscript edited

      p.2, line 27: > we show that the existing

      Manuscript edited

      p.2, line 33: > separate numbers and units, eg. 1.10 µM (this is a persisting error that should be corrected throughout the whole ms)

      Manuscript edited

      p.4, line 44: SARS virus should be referred as to SARS-CoV-1 throughout the whole manuscript. MERS-CoV is the name of the virus causing MERS

      Manuscript edited

      p.4, lines 61-62: > the selection of the specific compounds seems to be arbitrary... why atazanavir and not darunavir or other? The sentence should be rewritten.

      Rewritten as: “There is precedence for targeting the protease, as this approach has been successful in treating both HIV-1 and hepatitis C.”

      p.6, line 100: Citing Fig. 2B before completing the description of Fig. 1 is distracting. Authors should think of a better way to describe their results.

      This was a mistake and should have cited Fig 1B. Thank you for catching this.

      p.7, line 116: It is not clear what "10m-20,810" means

      This has been clarified to state: “ΔRFU at 10 minutes = 20,810 relative fluorescence units”

      p.7, lines 125-126: These sentences belong to an introduction, not appropriate in results section.

      We have removed these sentences.

      Figure 2. Part A is not necessary in results (ok for introduction). Black and purple dots in part B is not a good choice since they are difficult to distinguish, maybe orange and black is better.

      We have removed panel A, expanded the size of panel B and changed the color.

      Table 1: Status should be explained in a footnote (i.e the distinction between launched, P2/P3, phase 2, preclinical is not clear).

      The one compound indicated in P2/P3 development is now Phase 3 and the table has been updated. We have added a footnote:

      *Launched = compound approved for humans, though may only be approved for veterinary use in some countries

      Discussion. I think that subheadings are not necessary.

      Subheadings have been removed from the discussion.

      **Referees cross-commenting** I agree with reviewer no. 1 on the limited interest of the study. However, it could be published in a specialized lower-impact journal after addressing issues raised by reviewers 2 and 3 (likely to be completed in less than a month)

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this report, Baker et al. show that four inhibitors of hepatitis C virus (HCV) NS3/4 protease (ciluprevir, boceprevir, narlaprevir and telaprevir) are also effective inhibitors of the SARS-CoV-2 main protease (Mpro) in enzymatic assays, with lower IC50 values for narlaprevir and boceprevir (around 1 µM in their assay conditions). HCV NS3/4 inhibitors were identified after screening a library of >6,000 compounds of the Broad Institute, including approved drugs. Screening was done with fluorometric proteolytic assays.

      Experiments have been apparently well-done and results are sound. The manuscript needs editing.

      Significance

      Experiments have been apparently well-done and results are sound. However, this is a limited study since there are no data obtained in cell culture and a comparison of IC50 values of the selected drugs against HCV and SARS-CoV-2 proteases is missing. It is difficult to infer whether the drugs would be equally effective against SARS-CoV-2 than against HCV, and otherwise, how much should the doses increase in order to have a therapeutic effect. The manuscript needs editing (see below) and the Discussion is poor. The results reported by authors are not new, and a discussion of the effects of HCV inhibitors on SARS-CoV-2 replication, based on previous publications is necessary to provide the appropriate context for the study. Here are some references on Covid-19 and HCV inhibitors, that in my opinion should be considered for discussion and proper citation. As correctly pointed out by Baker and co-workers, docking studies should be considered with caution, though.

      1: Ghahremanpour MM, Tirado-Rives J, Deshmukh M, Ippolito JA, Zhang CH, de Vaca IC, Liosi ME, Anderson KS, Jorgensen WL. Identification of 14 Known Drugs as Inhibitors of the Main Protease of SARS-CoV-2. bioRxiv [Preprint]. 2020 Aug 28:2020.08.28.271957. doi: 10.1101/2020.08.28.271957. PMID: 32869018; PMCID: PMC7457600.

      2: Sacco MD, Ma C, Lagarias P, Gao A, Townsend JA, Meng X, Dube P, Zhang X, Hu Y, Kitamura N, Hurst B, Tarbet B, Marty MT, Kolocouris A, Xiang Y, Chen Y, Wang J. Structure and inhibition of the SARS-CoV-2 main protease reveals strategy for developing dual inhibitors against M<sup>pro</sup> and cathepsin L. bioRxiv [Preprint]. 2020 Jul 27:2020.07.27.223727. doi: 10.1101/2020.07.27.223727. PMID: 32766590; PMCID: PMC7402059.

      3: Ma C, Sacco MD, Hurst B, Townsend JA, Hu Y, Szeto T, Zhang X, Tarbet B, Marty MT, Chen Y, Wang J. Boceprevir, GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viral main protease. Cell Res. 2020 Aug;30(8):678-692. doi: 10.1038/s41422-020-0356-z. Epub 2020 Jun 15. PMID: 32541865; PMCID: PMC7294525.

      4: Ke YY, Peng TT, Yeh TK, Huang WZ, Chang SE, Wu SH, Hung HC, Hsu TA, Lee SJ, Song JS, Lin WH, Chiang TJ, Lin JH, Sytwu HK, Chen CT. Artificial intelligence approach fighting COVID-19 with repurposing drugs. Biomed J. 2020 May 15:S2319-4170(20)30049-4. doi: 10.1016/j.bj.2020.05.001. Epub ahead of print. PMID: 32426387; PMCID: PMC7227517.

      5: Elzupir AO. Inhibition of SARS-CoV-2 main protease 3CLpro by means of α-ketoamide and pyridone-containing pharmaceuticals using in silico molecular docking. J Mol Struct. 2020 Dec 15;1222:128878. doi: 10.1016/j.molstruc.2020.128878. Epub 2020 Jul 10. PMID: 32834113; PMCID: PMC7347502.

      Additional computational studies:

      1: Hosseini FS, Amanlou M. Anti-HCV and anti-malaria agent, potential candidates to repurpose for coronavirus infection: Virtual screening, molecular docking, and molecular dynamics simulation study. Life Sci. 2020 Aug 8;258:118205. doi:10.1016/j.lfs.2020.118205. Epub ahead of print. PMID: 32777300; PMCID:PMC7413873.

      2: Hakmi M, Bouricha EM, Kandoussi I, Harti JE, Ibrahimi A. Repurposing of known anti-virals as potential inhibitors for SARS-CoV-2 main protease using molecular docking analysis. Bioinformation. 2020 Apr 30;16(4):301-306. doi:10.6026/97320630016301. PMID: 32773989; PMCID: PMC7392094.

      3: Chtita S, Belhassan A, Aouidate A, Belaidi S, Bouachrine M, Lakhlifi T. Discovery of Potent SARS-CoV-2 Inhibitors from Approved Antiviral Drugs via Docking Screening. Comb Chem High Throughput Screen. 2020 Jul 30. doi:10.2174/1386207323999200730205447. Epub ahead of print. PMID: 32748740.

      4: Alamri MA, Tahir Ul Qamar M, Mirza MU, Bhadane R, Alqahtani SM, Muneer I, Froeyen M, Salo-Ahen OMH. Pharmacoinformatics and molecular dynamics simulation studies reveal potential covalent and FDA-approved inhibitors of SARS-CoV-2 main protease 3CL<sup>pro</sup>. J Biomol Struct Dyn. 2020 Jun 24:1-13. doi:10.1080/07391102.2020.1782768. Epub ahead of print. PMID: 32579061; PMCID:PMC7332866.

      5: Bafna K, Krug RM, Montelione GT. Structural Similarity of SARS-CoV2 M<sup>pro</sup> and HCV NS3/4A Proteases Suggests New Approaches for Identifying Existing Drugs Useful as COVID-19 Therapeutics. ChemRxiv [Preprint]. 2020 Apr 21. doi: 10.26434/chemrxiv.12153615. PMID: 32511291; PMCID: PMC7263768.

      6: Eleftheriou P, Amanatidou D, Petrou A, Geronikaki A. In Silico Evaluation of the Effectivity of Approved Protease Inhibitors against the Main Protease of the Novel SARS-CoV-2 Virus. Molecules. 2020 May 29;25(11):2529. doi:10.3390/molecules25112529. PMID: 32485894; PMCID: PMC7321236.

      7: Wang J. Fast Identification of Possible Drug Treatment of Coronavirus Disease-19 (COVID-19) through Computational Drug Repurposing Study. J Chem Inf Model. 2020 Jun 22;60(6):3277-3286. doi: 10.1021/acs.jcim.0c00179. Epub 2020 May 4. PMID: 32315171; PMCID: PMC7197972.

      8: Chen YW, Yiu CB, Wong KY. Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL <sup>pro</sup>) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates. F1000Res. 2020 Feb 21;9:129. doi: 10.12688/f1000research.22457.2. PMID: 32194944; PMCID: PMC7062204.

      Minor comments:

      p.2, line 26: > appears as an attractive

      p.2, line 27: > we show that the existing

      p.2, line 33: > separate numbers and units, eg. 1.10 µM (this is a persisting error that should be corrected throughout the whole ms)

      p.4, line 44: SARS virus should be referred as to SARS-CoV-1 throughout the whole manuscript. MERS-CoV is the name of the virus causing MERS

      p.4, lines 61-62: > the selection of the specific compounds seems to be arbitrary... why atazanavir and not darunavir or other? The sentence should be rewritten.

      p.6, line 100: Citing Fig. 2B before completing the description of Fig. 1 is distracting. Authors should think of a better way to describe their results.

      p.7, line 116: It is not clear what "10m-20,810" means

      p.7, lines 125-126: These sentences belong to an introduction, not appropriate in results section.

      Figure 2. Part A is not necessary in results (ok for introduction). Black and purple dots in part B is not a good choice since they are difficult to distinguish, maybe orange and black is better.

      Table 1: Status should be explained in a footnote (i.e the distinction between launched, P2/P3, phase 2, preclinical is not clear).

      Discussion. I think that subheadings are not necessary.

      Referees cross-commenting

      I agree with reviewer no. 1 on the limited interest of the study. However, it could be published in a specialized lower-impact journal after addressing issues raised by reviewers 2 and 3 (likely to be completed in less than a month)

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      SARS-CoV-2 pandemic causing serious health crisis globally. There are no specific medicine or vaccines to contain this virus currently. To address this issue, the authors developed one efficient fluorescent Mpro assay system and screened ~6070 previous used drugs in this article. Several compounds with activity against SARS-CoV-2 Mpro in vitro were founded. Most hits are hepatitis C NS3/4A protease inhibitors with fair IC50 value. Besides, the authors found that most identified compounds in in silico screen lack activity against Mpro in kinetic protease assays.

      These research results are well proved and reproducible. But there are two minor questions I present below:

      1.In your Mpro assay optimization process you said substrate MCA-AVLQSGFR-K(Dnp)-K-NH2 had drastically lower rates of Mpro catalyzed hydrolysis and were not considered further in your assay development. And in your Fig.1 I saw extremely low RFU changes. But several nice inhibitors were screened using this substrate that was reported in April. Can you explain this result?

      2.To exclude inhibitors possibly acting as aggregators, a detergent-based control should do at the same time when you do IC50 value measurement.

      Significance

      Nice work but the significance of this article is losing now. Most screened hits are reported in the last serval months. Some inhibitor complex structures have been published or released on Protein Data Bank. The novelty is missing. I suggest the authors add more results and resubmit it again.

      Referees Cross-commenting

      I agree with the other two reviewers' comments. The significance of this work is losing but still has something interest. I think it can be published in the lower-impact journal if they complete our suggestions

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Baker et al. report the screening of a collection of ~6,070 drugs for their inhibitory activity against the enzymatic activity of the SARS-SoV-2 Mpro protein in vitro using two peptide substrates. 50 compounds with activity against Mpro were identified and tested for their dose-dependent effect in the same assay. Several hits were identified, among which are approved drugs that target the HCV protease.<br> Indeed, there is an urgent need for effective drugs for SARS-CoV-2 infection, and high throughput screenings can discover novel candidates. However, the novelty of this work is quite limited, as former screens have been published with the same target using the same substrates. Moreover, as discussed below the translational impact of the hits discussed is also quite limited, particularly in the absence of antiviral data. Lastly, there are several overstatements in the write up and it will require major editing.

      Major comments:

      1.Were there any positive controls previously shown to potently inhibit the SARS-CoV-2 Mpro included in the screen (e.g. ebselen)? How did these perform in this assay?

      2.It will be helpful if the authors would provide info re the 50 hits from prior screens conducted with this library of compounds - how promiscuous are they across screens? How toxic in cell based assays?

      3.The translational potential of the findings appears to be limited. The calculated IC50s for these drugs in the Mpro assay are very high (10-1000 fold higher) relative to their IC50 in an enzymatic assay involving the HCV proteast (Boceprevir: IC50 = 0.95 μM vs. 0.084 μM in HCV), Ciluprevir (IC50 = 20.77 μM vs. 0.0087 in HCV), Telaprevir (IC50 = 15.25 μM vs. 0.050 μM in HCV) (https://aac.asm.org/content/aac/57/12/6236.full.pdf ). In the absence of antiviral data, the main statement of the manuscript that "the work presented here supports the rapid evaluation of previous HCV NS3/4A inhibitors for repurposing as a COVID-19 therapy." is thus an overstatement. Even is there is some activity, since likely to be limited, as with the HIV protease inhibitors, its chances to elicit a meaningful clinical effect is low. Moreover, when used in monotherapy, some of these protease inhibitors have a very low genetic barrier to resistance.

      4.There are additional inaccurate or overstatements - e.g. line 61 "Probably the most successful approved antivirals are protease inhibitors such as atazanavir for HIV-1 and simeprevir for hepatitis C. [reviewed in 10 and 11]."

      5.The manuscript requires editing - e.g. structure of sentences, commas, spacing (including in the abstract) etc.

      6.What is the take home message? The statement "Taken together this work suggests previous large-scale commercial drug development initiatives targeting hepatitis C NS3/4A viral protease should be revisited because some previous lead compounds may be more potent against SARS-CoV-2 Mpro than Boceprevir and suitable for rapid repurposing." is unclear.

      Significance

      Limited. As discussed above

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers and Revision Plan

      We thank all three reviewers for their time and their comments on our manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Here Ryan et al. have used localization analysis following induced rapid relocalization of endogenous proteins to investigate the composition and recruitment hierarchy of a clathrin-TACC3-based spindle complex that is important for microtubule organization and stability.

      The authors generate different HeLa cell lines, each with one of four complex members (TACC3, CLTA, chTOG and GTSE1) endogenously tagged with FKBP-GFP via Cas9-mediated editing. This tag allows rapid recruitment to the mitochondria upon rapamycin addition ("knocksideways"). They ultimately quantify each of the 4 components' localization to the spindle following knocksideways of each component using fluorescently-tagged transfected constructs. The authors' interpretation of the results of this analysis are summarized in the last model figure, in which a core MT-binding complex of clathrin and TACC3 recruit the ancillary components GTSE1 and chTOG. In addition, the authors investigate the contribution of individual clathrin-binding LIDL motifs in GTSE1 to the recruitment of clathrin and GTSE1 to spindles. Their findings here largely agree with and confirm a recent report regarding the contribution of these motifs to GTSE1 recruitment to the spindle. They further analyzed GTSE1 fragments for interphase and mitotic microtubule localization, and identified a second region of GTSE1 required (but not sufficient) for spindle localization. Finally, the authors report that PIK3C2A is not part of this complex, contradicting (correcting) a previously published study.

      **Major comments:**

      1.The chTOG-FKBP-GFP cell line the authors generate has only a small fraction of chTOG tagged, and thus should not be used for any conclusions about protein localization dependency on chTOG. Because they were unable to construct a HeLa cell line with all copies tagged, the authors expect that the homozygous knock-in of chTOG-FKBP-GFP is lethal, and thus their experience is appropriate to report. However, the authors should not use this cell line alone to make statements about chTOG dependency. They would have to use similar localization analysis, but after another method to disrupt chTOG (as a second-best approach), such as RNAi. In fact, they have reported this in a previous publication (Booth et al 2011). However, the result was different. There, loss of chTOG resulted in reduced clathrin on spindles, suggesting it may stabilize or help recruit the complex. Alternatively, they could remove their chTOG data, but this would compromise the "comprehensive" nature of the work.

      The referee is correct. The point here is to show the results we had using this approach for all four proteins under study. For this reason, we do not want to remove this data and prefer to show our results “warts-and-all”. We feel that the shortcomings of our approach are honestly presented and discussed in the manuscript. While only a fraction of chTOG was tagged, we should expect some co-removal after its induced mislocalization. Since we saw no change, we concluded that chTOG is auxiliary.

      The “second best” approach suggested (RNAi of chTOG) is problematic for two reasons. First, chTOG RNAi results in gross changes to spindle structure (multipolar spindles) and it is difficult to pick apart differences in protein partner localization that result from loss of chTOG from those resulting from changes in spindle structure. Second, the paper is about induced mislocalization as a method for determining protein complexes once a normal spindle has formed. So, removing chTOG prior to mitosis is not comparable. If we get the same or different result, does it confirm or conflict with the data we have? Nonetheless, given the discrepancy with our earlier work, we should investigate this further.

      To address this concern, we will stain endogenous clathrin, TACC3 and GTSE1 following chTOG RNAi and measure their relative levels at the spindle.

      Making the chTOG-FKBP-GFP cell line was difficult. As described in the paper, we only recovered heterozygous clones despite repeated attempts. Since submission, we have been made aware of a HCT116 chTOG-FKBP-GFP cell line that is reported to be homozygously tagged (Cherry et al. 2019 doi: 10.1002/glia.23628).

      A note about this cell line has been added to the paper (Results section, final sentence of 1st paragraph).

      2.The authors initially analyze complex member localization after knocksideways experiments by antibody staining, which has the advantage of analyzing endogenous proteins (versus the later transfected fluorescent constructs). Setting aside potential artefacts from fixation, this would seem to be a better method for controlled analysis to take advantage of their setup (short of generating stable cell lines with second proteins endogenously tagged in a second color - a huge undertaking). The authors conclude that antibody specificity problems confounded their analysis and explained unusual results. However, I think is worth investing a little more effort to sort this out, rather than bringing doubt to the whole data set. Verifying and then using another antibody for chTOG localization would be informative. Of course, the negative control should not be their chTOG-FKBP-GFP line, as it does not relocalize most of chTOG.

      In the case of GTSE1, an alternative explanation to antibody specificity issues would be that the GTSE1-FKBP-GFP cell line is not in fact homozygously tagged. Given the low expression levels on the western provided, and the detection of GTSE1 on the spindle in the induced GTSE1-FKBP-GFP cell line (but not TACC3-FKBP-GFP), it seems plausible that an untagged copy remains. If there are multiple copies of GTSE1 in Hela cells, one untagged copy could represent a small fraction of total GTSE1. This should thus be ruled out. GTSE1 clones should be analyzed with more protein extracts loaded - dilutions of the extracts can determine the sensitivity of the blot to lower protein levels. In addition, sequencing of genomic DNA can reveal a small percentage with different reads.

      We used a two-pronged approach for assessing relocalization of protein partners (staining vs transfected constructs). The staining approach is superior since endogenous proteins are examined, but it is limited by antibody specificity. The transfection approach overcomes this limitation but is in turn limited by effects of overexpression and tagging. Together the two approaches allow us, and anyone employing this method, to get a picture of protein complexes. We didn’t want to create the impression that one or other approach is confounded, but the referee is correct that this analysis would benefit from further work.

      Specifically, to address these concerns:

      • We will verify and use alternative chTOG antibodies to try to improve this dataset.
      • We will test the possibility that an untagged allele of GTSE1 remains. We will use western blotting and a summary of our genomic analysis will be added to the paper.

        3.There is a lot of data contained in the small graphs summarizing quantification of localization in Figs 3 and 4. They would be more accessible to the reader if they were larger and/or an "example" of the chart with labels was present explaining it (essentially what is in the figure legends). Furthermore, there is no statistical test applied to this data that I see. This is needed. How do authors determine whether there is an "effect"?

      Our aim was to compress a lot of information into a small space, while still showing some example primary data. All reviewers raised the same concern which tells us that we went too far towards “data visualization”.

      To address this point, we will rework these figures.

      **Minor issues:**

      1.The GTSE1 constructs used for mutation and localization analysis are 720 amino acids long. A recent study analyzing similar mutations uses a 739 amino acid construct (Rondelet et al 2020). The latter is the predominant transcript in NCBI and Ensembl databases. It appears the construct used by the authors omits the first 19 a.a.. I do not think using the truncated transcript affects conclusions of the manuscript, but it could generate confusion when identifying residues based on a.a.#s of mutant constructs (Fig 6). This should be somehow clarified.

      We were aware of the longer transcript but were using the 720 residue form since it is the canonical sequence in Uniprot (https://www.uniprot.org/uniprot/Q9NYZ3). We did not know that the 739 form is the predominant transcript. We agree this is unlikely to affect our work but that the numbering may cause confusion.

      We have added a note to the Methods (Molecular Biology section) to accurately describe what we and Rondelet et al. have used.

      2.The labeling of constructs in Fig 6C/D is confusing, and appears shifted by eye at places. Please relabel this more clearly.

      Apologies for the error.

      We have relabeled Figure 6C,D and also made a similar alteration to Figure 5C.

      The recommended new experimental data (Analysis complex member levels on spindles after full perturbation of spindle chTOG; new chTOG antibody stainings in the FKBP lines; reanalysis of GTSE1 DNA/protein in GTSE1-FKBP line) should only require a new antibody/siRNA, plus a few weeks time to repeat the analyses already in the paper with new reagents.

      Reviewer #1 (Significance (Required)):

      While multiple individual components of this complex have been previously characterized, the structure and nature of the complex formation and its recruitment to microtubules/spindles remains a complex problem that has yet to be solved.

      Overall this study represents a comprehensive localization-dependency analysis of the Clathrin-TACC3 based spindle complex using a consistent methodology. Although several of the conclusions of the findings echo previous reports, some of the previous literature is contradictory within itself as well as with the conclusions here. Analyzing all components with a single, rapid-perturbation technique thus has great value to present a clear data set, given that the experimental setup conditions and analysis are solid (a goal to which the majority of comments refer).

      Beyond the complex localization/recruitment analysis, two novel findings of this study that emerge are:

      a)GTSE1 contains a second, separate protein region, distinct from the clathrin-binding motifs that is required for its localization to the spindle, and most likely a microtubule-interaction site. This suggests that GTSE1 recruitment to the spindle is more complex than previously reported.

      b)PI3KC2A, which has been reported previously to be a stabilizing member of this complex, is in fact not a member, nor localizes to spindles, nor displays a mitotic defect after loss. This is important conclusion to be made as it would correct the literature, and avoid future confusion.

      --

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this paper, the authors investigate the nature of interactions between members of the TACC3-chTOG-clathrin-GTSE1 complex on the mitotic spindle. By using a series of HeLa cell lines that they have created by CRISPR/Cas9 editing to enable spatial manipulation (knocksideways) of either TACC3, chTOG, clathrin and GTSE1, they show that on spindle microtubules TACC3 and clathrin represent core complex members whereas chTOG and GTSE1 bind to them respectively but not to each other. Additionally, the authors find that the protein PIK3C2A, which has been implicated in this complex previously is in fact not a component of this complex in mitotic cells. The main advance of the paper in my opinion is the endogenous tagging of the proteins for knocksideways experiments since former experiments depended on RNAi silencing and expression of tagged proteins from plasmids, which introduced issues of protein silencing efficiency and plasmid overexpression problems. This approach seems to alleviate these problems, except in the case of chTOG which seems to be lethal in its homozygous variant.

      **Major comments:**

      I find the key conclusions regarding the localization of the components of the complex convincing. There are some issues regarding the specificity of antibodies in immunostaining experiments (Fig 3.) and the influence of mCherry-TACC3 expression on distorted localization of the complex prior to knocksideways. However, I think the general conclusion about which complex components (clathrin and TACC3) influence the localization of the other proteins in the complex (chTOG and GTSE1) stands. One thing that I miss from the paper is the data on the consequences on the spindle shape and morphology after knocksideways. I have noticed on images in both Figure 3 and Figure 4 that in some cases distribution of the signal seems to influence quite a bit the spindle morphology. Also, In Figure 3 I have noticed what seems to me a quite big variation in spindle size in tubulin signal in both untreated and rapamycin cells. Since authors have many of these images already, I believe it would be realistic, not costly and of additional value for the paper to provide more data on the consequences of the knocksideways experiments. Change of spindle size, tubulin intensity and DNA/kinetochore misalignment upon knocksideways would be helpful to appreciate more the findings of the paper. More so since the authors on more than one occasion find their motivation in the field of cancer research and spindle stability relation to it. Some data connection to this motivation would be of value. Experiments seem reproducible.

      The focus of the paper is on using the knocksideways methodology to understand a protein complex during mitosis, rather than looking at its function. We are not keen to do new experiments that are not part of the central message of the paper. However, the Reviewer is correct that we do already have a dataset that can be mined in the manner described.

      To address this point, we will analyze spindle size parameters and also the intensity of tubulin. Our analysis will be limited to the short timeframe of our experiments, but it should reveal or refute any changes in spindle structure that may result from loss of complex members.

      **Minor comments:**

      I have some problems with the clarity of Figure 3 and 4. For Figure 3. In Figure 3 plots on the right are a bit small and not easy to read. Some reorganization of the figure might be beneficial. In Figure 4 plots to the right are also too small to be clear. Also, I miss the number of cells (n) I can't see the number of individual arrows because of the size of graphs.

      Our aim was to compress a lot of information into a small space, while still showing some example primary data. All reviewers raised the same concern which tells us that we went too far towards “data visualization”.

      To address this point, we will rework these figures.

      Reviewer #2 (Significance (Required)):

      I find that the biggest significance of the paper is in the creation of new tools (cell lines) to study the localization of proteins TACC3, chTOG, clathrin and GTSE1. Cell lines where endogenous proteins can be delocalized rapidly will be of value for scientist working not only in mitosis but such as in the case of clathrin research, vesicle formation and trafficking or p53-dependent apoptosis in the case of GTSE1. In the field of mitosis it will surely help and speed up the research concerning the role of these proteins in spindle assembly and stability.

      Field of expertise: mitotic spindle

      --

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      This papers analyses the chTog/TACC3/clathrin/GTSE1 complex that crosslinks and stabilises microtubule bundles in the mitotic spindle. The authors have developed an elegant knock sideways approach to specifically analyse the effects of removing individual components of the complex from the spindle and study the effect this has on the other interactors. They report, based on these assays that the core of the complex is formed by TACC3 and Clathrin while GTSE1 and chTog are auxiliary interactors. They also refute previous evidence that this complex also incorporates PIK3C2A. Overall, this is an interesting study that distinguishes itself predominantly by its methodology. However, some of the reported results need more thorough analysis to allow convincing conclusions.

      **Major comments:**

      1)The knockside way method is the main highlight if this paper. Unlike previous studies by the PI, this time endogenous genes are tagged which is a key advance and allows much better interpretation of the results. I am not sure why the authors have chosen HeLa cells as their model here, given the messed up genome of these cells. A non-transformed cell line would have been preferable, but as a proof of principle study, I think HeLa are acceptable, and I wouldn't expect the authors to repeat all the experiment in another system.

      Figure 1,2 and S1 are describing and validating this approach in some detail, but this will require some more work.

      The authors state that gene targeting was validated using a combination of PCR, sequencing, Western blotting, but show only the results for westerns. PCR analysis that demonstrates homozygous or heterozygous gene targeting should be shown here.

      Another issue is the penetrance of the phenotypes induced by Rapamycin. The authors show nice data of the system working in individual cells but do not give us an idea if this happens in all cells. The localisation of the individual tagged genes should be quantified (ideally with line plots) in 50 randomly chosen mitotic cells with 3 repeats before and after rapamycin treatment. Moreover, the analysis of mitotic duration (Figure S1D) should be extended to include a plus Rapamycin cohort and this should be moved in the main Figure.

      If the system works only in a small proportion of cells, this should be clearly stated. I don't think this would prevent publication, but it is an important piece of information that is missing.

      The Reviewer raises two issues here.

      • PCR analysis should be shown. This issue was also partly raised by Reviewer 1. A summary of our PCR analysis was actually included in Table 1, since the analysis we did is pretty unwieldy. We agree though that presenting our evidence for homozygosity of the cell lines would be useful. To address this point, we will add more detail of the PCR and sequencing work done to validate these cell lines.
      • Does knocksideways happen in all cells? The answer to this depends on the transient expression of MitoTrap and sufficient application of rapamycin. We agree that this will be a useful piece of information to add to the manuscript. A related issue is whether knocksideways of complex members affects mitotic progression. We have established through other experiments that rapamycin application to wild-type cells alters mitotic progression, although application of Rapalog does not have this effect. Our plan to address these points is 1) to analyze the efficacy of knocksideways that readers can expect to achieve using these, or similar cells, and 2) analyze mitotic duration in rapalog-treated cells expressing a rapalog sensitive MitoTrap.

        2)Apart from a simple quantification of mitotic duration, I believe a more detailed mitotic phenotype analysis for each knock-side way gene, especially the homozygous targeted clones, should be included. This can involve more high-resolution live cell imaging of mitotic progression with SiR-DNA and GFP-tubulin, using the dark mitotrap.

      We don’t agree that such an analysis should be included. The focus of this paper is on using the knocksideways methodology to understand a protein complex during mitosis, and not looking at its function. There are several papers on the mitotic phenotypes of these genes probed using RNAi in different cellular systems (examples for chTOG: 10.1101/gad.245603; TACC3/clathrin: 10.1038/emboj.2011.15, 10.1242/jcs.075911, 10.1083/jcb.200911091, 10.1083/jcb.200911120; GTSE1: 10.1083/jcb.201606081). Moreover, our 2013 paper used knocksideways (with RNAi and overexpression) and has a detailed analysis of mitotic progression, microtubule stability, checkpoint activity and kinetochore motions (Cheeseman et al., 2013 doi: 10.1242/jcs.124834).

      New experiments that are not part of the central message of the paper and are unlikely to give new insight are not the best use of our revision efforts for this paper (especially during the pandemic). Having said this, Reviewer 2’s suggestion to use our existing dataset to investigate mitotic phenotypes, will largely answer Reviewer 3’s request.

      We will analyze spindle size parameters and also the intensity of tubulin. Our analysis will be limited to the short timeframe of our experiments, but it should reveal or refute any changes in spindle structure that result from the loss of complex members.

      3)Overall, the quantitative analysis in Figure 3 ,4 and 7 is not good enough and sometimes doesn't fully support the conclusions. In Figure 3,4 a convoluted way of demonstrating the change in localisation is shown and this panel is so small that is almost impossible to read. Also, there is no statistical analysis, and the sample size seems very small . At least 25 cells should be analysed here in 3 repeats. I would suggest to unify the quantification in the MS and use the line plots shown in Figure 5 and 6 and compare each protein before and after rapamycin addition. This is much easier to read and more convincing. The images of the cells panels can be moved to a supplement as they contain very little information. This would generate space to expand the size and depth of the quantitative analysis. Instead of Anova tests, I would recommend using a simple t-test comparing each condition to its relevant control since this is the only relevant comparison in the experiment. Statistical significance should be calculated for each experiment with sufficient sample size. It would also be better to show the individual data points from the three repeats in different colours so that the reproducibility between repeat can be judged.

      This type of statistical analysis should be uniformly done throughout the MS and also extended to Figure 7.

      The referee raises several issues here with our data presentation and statistical analysis.

      • Our aim in Figures 3 and 4 was to compress a lot of information into a small space, while still showing some example primary data. All reviewers raised the same concern about these figures which tells us that we went too far towards “data visualization”. To address this point, we will rework Figures 3 and 4 to provide more clear data presentation.
      • The Reviewer’s comments about statistical analysis however are not sound. First, it is incorrect to state that simple t-tests can be applied (this is a form of p-hacking). Correction for multiple testing must be done on these datasets. Second, the reviewer arbitrarily states numbers for cells and experimental repeats without considering the effect size or it seems, understanding the structure of the data that we have collected. Sample sizes are small but they are taken from many independent replicates. Third, and related to the previous point, the fixed and live cell data are structured differently which means that a uniform data presentation is not possible. The live data has a paired design and each cell is an independent replicate (with replicates done over several trials). The fixed data is unpaired and we have taken measures from several experiments (independent replicates). The point about applying statistical tests to the data is also made by Reviewer 1 and we will use appropriate tests (NHST or estimation statistics) as we re-work the figures.

        Reviewer #3 (Significance (Required)):

      In my opinion, the most interesting aspect of the MS is the methodology. Based on this, publication is justified and will be of interest to a wider audience. That is why a more detailed analysis of the penetrance of this manipulation across the cell population will be critical.

      The application of this method to analyse the composition of the TACC3/Clathrin complex on the spindle is the main biological advance, and the novel information is rather limited but not unimportant.

      Overall, if these results can be properly quantified I would recommend publication.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      This papers analyses the chTog/TACC3/clathrin/GTSE1 complex that crosslinks and stabilises microtubule bundles in the mitotic spindle. The authors have developed an elegant knock sideways approach to specifically analyse the effects of removing individual components of the complex from the spindle and study the effect this has on the other interactors. They report, based on these assays that the core of the complex is formed by TACC3 and Clathrin while GTSE1 and chTog are auxiliary interactors. They also refute previous evidence that this complex also incorporates PIK3C2A. Overall, this is an interesting study that distinguishes itself predominantly by its methodology. However, some of the reported results need more thorough analysis to allow convincing conclusions.

      Major comments:

      1)The knockside way method is the main highlight if this paper. Unlike previous studies by the PI, this time endogenous genes are tagged which is a key advance and allows much better interpretation of the results. I am not sure why the authors have chosen HeLa cells as their model here, given the messed up genome of these cells. A non-transformed cell line would have been preferable, but as a proof of principle study, I think HeLa are acceptable, and I wouldn't expect the authors to repeat all the experiment in another system. Figure 1,2 and S1 are describing and validating this approach in some detail, but this will require some more work. The authors state that gene targeting was validated using a combination of PCR, sequencing, Western blotting, but show only the results for westerns. PCR analysis that demonstrates homozygous or heterozygous gene targeting should be shown here. Another issue is the penetrance of the phenotypes induced by Rapamycin. The authors show nice data of the system working in individual cells but do not give us an idea if this happens in all cells. The localisation of the individual tagged genes should be quantified (ideally with line plots) in 50 randomly chosen mitotic cells with 3 repeats before and after rapamycin treatment. Moreover, the analysis of mitotic duration (Figure S1D) should be extended to include a plus Rapamycin cohort and this should be moved in the main Figure. If the system works only in a small proportion of cells, this should be clearly stated. I don't think this would prevent publication, but it is an important piece of information that is missing.

      2)Apart from a simple quantification of mitotic duration, I believe a more detailed mitotic phenotype analysis for each knock-side way gene, especially the homozygous targeted clones, should be included. This can involve more high-resolution live cell imaging of mitotic progression with SiR-DNA and GFP-tubulin, using the dark mitotrap.

      3)Overall, the quantitative analysis in Figure 3 ,4 and 7 is not good enough and sometimes doesn't fully support the conclusions. In Figure 3,4 a convoluted way of demonstrating the change in localisation is shown and this panel is so small that is almost impossible to read. Also, there is no statistical analysis, and the sample size seems very small . At least 25 cells should be analysed here in 3 repeats. I would suggest to unify the quantification in the MS and use the line plots shown in Figure 5 and 6 and compare each protein before and after rapamycin addition. This is much easier to read and more convincing. The images of the cells panels can be moved to a supplement as they contain very little information. This would generate space to expand the size and depth of the quantitative analysis. Instead of Anova tests, I would recommend using a simple t-test comparing each condition to its relevant control since this is the only relevant comparison in the experiment. Statistical significance should be calculated for each experiment with sufficient sample size. It would also be better to show the individual data points from the three repeats in different colours so that the reproducibility between repeat can be judged. This type of statistical analysis should be uniformly done throughout the MS and also extended to Figure 7.

      Significance

      In my opinion, the most interesting aspect of the MS is the methodology. Based on this, publication is justified and will be of interest to a wider audience. That is why a more detailed analysis of the penetrance of this manipulation across the cell population will be critical. The application of this method to analyse the composition of the TACC3/Clathrin complex on the spindle is the main biological advance, and the novel information is rather limited but not unimportant. Overall, if these results can be properly quantified I would recommend publication.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this paper, the authors investigate the nature of interactions between members of the TACC3-chTOG-clathrin-GTSE1 complex on the mitotic spindle. By using a series of HeLa cell lines that they have created by CRISPR/Cas9 editing to enable spatial manipulation (knocksideways) of either TACC3, chTOG, clathrin and GTSE1, they show that on spindle microtubules TACC3 and clathrin represent core complex members whereas chTOG and GTSE1 bind to them respectively but not to each other. Additionally, the authors find that the protein PIK3C2A, which has been implicated in this complex previously is in fact not a component of this complex in mitotic cells. The main advance of the paper in my opinion is the endogenous tagging of the proteins for knocksideways experiments since former experiments depended on RNAi silencing and expression of tagged proteins from plasmids, which introduced issues of protein silencing efficiency and plasmid overexpression problems. This approach seems to alleviate these problems, except in the case of chTOG which seems to be lethal in its homozygous variant.

      Major comments:

      I find the key conclusions regarding the localization of the components of the complex convincing. There are some issues regarding the specificity of antibodies in immunostaining experiments (Fig 3.) and the influence of mCherry-TACC3 expression on distorted localization of the complex prior to knocksideways. However, I think the general conclusion about which complex components (clathrin and TACC3) influence the localization of the other proteins in the complex (chTOG and GTSE1) stands. One thing that I miss from the paper is the data on the consequences on the spindle shape and morphology after knocksideways. I have noticed on images in both Figure 3 and Figure 4 that in some cases distribution of the signal seems to influence quite a bit the spindle morphology. Also, In Figure 3 I have noticed what seems to me a quite big variation in spindle size in tubulin signal in both untreated and rapamycin cells. Since authors have many of these images already, I believe it would be realistic, not costly and of additional value for the paper to provide more data on the consequences of the knocksideways experiments. Change of spindle size, tubulin intensity and DNA/kinetochore misalignment upon knocksideways would be helpful to appreciate more the findings of the paper. More so since the authors on more than one occasion find their motivation in the field of cancer research and spindle stability relation to it. Some data connection to this motivation would be of value. Experiments seem reproducible.

      Minor comments:

      I have some problems with the clarity of Figure 3 and 4. For Figure 3. In Figure 3 plots on the right are a bit small and not easy to read. Some reorganization of the figure might be beneficial. In Figure 4 plots to the right are also too small to be clear. Also, I miss the number of cells (n) I can't see the number of individual arrows because of the size of graphs.

      Significance

      I find that the biggest significance of the paper is in the creation of new tools (cell lines) to study the localization of proteins TACC3, chTOG, clathrin and GTSE1. Cell lines where endogenous proteins can be delocalized rapidly will be of value for scientist working not only in mitosis but such as in the case of clathrin research, vesicle formation and trafficking or p53-dependent apoptosis in the case of GTSE1. In the field of mitosis it will surely help and speed up the research concerning the role of these proteins in spindle assembly and stability.

      Field of expertise: mitotic spindle

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Here Ryan et al. have used localization analysis following induced rapid relocalization of endogenous proteins to investigate the composition and recruitment hierarchy of a clathrin-TACC3-based spindle complex that is important for microtubule organization and stability. The authors generate different HeLa cell lines, each with one of four complex members (TACC3, CLTA, chTOG and GTSE1) endogenously tagged with FKBP-GFP via Cas9-mediated editing. This tag allows rapid recruitment to the mitochondria upon rapamycin addition ("knocksideways"). They ultimately quantify each of the 4 components' localization to the spindle following knocksideways of each component using fluorescently-tagged transfected constructs. The authors' interpretation of the results of this analysis are summarized in the last model figure, in which a core MT-binding complex of clathrin and TACC3 recruit the ancillary components GTSE1 and chTOG. In addition, the authors investigate the contribution of individual clathrin-binding LIDL motifs in GTSE1 to the recruitment of clathrin and GTSE1 to spindles. Their findings here largely agree with and confirm a recent report regarding the contribution of these motifs to GTSE1 recruitment to the spindle. They further analyzed GTSE1 fragments for interphase and mitotic microtubule localization, and identified a second region of GTSE1 required (but not sufficient) for spindle localization. Finally, the authors report that PIK3C2A is not part of this complex, contradicting (correcting) a previously published study.

      Major comments:

      1.The chTOG-FKBP-GFP cell line the authors generate has only a small fraction of chTOG tagged, and thus should not be used for any conclusions about protein localization dependency on chTOG. Because they were unable to construct a HeLa cell line with all copies tagged, the authors expect that the homozygous knock-in of chTOG-FKBP-GFP is lethal, and thus their experience is appropriate to report. However, the authors should not use this cell line alone to make statements about chTOG dependency. They would have to use similar localization analysis, but after another method to disrupt chTOG (as a second-best approach), such as RNAi. In fact, they have reported this in a previous publication (Booth et al 2011). However, the result was different. There, loss of chTOG resulted in reduced clathrin on spindles, suggesting it may stabilize or help recruit the complex. Alternatively, they could remove their chTOG data, but this would compromise the "comprehensive" nature of the work.

      2.The authors initially analyze complex member localization after knocksideways experiments by antibody staining, which has the advantage of analyzing endogenous proteins (versus the later transfected fluorescent constructs). Setting aside potential artefacts from fixation, this would seem to be a better method for controlled analysis to take advantage of their setup (short of generating stable cell lines with second proteins endogenously tagged in a second color - a huge undertaking). The authors conclude that antibody specificity problems confounded their analysis and explained unusual results. However, I think is worth investing a little more effort to sort this out, rather than bringing doubt to the whole data set. Verifying and then using another antibody for chTOG localization would be informative. Of course, the negative control should not be their chTOG-FKBP-GFP line, as it does not relocalize most of chTOG.

      In the case of GTSE1, an alternative explanation to antibody specificity issues would be that the GTSE1-FKBP-GFP cell line is not in fact homozygously tagged. Given the low expression levels on the western provided, and the detection of GTSE1 on the spindle in the induced GTSE1-FKBP-GFP cell line (but not TACC3-FKBP-GFP), it seems plausible that an untagged copy remains. If there are multiple copies of GTSE1 in Hela cells, one untagged copy could represent a small fraction of total GTSE1. This should thus be ruled out. GTSE1 clones should be analyzed with more protein extracts loaded - dilutions of the extracts can determine the sensitivity of the blot to lower protein levels. In addition, sequencing of genomic DNA can reveal a small percentage with different reads.

      3.There is a lot of data contained in the small graphs summarizing quantification of localization in Figs 3 and 4. They would be more accessible to the reader if they were larger and/or an "example" of the chart with labels was present explaining it (essentially what is in the figure legends). Furthermore, there is no statistical test applied to this data that I see. This is needed. How do authors determine whether there is an "effect"?

      Minor issues:

      1.The GTSE1 constructs used for mutation and localization analysis are 720 amino acids long. A recent study analyzing similar mutations uses a 739 amino acid construct (Rondelet et al 2020). The latter is the predominant transcript in NCBI and Ensembl databases. It appears the construct used by the authors omits the first 19 a.a.. I do not think using the truncated transcript affects conclusions of the manuscript, but it could generate confusion when identifying residues based on a.a.#s of mutant constructs (Fig 6). This should be somehow clarified.

      2.The labeling of constructs in Fig 6C/D is confusing, and appears shifted by eye at places. Please relabel this more clearly.

      The recommended new experimental data (Analysis complex member levels on spindles after full perturbation of spindle chTOG; new chTOG antibody stainings in the FKBP lines; reanalysis of GTSE1 DNA/protein in GTSE1-FKBP line) should only require a new antibody/siRNA, plus a few weeks time to repeat the analyses already in the paper with new reagents.

      Significance

      While multiple individual components of this complex have been previously characterized, the structure and nature of the complex formation and its recruitment to microtubules/spindles remains a complex problem that has yet to be solved.

      Overall this study represents a comprehensive localization-dependency analysis of the Clathrin-TACC3 based spindle complex using a consistent methodology. Although several of the conclusions of the findings echo previous reports, some of the previous literature is contradictory within itself as well as with the conclusions here. Analyzing all components with a single, rapid-perturbation technique thus has great value to present a clear data set, given that the experimental setup conditions and analysis are solid (a goal to which the majority of comments refer).

      Beyond the complex localization/recruitment analysis, two novel findings of this study that emerge are:

      a)GTSE1 contains a second, separate protein region, distinct from the clathrin-binding motifs that is required for its localization to the spindle, and most likely a microtubule-interaction site. This suggests that GTSE1 recruitment to the spindle is more complex than previously reported.

      b)PI3KC2A, which has been reported previously to be a stabilizing member of this complex, is in fact not a member, nor localizes to spindles, nor displays a mitotic defect after loss. This is important conclusion to be made as it would correct the literature, and avoid future confusion.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful for the insightful, constructive and very positive reviews provide by the three reviewers. Please find responses to each of the reviewer comments below.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors study proteins localised to the apical end of the highly polarised parasites causing Toxoplasmosis and malaria. They find new proteins using BioID and examine the localisation of these along with recently identified proteins in the two different parasites. They key question they address is whether there is a conservation of the apical components in these distantly related parasites as well as in some even more distantly related organisms. This is an important question as the apical part comprises many essential proteins of invasion of host cells and shows a unique structure that defines the apicomplexans as a group. The apical structure can be highly elaborate such as in T. gondii and less elaborate as in P. falciparum. The authors now show that there is a large conservation between the species in the protein makeup of the apical end. The experiments are well performed, displayed and discussed and there is no doubt about the validity of the presented results. The text is eloquently written, if at times a bit wordy.

      My only main suggestion would be to possibly add data on gene disruption of the two candidates (0310700 and 1216300) that are not detected in blood stage parasites but in the insect stages. A deletion of these should be technically straightforward and would show whether the proteins are important to the parasite. Likely not all of the now many proteins are essential for the parasites but these are good candidates to rapidly investigate. But showing a functional impact might convince editors at certain journals.

      Authors’ response: The central aim of this study was to ask if the molecular composition of the conoid complex is conserved across Apicomplexa. Functional dissection of proteins is part of an exciting set of subsequent questions and studies that will now follow by us and others. However, careful and thorough phenotyping of gene disruptions is not trivial work, would be most informative to perform in both Toxoplasma and Plasmodium, and is therefore beyond the scope of this project. Regarding the two proteins suggested by this reviewer for follow-up work and the question of ‘essentiality’, that the proteins have not been lost during parasite selection through evolution is clear evidence of their relevance to the biology of Plasmodium.

      Other suggestions in chronological order (line numbers would have helped)

      title: maybe write 'conoid complex proteome'

      Authors’ response: while we initially thought that this change would be suitable, given that the subsequent part of the title is ‘reveals a cryptic conoid feature’ we think it is clearer and more logical to leave this title in its original form. The conoid complex includes the apical polar rings, and these are not considered to be cryptic or previously unrecognised, only the conoid. While our study confirms that there is conservation across all proteome components of the conoid complex, this is secondary to the primary question of this study.

      abstract: not sure about the use of the words instrument and substructures

      Authors’ response: we believe that the use of ‘instrument’ is an appropriate analogy of a tool and not different from the use of ‘machine’ and ‘machinery’ that is widely used in molecular and cellular biology. Similarly, ‘substructure’ acknowledges that within recognised structures, such as the conoid, there is further specific organisation such as the conoid base or apex.

      page 2 last lines: is tubulin monomeric or polymerized?

      Authors’ response: to specify the polymerized state of tubulin as mentioned here the text has been changed to ‘the presence of tubulin polymers’.

      page 3 name protein talked about in 9th line

      Authors’ response: we have now named this protein (RNG2) as suggested.

      third paragraph: mention previous proteomics studies e.g. from Ke Hu (mentioned later in discussion)

      Authors’ response: We feel that it is more appropriate to leave the discussion of the Hu et al (2006) proteomics study, along with various subsequent approaches used in pursuit of discovering conoid-associated proteins, to the discussion as currently occurs. In the introduction we seek to efficiently inform the reader of the current state of knowledge that makes the value and nature of the questions that we have asked in this study apparent. But we do give full credit and evaluation of previous studies in the discussion which we think is the most appropriate place for this.

      first paragraph or results could go into introduction

      Authors’ response: The first paragraph of the Results contains specific detail of just one aspect of this study, the use of hyperLOPIT. This is relevant to the new analysis that we have made of the hyperLOPIT data in this study. We, therefore, believe that it is most appropriately presented here in the Results in association with the new analyses we described. Our aim is that the Introduction is succinct and serves the entire study.

      page 4: add reference after BioID

      Authors’ response: reference added as suggested

      page 5: add definitions of the conoid; what technique was used to report YFP-SAS6?

      Authors’ response: It is unclear what this reviewer is requesting with respect to definitions of the conoid on this page. Nevertheless, we have now included a thorough definition of the conoid based on the original electron microscopy studies (fourth paragraph of the Introduction).

      With respect to the technique used to report on YFP-tagged SAS6 in the de Leon et al 2013 study, we now include fuller description of this previous study as follows:

      ‘The fluorescence imaging used in the de Leon et al study was limited to lower resolution widefield microscopy. Immuno-TEM was also used, however, contrary to their conclusions, did show YFP presence throughout transverse and oblique sections of the conoid consistent with our detection of SAS6L throughout the conoid body.’

      page 7: 'showed similar localisation' instead of 'phenocopied'?; add reference after ookinete stage; add expression levels from PlasmoDB to the Table 1 data at least for merozoites, ookinetes and sporozoites or add separate table for the 9 proteins in supplement

      Authors’ response: ‘phenocopied’ replaced, as suggested. Reference added after ookinete stage, as suggested.

      As requested, we have complied available expression data for the Plasmodium proteins throughout the different zoite stages and will include these data as supplemental material in our subsequent revision.

      Discussion: Maybe discuss that the conoid complex is a cytoskeletal structure and that the other cytoskeletons (actin, microtubules, subpellicular network) also differ between the species investigated in their composition and overall architecture

      Authors’ response: These are reasonable suggested analogies and we will introduce them in the subsequent revision.

      page 9: at least two proteins could be deleted as they seem to not confer any growth defect on blood stages (see main comment)

      Authors’ response: This reviewer has not linked this comment to a specific statement on page 9, however, we are cautious not to interpret lack of observed growth defects in experimental scenarios with unimportant or irrelevant proteins. Maintenance, through natural selection and evolution, of proteins of a structure indicate that they are selectively advantageous and of functional relevance. The two proteins in question are not expressed in the blood stage, so one wouldn’t expect their deletion to have consequence in this stage.

      Apart from classic TEM images also Cryo EM data is available for apex of merozoite and sporozoite. Worth to discuss?

      Authors’ response: According to this review’s subsequent suggestion (below), we are now preparing a schematic for the subsequent revision of each of the zoite stages of Plasmodium and these draw on Cryo EM tomography data.

      Add and discuss the recent work from Curr Biol and EMBO J of the Yuan lab on ookinete formation?

      Authors’ response: These two reports are excellent studies of the polarised development of the cell pellicle during ookinete formation and control of gliding initiation, but don’t specifically related to the conoid complex structures that are the subject of our study. We, therefore, do not see a logical place to include discussion of these works.

      Reviewer #2 (Significance (Required)):

      The paper provides a conceptual advance over previous data as it shows clearly a high level of conservation of the protein components of the conoid complex. It could introduce a new terminology for these important apical structure of Apicomplexan parasites and provides a good basis to dissect the molecular functions.

      Authors’ response: We appreciate this reviewer recognising this opportune point in time to more clearly define the terminology applied to these apical structures so that they can be more clearly and easily compared between taxa. We will use the suggested schematic figure (see comment below) that is now in preparation as a basis and guide for a refined nomenclature based on precedent in the literature.

      As it stands all scientists investigating Plasmodium and Toxoplasma invasion of host cells will be highly interested in this study, most scientists researching apicomplexan organisms should be and some evolutionary scientists will be interested in this study.

      Key papers in the field are the discovery of the Toxoplasma conoid as a highly twisted microtubule-like structure (Hu et al., JCB 2002; doi: 10.1083/jcb.200112086) the first description of an apical proteome (Hu et al., PLoS Path 2006; 10.1371/journal.ppat.0020013), the description of a tilted arrangement of the rings in Plasmodium versus Toxoplasma (Kudryashev et al., Cell Microbiol 2012; doi: 10.1111/j.1462-5822.2012.01836.x) and the discovery of apical located proteins that are essential for conoid formation (Tosetti et al., eLife 2020; 10.7554/eLife.56635) to name a few.

      If intended for a broader audience, a cartoon of a conoid complex across the different species investigated and discussed here would help for visual guidance highlighting the similarities and differences

      Authors’ response: This is a good suggestion and we are presently preparing a schematic of all stages studied and supporting this with electron microscopy.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this work, Koreny et al. characterized the localization of a new collection of conoid proteins in Toxoplasma gondii as well as in several different stages of Plasmodium berghei. The authors discovered that these proteins are located in several distinct substructures in Plasmodium and are expressed in a stage-specific manner. The data are of high quality, well‐organized, and well presented. The paper is well written. The introduction, in particular, was a pleasure to read. This reviewer (Ke Hu) does not have any new experiments to suggest.

      However, while the authors present LOPIT+BIOID as a powerful approach to identify conoid proteins, implying that it is more reliable than previously published approaches (see below), the manuscript includes no data to show what the false positive or false negative rate is with the current approach, nor any estimate of how many conoid proteins were missed entirely.

      Authors’ response: In our validation of putative conoid-associated proteins identified by the hyperLOPIT+BioID approach we reporter-tagged 18 proteins to resolve their cellular location by microscopy. All 18 were verified as being located at the site of the conoid. So, by this measure there were no false positives. The veracity of the hyperLOPIT data was also confirmed across other cell compartments in our report where 62 proteins were reporter-tagged from which there were no false positive assignments of cell location (Barylyuk et al., 2020, Cell Host & Microbe, in press:doi:10.1016/j.chom.2020.09.011), bioRixv: https://doi.org/10.1101/2020 .04.23.057125).

      Estimating false negatives is more difficult, but we know that these would occur as for any mass spectrometry-based detection technique. However, we have not claimed to have been exhaustive, nor was this required to answer our central question of are there conserved conoid-associated proteins throughout Apicomplexa? To address this question, we required a good sample of proteins, and the methods that we have employed provided this.

      Page 7: "Previous identification of conoid complex proteins used methods including subcellular enrichment, correlation of mRNA expression, and proximity tagging (BioID) (Hu et al. 2006; Long, Anthony, et al. 2017; Long, Brown, et al. 2017). Amongst these datasets many components have been identified, although often with a high false positive rate. We have found the hyperLOPIT strategy to be a powerful approach for enriching in proteins specific to the apex of the cell, and BioID has further refined identification of proteins specific to the conoid complex region."

      The authors should state whether the candidate proteins were chosen in an unbiased way or not.

      Authors’ response: Candidate proteins selected for validation by microscopy were not biased for any known likelihood of being associated with the conoid, other than our proteomics data what we were seeking to test. However, we did preference proteins with the following traits, 1) proteins with strong corresponding gene knockout fitness phenotypes from published studies, 2) proteins with some evidence of conserved functional domains, and 3) genes with orthologues found in Plasmodium spp. and other apicomplexans. These traits were chosen with future functional studies in mind where proteins might be more informative of conoid-related functions and relevance in other apicomplexans. All validated proteins, however, were otherwise uncharacterised and, therefore, were not knowingly biased for more likely conoid-association over others discovered by our proteomics approach. We now include the following statement.

      “All proteins selected for validation were previously uncharacterised and with no a priori reason to be identified as conoid-associated other than our proteomics data.”

      If so, how many proteins were localized to the conoid and how many were not?

      Authors’ response: as stated above, we observed no false positives from the sample of 18 protein locations verified by microscopy.

      Related to this, the majority (14 out of 20) of the conoid proteins identified by LOPIT+BIOID in this paper were previously identified as conoid candidate proteins in Hu et al's 2006 paper, based on the number of peptides retrieved from the conoid enriched vs depleted fractions. Those data (see below) have been available from ToxoDB for many years and should be acknowledged.

      Accession# - conoid enriched : conoid depleted (from Hu et al. 2006)

      222350 - 2:0

      274120 - 3:0

      291880 - 1:0

      301420 - 3:1

      246720 - 4:0

      258090 - 10:0

      266630 - 8:1

      208340 - 4:2

      253600 - 1:0

      306350 - not found

      250840 - 1:0

      292120 - not found

      219070 - not found

      274160 - not found

      320030 - 7:1

      227000 - 10:0

      278780 - not found

      284620 - not found

      295420 - 6:0

      297180 - 4:0

      Authors’ response: Proteomic methods and mass spectrometry have experienced revolutionary advances since this 2006 study was conducted. These include improvements in both sensitivity and quantitation accuracy. The Hu et al 2006 study provided an exciting first step towards conoid protein discovery. However, by their original estimation, at least 35% of their putative conoid-specific proteins were identifiable as false positives (e.g. ribosomal proteins) and this estimate could not account for the majority of uncharacterised proteins whose potential for false positive attribution to the conoid was untested. From almost 300 proteins, this study only validated four as associated with the conoid. The further proteins listed above were not validated as conoid proteins in the Hu et al study and, therefore, could not be distinguished from the many false positives reported in their work. In our Table 1, we have acknowledged the Hu et al study for the select proteins that they established as conoid proteins in their study.

      To further assess the utility of this 2006 conoid-enriched proteome we sorted the Hu et al detected proteins on our full hyperLOPIT assignments. Of the proteins that were reported by Hu et al as either exclusive to the conoid-enriched fraction or enriched by at least 2-fold over the conoid-depleted fraction, 15% were assigned to the apical 1 and 2 clusters (representing the relevant compartments to the conoid complex). Thus, according to the hyperLOPIT data these represent the true positives found in this study and 13 of these proteins were independently validated as conoid-associated by us. Significantly, however, 85% of the conoid-exclusive and conoid-enriched proteins from Hu et al (2006) were allocated to a non-apical location with 99% probability by hyperLOPIT, and, during our validation of 62 assignments we verified the alternative location of eight of these. False positives, therefore, greatly outnumbered true positives in this earlier dataset. This high rate of false positives in subcellular isolation proteomics is typical of the challenges that this method faces, and this was the rationale for and strength of the alternative hyperLOPIT approach. Given the overall relatively low level of conoid specificity in the earlier work we do not think that there is value in making specific protein-by-protein reference to it.

      Reviewer #3 (Significance (Required)):

      see above

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      This manuscript details the further use of the hyperplexed Localisation of Organelle Proteins by Isotope Tagging (hyperLOPIT) that the group has previous published using T. gondii tachyzoites by combining this with BioID and super-resolution microscopy in order to uncover new proteins that form part of a structurally known and functionally elusive conoid. The authors conclusively identified new proteins that localise to the conoid structure in T. gondii and also excitingly showed that not only is this structure found in all invasive forms of plasmodium (using the P. berghei model) but there also is a different molecular make up in the blood stage merozoites which have a slightly reduced number of proteins (or possible as yet unknown alternatives) compared to ookinetes and sporozoite conoid structures. This study is scientifically sound and the conclusions reached are well supported by the results presented.

      **Major Comments:** No major comments

      **Minor Comments:**

      1)While both the introduction and discussion and well written and detailed they could both be a little more concise.

      Authors’ response: We take this as a style recommendation, but we note that the other reviewers commented on the text’s “eloquence” and that the introduction in particular was a “pleasure to read”. We take these comments as votes of confidence in the current form.

      2)Selection of the 5 new genes in Tg to be tagged (top pg 5) it was not clear as to the selection criteria for these 5.

      Authors’ response: Please see the same query, and response with modified text, made by Reviewer #3.

      This also leads to the second part of this question where there appears to be some genes missing from Table 1 and Table S1, specifically those found in both SAS6L and RNG2 BioID. It was mentioned that 25 were identified in both SAS6L and RNG2 BioID. In Table 1 (there are 23) there is no mention of 223790, 281650, 224700, and 293540 but they are in the Table S1 (assuming these 4 are not selected in this study for tagging) but in table S1 (there are 25 listed) 216080 (AKMT) and 234250 (CIP1) that are in the Table 1 as being identified in both SAS6L and RNG2 BioID are absent from the Table S1 does this mean there are actually 27 or was the indication of identified in both SAS6L and RNG2 BioID for 216080 (AKMT) and 234250 (CIP1) in Table 1 a mistake?

      Authors’ response: This reviewer has overlooked that Table 1 reports on all currently known conoid associated proteins, including those not detected in the hyperLOPIT data but reported in the literature, whereas Table S1 is exclusively those proteins detected and assigned as ‘apical’ by hyperLOPIT. The reported BioID-detection for each protein is then made within this framework. Thus, the proteins that occur in only one or the other table do so because they don’t satisfy these two sets of criteria. We have rechecked the numbers reported in the text and they are correct.

      3)Table 1: There is the fitness score for Pf orthologues but no mention of fitness in Pb (the model used) from the PlasmoGEM screens, considering the authors use the Pb model it would be of interest to add this in the table.

      Authors’ response: The Plasmodium berghei PlasmoGEM gene disruption screen were much more limited in number than that for P. falciparum. Consequently, fitness scores were available for only two of the Plasmodium orthologues for which we have location data. We, therefore, thought it was of limited utility to include these data in Table 1, and these data are in the public domain should a reader seek them.

      4)Figure 2: The image for localisation with SAS6L for 291880 and 258090 appear to be missing.

      Authors’ response: Initially we did not make the separate transgenic cell lines for each protein with both the SAS6L and RNG2 markers. This was because one marker was usually sufficient to resolve the relative location of the protein of interest. However, given this reviewer’s comment and the potential for some extra information to be recovered by using both markers, we have now generated all cell lines necessary for this analysis. We are presently completing the imaging of these new cell lines and these data will be included in the subsequent revision.

      5)Figure 3: It is unclear why both SAS6L and RNG2 are not used for all localisations shown (this could be clarified in the text)

      Authors’ response: see previous comment.

      6)Figure 5: It is a shame only 7 of the 9 plasmodium orthologues were included in the super resolution as there is only 2 more to have the complete set.

      Authors’ response: Ideally, we would have been able to achieve this but, the restrictions imposed by the COVID-19 disruption to laboratory access and activities ultimately slightly limited these analyses. However, to answer the central question of whether there is conservation of the Toxoplasma conoid proteome in Plasmodium it was not necessary to perform super resolution imaging for all of these proteins. The major outcome of this study, therefore, is not affected by this.

      7)Figure 6: As with Figure 5 it would be better if more were included in the super-resolution images in this sporozoite stage.

      Authors’ response: Same response as above. Generation of sporozoites requires passage through the mosquito vector so this is even more resource-intensive than generation of ookinetes that can be differentiated in vitro from mouse-derived parasites. Again, the answers to the central questions posed by this study do not require these further, high resolution, data.

      8)Figure 7: This would be improved with at least a selection (or even all 6) to have the super-resolution images (possibly even with free merozoites)

      Authors’ response: We did apply 3D-SIM imaging to fixed merozoites, however, unlike ookinetes and sporozoites, the imaged fixed material was inferior to the live cell GFP imaging that we have included. This likely reflects the poorer fixation properties of Plasmodium merozoites that is a challenge of these cell forms that is widely experienced by Plasmodium researchers. We do not have access to a 3D-SIM microscope within a containment laboratory necessary for handling viable parasites, therefore, could not attempt to image live material with this instrument. Again, the answers to the central questions posed by this study do not require these further, high resolution, data

      9)As there are numerous new protein identified in 2 different parasites and with the composition of the conoid differing at different stages it would be beneficial to have some sort of schematic model of the apical complex in Tg and Pb indicating where each new protein localises

      Authors’ response: In response to this reviewer, and reviewer #2’s suggestion, we are now preparing schematic models of the apices of all of the relevant organism stages.

      Reviewer #4 (Significance (Required)):

      The authors have combined expert mass spectrometry and super-resolution microscopy to identify new components of the conoid in Tg and added to the knowledge that will help to uncover the function of the structure. But perhaps the most significant is the conclusive identification of the conoid in all 3 invasive stages of the plasmodium parasite. Until now it was widely accepted that the conoid was missing in plasmodium and to uncover multiple proteins that appear to make up and constitute this structure in Plasmodium is highly significant and clear of interest to the Apicomplexean field. Furthermore the suggestion that the conoid differs in the molecular makeup within Plasmodium depending on stage is very intriguing and clearly of interest. This paper expertly combined cutting-edge proteomic and microscopy to identify the conoid in Plasmodium. This manuscript would have a broad readership in parasitology, proteomics, and cell biology

      Our expertise is largely in molecular parasitology and microscopy

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript details the further use of the hyperplexed Localisation of Organelle Proteins by Isotope Tagging (hyperLOPIT) that the group has previous published using T. gondii tachyzoites by combining this with BioID and super-resolution microscopy in order to uncover new proteins that form part of a structurally known and functionally elusive conoid. The authors conclusively identified new proteins that localise to the conoid structure in T. gondii and also excitingly showed that not only is this structure found in all invasive forms of plasmodium (using the P. berghei model) but there also is a different molecular make up in the blood stage merozoites which have a slightly reduced number of proteins (or possible as yet unknown alternatives) compared to ookinetes and sporozoite conoid structures. This study is scientifically sound and the conclusions reached are well supported by the results presented.

      Major Comments: No major comments

      Minor Comments:

      1)While both the introduction and discussion and well written and detailed they could both be a little more concise.

      2)Selection of the 5 new genes in Tg to be tagged (top pg 5) it was not clear as to the selection criteria for these 5. This also leads to the second part of this question where there appears to be some genes missing from Table 1 and Table S1, specifically those found in both SAS6L and RNG2 BioID. It was mentioned that 25 were identified in both SAS6L and RNG2 BioID. In Table 1 (there are 23) there is no mention of 223790, 281650, 224700, and 293540 but they are in the Table S1 (assuming these 4 are not selected in this study for tagging) but in table S1 (there are 25 listed) 216080 (AKMT) and 234250 (CIP1) that are in the Table 1 as being identified in both SAS6L and RNG2 BioID are absent from the Table S1 does this mean there are actually 27 or was the indication of identified in both SAS6L and RNG2 BioID for 216080 (AKMT) and 234250 (CIP1) in Table 1 a mistake?

      3)Table 1: There is the fitness score for Pf orthologues but no mention of fitness in Pb (the model used) from the PlasmoGEM screens, considering the authors use the Pb model it would be of interest to add this in the table.

      4)Figure 2: The image for localisation with SAS6L for 291880 and 258090 appear to be missing.

      5)Figure 3: It is unclear why both SAS6L and RNG2 are not used for all localisations shown (this could be clarified in the text)

      6)Figure 5: It is a shame only 7 of the 9 plasmodium orthologues were included in the super resolution as there is only 2 more to have the complete set.

      7)Figure 6: As with Figure 5 it would be better if more were included in the super-resolution images in this sporozoite stage.

      8)Figure 7: This would be improved with at least a selection (or even all 6) to have the super-resolution images (possibly even with free merozoites)

      9)As there are numerous new protein identified in 2 different parasites and with the composition of the conoid differing at different stages it would be beneficial to have some sort of schematic model of the apical complex in Tg and Pb indicating where each new protein localises

      Significance

      The authors have combined expert mass spectrometry and super-resolution microscopy to identify new components of the conoid in Tg and added to the knowledge that will help to uncover the function of the structure. But perhaps the most significant is the conclusive identification of the conoid in all 3 invasive stages of the plasmodium parasite. Until now it was widely accepted that the conoid was missing in plasmodium and to uncover multiple proteins that appear to make up and constitute this structure in Plasmodium is highly significant and clear of interest to the Apicomplexean field. Furthermore the suggestion that the conoid differs in the molecular makeup within Plasmodium depending on stage is very intriguing and clearly of interest. This paper expertly combined cutting-edge proteomic and microscopy to identify the conoid in Plasmodium. This manuscript would have a broad readership in parasitology, proteomics, and cell biology

      Our expertise is largely in molecular parasitology and microscopy

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this work, Koreny et al. characterized the localization of a new collection of conoid proteins in Toxoplasma gondii as well as in several different stages of Plasmodium berghei. The authors discovered that these proteins are located in several distinct substructures in Plasmodium and are expressed in a stage-specific manner. The data are of high quality, well‐organized, and well presented. The paper is well written. The introduction, in particular, was a pleasure to read. This reviewer (Ke Hu) does not have any new experiments to suggest.

      However, while the authors present LOPIT+BIOID as a powerful approach to identify conoid proteins, implying that it is more reliable than previously published approaches (see below), the manuscript includes no data to show what the false positive or false negative rate is with the current approach, nor any estimate of how many conoid proteins were missed entirely.

      Page 7: "Previous identification of conoid complex proteins used methods including subcellular enrichment, correlation of mRNA expression, and proximity tagging (BioID) (Hu et al. 2006; Long, Anthony, et al. 2017; Long, Brown, et al. 2017). Amongst these datasets many components have been identified, although often with a high false positive rate. We have found the hyperLOPIT strategy to be a powerful approach for enriching in proteins specific to the apex of the cell, and BioID has further refined identification of proteins specific to the conoid complex region."

      The authors should state whether the candidate proteins were chosen in an unbiased way or not. If so, how many proteins were localized to the conoid and how many were not? Related to this, the majority (14 out of 20) of the conoid proteins identified by LOPIT+BIOID in this paper were previously identified as conoid candidate proteins in Hu et al's 2006 paper, based on the number of peptides retrieved from the conoid enriched vs depleted fractions. Those data (see below) have been available from ToxoDB for many years and should be acknowledged.

      Accession# - conoid enriched : conoid depleted (from Hu et al. 2006)

      222350 - 2:0

      274120 - 3:0

      291880 - 1:0

      301420 - 3:1

      246720 - 4:0

      258090 - 10:0

      266630 - 8:1

      208340 - 4:2

      253600 - 1:0

      306350 - not found

      250840 - 1:0

      292120 - not found

      219070 - not found

      274160 - not found

      320030 - 7:1

      227000 - 10:0

      278780 - not found

      284620 - not found

      295420 - 6:0

      297180 - 4:0

      Significance

      see above

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors study proteins localised to the apical end of the highly polarised parasites causing Toxoplasmosis and malaria. They find new proteins using BioID and examine the localisation of these along with recently identified proteins in the two different parasites. They key question they address is whether there is a conservation of the apical components in these distantly related parasites as well as in some even more distantly related organisms. This is an important question as the apical part comprises many essential proteins of invasion of host cells and shows a unique structure that defines the apicomplexans as a group. The apical structure can be highly elaborate such as in T. gondii and less elaborate as in P. falciparum. The authors now show that there is a large conservation between the species in the protein makeup of the apical end. The experiments are well performed, displayed and discussed and there is no doubt about the validity of the presented results. The text is eloquently written, if at times a bit wordy. My only main suggestion would be to possibly add data on gene disruption of the two candidates (0310700 and 1216300) that are not detected in blood stage parasites but in the insect stages. A deletion of these should be technically straightforward and would show whether the proteins are important to the parasite. Likely not all of the now many proteins are essential for the parasites but these are good candidates to rapidly investigate. But showing a functional impact might convince editors at certain journals.

      Other suggestions in chronological order (line numbers would have helped)

      title: maybe write 'conoid complex proteome'

      abstract: not sure about the use of the words instrument and substructures

      page 2 last lines: is tubulin monomeric or polymerized?

      page 3 name protein talked about in 9th line

      third paragraph: mention previous proteomics studies e.g. from Ke Hu (mentioned later in discussion)

      first paragraph or results could go into introduction

      page 4: add reference after BioID

      page 5: add definitions of the conoid; what technique was used to report YFP-SAS6?

      page 7: 'showed similar localisation' instead of 'phenocopied'?; add reference after ookinete stage; add expression levels from PlasmoDB to the Table 1 data at least for merozoites, ookinetes and sporozoites or add separate table for the 9 proteins in supplement

      Discussion: Maybe discuss that the conoid complex is a cytoskeletal structure and that the other cytoskeletons (actin, microtubules, subpellicular network) also differ between the species investigated in their composition and overall architecture

      page 9: at least two proteins could be deleted as they seem to not confer any growth defect on blood stages (see main comment)

      Apart from classic TEM images also Cryo EM data is available for apex of merozoite and sporozoite. Worth to discuss?

      Add and discuss the recent work from Curr Biol and EMBO J of the Yuan lab on ookinete formation?

      Significance

      The paper provides a conceptual advance over previous data as it shows clearly a high level of conservation of the protein components of the conoid complex. It could introduce a new terminology for these important apical structure of Apicomplexan parasites and provides a good basis to dissect the molecular functions. As it stands all scientists investigating Plasmodium and Toxoplasma invasion of host cells will be highly interested in this study, most scientists researching apicomplexan organisms should be and some evolutionary scientists will be interested in this study.

      Key papers in the field are the discovery of the Toxoplasma conoid as a highly twisted microtubule-like structure (Hu et al., JCB 2002; doi: 10.1083/jcb.200112086) the first description of an apical proteome (Hu et al., PLoS Path 2006; 10.1371/journal.ppat.0020013), the description of a tilted arrangement of the rings in Plasmodium versus Toxoplasma (Kudryashev et al., Cell Microbiol 2012; doi: 10.1111/j.1462-5822.2012.01836.x) and the discovery of apical located proteins that are essential for conoid formation (Tosetti et al., eLife 2020; 10.7554/eLife.56635) to name a few.

      If intended for a broader audience, a cartoon of a conoid complex across the different species investigated and discussed here would help for visual guidance highlighting the similarities and differences

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point response to reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **

      The authors constructed a virtually complete fitness landscape of the P1 extension region (4-base-paired helix) in the group I intron from Tetrahymena thermophila, using a kanamycin resistance reporter to evaluate the fold-change in fitness, which is related to self-splicing activity. This was a clever choice of system because it was known from earlier work that the P1 extension adopts two different conformations during self-splicing. The fitness of each variant was determined from the number of reads acquired from the sequencing data sets and analyzed through an extensive computational pipeline. The strength of the paper is that this machine learning approach can be used to calculate how individual variants contribute to the fitness landscape and assess the directions of epistasis across a large number of identified genotypes.

      We thank the reviewer for highlighting one of the key strengths of our manuscript, the fact that our analytical approach, using SHAP values, enables contributions of individual variants to be assessed in a genotype-specific manner. This approach provides for a sound, robust, and principled way of describing and understanding the fitness impact of one mutation in the context of (potentially many) others.

      The authors argue that machine learning more successfully models subtle effects that arise from interactions between RNA residues, and that the power to analyze deep mutational sequencing experiments can better rationalize fitness constraints arising from multiple conformational states.

      We do indeed argue that machine learning is likely to play an increasing role in making sense of deep mutational scanning data. These scans provide high-resolution information on how fitness maps onto genotype, but the molecular underpinnings of this relationship often remain obscure. It is these “hidden” underpinnings, including the effects of specific mutations on RNA/protein folding, structures, and dynamics, that machine learning approaches can help elucidate.

      The results are mostly consistent with previous studies even though the authors collected the data in a more advanced and complicated way. They are also able to rationalize complex phenotypes - for example, the observed fitness defects are more prevalent under an unfavorable growth condition (30ºC), because the lower temperature hinders conformational exchange. Although such cold sensitive effects are well known in RNA, it is gratifying that this can be captured in the fitness landscape.

      Finding temperature-related fitness effects that are consistent with impaired conformational exchange was also gratifying for us and we thank the reviewer for highlighting this finding.

      The results would be more convincing if the authors directly measure the self-splicing activity of a few key variants, such as the C2C21 mutant, to determine whether these mutations alter the self-splicing mechanism of the Tte-119(C20A) master sequence in the way that they infer from their model. In interpreting their results, they may want to consider misfolding of the intron core (coupled to base pairing of P1) and reverse self-splicing. Reversibility in the hairpin ribozyme, for example, turned out to be the key for understanding the effects of certain mutations.

      We appreciate that measurements of splicing activity for individual genotypes would complement and further strengthen our study. We will therefore aim to construct strains for a few key genotypes and assay self-splicing activity using RT-qPCR – an approach we previously used successfully to monitor splicing kinetics of self-splicing introns in yeast mitochondria (see Rudan et al. 2018 eLife 7:e35330). Specifically, we will quantify the fraction of spliced and unspliced transcripts using primers that span the exon-exon and the 3’ exon-intron junction, respectively (the 5’ intron-exon junction is genotypically diverse and would require genotype-specific primers). This will be done under non-selective (-kan) conditions, where the relative fraction of spliced and unspliced transcripts is a function of intrinsic splicing ability and not confounded by selection. We aim to include the master sequence, C2C21, G3C20 and its mirror genotype C3G20, U3 (which restores perfect complementarity in the master sequence), and G5 (inferred from the high-throughput experiment to make a strong negative contribution to fitness).

      In interpreting our results, we will consider different mechanisms of splicing failure, such as kinetic problems (slow dissociation of P1ex), misfolding of the intron core, reverse self-splicing, and the use of cryptic splice sites, which has previously been documented (see e.g. Woodson & Cech 1991 Biochemistry 30:2042-2050). We note, however that a precise mechanistic dissection of the splicing defects of individual variants is not the purpose of this manuscript and we therefore do not aim to establish genotype-specific defects in great molecular detail.

      Related to the point above, interesting conclusions regarding the relationships between base identity and epistasis that arise from metastability should be strengthened with additional examples. For example, the authors can explain why a reverse base-pairing variant (C3G20) exhibits negative epistasis but is not similar to that of the G3C20 construct. This would ideally use the data from the screen but also be validated by checking the self-splicing activity of a few individuals at low and high temperature.

      In measuring splicing activity and its link to fitness for a subset of key variants (see point #4), we will include at least one mirror example such as C3G20/G3C20. In addition, we will highlight additional examples of this mirror asymmetry based on the results from our high-throughput screen.

      They should validate the screen by showing that kanamycin resistance does indeed correlate strictly with self-splicing activity, and not some other feature such as RNA turnover. (It would also not be a bad idea to check this in the cell, which can be done by primer extension or Northern blotting.)

      This question (i.e. whether altered RNA stability rather than splicing efficiency explains differential KNT production and ultimately fitness) has previously been addressed by Guo & Cech (2002) when introducing the knt+intron reporter system. These authors found no difference in mRNA stability in constructs that displayed differential kanamycin resistance. To shore up this conclusion further, we will measure fitness (via colony counts, growth rate or more directly through competitive fitness assays) of the key variants for which we determine splicing activity (see point #4) and then correlate splicing and fitness.

      The benefit of the machine learning model is that it can extract signals that may be hard to detect otherwise. The downside is that it doesn't produce a physical model, as far as I am aware. The parameters are themselves not meaningful - except to the degree that trends in the fitness estimates can be explained after the fact. This is something that should ideally be explained more directly in the manuscript.

      The reviewer raises an interesting point, that indeed deserves further discussion/explanation. The reviewer is right that, at first sight, high-resolution fitness landscapes like ours do not directly produce a physical (structural) model of the molecule under investigation. They connect genotype and fitness, but the molecular intermediate – a biophysical structure – is not explicit. However, over the last few years, it has become apparent that deep mutational scanning experiments can – both in principle and in practice – yield information that can be leveraged to infer such a physical model. In short, covariation in fitness between residues in a protein or bases in an RNA can be used as inputs for constraint-based modelling of physical interactions. Notably, Schmiedel & Lehner (2019, Nature Genetics 51: 1177-1186) recently demonstrated that deep mutational scanning data can be used in this manner to reconstruct secondary and tertiary protein structure with high accuracy. In principle, the same approach can be used to reconstruct RNA structures. This will require more extensive, molecule-wide fitness data, but our study points towards just this future, even for data collected from structural ensembles.

      When we stated in the original manuscript that deconvolution of the fitness landscape might help to reverse engineer structures, this ability to interpolate between genotype and fitness to reveal hidden biophysical/structural relationships is what we refer to. We will revise the manuscript to make this connection more explicit.

      The authors claim that by evaluating a large number of sequences at two conditions, they can capture variants with intermediate phenotypes (Fig. 1). This is not necessarily true. If the original screen allows only the most active variants to survive on kan+ medium, then the signature of intermediate phenotypes may not be encoded in the original data, and thus not retrievable even with sophisticated algorithms, which may also be prone to overfitting. At what limit of stringency will the screen fail to yield information about intermediate fitness? How deeply must one sequence to recover this information, especially if noisy or degraded? Some discussion of these effects would be helpful.

      The capacity of any high-throughput sequencing-based DMS experiment to resolve intermediate phenotypes does indeed depend on a number of things. The reviewer highlights two of these: First, in screens where the phenotype is not binary (dead/alive) but fitness can be measured on a continuous scale, can we – and do we – capture phenotypes with intermediate fitness? What if only the fittest/most active variants survive? This is, ultimately, an empirical question, and one we can answer quite definitively: we do observe a large range of intermediate phenotypes, which – in our study – correspond to intermediate fold-change values. For each genotype, we can provide confidence limits and assess statistical significance. Table S1 provides this information. Our capacity to resolve these intermediate phenotypes is mainly based on three things. One is adequate sequencing depth, as highlighted by the reviewer. The second is the number of biological replicates (N=6) we analyse, which allows us to differentiate biological variability from noise for a large number of genotypes. This is an important aspect of DMS experiments that has often been overlooked (i.e. there are many other studies where only a single replicate is analysed and biological heterogeneity is not taken into account). With six replicates in hand, we can directly estimate variability (as done e.g. in our DESeq2 analysis) and quantify uncertainty so as to guard against overfitting. In our view, this is arguably more important than sequencing depth in deriving appropriate fitness estimates. Finally, we can resolve intermediate phenotypes because we keep the time lag between initial exposure to kanamycin and assaying genotype frequencies relatively short (overnight growth, see Methods). Our experiment is effectively a multi-genotype competition experiment, and we provide a snapshot across the genotype pool at a given time. If we had measured after several days of culture, genotypes with greater relative fitness would have spread further through the population, at the cost of less fit genotypes, many of which would likely have been eliminated. We kept measurement lag relatively short on purpose so that we could see a clear differential response to kanamycin while still being able to catch more than just a handful of the very fittest genotypes.

      In light of the above, it will be apparent that there are no simple answers to the reviewer’s questions about required sequencing depth, levels of stringency, etc. The ability to assign differential fitness across a large population of genotypes hinges on multiple interrelated considerations (sequencing depth, complexity of the final & starting pool, number of replicates). In revising the manuscript, we will highlight some of the key considerations just discussed, bearing in mind that the manuscript cannot possibly discuss all possible pitfalls and requirements of deep mutational scanning experiments in great detail.

      Lastly, the evolvability of RNA is fascinating and there is much to learn. However, the authors don't discuss the implications of their findings for molecular evolution although they throw the term around. It would be exciting if there is a trend in the fitness landscape that could help explain the trajectory of RNA evolution in nature.

      We agree with the reviewer that it would be exciting to link deep mutational scanning results more closely with observable patterns of RNA evolution. This is true both in relation to evolution of P1ex/group I introns specifically and evolution of dynamic RNA structures more generally. Regarding the latter, we note that selection against excess stability has previously been inferred for 5’ UTRs (see e.g. Gu et al. 2010 PLoS Comp Biol 6: e1000664), although our case is slightly different in that a helix still needs to form but be sufficiently unstable to enable swift dissociation. We also note that riboswitches might make for an excellent subject to study asymmetric constraint and selection against excess stability as they involve formation of competing helices (including participation of some but not all nucleotides in more than one helix), their structure/function is well understood, and many examples are known, providing opportunities for evolutionary analysis. We consider this outside the scope of the current study. We will, however, seek to analyse patterns of evolution in P1ex to establish whether they correspond in a meaningful way to the fitness trends we observe in the laboratory. To do so, we will analyse the distribution and evolutionary history of variants across orthologous introns in different Tetrahymena species/strains, with a focus on P1ex, P10 and the surrounding sequence. Fortunately for us, the 23S ribosomal RNA gene in which the intron is embedded has been used as a phylogenetic marker so that intron/exon sequence information is available for a reasonable number of species/strains (see Doerder 2018 J Eukaryot Microbiol 66:182-208). We will generate an alignment of these sequences and ask, for example, whether N2-N5 are subject to different constraints than N18-N21 mirroring our experimental findings. We have previously successfully quantified patterns of variation surrounding self-splicing introns in yeast mitochondria (Repar & Warnecke 2017 Genetics 205:1641-1648). Note here that extending this analysis beyond Tetrahymena is problematic. Specifically, the intron is absent from close relatives of Tetrahymena (Doerder 2018 J Eukaryot Microbiol 66:182-208) and P1-proximal structures of distant relatives are quite variable. In addition, we are looking at intronic regions that are not only adjacent to but also directly interact with exonic sequence. The exonic context in which the intron is embedded therefore matters but will be quite different for more distant group I introns. We therefore think that aligning and comparing distant orthologs has limited merit.

      The authors use the abbreviation DMS for deep mutational scanning; the RNA structure field uses the reagent dimethylsulfate that is also abbreviated DMS. They may want to choose a different acronym or just avoid an acronym altogether.

      We appreciate this point about false-friend acronyms. We will either find a different acronym or avoid it altogether.

      Reviewer #1 (Significance (Required)):

      As the importance of RNA structure for gene expression becomes more widely appreciated, interest in understanding the evolution of RNA structures is also increasing. Compared with the molecular evolution of proteins, evolution and fitness in RNA is far less understood, although the authors appropriately point to a number of recent studies on this topic. The main advance here is to use machine learning methods to analyze the results of a large genotypic screen, with the goal of more accurately capturing the fitness effects of sequences at varied distances from the parental sequence. The specific conclusions reached here such as the importance of metastability or the prominence of cold sensitive effects are not revolutionary, but the authors illustrate how such phenomena can be investigated more systematically and in more depth.

      We thank the reviewer for highlighting that our analytical approach showcases how deep mutational scanning data can be analysed in an unbiased and systematic manner to better understand the relationship between genotype, molecular phenotype (e.g. structure), and fitness. The reviewer also rightly points to specific results we obtain regarding temperature-related effects and metastability of P1ex/P10. However, we believe that the most important contribution of this work is a more general one, namely our proof-of-principle demonstration that deep mutational scanning data can capture multiple conformational states simultaneously, and that these states can be deconvoluted from a single fitness landscape to attribute the fitness impact of individual mutations to specific RNA conformations. To our knowledge this had not been explicitly demonstrated before and our work provides an important cornerstone for future studies looking to interpret mutational effects in either RNAs or proteins in the light of dynamic structures.

      In light of comments by reviewer #2 below, it is worth reiterating the proof-of-principle nature of this study. Many of the specific results we obtain (e.g. importance of avoiding excess stability in P1ex) are not revolutionary. Indeed, we would be worried if they were. We chose to investigate P1ex because substantial prior work exists that has furnished us with solid positive controls. This independent prior validation allows us to both have great confidence in the data we generate and demonstrate cogently that the two conformational states at the beginning and end of the splicing reaction are captured in the data.

      Finally, we believe our work, in covering a virtually complete genotype space, using multiple replicates to quantify uncertainty in fitness estimates, and using SHAP scores to interpret variant effects in genotype-specific context, sets a new high bar for this type of study and will provide valuable reference data and analytical recipes for future analyses. **

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Soo et al probes the effect of mutations on the fitness of the Tetrahymena Group I self-splicing intron. They used high-throughput sequencing to simultaneously identify the effect of every possible sequence in a 4-bp helix. The approach is sound and the conclusions are generally supported. However, the analysis seems overly complicated given the dataset. Both the analysis and the accompanying writing make it difficult to understand what seems to be a fairly clear conclusion - that the relative stabilities of two alternative RNA helices are important for splicing.

      We thank the reviewer for testifying to the validity of our approach and the soundness of our conclusions. Regarding the complexity of the analysis, the reviewer is right in that – for the conclusion that the relative stabilities of two alternative helices are important for fitness – a simpler analysis would have sufficed. However, as elaborated in response to point #11 above, our objective here is not merely to draw specific conclusions about the relative stabilities of P1ex and P10, but more general: a) to demonstrate that a single fitness landscape can be deconvoluted to implicate multiple conformations in fitness defects and b) to provide a basic but powerful recipe for doing so in an unbiased, systematic manner using machine learning.

      We will strive to make the writing clearer so that readers can follow this reasoning and appreciate our analytical choices.

      • **Major Comments** *

      The authors state that this method can identify the impact of transient conformational states. However, the two conformational states in this study are not transient - in fact they are associated with two distinct chemical steps of splicing and are quite stable. It may be that the effect of important transient states would be observed, but this study does not demonstrate that.

      We used the word “transient” to describe two alternative RNA structures formed during the life cycle of the intron. Both states (characterized by P1ex and P10 formation) are transient in as much as they disappear as splicing proceeds. In retrospect, we agree with the reviewer that this usage is too loose (given how the term is generally used in the literature) and might evoke the wrong connotations. We will therefore revise the manuscript to eliminate references to P1ex and P10 as transient states, but rather describe them as alternative conformations. Of course, the general point remains true: that deep mutational scanning data should in principle capture all fitness-relevant structural states even if these are transient (in the strict sense of the word).

      "Fitness" ends up being on an arbitrary scale, which impairs some analysis. A similar high-throughput sequencing pipeline could have been used to directly monitor splicing of every mutant, though at this point that is outside the scope of this study. Even with the arbitrary units, it would be clearer if more time were spent comparing fitness to base-pair stability on an individual basis, rather than the broad analyses. (See minor comments for details.)

      The reviewer is right in saying that a high-throughput pipeline could have been designed to monitor splicing of each genotype directly (rather than assaying fitness of the cell population that represents a particular genotype).We chose not to do so. One reason for this is that monitoring splicing directly would have necessitated design of a more complicated assay. This is because, to monitor splicing efficiency, one would have to monitor both pre-mRNA and mRNA for different genotypes. The former is straightforward (using primers that span the exon-intron junction) but the latter is not: successful splicing removes the genotype-specific information from the mRNA (that information being solely encoded in the intron). This a solvable problem in principle. One might, for example, introduce barcodes of sufficient complexity in the mRNA that can be linked back to the intron genotype, but doing so would have introduced a further source of error and complicated analysis. We therefore opted for monitoring genotypic fitness by sequencing the plasmids from which the RNAs originate. This does mean that our measurements of fitness are not coupled to a specific molecular phenotype (such as splicing efficiency) – we presume (but are not entirely sure) this is what the reviewer refers to when talking about fitness being on an “arbitrary scale”. However, fitness derived in this manner has the advantage of providing information that does not start from a mechanistic preconception. We ask how variant affects survival and reproduction of the cell without presuming specific mechanism and the results can therefore capture any mechanism, including those that we did not consider initially. The challenge then becomes to tease out possibly multiple mechanisms from unbiased data.

      We will tackle the reviewer’s final comment, regarding analysis of base-pair stability, below in response to one of the minor comments (point #20).

      \*Minor Comments** *

      The sentence in the abstract beginning "Using an in vivo report system..." is very difficult to comprehend. This is due both to the length of the sentence and the word usage. The final sentence of the abstract is similarly difficult. In general, the writing overemphasizes complexity at the cost of clarity.

      We will revise the entire manuscript to make the writing both clearer and more concise.

      Analysis of results in terms of "epistasis" obscures what could be a straightforward observation. This is the same as saying that mutants are not independent, or that their energetic costs are not additive. This follows obviously from the observation that the nucleotides being mutated are base-paired.

      Making explicit reference to “epistasis” is a considered choice. Framing results in terms of epistasis might be less familiar to readers grounded in RNA or protein biophysics/biochemistry, but is very much at the heart of thinking about the genotype-phenotype relationship from an evolutionary perspective, where global descriptions of epistasis are commonplace and usually provide the starting point for thinking about genotype-phenotype relationships, evolution and evolvability. So what seems unnecessarily obscure when seen through the lens of one field, is natural when considered in the context of another. Importantly, it is also the central approach adopted by many if not most prior deep mutational scanning studies (see e.g. Hayden et al. 2011; Pressman et al. 2019; Zhang et al. 2009; Li et al. 2016; Puchta et al. 2016; Domingo et al. 2018; Li and Zhang 2018; Weinreich et al. 2013; Lalić and Elena 2015; Bendixsen et al. 2017 as cited on page 3 of the manuscript) so we think this framing is helpful to compare our results to prior work.

      We expect that the readership will include many researchers interest in mapping genotype-phenotype-fitness relationships who will expect to see global analyses and descriptors of the type we present. We will, however, revise the manuscript to ensure that our description of the findings remains accessible to readers from other fields.

      More specifically, we also note that the fact that mutations are not independent (i.e. epistasis exists) might be trivial from the fact that P1ex is a base-paired helix. The magnitude and direction (“sign”) of epistasis, however, are not. In fact, as we describe, contrary to prior DMS on RNA helices, we find a lot of positive epistasis, reflecting, as we argue, selection against excess stability of P1ex to allow subsequent formation of P10.

      The novel information is the sensitivity of fitness to base pairing. This is best shown in an analysis like Figure 3A (see below), not broad measures of epistasis.

      Please see responses to points #11, #12, and #16 above for an elaboration of what we consider to be the main merits of this study and why providing broad measures of epistasis is a sensible choice.

      Figure 1C isn't necessary for the reader to understand the process.

      We are happy to follow editorial guidance as to whether this panel is superfluous and should be removed or is worth including.

      It is unclear what figure 2C is showing. It appears that the replicates are similar to each other, that 30 deg C and 37 deg C are also similar, but that +/- Kan are different. This probably doesn't need a figure in the main text.

      This figure does indeed capture what the reviewer describes: genotype pools in +/-kan are least similar to each other, while 30/37ºC are similar but distinct in the +kan condition and effectively indistinguishable in the -kan condition, in line with expectations. We agree with the reviewer that this information per se is something that would typically be found in a supplementary figure. However, we would advocate for retention of this panel in the main manuscript in this instance because of the way in which it was derived: using the Bray-Curtis dissimilarity index. To our knowledge, this is the first time that Bray-Curtis dissimilarity has been used to quantify, in a principled way, the similarity between genotype pools. Borrowed from the ecology literature, the index captures both richness (number of different species/genotypes in the ecosystem/genotype pool) and relative abundance to provide an integrated measure of genotype diversity. We believe that this measure will be useful for future studies and rather than relegating the figure to the supplement, we would aim to briefly highlight its methodological novelty. *

      *

      Figure 3A could be the most informative part of the manuscript. However, predicted minimum free energy should be on the x-axis as the independent variable. The expectation then is that you would see a peak in fitness at some free energy, with fitness falling off both with increased and decreased stability. Furthermore, there should be more analysis along these lines. The authors should calculate helical stability for both P1ex and P10 for every mutant and compare with fitness. Mutations which affect both could also be separated out. Figure 4C comes the closest to this but views it only in terms of GC pairs; there is no reason not to quantify the energetic effects given that predictions of stability for helices is quite good. Deviations from a model invoking only helical stabilities would indicate another factor is involved (alternative base-pairing or tertiary structure, for example).

      We agree with the reviewer that the axes in Figure 3A should be flipped and we will do so in the revised manuscript. We also agree that, when it comes to helical stability of P1ex, the simple expectation would be to see a peak at a certain stability with drop-offs either side, as intimated by Figure 4C. We further agree with the reviewer that Figure 4C is rather indirect and can be made more quantitative by considering helical stability across all genotypes directly. To this end, we will use one of the many tools available that allow prediction of helical stability from primary sequence (e.g. the enf2 function in RNAStructure, as used by Torgerson et al 2018 RNA, see point #24 below) and replace Figure 4C with a more quantitative fitness landscape based on these computations. To provide added confidence in the computations of helical stabilities from primary sequence in the context of our structure, we will also calculate helical stabilities from molecular dynamics simulations for the subset of genotypes we considered previously (Figure 4E/F) and see how inferred stabilities compare.

      There appears to be a missing verb in the legend for figure 3A, second sentence.

      We will fix this error.

      Figure S5 appears to be redundant with Figure 1.

      At first glance, Figure S5 does indeed appear redundant with Figure 1 but it is not. Figure S5 shows the relevant sequence of the group I intron and bordering exons in its native context, i.e. when embedded in the 23S ribosomal RNA gene of Tetrahymena thermophila, whereas Figure 1 shows the genotype of the mutant intron embedded in knt. The sequences are different. We will revise the legend to Figure S5 to make this clearer.

      Figure S6 is a better analysis than what appears in the main text, and could be expanded to all base pairs.

      We will expand Figure S6 to include all base pairs as suggested. We disagree that this is a better analysis compared to what appears in the main text. Rather, it provides a complementary, hypothesis-driven view whereas the analysis in the main text is more systematic and unbiased in approach. *

      *

      Reviewer #2 (Significance (Required)):

      This manuscript largely focuses on the technical approach. The shift in analytic strategy described above would increase the conceptual impact. The conclusions are consistent with and fit in with recent uses of high-throughput sequencing to study RNA systems. For example Pitt & Ferré-D'Amaré, Science (2010) and Kobari et al, NAR (2015) describe fitness landscapes of the ligase and HDV ribozymes, respectively. Torgerson et al RNA (2018) make similar measurements on the glycine riboswitch, including a treatment of relative helix stability for two mutually exclusive conformations. The overall results are of interest to researchers in the field of noncoding RNA.

      We thank the reviewer for highlighting the paper by Torgerson et al, of which – embarrassingly – we were not aware. We will make reference to this paper in a revised manuscript and highlight that riboswitches might be a good model system to further explore asymmetric constraint and selection against excess stability in an evolutionary context (also see our response to point #9 above).

      As highlighted earlier, we think the main conceptual impact of our work lies not in the description of helical stabilities. Rather, it lies in a) providing a rigorous proof-of-principle that deep mutational scanning can capture multiple conformational states simultaneously, and b) that, using an unbiased machine learning approach, these states can be deconvoluted from a single fitness landscape to attribute the fitness impact of individual mutations to specific RNA conformations. A shift in analytical strategy to “cut to the chase” and narrowly focus on helical stability would be misguided in this context, as we seek to provide not only insights into the data at hand but also lay out a sound and general recipe for analysing similar datasets in the future.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The manuscript by Soo et al probes the effect of mutations on the fitness of the Tetrahymena Group I self-splicing intron. They used high-throughput sequencing to simultaneously identify the effect of every possible sequence in a 4-bp helix. The approach is sound and the conclusions are generally supported. However, the analysis seems overly complicated given the dataset. Both the analysis and the accompanying writing make it difficult to understand what seems to be a fairly clear conclusion - that the relative stabilities of two alternative RNA helices are important for splicing.

      Major Comments

      1.The authors state that this method can identify the impact of transient conformational states. However, the two conformational states in this study are not transient - in fact they are associated with two distinct chemical steps of splicing and are quite stable. It may be that the effect of important transient states would be observed, but this study does not demonstrate that.

      2."Fitness" ends up being on an arbitrary scale, which impairs some analysis. A similar high-throughput sequencing pipeline could have been used to directly monitor splicing of every mutant, though at this point that is outside the scope of this study. Even with the arbitrary units, it would be clearer if more time were spent comparing fitness to base-pair stability on an individual basis, rather than the broad analyses. (See minor comments for details.)

      Minor Comments

      1.The sentence in the abstract beginning "Using an in vivo report system..." is very difficult to comprehend. This is due both to the length of the sentence and the word usage. The final sentence of the abstract is similarly difficult. In general, the writing overemphasizes complexity at the cost of clarity.

      2.Analysis of results in terms of "epistasis" obscures what could be a straightforward observation. This is the same as saying that mutants are not independent, or that their energetic costs are not additive. This follows obviously from the observation that the nucleotides being mutated are base-paired. The novel information is the sensitivity of fitness to base pairing. This is best shown in an analysis like Figure 3A (see below), not broad measures of epistasis.

      3.Figure 1C isn't necessary for the reader to understand the process.

      4.It is unclear what figure 2C is showing. It appears that the replicates are similar to each other, that 30 deg C and 37 deg C are also similar, but that +/- Kan are different. This probably doesn't need a figure in the main text.

      3.Figure 3A could be the most informative part of the manuscript. However, predicted minimum free energy should be on the x-axis as the independent variable. The expectation then is that you would see a peak in fitness at some free energy, with fitness falling off both with increased and decreased stability. Furthermore, there should be more analysis along these lines. The authors should calculate helical stability for both P1ex and P10 for every mutant and compare with fitness. Mutations which affect both could also be separated out. Figure 4C comes the closest to this but views it only in terms of GC pairs; there is no reason not to quantify the energetic effects given that predictions of stability for helices is quite good. Deviations from a model invoking only helical stabilities would indicate another factor is involved (alternative base-pairing or tertiary structure, for example).

      4.There appears to be a missing verb in the legend for figure 3A, second sentence.

      5.Figure S5 appears to be redundant with Figure 1.

      6.Figure S6 is a better analysis than what appears in the main text, and could be expanded to all base pairs.

      Significance

      This manuscript largely focuses on the technical approach. The shift in analytic strategy described above would increase the conceptual impact. The conclusions are consistent with and fit in with recent uses of high-throughput sequencing to study RNA systems. For example Pitt & Ferré-D'Amaré, Science (2010) and Kobari et al, NAR (2015) describe fitness landscapes of the ligase and HDV ribozymes, respectively. Torgerson et al RNA (2018) make similar measurements on the glycine riboswitch, including a treatment of relative helix stability for two mutually exclusive conformations. The overall results are of interest to researchers in the field of noncoding RNA.

      Our expertise is in RNA biochemistry and biophysics. We are not qualified to evaluate the details of several of the computational pipelines described.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors constructed a virtually complete fitness landscape of the P1 extension region (4-base-paired helix) in the group I intron from Tetrahymena thermophila, using a kanamycin resistance reporter to evaluate the fold-change in fitness, which is related to self-splicing activity. This was a clever choice of system because it was known from earlier work that the P1 extension adopts two different conformations during self-splicing. The fitness of each variant was determined from the number of reads acquired from the sequencing data sets and analyzed through an extensive computational pipeline.

      The strength of the paper is that this machine learning approach can be used to calculate how individual variants contribute to the fitness landscape and assess the directions of epistasis across a large number of identified genotypes. The authors argue that machine learning more successfully models subtle effects that arise from interactions between RNA residues, and that the power to analyze deep mutational sequencing experiments can better rationalize fitness constraints arising from multiple conformational states. The results are mostly consistent with previous studies even though the authors collected the data in a more advanced and complicated way. They are also able to rationalize complex phenotypes - for example, the observed fitness defects are more prevalent under an unfavorable growth condition (30{degree sign}C), because the lower temperature hinders conformational exchange. Although such cold sensitive effects are well known in RNA, it is gratifying that this can be captured in the fitness landscape.

      Despite these strengths, there are several weaknesses that should ideally be addressed before publication.

      1.The results would be more convincing if the authors directly measure the self-splicing activity of a few key variants, such as the C2C21 mutant, to determine whether these mutations alter the self-splicing mechanism of the Tte-119(C20A) master sequence in the way that they infer from their model. In interpreting their results, they may want to consider misfolding of the intron core (coupled to base pairing of P1) and reverse self-splicing. Reversibility in the hairpin ribozyme, for example, turned out to be the key for understanding the effects of certain mutations.

      2.Related to the point above, interesting conclusions regarding the relationships between base identity and epistasis that arise from metastability should be strengthened with additional examples. For example, the authors can explain why a reverse base-pairing variant (C3G20) exhibits negative epistasis but is not similar to that of the G3C20 construct. This would ideally use the data from the screen but also be validated by checking the self-splicing activity of a few individuals at low and high temperature.

      3.They should validate the screen by showing that kanamycin resistance does indeed correlate strictly with self-splicing activity, and not some other feature such as RNA turnover. (It would also not be a bad idea to check this in the cell, which can be done by primer extension or Northern blotting.)

      4.The benefit of the machine learning model is that it can extract signals that may be hard to detect otherwise. The downside is that it doesn't produce a physical model, as far as I am aware. The parameters are themselves not meaningful - except to the degree that trends in the fitness estimates can be explained after the fact. This is something that should ideally be explained more directly in the manuscript.

      5.The authors claim that by evaluating a large number of sequences at two conditions, they can capture variants with intermediate phenotypes (Fig. 1). This is not necessarily true. If the original screen allows only the most active variants to survive on kan+ medium, then the signature of intermediate phenotypes may not be encoded in the original data, and thus not retrievable even with sophisticated algorithms, which may also be prone to overfitting. At what limit of stringency will the screen fail to yield information about intermediate fitness? How deeply must one sequence to recover this information, especially if noisy or degraded? Some discussion of these effects would be helpful.

      6.Lastly, the evolvability of RNA is fascinating and there is much to learn. However, the authors don't discuss the implications of their findings for molecular evolution although they throw the term around. It would be exciting if there is a trend in the fitness landscape that could help explain the trajectory of RNA evolution in nature.

      7.The authors use the abbreviation DMS for deep mutational scanning; the RNA structure field uses the reagent dimethylsulfate that is also abbreviated DMS. They may want to choose a different acronym or just avoid an acronym altogether.

      Significance

      As the importance of RNA structure for gene expression becomes more widely appreciated, interest in understanding the evolution of RNA structures is also increasing. Compared with the molecular evolution of proteins, evolution and fitness in RNA is far less understood, although the authors appropriately point to a number of recent studies on this topic. The main advance here is to use machine learning methods to analyze the results of a large genotypic screen, with the goal of more accurately capturing the fitness effects of sequences at varied distances from the parental sequence. The specific conclusions reached here such as the importance of metastability or the prominence of cold sensitive effects are not revolutionary, but the authors illustrate how such phenomena can be investigated more systematically and in more depth.

  3. Sep 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Dear reviewers,

      Thank you very much for your constructive and helpful remarks and suggestions!

      We marked the changes in the manuscript in yellow.

      Our replies to the specific points:

      Reviewer #1 In the Introduction the authors need to cite earlier work in Chlamydomonas which first showed that binding of specific proteins to the psbA 5'UTR is correlated with increased translation in the light (Danon et al. 1991).

      As suggested, we added the reference to the introduction.

      Reviewer #1 The paper could be improved by testing for protein binding to the footprint region in high vs low light. An obvious candidate is HCF173.

      We agree that HCF173 is an obvious candidate, although its interaction could be mediated via additional proteins. Alice Barkan’s group has demonstrated that in maize HCF173 binds to the same region upstream of the translation initiation region (McDermott et al., 2019) where we detected a footprint (Supplemental Figure S11A-D). Furthermore, McDermott et al showed that the binding sequence is conserved. We would like to analyze this question in more detail, but we have currently in the lab no approach available to specifically isolate psbA mRNA with its bound proteins for this analysis and therefore have to postpone the answer to this question to future studies.

      Reviewer #2: \*Important changes to make before full submission:** 1)It is becoming clear that the translation efficiency (TE) is often not a calculation of translational output from specific mRNAs but in fact is better to be described as ribosome association. There can be many reasons for increased ribosome association including ribosome stalling and increased translational engagement. It would be good for the authors to add a simple Western blot to demonstrate directly increased protein output from psbA during high light as compared to low light treatments. This figure could be added to Figure S1.*

      We want to stress that we have chosen a condition that is well known to increase psbA translation in higher plants as shown in the literature with different methods (e.g. Chotewutmontri and Barkan, 2018; Schuster et al., 2020). The protein encoded by psbA, the D1 subunit of photosystem II, has an increased turnover in high light, i.e. a higher amount of D1 has to be produced to compensate for the increased degradation of photodamaged D1 (Mulo et al., 2012; Li et al., 2018).

      Although there is a lot of evidence in the literature for good correlation of translation efficiency as determined by ribosome profiling and protein synthesis, the reviewer raised a valid concern. Ribosome pausing or even ribosome stalling could also cause increased ribosome binding and thereby increased amounts of ribosome footprints. Therefore, we analyzed ribosome pausing in selected genes including psbA and rbcL. The pattern of ribosome pausing was very similar in low and high light (new Supplemental Figure 14), which rules out any ribosome stalling at specific sites or drastic changes in ribosome pausing. To analyze if there is increased ribosome pausing, we determined the fraction of footprints at pause sites compared to the total number of footprints. We used two different pause scores as cutoffs to determine pause sites. To include as many pausing events as possible, we used a pause score of 1, i.e. everything higher than the mean ribosome density per nucleotide of the corresponding coding region (Gawronski et al., 2018). This fraction was unaltered in low and high light (new Supplemental Figure 14). With a more stringent pause score of 20 (20 times higher ribosome density than the mean), an increase of ribsome pausing in high light was detected for psbA, whereas we did not find differences between high and low light for rbcL and psaA. However, this increase in pausing at the psbA mRNA is insufficient to explain the increase in the total amounts of ribosome footprints. Additional pause scores were tested, the value for the psbA fraction with a pause score of 20 included in Supplemental Figure S14 showed the largest difference.

      Reviewer #2: \*Strongly suggested additions to the manuscript to improve its significance before publication** 1)Identifying the RNA-binding protein(s) (likey HCF173 which may be in a complex with other proteins) that interacts with the 5' UTR of psbA in a highlight dependent manner would increase the significance of this study. Finding that this protein binds to other plastid transcripts with weak Shine-Delgarno sequences would also be a nice addition to this study.*

      See comment to reviewer 1. McDermott et al. (2019) describe HCF173 as relatively specific for psbA. Therefore, we do not assume that other genes with weak Shine-Dalgarno sequences are regulated via HCF173 but via different proteins using a similar molecular mechanism to influence the mRNA secondary structure at the translation initiation region.

      Reviewer #2: \*Strongly suggested additions to the manuscript to improve its significance before publication** 2)Mutational analysis of the RBP binding site and also to change the secondary structure around the start codon based on the new structure maps to show the effects of these various changes on protein output would really provide important new findings on how important the RBP being as compared to the RNA secondary structure changes are for regulating protein output form psbA. It could also allow the demonstration of the dependence or independence of these two features on regulating translation from chloroplast mRNAs.*

      We agree with the reviewer that this would be a very interesting study. Unfortunately, it requires a larger collection of lines with mutated psbA sequences. Plastid transformation in Arabidopsis thaliana is still technically demanding and time consuming. Even in the case of Nicotiana tabacum, for which plastid transformation is well established, such a project would likely need several years. We therefore think that such a study is beyond the scope of the current manuscript.

      Reviewer #3 1.In this paper, author mentioned that DMS can modify four nucleotides under alkaline conditions. Because the chloroplast is slightly alkaline, the authors use DMS reactivity from 4 nucleotides to model RNA secondary structure. Based on Kevin Weeks' s paper, it shows that in cell-free condition, DMS has very limited ability to modify single-stranded G and U compared to A and C (Anthony M. Mustoe et al., 2019, PNAS 116: 24574. fig. 1B). In Lars B. Scharff' paper which is cited by the author, it is also mentioned that A and C is more reliable to model RNA secondary structure. The authors might need to calculate the correlation the DMS data and known RNA structure using G/U or all four nucleotides to show that DMS reactivity from G and U is also reliable to be used. Also in Fig. S3B, the reproducibility of G/U between replicates is not as good as A/C. I don' t think G and U can be used to predict RSS.

      We agree with the reviewer that DMS reactivities at G/U are less reliable than those at A/C. This was shown by Mustoe et al. (2019) and by us for chloroplast rRNAs (Gawronski et al., 2020, Plants). We included a correlation of the known 16S rRNA secondary structure and the DMS reactivities at the different nucleotides (Supplemental Figure S5A) that demonstrates that the DMS reactivities at G/U actually contain information about rRNA secondary structure. This analysis demonstrated again that the reactivities at G/U are less reliable than at A/C. Therefore, we added an analysis of the more reliable A/C for comparison with the results for all four nucleotides (Figure 1D-F, 3C-F).

      Reviewer #3 2.Is the 5'UTR the only region which has RSS change? If not, how do RSS changes in other region contribute to translation?

      Translation initiation in plastids is mainly influenced by the secondary structure of the translation initiation region, especially at the cis-elements required for the recognition of the start codon. In addition, we have analyzed different other regions, e.g. the coding regions, the coding regions without the sequences next to the start codon, the end of the coding region, and the complete 5’ UTR (Supplemental Figure S14). We added a more detailed analysis of the changes of secondary structure of the coding region of those genes we focus on (Supplemental Figure S16). This shows that the secondary structure changes of the complete coding region correlate negatively with translation efficiency (see also Supplemental Figure S14G). A similar observation was made in E. coli and explained to be caused by differences in translation initiation, which are mainly influenced by the secondary structure of the translation initiation region (Mustoe et al., 2018).

      Reviewer #3 3.In Fig. 2A and 2B, the DMS reactivities seem very similar under low light and high light. Why did the authors obtain significantly different RNA secondary structure? Are the parameter of low light and high light the same when modelling RNA structure?

      The parameters for the RNA secondary structure predictions in Figure 2 are not identical (see Figure legend). For all structure predictions, the DMS reactivities were used as constrains, but only for the high light structure the sequence of the RNA binding protein’s footprint was forced to be single-stranded. These structure predictions are included to illustrate the mRNA structures in the presence and absence of an RNA binding protein. These structures are based on the observation that the two halves of the stem loop structure have different DMS reactivities in response to high light. The sequence including the protein footprint has lower DMS reactivities in both low and high light. This is in agreement with both a double-stranded sequence as well as a protein-bound sequence. In contrast, the other half of the stem loop, the sequence including the cis-elements of the translation initiation region, has increased DMS reactivities in high light, indicating that it is single-stranded. This suggests that there is protein binding in high light preventing the formation of the inhibitory stem loop.

      Reviewer #3 4.In Fig. S12, the correlationship between HL and LL in ribo-seq and RNAseq is high, which means no significant changes upon light change. In this paper, psbA should have translation change under high light conditions. I suggest the authors to label the dot representing psbA.

      Thank you very much for this suggestion! We marked psbA in the correlation plots (Supplemental Figure 12). The changes in the transcript levels are really minor, whereas for some genes the translation efficiency changes (see Figure 4 and Supplemental Figure S13).

      Reviewer #3 5.I suggest to use plants at the same stage for DMS-MaPseq and SHAPE probing.

      The different plant material was chosen because of the different requirements during probing. In this context, we would like to point out that observing the same changes in the translation initiation region in response to high light in different developmental stages is a stronger confirmation than observing the same response at the same developmental stage. This indicates that the response is not specific for a developmental stage.

      Reviewer #3 6.In Huang's paper (Jianyan Huang et al., 2019, Cell Reports 29: 4186-4199), there are many differential express genes under high light for 0.5hr. However, in the RNAseq data here, the correlation between high light and low light conditions is very high (Fig. S12). Why? Also, it would be nice if the authors could label several DEG whose expression change under high light treatment in Fig. S12?

      Supplemental Figure S12 contains only plastid-encoded RNAs, whereas Huang et al. (2019) focused on nuclear-encoded mRNAs. We clarified the figure legend of Supplemental Figure S12 by adding “of the plastid-encoded genes”. The values for the individual genes can be seen in Supplemental Figure S13.

      Reviewer #3 7.For the MNase footprint method, is the as-SD region the only region show enrichment under high light conditions? Besides, please provide the detailed method of MNase footprint. Does it work for RNA footprinting?

      The used methods are described under “Ribosome profiling (Ribo-seq)” and “Processing of Ribo-seq and RNA-seq reads” in Material and Methods. The approach was very similar to the one used for ribosome profiling with the difference that also smaller read lengths were included in the analysis (18-40 nt instead of 28-40 nt). We did this, because many plastid RNA binding proteins have footprints that are smaller than a ribosomal footprint. The described footprint is the only one detected near the translation initiation region of psbA. Binding of HCF173 was detected by the Barkan group in the same region using a RIP-Seq Analysis combined with RNase I digestion (McDermott et al., 2019), which confirms that our approach is working. We added a reference to the method section in the results part to clarify which approach was chosen.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      RNA can fold into secondary and tertiary structure through base-pairing. RNA structure plays a crucial role in gene functions and regulations, including transcription, processing, translation and decay. Plants acclimate to fluctuating light conditions to optimize photosynthesis and minimize photodamage. Translational regulation is known to be a strategy of these acclimations. It reported that translation of psbA, encoding the D1 reaction center protein of Photosystem II, is increased under high light condition. The light-controlled psbA translation has been intensively studied and was suggested to be related with redox/thiol signals, the ATP status, and some certain proteins. In this ms, Gawroński et al. explored the possible link between RNA secondary structure and translational efficiency. They adopted DMS-MaPseq and SHAPE-seq methods to profile the RNA secondary structure in 5UTR of psbA under low light and high light conditions. The results showed that the DMS and SHAPE activities of Shine-Dalgarno (SD) sequence, star codon and as-SD region are higher under high light condition than that under low light control, indicating that the psbA translation initiation region becomes more single-strandeness and accessible under high light condition. MNase-digestion and DMS activity analysis suggested that protein binding might cause the change of RNA secondary structure of psbA translation initiation region. In addition, the authors probed the RNA secondary structure of the translation initiation region of rbcL that encodes the large subunit of Rubisco and found no change in RNA structure of rbcL, while the translation of rbcL is also increased under high light condition. To address the question that RNA structure changes is related with high light-dependent translational activation of psbA but not rbcL, plastome-wide translational efficiency and RNA structure were analyzed. The results showed that a significant correlation between the RNA secondary changes and translational efficiency changes in the chloroplast-coded mRNAs with week SDs (such as psbA), but not with strong SDs (such as rbcL).

      The light-dependent translational activation of psbA is critical for maintaining photosynthetic homeostasis. Also, the molecular mechanism of RSS's impact on translation is still exclusive The topic of this study is very important. However, this study just described the phenomenon of RNA secondary structure changes in translational initiation region, but does not give further evidence to validate the effect of RNA secondary changes on the translational activation of psbA under high light condition. Besides, the evidence of protein binding causing RNA structure changes is week and unclear. In addition, there is much room for improvement for this work

      1.In this paper, author mentioned that DMS can modify four nucleotides under alkaline conditions. Because the chloroplast is slightly alkaline, the authors use DMS reactivity from 4 nucleotides to model RNA secondary structure. Based on Kevin Weeks' s paper, it shows that in cell-free condition, DMS has very limited ability to modify single-stranded G and U compared to A and C (Anthony M. Mustoe et al., 2019, PNAS 116: 24574. fig. 1B). In Lars B. Scharff' paper which is cited by the author, it is also mentioned that A and C is more reliable to model RNA secondary structure. The authors might need to calculate the correlation the DMS data and known RNA structure using G/U or all four nucleotides to show that DMS reactivity from G and U is also reliable to be used. Also in Fig. S3B, the reproducibility of G/U between replicates is not as good as A/C. I don' t think G and U can be used to predict RSS.

      2.Is the 5'UTR the only region which has RSS change? If not, how do RSS changes in other region contribute to translation?

      3.In Fig. 2A and 2B, the DMS reactivities seem very similar under low light and high light. Why did the authors obtain significantly different RNA secondary structure? Are the parameter of low light and high light the same when modelling RNA structure?

      4.In Fig. S12, the correlationship between HL and LL in ribo-seq and RNAseq is high, which means no significant changes upon light change. In this paper, psbA should have translation change under high light conditions. I suggest the authors to label the dot representing psbA.

      5.I suggest to use plants at the same stage for DMS-MaPseq and SHAPE probing.

      6.In Huang's paper (Jianyan Huang et al., 2019, Cell Reports 29: 4186-4199), there are many differential express genes under high light for 0.5hr. However, in the RNAseq data here, the correlation between high light and low light conditions is very high (Fig. S12). Why? Also, it would be nice if the authors could label several DEG whose expression change under high light treatment in Fig. S12?

      7.For the MNase footprint method, is the as-SD region the only region show enrichment under high light conditions? Besides, please provide the detailed method of MNase footprint. Does it work for RNA footprinting?

      Significance

      see above

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This study uses multiple high-throughput sequencing approaches to probe the secondary structure of the chloroplasitc psbA mRNA during low and high light treatments. They are able to demonstrate a shift in secondary structure around the start codon of this mRNA in response to the high light treatment as compared to under low light conditions. This structural shift is also accompanied by an RBP binding even that may also be involved in regulating the translation from this mRNA in response to high light. I think this study is very interesting and timely. However, I think determining the relative contributions of the secondary structure and RBP binding changes to potential increases in protein outputs from this mRNA in response to high light would improve this manuscript. I also think directly looking at protein levels through a straight-forward Western blot to show increase psbA protein in response to high light treatment is an important addition to this study. I outline my few suggested experimental additions for this manuscript below.

      Important changes to make before full submission:

      1)It is becoming clear that the translation efficiency (TE) is often not a calculation of translational output from specific mRNAs but in fact is better to be described as ribosome association. There can be many reasons for increased ribosome association including ribosome stalling and increased translational engagement. It would be good for the authors to add a simple Western blot to demonstrate directly increased protein output from psbA during high light as compared to low light treatments. This figure could be added to Figure S1.

      Strongly suggested additions to the manuscript to improve its significance before publication

      1)Identifying the RNA-binding protein(s) (likey HCF173 which may be in a complex with other proteins) that interacts with the 5' UTR of psbA in a highlight dependent manner would increase the significance of this study. Finding that this protein binds to other plastid transcripts with weak Shine-Delgarno sequences would also be a nice addition to this study.

      2)Mutational analysis of the RBP binding site and also to change the secondary structure around the start codon based on the new structure maps to show the effects of these various changes on protein output would really provide important new findings on how important the RBP being as compared to the RNA secondary structure changes are for regulating protein output form psbA. It could also allow the demonstration of the dependence or independence of these two features on regulating translation from chloroplast mRNAs.

      Significance

      This study definitely focuses on a research topic that is currently of interest and highly timely.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This manuscript addresses the regulation of chloroplast translation, an important topic in chloroplast biology. The authors show that specific changes in the secondary structure of the 5'UTR of the psbA mRNA involving the Shine-Dalgarno sequence and the AUG initiation codon can be correlated with changes in translational efficiency during a low light to high light shift. Based on indirect evidence they propose that this may be caused by binding of specific proteins to this region. They also show that this correlation appears to be valid to some extent for other mRNAs with a weak SD sequence. The technical quality of this manuscript is excellent and the manuscript is clearly written.

      Additional remarks

      In the Introduction the authors need to cite earlier work in Chlamydomonas which first showed that binding of specific proteins to the psbA 5'UTR is correlated with increased translation in the light (Danon et al. 1991). The paper could be improved by testing for protein binding to the footprint region in high vs low light. An obvious candidate is HCF173.

      Significance

      This work provides valuable new insights into the molecular mechanisms involving the psbA 5'UTR in the initiation of chloroplast translation.

      This work will be of interest to a wide audience interested in the mechanisms of translational regulation.

      My expertise is in chloroplast biogenesis and in assembly and regulation of the photosynthetic apparatus

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Molenaars et al., describe a protocol to extract and quantify a wide range of polar and apolar metabolites from the same C. elegans sample using methanol-chloroform based phase separation. The authors assess the method across different input amounts, in comparison to a 1-phase extraction method and through metabolic perturbations using RNAi against several metabolic enzymes. Finally, they provide a metabolomics analysis of metabolite variation across several C. elegans strains. The data are of overall high quality and presented in a clearly written manuscript.

      We really appreciate the positive words from the reviewer.

      To help assessing the value of the method to other approaches, several controls are suggested below:

      1.Fig.1: Metabolite abundance in the polar phase should be compared to 1-phase extraction methods (analogous to Fig. 2I, which compares metabolites in the apolar phase to 1-phase extraction)

      We acknowledge the apparent asymmetry in the text; comparing our two-phase method to a single phase lipidomics method indeed suggests a similar comparison for metabolomics. However, our established polar metabolomics method has always been based on this exact two-phase extraction. The current method exclusively asks whether it is possible to integrate our dedicated lipidomics platform into our established two-phase polar metabolomics method, by utilizing the apolar phase that is usually discarded. This way, the method enables comprehensive metabolomics/lipidomics screening while limiting the need of culturing twice the amount of material.

      Our manuscript does not necessarily ask the more fundamental question of the advantages of a one-phase vs two-phase extraction for polar metabolites. Interestingly, the one-phase vs two-phase metabolomics methods have been compared previously and the authors show here that the two-phase method achieved broader metabolite coverage, satisfactory extraction reproducibility, acceptable recovery and safety (DOI: 10.1038/srep38885). This is most probably due to the cHILIC column being sensitive for contamination and therefore excluding lipids from your samples is beneficial for measuring polar metabolites. We hence believe that developing a single phase polar method would appear superfluous for the purpose of this study.

      2.Are polar metabolites also detected in the apolar phase? Can the less hydrophobic lipids missing from the apolar phase detected in the polar phase?

      This is an interesting question that mostly relates to the lyso-lipids that are not detected in the lipid phase of our two-phase extraction. The first point to make is that sample solvents that are used at the final stage of extraction are not compatible between methods. In other words, the solvent we normally use for the lipids phase (xxx) cannot be injected on the cHILIC column. So, in a practical sense, we would not be able to measure these compounds, even if they would technically be dissolved in the other layer. However, we tried a few different alternative approaches to get more information on this point:

      We have attempted to integrate the lyso-lipids in the cHILIC measurements, in the polar layer, using the polar sample solvents. This was unsuccessful; no reproducible peaks, not even the internal standards, were measured. We will include a note on these results in our manuscript. We have, albeit for a different sample matrix, attempted to dissolve both layers of the two-phase extraction in the cHILIC sample solvents. While we cannot guarantee this for all metabolites, it appears that most polar metabolites are exclusively found in the polar layer. We were not able to integrate even a single peak from any of the sugar, amino acids, nucleotides, etc in the apolar layer dissolved in polar solvents. We have reconstituted both the polar and apolar layer of our two-phase extraction in 50:50 methanol:chloroform and analyzed them on the lipidomics platform. We did find some of the lipid internal standards partition to the polar phase, especially LPG (and to a lesser extent LPE and LPA) compared to for instance PE, SM, PG and PC that all end up in the apolar phase. We will include these data in the revised manuscript as a supplemental figure as it demonstrates that the lyso-lipids are poorly measured in the two-phase extraction. This is also why in the text we advise to use the dedicated one-phase extraction when interested primarily in these species.

      3.Fig.3l-n: The authors claim that extracting metabolites from the polar and apolar phases of the same sample leads to better cross-correlation than if metabolites are extracted from different samples using methods optimized for the respective metabolite classes. To provide experimental evidence, metabolite abundance should be compared directly when metabolites are extracted from the same or from different samples using suitable methods.

      We agree with this point. We will amend the text to not overstate these advantages.

      Reviewer #1 (Significance (Required)):

      The methodological and conceptual advancement of the present study is rather incremental. The authors essentially use the classical chloroform/methanol/water phase separation protocols developed by Bligh & Dyer and Folch, which have been used extensively for lipid extraction for many decades now. However, the effort to carefully measure the metabolites contained in the aqueous phase is laudable. For method validation, the authors use well-understood perturbations that yield predictable results. Overall, I consider the study more appropriate for a publication as a methods protocol, which could be of interest to the metabolomics community, rather than as a research paper.

      We agree; our goal was indeed to create and share a method, we will make sure to emphasize this in our cover letter.

      While the extraction method we use is not novel per se and based on classical extraction procedures, it is important to underscore that we are only now able to use these extractions in combination with high-resolution mass spectrometry. This opens new opportunities for basic discovery. The efficiency we achieve by using both phases of the two-phase procedure makes our method highly attractive for hypothesis generation, especially in sample sets where limited amounts of material are available.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors provide a detailed description of a method to analyse both polar as well as lipophilic metabolites from the same nematode sample. This provides significant advantages over methods using individual samples. Moreover and by using internal standards they establish an extremely good correlation of individual metabolites. This paper is of immediate importance for the worms community and beyond.

      We are very grateful to receive this positive response from the reviewer and for highlighting the advantages of our described method also beyond the worm community.

      **Major comments:**

      none **Minor comments:**

      The correction process using internal standards could be described a bit more detailed.

      In our revised manuscript, we will describe the internal standard use and corrections in more detail in the text. In summary: internal standards are selected for specific metabolites based on their Pearson correlation and %CV. Subsequently, metabolite peak areas were divided by the area of the appropriate internal standard. This corrects for any loss of sample during sample prep, for instance during the isolation of the two layers.

      Jenni Watts has written a nice Worm Book chapter on lipids which may be cited in addition to reference 17, since it covers many of the metabolites and related enzymes contained in this manuscript

      We will include a reference to this Worm book chapter reviewing fat regulation in C. elegans in our paper, thank you for the suggestion.

      Reviewer #2 (Significance (Required)):

      see above

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript is well written and consider. However, there is room for further improvements:

      We thank the reviewer for the positive response and for the suggestions raised.

      1) Author need to write exactly how many metabolites not just >, semi-quantitative analysis of >100 polar (metabolomics) and >1000 apolar (lipidomics) metabolites in C. elegans, for example they did with other papers in Table 1

      We understand that this might appear vague. The notation was a compromise, based on the following considerations:

      1. The maximum number of reported metabolites can be different to the number of analyzed metabolites in a specific experiment or even a specific sample. For instance, our method is perfectly capable of measuring creatine metabolism –we have standards for these metabolites and they can be reliably measured–, however we have not yet been able to detect these metabolites in elegans. Some mutants also lose abundance of a certain metabolite to the point of it not being reliably measurable, which means they are filtered out in the bioinformatics.
      2. Since the initial draft of our manuscript we have been able, and will continue to be able, to add new metabolites to our analysis, as we perform a full scan over the range of m/z 50-1200. Because of this, we felt it more accurate to state that we can measure >100 metabolites, instead of a specific number.

        2) Authors also need to clarify on number of samples in the result section while describing the statistical analysis.

      We understand this point raised by the reviewer and will specify not only the number of samples, but also that they are indeed biological replicates. This will be included in the figure legends.

      Reviewer #3 (Significance (Required)):

      This might be interesting paper for the research community who work with C.elegans (metabolism or in general)

      Thank you, we are in fact utilizing this double extraction for other non-worm samples such as mice an human tissues and we believe this could also benefit the research community beyond the model organism C. elegans.

      The authors must deposit the raw data and make it available for the public, so they could also benefit from this good work.

      It is our full intention to share our data in a convenient and standardized way through for instance the MetaboLights database (https://www.ebi.ac.uk/metabolights/). We agree and changes will be implemented as suggested.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      **Summary:** The authors present a method for extraction of both lipid and polar metabolites from the model organism C. elegans. This extraction method is based on the well-established Blyth and Dyer method, with a slight modification to retain and utilize both the organic and non-polar fractions for LCMS analysis. They applied and tested this method against a monophasic extraction utilizing the same solvent system. They report that there is a loss of metabolites in the non-polar fraction to the polar fraction (of more polar metabolites) and small differences between the monophasic and biphasic extractions. They also expanded on the linearity of the extraction efficiency by increasing the number of worms. Further they applied the single extraction method to both knockdown mutants of C. elegans and Recombinant Inbred Lines derived from N2 and the natural isolate CB4856 to determine whether this method would still be able to differentiate the metabolome between the genetically different C. elegans populations.

      We thank the reviewer for their comments and suggestions.

      **Major comments:**

      *Are the key conclusions convincing?*

      As a whole the conclusions are convincing and valid.

      We appreciate that the reviewer considers our work convincing and valid.

      *Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?*

      The use of the adjective "robust" is, to an extent, erroneous. As defined, a robust method implies that the method is capable of withstanding small (deliberate or not) changes or variations. In this case the robustness of the method was not assessed and not clear how replication was carried out.

      We have in fact performed analysis on both biological replicates and repeated injections of pooled samples to determine robustness. We will clarify the biological replicates in the text and will place the pooled QC samples in the main text with additional explanation and relevant statistics such as % coefficient of variance (%CV) between them. For clarity, we plotted %CV of all polar as well as apolar metabolites. For polar metabolites 97% of the metabolites had a %CV lower than 30. For apolar metabolites 86% of the metabolites had a %CV lower than 30.

      *Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.*

      Reproducibility would need to be assessed/quantified to establish how robust the method is. Even though linearity with an increase in the number of worms is a good indication, it does not satisfactorily establish the robustness of the method. The use of replicates to assess the agreement between measurements (i.e. bland-Altman plots), linearity as well as coefficients of variation (included in the sup material but not clear in the body of the manuscript) would characterize the methods best. The isolation of each variance originating from instrumental (pooled quality controls), biological (biological replication) and sample preparation (multiple extractions from the same biological source) is critical.

      We have these data and will elaborate on this in our revised manuscript. We will discuss the quality control samples more prominently in the main body of the manuscript, and show one or more figures that specifically address both analytical and biological variance (see rebuttal figure 2). In summary, we assessed this variance using (a) a repeated injection of a pooled QC sample, and (b) biological replicates prepared individually. Especially the latter condition, in which we assess biological variance is representative for the actual method application. The %CV under these conditions is ≤20% for the majority of metabolites, which is why we consider our method robust.

      *Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.*

      The suggested experiments are in-fact just further analysis with the already collected data. There would be no need for further experiments, however it is not clear whether pooled QCs/or reference materials were used and the number of replicates per experimental design.

      All the data are available. These analyses will be included in the revision.

      *Are the data and the methods presented in such a way that they can be reproduced?*

      The methods are very well described. My only comment is to address how the replicates were grown/created and how many per strain/group. If the replicate measurements were done on the same samples (repeated injections), I believe that would weaken the findings (if not invalidate them altogether), however if these were biological replicates from independent starting populations the findings are valid and convincing.

      We performed bona fide biological replicates. We will explicitly mention this in the paper together with the other descriptions of our validation protocols.

      *Are the experiments adequately replicated and statistical analysis adequate?*

      As per my above comments.

      **Minor comments:**

      *Specific experimental issues that are easily addressable.*

      It is not clear how the sample preparation process was carried out (randomization, run order, QCs etc). As per the guidelines widely accepted from –Broadhurst, D., Goodacre, R., Reinke, S.N. et al. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics 14, 72 (2018). https://doi.org/10.1007/s11306-018-1367-3.

      We will provide details on the analysis itself in a table. In summary: Samples were measured in a random order, with blanks and QC samples throughout the run.

      *Are prior studies referenced appropriately?*

      A major reference that has applied this extraction method before in the same model organism is missing:

      Castro, C., Sar, F., Shaw, W.R. et al. A metabolomic strategy defines the regulation of lipid content and global metabolism by Δ9 desaturases in Caenorhabditis elegans. BMC Genomics 13, 36 (2012). https://doi.org/10.1186/1471-2164-13-36

      We will include this paper in our references. We would like to note though that this method requires not just an LC system to analyze lipids, but also GC with additional derivatization steps. Our method achieves comprehensive lipidomics using a single technique and no additional derivatization.

      Further a recent publication that goes beyond the work described by the authors using similar approach: MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses. Ernesto S. Nakayasu, Carrie D. Nicora, Amy C. Sims, Kristin E. Burnum-Johnson, Young-Mo Kim, Jennifer E. Kyle, Melissa M. Matzke, Anil K. Shukla, Rosalie K. Chu, Athena A. Schepmoes, Jon M. Jacobs, Ralph S. Baric, Bobbie-Jo Webb-Robertson, Richard D. Smith, Thomas O. Metz mSystems May 2016, 1 (3) e00043-16; DOI: 10.1128/mSystems.00043-16

      We will also include this paper, reporting 51 polar metabolites and 84 lipid species, in our references. While we recognize that they also make use of both phases and the protein pellet, we think our method is much more practical in several key ways:

      Our metabolomics platform provides twice as many species and our lipids platform exceeds their analytical capabilities 10 fold. This means a far better coverage of differences within metabolite and lipid classes, allowing for far more intricate patterns to be detected. We show this for instance in our plots comparing carbon chain length to degree of saturation (Fig 4 and S2 in original manuscript); a comparison that is only possible with the data density that our method offers. The MPLEx metabolomics method also requires the use of a GC system and derivatization steps, while our method does not, making it much more user friendly and requiring only a single analytical system.

      *Are the text and figures clear and accurate?*

      Yes *Do you have suggestions that would help the authors improve the presentation of their data and conclusions? *

      The figures, overall are of exceptional quality.

      As per current scientific consensus, Box plots should also be overlaid with the actual datapoints (which was aptly done for the bar charts and other plots).

      The supplementary data even though comprehensive is hard to understand. A "readme" file detailing what data each file contains would improve readability and comply with FAIR principles.

      We agree that a readme file would make the supplemental data more understandable. We will provide such a file. For the box plots we will show the actual data points in our revised manuscript.

      Reviewer #4 (Significance (Required)):

      Even though the approach is not novel and has long been used in Natural Products Chemistry and in other organisms, it's highly significant to set an extraction method standard for the field of C. elegans metabolomics (including myself doing metabolomics and natural products chemistry with LCMS and NMR). However, this manuscript does not cover the technical aspects of the method with sufficient depth to hallmark this method as the standard for the field. Further information is needed to fill the missing gaps (as highlighted by the authors). Ratios between solvent and biological material amounts, reproducibility, recovery rates (even though buried in the supplementary files) and metabolite coverage are still missing.

      As a side note, the disparity between the monophasic and biphasic extractions could be overcome by a sequential extraction of the same sample, with no incurred cost on performance (and removing the much-dreaded pipetting uncertainty near the line between solvents). The second aspect of the manuscript, which initially was a welcoming idea (and important), became >50% of the manuscript creating a disconnect between the information set by the abstract and introduction and the results/conclusion. The work is extremely relevant in both sections of the manuscript, but the technical aspect is still lacking details and/or analysis.

      Strongly suggested: explicit compliance with the minimum reporting standards as per the Metabolomics Standards Initiative (MSI) and deposition of the data to a metabolomics repository (i.e. Metabolights or Metabolomics Workbench). These are internationally accepted requirements for metabolomics publications.

      We are aware that the extraction itself is an analytical chemistry staple. However, it is precisely in this fact that we find novelty. It should be noted that both of the other papers mentioned by the reviewers that have attempted to integrate lipidomics and metabolomics have had to resort to labor intensive (as well as possibly expensive and destructive) derivatization steps and a separate analysis on GC. Our method does not have these requirements. It is indeed a single and very common extraction, after which each dried phase is reconstituted and immediately injected. But this simplicity is not a concession, as our metabolome coverage is easily more comprehensive than the other mentioned methods. We therefore feel that this simplicity should not discount our currently presented method, but be considered an additional advantage.

      Sequential extractions may be an option to consider. However, we feel like they are less user friendly and unneeded. Because we use internal standards, it is never an issue to pipet slightly more or less of any particular sample; making it easy to avoid the line between solvents.

      We will explicitly clarify where we already comply with the standards (such as the analysis of biological replicates and repeated injection of a QC sample) and are confident we can add figures and further information such as deposition of our data to comply with the rest.

      REFEREES CROSS-COMMENTING

      Completely agree with reviewer #1 comments, they are on point and I completely missed it. Relevant and should be addressed.

      Reviewers #2 points out work worth acknowledging, the internal standard work was quite thorough and well designed.

      Reviewer #3 and my comments overlap nicely, the need for further description of samples/replication and deposition of data in a metabolomics repository.

      Further work is required to make this a good publication and standard for the field, without this extra work addressing the reviewers comments I feel this work could be to certain degree misleading and/or incomplete putting in cause its publication potential.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Summary:

      The authors present a method for extraction of both lipid and polar metabolites from the model organism C. elegans. This extraction method is based on the well-established Blyth and Dyer method, with a slight modification to retain and utilize both the organic and non-polar fractions for LCMS analysis. They applied and tested this method against a monophasic extraction utilizing the same solvent system. They report that there is a loss of metabolites in the non-polar fraction to the polar fraction (of more polar metabolites) and small differences between the monophasic and biphasic extractions. They also expanded on the linearity of the extraction efficiency by increasing the number of worms. Further they applied the single extraction method to both knockdown mutants of C. elegans and Recombinant Inbred Lines derived from N2 and the natural isolate CB4856 to determine whether this method would still be able to differentiate the metabolome between the genetically different C. elegans populations.

      Major comments:

      Are the key conclusions convincing?

      As a whole the conclusions are convincing and valid.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The use of the adjective "robust" is, to an extent, erroneous. As defined, a robust method implies that the method is capable of withstanding small (deliberate or not) changes or variations. In this case the robustness of the method was not assessed and not clear how replication was carried out.

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      Reproducibility would need to be assessed/quantified to establish how robust the method is. Even though linearity with an increase in the number of worms is a good indication, it does not satisfactorily establish the robustness of the method. The use of replicates to assess the agreement between measurements (i.e. bland-Altman plots), linearity as well as coefficients of variation (included in the sup material but not clear in the body of the manuscript) would characterize the methods best. The isolation of each variance originating from instrumental (pooled quality controls), biological (biological replication) and sample preparation (multiple extractions from the same biological source) is critical.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      The suggested experiments are in-fact just further analysis with the already collected data. There would be no need for further experiments, however it is not clear whether pooled QCs/or reference materials were used and the number of replicates per experimental design.

      Are the data and the methods presented in such a way that they can be reproduced?

      The methods are very well described. My only comment is to address how the replicates were grown/created and how many per strain/group. If the replicate measurements were done on the same samples (repeated injections), I believe that would weaken the findings (if not invalidate them altogether), however if these were biological replicates from independent starting populations the findings are valid and convincing.

      Are the experiments adequately replicated and statistical analysis adequate?

      As per my above comments.

      Minor comments:

      Specific experimental issues that are easily addressable.

      It is not clear how the sample preparation process was carried out (randomization, run order, QCs etc). As per the guidelines widely accepted from -

      Broadhurst, D., Goodacre, R., Reinke, S.N. et al. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics 14, 72 (2018). https://doi.org/10.1007/s11306-018-1367-3.

      Are prior studies referenced appropriately?

      A major reference that has applied this extraction method before in the same model organism is missing:

      Castro, C., Sar, F., Shaw, W.R. et al. A metabolomic strategy defines the regulation of lipid content and global metabolism by Δ9 desaturases in Caenorhabditis elegans. BMC Genomics 13, 36 (2012). https://doi.org/10.1186/1471-2164-13-36

      Further a recent publication that goes beyond the work described by the authors using similar approach:

      MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses Ernesto S. Nakayasu, Carrie D. Nicora, Amy C. Sims, Kristin E. Burnum-Johnson, Young-Mo Kim, Jennifer E. Kyle, Melissa M. Matzke, Anil K. Shukla, Rosalie K. Chu, Athena A. Schepmoes, Jon M. Jacobs, Ralph S. Baric, Bobbie-Jo Webb-Robertson, Richard D. Smith, Thomas O. Metz mSystems May 2016, 1 (3) e00043-16; DOI: 10.1128/mSystems.00043-16

      Are the text and figures clear and accurate?

      Yes

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      The figures, overall are of exceptional quality. As per current scientific consensus, Box plots should also be overlaid with the actual datapoints (which was aptly done for the bar charts and other plots). The supplementary data even though comprehensive is hard to understand. A "readme" file detailing what data each file contains would improve readability and comply with FAIR principles.

      Significance

      Even though the approach is not novel and has long been used in Natural Products Chemistry and in other organisms, it's highly significant to set an extraction method standard for the field of C. elegans metabolomics (including myself doing metabolomics and natural products chemistry with LCMS and NMR). However, this manuscript does not cover the technical aspects of the method with sufficient depth to hallmark this method as the standard for the field. Further information is needed to fill the missing gaps (as highlighted by the authors). Ratios between solvent and biological material amounts, reproducibility, recovery rates (even though buried in the supplementary files) and metabolite coverage are still missing.

      As a side note, the disparity between the monophasic and biphasic extractions could be overcome by a sequential extraction of the same sample, with no incurred cost on performance (and removing the much-dreaded pipetting uncertainty near the line between solvents).

      The second aspect of the manuscript, which initially was a welcoming idea (and important), became >50% of the manuscript creating a disconnect between the information set by the abstract and introduction and the results/conclusion. The work is extremely relevant in both sections of the manuscript, but the technical aspect is still lacking details and/or analysis.

      Strongly suggested: explicit compliance with the minimum reporting standards as per the Metabolomics Standards Initiative (MSI) and deposition of the data to a metabolomics repository (i.e. Metabolights or Metabolomics Workbench). These are internationally accepted requirements for metabolomics publications.

      REFEREES CROSS-COMMENTING

      Completely agree with reviewer #1 comments, they are on point and I completely missed it. Relevant and should be addressed.

      Reviewers #2 points out work worth acknowledging, the internal standard work was quite thorough and well designed.

      Reviewer #3 and my comments overlap nicely, the need for further description of samples/replication and deposition of data in a metabolomics repository.

      Further work is required to make this a good publication and standard for the field, without this extra work addressing the reviewers comments I feel this work could be to certain degree misleading and/or incomplete putting in cause its publication potential.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript is well written and consider. However, there is room for for further improvements,

      1) Author need to write exactly how many metabolites not just >, semi-quantitative analysis of >100 polar (metabolomics) and >1000 apolar (lipidomics) metabolites in C. elegans, for example they did with other papers in Table 1

      2)Authors also need to clarify on number of samples in the result section while describing the statistical analysis.

      Significance

      This might be interesting paper for the research community who work with C.elegans (metabolism or in general)

      The authors must deposit the raw data and make it available for the public,so they could also benefit from this good work.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors provide a detailed description of a method to analyse both polar as well as lipophilic metabolites from the same nematode sample. This provides significant advantages over methods using individual samples. Moreover and by using internal standards they establish an extremely good correlation of individual metabolites. This paper is of immediate importance for the worms community and beyond.

      Major comments: none

      Minor comments:

      The correction process using internal standards could be described a bit more detailed.

      Jenni Watts has written a nice Worm Book chapter on lipids which may be cited in addition to reference 17, since it covers many of the metabolites and related enzymes contained in this manuscript

      Significance

      see above

    5. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Molenaars et al., describe a protocol to extract and quantify a wide range of polar and apolar metabolites from the same C. elegans sample using methanol-chloroform based phase separation. The authors assess the method across different input amounts, in comparison to a 1-phase extraction method and through metabolic perturbations using RNAi against several metabolic enzymes. Finally, they provide a metabolomics analysis of metabolite variation across several C. elegans strains. The data are of overall high quality and presented in a clearly written manuscript.

      To help assessing the value of the method to other approaches, several controls are suggested below:

      1.Fig.1: Metabolite abundance in the polar phase should be compared to 1-phase extraction methods (analogous to Fig. 2I, which compares metabolites in the apolar phase to 1-phase extraction)

      2.Are polar metabolites also detected in the apolar phase? Can the less hydrophobic lipids missing from the apolar phase detected in the polar phase?

      3.Fig.3l-n: The authors claim that extracting metabolites from the polar and apolar phases of the same sample leads to better cross-correlation than if metabolites are extracted from different samples using methods optimized for the respective metabolite classes. To provide experimental evidence, metabolite abundance should be compared directly when metabolites are extracted from the same or from different samples using suitable methods.

      Significance

      The methodological and conceptual advancement of the present study is rather incremental. The authors essentially use the classical chloroform/methanol/water phase separation protocols developed by Bligh & Dyer and Folch, which have been used extensively for lipid extraction for many decades now. However, the effort to carefully measure the metabolites contained in the aqueous phase is laudable. For method validation, the authors use well-understood perturbations that yield predictable results. Overall, I consider the study more appropriate for a publication as a methods protocol, which could be of interest to the metabolomics community, rather than as a research paper.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      We thank the reviewers for their feedback and encouragement. We have now fully revised the manuscript to address all comments. Our specific responses are provided below and we have highlighted changes in the text. The major additions are:

      • analysis of simulated time-courses with lower temporal resolution
      • analysis of ex vivo PER2::LUCIFERASE SCN recordings
      • analysis of simulated time-courses with Poisson distributions of noise
      • plotted summary statistics for several figures
      • mathematical formula and explanation in the Methods Overall, these revisions have strengthened our findings and improved the manuscript, particularly in demonstrating that the issues with the chi-square periodogram are not specific to sampling interval or data type.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Tackenberg & Hughey investigate the reliability of a popular period estimation algorithm, the chi-square periodogram. They find a bias in the estimation, and through careful investigation identify the cause. This is a well executed and well presented study.

      **Comments:**

      In Figs 2+3 the authors show that the discontinuity in periodogram coincides with the number of complete cycles, K. However, in Fig 2C there are several other positions where K abruptly changes, but little effect on the chi-squared statistic is observed. Can the authors offer an explanation as to why the magnitude of the discontinuities differ?

      We have taken a closer look at how each component of the chi-square statistic calculation changes at points where K decreases, and have found that discontinuities do always occur at these points. In addition to the obvious effect of the K * N term on the sudden decreases, we found that the sum of squares of the column means alone (the primary component of the numerator) also changes abruptly at each transition point of K. As a result, the discontinuity magnitude is likely roughly proportional to the amplitude of the chi-square statistic at that point.

      An important claim is that the discontinuity is observed in multiple software implementations. However, the plots of Supplementary Fig 1C,D are presented too small to evaluate this claim.

      In Supplemental Fig. 1C-D, the critical information is the shape of the periodogram and the presence of a discontinuity, so we believe the plot sizes are appropriate.

      It may be of interest to apply the algorithms to a single-cell experimental data set which are qualitatively different (e.g., oscillation shape, damping).

      We have created a new supplemental figure (Supplemental Fig. 8) by applying the strategy and visualization used in Fig. 6 to SCN PER2::LUC recordings instead of wheel-running data, and have updated the text accordingly.

      Reviewer #1 (Significance (Required)):

      It has been previously shown that the chi-square periodogram algorithm has performance shortcomings for the analysis of circadian data (e.g. Zielinski et al., 2004). However, this study demonstrates exactly why, giving more conclusive evidence to support the conclusion that it should be avoided. This will be useful to many in the mammalian circadian community. It should be noted however that other algorithms are already favoured by other ciock communities (e.g. plant), even if a rigorous understanding of the biases were lacking.

      The methods developed here will be valuable for future comparisons of circadian algorithms. Of particular importance will be comparing algorithms for analysis of single-cell rhythms or non-stationary rhythms.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Chi-squared periodograms (CSP) are routinely used in circadian biology. In particular, this test has been used to determine circadian period in behavioral data (e.g. actigraphy) in mammals, flies and other species. This paper suggests that CSP, in some circumstances (e.g. where there are discontinuities), that CSP could be improved by changing the algorithm. They propose different steps to do this (e.g. using their greedy CSP code) and/or by using alternative tests such as Lomb-Scargle.

      The authors use simulated data to demonstrate their findings, and whilst I can see the benefits of this, it would be useful to benchmark the algorithms on actual real world circadian data (e.g. actograms from mouse or fly experiments). Although these types of data may not be publicly available, it would be highly likely to be available from multiple labs in the circadian field. In particular, fly datasets will be abundant in many clock labs. This would aid the utility of the papers findings for the field.

      Fig. 6 is entirely based on real-world circadian data (mouse wheel-running activity), as is the newly added Supplemental Fig. 8.

      Reviewer #2 (Significance (Required)):

      The paper is helpful for the circadian field when dealing with datasets that may contain discontinuities.

      It appears that the paper will be primarily useful for behavioral data, rather than, for example, transcriptomic time courses, since these tend to be much shorter and less sample intensive. Thus, it would be useful for circadian (and other) researchers analysing activity data in particular.

      My expertise is in circadian rhythms, both behavioural and molecular (e.g. sequencing) level analyses. Thus, I would be a possible end-user for the algorithms in this paper.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The authors identify a serious flaw in a popular method called Chi-squared periodogram (CSP) for period estimation in circadian rhythms. They systematically get to the source of the problem -- a discontinuity in the test statistic. This flaw leads to a bias in the period estimate. They present two modifications to the CSP, one of which they prefer. Nevertheless, they show that other more flexible methods such as Lomb-Scargle Periodogram work well without this discontinuity (bias) issue.

      **Major Comments:**

      1.One thing the authors do not include is timeseries lengths of non-integer days. Would it not be an interesting suggestion to choose a non-integer length time course, which is not a multiple of the periods of interest, and still continue using CSP as is ? This is also rather counter-intuitive.

      Figs. 3A and 6 and newly added Supplemental Fig. 8 use non-integer (24-h) days.

      2.I suppose the authors use a sampling resolution of 6min with wheel-running activity in mind. But it would be worth it in the interest of completeness to also consider a lower resolution. There is nothing in this study that ties it to the specific application, is it not?

      Although a sampling resolution of 6 minutes is not specific to wheel-running activity, we have added an analysis identical to that of Fig. 5 but with a resolution of 20 minutes (Supplemental Fig. 5). Additionally, the PER2::LUC SCN recordings analyzed in Supplemental Fig. 8 have a sampling resolution of 20 minutes.

      3.The authors discuss only the mean absolute error in the text but isn't the direction (sign) of the error also of interest. As far as I can see in Fig 5, conservative CSP overestimates and greedy CSP generally underestimates periods.

      We discuss both the error (references to Fig. 5A) and absolute error (references to Fig. 5B) in the text. We feel the interpretation suggested by the reviewer may be too reliant on the results of 3-day simulations, as the apparent underestimation by greedy appears far less substantial in simulations of 6 and 12 days.

      **Minor Comments:**

      1.I would like to see the formulae for the ratio of variances and p-values to be clear about how the authors computed the CSP. They describe it in words already, but I think some mathematics is warranted here.

      We have added the formula for the standard chi-square periodogram to the Methods section.

      2.It is nice to the see the raw data in the plots. But I would like to see the plot of the summary statistics (mean and variance/st. dev) for each of scatter plots to judge the size of bias. It is not easy to do this with the Excel sheet.

      We have overlaid a black circle representing the median and a vertical black line representing the 5th-95th percentile range onto Fig. 5 and Supplemental Figs. 3-7.

      Reviewer #3 (Significance (Required)):

      The authors present a sobering perspective on the chi-squared periodogram, which is still very popular among empirical biologists. They plainly show using artificial data that it is better to avoid the CSP when possible, although they suggest improvements to the CSP. The authors provide an R package to perform the analysis.

      There have been previous work that have highlighted other limitations of the CSP. This might be considered one more nail in the coffin of the CSP.

      I think this paper would be interest to both computational biologists and wet-lab biologists, but I think it ought to have a greater influence on the latter as the former already resort to more sophisticated approaches.

      My expertise is in Computational and Theoretical biology.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors identify a serious flaw in a popular method called Chi-squared periodogram (CSP) for period estimation in circadian rhythms. They systematically get to the source of the problem -- a discontinuity in the test statistic. This flaw leads to a bias in the period estimate. They present two modifications to the CSP, one of which they prefer. Nevertheless, they show that other more flexible methods such as Lomb-Scargle Periodogram work well without this discontinuity (bias) issue.

      Major Comments:

      1.One thing the authors do not include is timeseries lengths of non-integer days. Would it not be an interesting suggestion to choose a non-integer length time course, which is not a multiple of the periods of interest, and still continue using CSP as is ? This is also rather counter-intuitive.

      2.I suppose the authors use a sampling resolution of 6min with wheel-running activity in mind. But it would be worth it in the interest of completeness to also consider a lower resolution. There is nothing in this study that ties it to the specific application, is it not?

      3.The authors discuss only the mean absolute error in the text but isn't the direction (sign) of the error also of interest. As far as I can see in Fig 5, conservative CSP overestimates and greedy CSP generally underestimates periods.

      Minor Comments:

      1.I would like to see the formulae for the ratio of variances and p-values to be clear about how the authors computed the CSP. They describe it in words already, but I think some mathematics is warranted here.

      2.It is nice to the see the raw data in the plots. But I would like to see the plot of the summary statistics (mean and variance/st. dev) for each of scatter plots to judge the size of bias. It is not easy to do this with the Excel sheet.

      Significance

      The authors present a sobering perspective on the chi-squared periodogram, which is still very popular among empirical biologists. They plainly show using artificial data that it is better to avoid the CSP when possible, although they suggest improvements to the CSP. The authors provide an R package to perform the analysis.

      There have been previous work that have highlighted other limitations of the CSP. This might be considered one more nail in the coffin of the CSP.

      I think this paper would be interest to both computational biologists and wet-lab biologists, but I think it ought to have a greater influence on the latter as the former already resort to more sophisticated approaches.

      My expertise is in Computational and Theoretical biology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Chi-squared periodograms (CSP) are routinely used in circadian biology. In particular, this test has been used to determine circadian period in behavioral data (e.g. actigraphy) in mammals, flies and other species. This paper suggests that CSP, in some circumstances (e.g. where there are discontinuities), that CSP could be improved by changing the algorithm. They propose different steps to do this (e.g. using their greedy CSP code) and/or by using alternative tests such as Lomb-Scargle.

      The authors use simulated data to demonstrate their findings, and whilst I can see the benefits of this, it would be useful to benchmark the algorithms on actual real world circadian data (e.g. actograms from mouse or fly experiments). Although these types of data may not be publicly available, it would be highly likely to be available from multiple labs in the circadian field. In particular, fly datasets will be abundant in many clock labs. This would aid the utility of the papers findings for the field.

      Significance

      The paper is helpful for the circadian field when dealing with datasets that may contain discontinuities.

      It appears that the paper will be primarily useful for behavioral data, rather than, for example, transcriptomic time courses, since these tend to be much shorter and less sample intensive. Thus, it would be useful for circadian (and other) researchers analysing activity data in particular.

      My expertise is in circadian rhythms, both behavioural and molecular (e.g. sequencing) level analyses. Thus, I would be a possible end-user for the algorithms in this paper.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Tackenberg & Hughey investigate the reliability of a popular period estimation algorithm, the chi-square periodogram. They find a bias in the estimation, and through careful investigation identify the cause. This is a well executed and well presented study.

      Comments:

      In Figs 2+3 the authors show that the discontinuity in periodogram coincides with the number of complete cycles, K. However, in Fig 2C there are several other positions where K abruptly changes, but little effect on the chi-squared statistic is observed. Can the authors offer an explanation as to why the magnitude of the discontinuities differ?

      An important claim is that the discontinuity is observed in multiple software implementations. However, the plots of Supplementary Fig 1C,D are presented too small to evaluate this claim.

      It may be of interest to apply the algorithms to a single-cell experimental data set which are qualitatively different (e.g., oscillation shape, damping).

      Significance

      It has been previously shown that the chi-square periodogram algorithm has performance shortcomings for the analysis of circadian data (e.g. Zielinski et al., 2004). However, this study demonstrates exactly why, giving more conclusive evidence to support the conclusion that it should be avoided. This will be useful to many in the mammalian circadian community. It should be noted however that other algorithms are already favoured by other ciock communities (e.g. plant), even if a rigorous understanding of the biases were lacking.

      The methods developed here will be valuable for future comparisons of circadian algorithms. Of particular importance will be comparing algorithms for analysis of single-cell rhythms or non-stationary rhythms.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for their comments and suggestions. Our responses to them are listed below. We are hopeful that they will be satisfied with our responses and the changes we made in the revised version of the manuscript.

      REVIEWER #1


      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In this manuscript, Ameen and colleagues report the results of a multidimensional proteomic analysis which combined quantitative proteomics, phosphoproteomics and N-terminomics in an effort to identify neuronal proteins displaying altered abundance or modifications by proteolysis and/or phosphorylation following an excitotoxic insult. Excitotoxicity is known to initiate by over-activation of ionotropic glutamate receptors which allows an increase in intracellular Ca2+ , ultimately leading to activation of proteases. The analysis revealed that glutamate treatment for up to 240 min did not significantly affect the abundance of neuronal proteins but caused dramatic changes in the phosphorylation state of many neuronal proteins. Based upon the phosphopeptides and neo-N-peptides, which contain the neo-N-terminal amino acid residue generated through proteolytic cleavage of intact neuronal proteins during excitotoxicity, the authors identified the proteins that undergo phosphorylation, dephosphorylation and/or enhanced proteolytic processing in excitotoxic neurons. By combining different software packages, they found that these modified proteins form complex interactions that affect signaling pathways regulating survival, synaptogenesis, axonal guidance and mRNA processing. These data suggest that perturbations in the aforementioned pathways mediate excitotoxic neuronal death. Then, the authors showed by Western blot analysis that CRMP2, a crucial regulator of axonal guidance signaling, exhibited enhanced truncation and reduced phosphorylation at specific sites upon glutamate treatment. These events may contribute to injury to dendrites and synapses associated with excitotoxic neuronal death. Furthermore, the authors showed that calpains are responsible for the proteolytic processing and cathepsins for enhanced degradation of proteins during excitotoxicity. Blockage of calpain-mediated cleavage site of the tyrosine kinase Src during excitotoxicity confers neuroprotection in an in vivo model of neurotoxicity. In that regard, over twenty protein kinases are predicted to be activated in excitotoxic neurons. Collectively, this study contributes to the construction of an atlas of phosphorylation and proteolytic processing events that occur during excitotoxicity and as such they can be targeted for therapeutic purposes.

      **Comments** Comment: The identification of potential calpain cleavage sites in neuronal proteins modified during excitotoxicity is an interesting finding of the study. However, the atlas presented appears to miss components such as Kinase D-interacting substrate of 220 kDa (Kidins220), also known as ankyrin repeat-rich membrane spanning (ARMS), a protein recently shown to be cleaved by calpain during excitotoxicity (López-Menéndez et al, 2019, Cell Death and Disease 10, 535).

      Response: The calpain cleavage site of neuronal ARMS/KIDINS220 was mapped to the peptide bond between Asn-1669 and Arg-1670 (Gamir-Morralla, et al. (2015) Cell Death & Diseases 6, e1939). The cleavage is expected to generate two truncated fragments – one of ~185 kDa and another of ~10 kDa at the N-terminal and C-terminal sides, respectively of the cleavage site. Our TAILS analysis failed to detect the 10 kDa fragment which contains the neo-N-terminus generated by calpain cleavage. Here are the possible explanations:

      The neo-N-terminus of the 10 kDa C-terminal fragment is unlikely to be observed in our experiment as the TAILS method relies on the production of peptides by trypsin. The 10 kDa fragment has Arginine as the first amino acid which means that the N-terminal peptide released and isolated by the TAILS method would be a single amino acid. In their publication, Gamir-Morralla, et al. showed that the total levels of both intact and degraded ARMS/Kidins220 decreased as a result of ischemic cerebral stroke, suggesting degradation rather than proteolytic processing to generate stable truncated fragments as the final outcome of calpain cleavage of ARMS/Kidins220 (Figure 2b of the publication by Gamir-Morralla, et al.). The TAILS method predominantly detects proteolytic processing whereas degradation can be more difficult to capture. Degradation often results in peptides containing less than 5-6 amino acids that are difficult to align with a single protein or result in transient peptide that may not be detectable in neurons at 240 min after glutamate treatment. **Overall, it is possible that Kidins220 is generated but was undetected by the TAILS approach.


      Comments: The CRMP2 antibody (Cell Signalling, 35672) used for western blots (figure 5D, also figure S11) and immunofluorescence (figure 5E) is problematic. Copied from https://www.cellsignal.com/products/primary-antibodies/crmp-2-d8l6v-rabbit-mab/35672: Monoclonal antibody is produced by immunizing animals with a synthetic peptide corresponding to residues surrounding lle546 of human CRMP-2 protein. The truncated CRMP2 (figure 5D) studied in the whole section (residues 1-516 or 1-517, ~57kDa) cannot be recognized by this monoclonal antibody. The detected band with the red letters in figure 5D might represent another cleavage product. In any case, asking Cell Signalling for more information about the exact immunogen might help, but since it's monoclonal and derived from residues surrounding lle546 it's very hard to include residues before aa516 and the unique epitope recognition upstream of aa516. The whole result section and discussion has to be reconsidered. Alternatively another antibody can be used to repeat those experiments in order to support the hypothesis. Time and resources are very familiar to authors since they have to repeat their previous work with a new antibody. Finally, there are no "western blot" and "immunofluorescence" methods for CRMP2.

      Response: We would like to apologise for incorrectly listing the catalogue number of the anti-CRMP2 antibody purchased from Cell Signalling technology. Rather than the rabbit monoclonal anti-CRMP2 antibody (Cell Signalling, Cat#: 35672), we used the polyclonal anti-CRMP2 antibody (Cell Signalling, Cat#9393) to perform all the Western blot and immunofluorescence analysis in this paper. The e-mail confirming the purchase of this antibody is appended. According to the vendor, the antibody was raised by immunizing rabbits with a synthetic peptide derived from the human CRMP2 sequence. We decided to order this antibody because Zhang, et al. (Sci Rep. 2016; 6: 37050) reported that it could detect the truncated CRMP2 fragments generated by calpain cleavage in primary cortical neurons in vitro in response to axonal damage.

      *The procedures of Western blot and immunofluorescence detailing the correct CRMP2 antibody descriptions are added in the revised version of the submitted manuscript.

      *


      Comment: The truncated DCLK1 bands detected in figure S8B cannot be attributed to the proteolytic processing of DCLK1 at the sites described: T311↓S312, S312↓S313 and N315↓G316 (predicted M.W. of the (C-terminal) products: 48.7-49.1kDa (figure S8A) which is very close to be well-separated with conventional PAGE). The number and the separation of the bands suggest other cleavage sites. Response: We agree with the reviewer’s comment that conventional SDS-PAGE cannot differentiate the proteolytic products generated by cleavage at the three sites identified by TAILS. Furthermore, the TAILS methods could not detect all peptides generated by a protein during proteolysis. Therefore, validating our results with a Western blot experiment may reveal unidentified peptides in certain cases. We have now added the following statement in the revised manuscript to reflect the presence of other cleavage sites: “Besides detecting the 50-56 kDa truncated fragments, the antibody also cross-reacted with several truncated fragments of ~37-45 kDa. These findings suggest that DCLK1 underwent proteolytic processing at multiple other sites in addition to the three cleavage sites identified by our TAILS analysis.

      Comment: Could the striking observation that almost all proteolytic processing during excitotoxicity is catalyzed by calpains and/or cathepsins have derived (partially) from unspecific targets of calpeptin such as a subset of tyrosine phosphatases (Schoenwaelder and Burridge, 1999: approx. 1h treatment of fibroblasts with approx.. 10x less concentration) or other(s)? Response: Schoenwaelder and Burridge (1999, JBC 274:14359) reported that calpeptin exhibits both protease inhibitor as well as a protease inhibitor-independent activities in fibroblasts. Besides inhibiting calpains and cathepsins, they demonstrated that calpeptin could selectively inhibit a subset of membrane-bound tyrosine phosphatases. Since the TAILS method monitored the protease inhibitor activity of calpeptin, the proteolytically processing events mitigated by calpeptin in neurons during excitotoxicity are likely attributed to its protease inhibitor activity. Additionally, Schoenwaelder and Burridge reported this unconventional protease inhibitor-independent activity of calpeptin in fibroblasts. Since the protein tyrosine kinases expressed in neurons and fibroblasts are different, it is unclear if calpeptin can also exert such activity in neurons.

      Comment: Describing the final part of figure 4C the authors suggest that "Liver kinase B1 homolog (LKB1), CaM kinase kinase β (CaMKKβ) and transforming growth factor‐β‐activating kinase 1 (TAK1) are the known upstream kinases directly phosphorylating T172 of AMPKα to activate AMPK (Herrero-Martin et al., 2009; Woods et al., 2005; Woods et al., 2003). Our findings therefore predict activation of these kinases during excitotoxicity (Figure 4C)." The first question arising here is whether these three kinases are the only ones know to phosphorylate AMPKα. Even if this is true, it is highly speculative to suggest that the findings of the present study predict the activation of these kinases during excitotoxicity, without providing the necessary experimental data, since the increased phosphorylation of AMPK may be an indirect effect of the reduced function of a phosphatase. Thus the proposed model does not hold. Response: Agree. We have therefore revised our interpretation of the results to reflect this possibility. The Revised sentence on page 13 reads “**Liver kinase B1 homolog (LKB1), CaM kinase kinase β (CaMKKβ) and transforming growth factor‐β‐activating kinase 1 (TAK1) are the known upstream kinases directly phosphorylating T172 of AMPKα to activate AMPK (Herrero-Martin et al., 2009; Woods et al., 2005; Woods et al., 2003), while a member of the metal-dependent protein phosphatase (PPM) family could dephosphorylate T172 of AMPK in cells (Garcia-Haro et al., 2010). Our findings therefore predict activation of these kinases and/or inactivation of the PPM family phosphatase in neurons during excitotoxicity (Figure 4C).”

      Additionally, we also deleted the schematic diagram depicting the possibility of activation of LKB1, CaMKKβ and TAK1 in Figure 4 of the revised manuscript.

      __**Minor points**

      __

      Minor Comment: Highlights could present the key points of the study in a more straightforward manner. Response: Agree. We have edited the highlights in our revised manuscript to make them more straightforward.


      Minor comment: Figure 4A is too complicated. Proteins considered as hubs of signaling pathways in neurons should be somehow highlighted to distinguish them.

      Response: Agree. We have now highlighted the signalling hubs by shading them in green in the revised figure. As we merged figures 2 and 4 of the original manuscript, these signalling hubs are presented in Figure 2B of the revised manuscript.

      Minor Comment: The analysis of proteins with enhanced truncation and reduced phosphorylation such as CRMP2 and DCLK1 is fragmented. In addition, the authors should mention the criteria based on which these proteins were selected for further analysis.

      Response: IPA analysis revealed synaptogenesis and axonal guidance as the top-ranked perturbed canonical signalling pathways governed by neuronal proteins undergoing significantly increased proteolytic processing and altered phosphorylation. As CRMP2 and DCLK1 are the key players in these pathways, they were chosen for further biochemical analysis to validate the TAILS results. To address this point, we added a few statements in the sections describing results of biochemical analysis of CRMP2 and DCLK1 in the revised manuscript. The additional sentences on page 13 now read “IPA analysis of the significantly modified neuronal proteins identified in our study predicted perturbation of signalling pathways governing axonal guidance and synaptogenesis in neurons during excitotoxicity (Figure S7). Since CRMP2 (also referred as DPYSL2) is a key player in neuronal axonal guidance and synaptogenesis (Evsyukova et al., 2013) and it underwent significant changes in phosphorylation state and proteolytic processing (Figures 5A and S7), it was chosen for validation of our proteomic results.” The additional sentences on page 15 read ”Similar to CRMP2, DCLK1 is also a key player in regulation of axonal guidance and synaptogenesis (Evsyukova et al., 2013). Since our TAILS results revealed significant proteolytic processing of DCLK1 (Figure S8A), it was chosen for validation of our proteomic results.”

      • *

      Minor comment: The potential therapeutic relevance of phosphorylation and proteolytic processing events that occur during excitotoxicity can be further explored. Response: Thanks for the suggestion. We have added a paragraph describing the additional evidence that protein kinase inhibitors and cell-permeable inhibitors blocking calpain cleavage of specific neuronal proteins as potential neuroprotectants to reduce brain damage induced by ischemic stroke. The additional sentences near the end of the Discussion section (page 25) now read Since CRMP2 is key player in axonal guidance and synaptogenesis revealed by our proteomic analysis as the most perturbed cellular processes in excitotoxicity, blockade of its cleavage to form the truncated CRMP fragment is another potential neuroprotective strategy. Indeed, a cell-permeable Tat-CRMP2 peptide encompassing residues 491-508 close to the identified cleavage sites of CRMP2 could block calpain-mediated cleavage of neuronal CRMP2 and protect neurons against excitotoxic cell death (Yang et al., 2016)**.”

      • *

      The additional paragraph at the end of the Discussion section (page 25) now reads: “Besides the neuronal proteins undergoing enhanced proteolytic processing during excitotoxicity, protein kinases predicted by our phosphoproteomic results to be activated during excitotoxicity are also targets for the development of neuroprotective drugs. For example, our results demonstrated significant activation of neuronal AMPK during excitotoxicity, suggesting that aberrant activation of AMPK can contribute to neuronal death. Of relevance, small-molecule AMPK inhibitors could protect against neuronal death induced by ischemia in vitro, and brain damages induced by ischemic stroke in vivo. Likewise, inhibitors of Src and other Src-family kinases were known to protect against neuronal loss in vivo in a rat model of in traumatic brain injury (Liu et al., 2008a; Liu et al., 2017). Future investigation of the role of the excitotoxicity-activated protein kinases in excitotoxic neuronal death will reveal if small-molecule inhibitors of these kinases are potential neuroprotective drug candidates.”

      • *

      • *

      Minor comment: I am sorry but I could not find Figure 8, which is supposed to show the "In vivo model of NMDA neurotoxicity" (please, see page 30).

      Response: Our apology for the mistake. This should be Figure 6 of the revised manuscript.

      Minor comment: Introduction: O'Collins et al., 2006; Savitz and Fisher, 2007; both references are missing.

      Response:* This was an oversight from our part and the references have been added to the revised manuscript.**

      *

      Minor comment: Figure S1A-B: vehicle treatment time course is needed. Response: All neurons were cultured in neurobasal media for seven days. The control neurons were incubated in culture media while we started treating the other neurons with glutamate for MTT and LDH assay. The additional paragraph describing the design of the cell viability/death assays in page 32 reads “Primary cortical neurons were incubated for 480 min with and without the addition of 100 μM of glutamate. The control neurons were incubated for 480 min in culture medium. For neurons treated with glutamate for 30 min, 60 min, 120 min and 240 min, they were pre-incubated in culture medium for 450 min, 420 min, 360 min and 240 min, respectively prior to the addition of glutamate to induce excitotoxicity. For neurons treated with glutamate for 480 min, they were treated with glutamate just after seven days of culture in neurobasal media.”

      • *

      Minor comment: Figure 5E: Control close-up is missing. Response: A close-up view of the control neurons is now provided in Figure 4E of the revised manuscript.

      *

      *

      Minor comment: "Moreover, the number of CRMP2-containing dendritic blebs in neurons at 240 min of glutamate treatment was significantly higher than that in neurons at 30 min of treatment (inset of Figure 5E)." Such a statistic is not shown in the graph. Response: The statistical analysis results are now added to the revised manuscript in Figure 5E.

      • *

      Minor comment: "Consistent with this prediction, our bioinformatic analysis revealed that the identified cleavage sites in most of the significantly degraded neuronal proteins during excitotoxicity are mapped within functional domains with well-defined three-dimensional structures (Figures 6A)." Authors might mean figure S12A? Response: Correct. Our apology for the mislabelling. This has been corrected to “S12A”in the revised manuscript.

      Minor comment: "Neuronal Src was identified by the three criteria of our bioinformatic analysis to be cleaved by calpains to form a stable truncated protein fragment during excitotoxicity (Figures 6A and Table S6)." Authors might mean figure 6D?

      Response: Correct. Our apology for the mislabelling. Since we merged figures 2 and 4 of the original manuscript. This has been corrected to now read “(Figure 5D)” on page 18 of the revised manuscript.

      Minor comment: Figure 2B: Clusters 1, 3, 4 and 6 do not follow treatment trends homogenously in all time points. For example in cluster 1 there is a phosphopeptide following the pattern 1, 0, -1 and another one following the pattern 0, 1, -1, which is actually a very different pattern even if the end value is stable (-1). The first example could belong to the cluster 6 as well, while the second example to cluster 5. Please elaborate on the rationale behind the categorization. Is there any other clustering method that can be used without making the categorization more complicated? Response: Since we merged Figures 2 and 4 of the original manuscript. This comment relates to the right panel of Figure 2A of the revised manuscript. The rationale behind the categorization of the phosphopeptides into six clusters was based upon the patterns of changes of their abundance (i.e. average of log-2 normalized z-score of phosphopeptide intensity) in three sample groups. **We calculated the number of permutations where the number of sample groups in set (n) = 3 (i.e. Control neurons, neurons of 30 min glutamate treatment and neurons of 240 min glutamate treatment) and number of sample groups in each permutation (r) = 3 (i.e. all three sample groups should be present in each permutation). Hence the number of permutations is 6. The six clusters refer to the six possible permutations of the patterns of abundance changes of the identified phosphopeptides rather than the end results.

      Minor comment: A problem of the manuscript is its length and lack of coherence. Apart from presenting the data from the proteomics, phosphoproteomics and N-terminomics analyses, the authors focus on several different proteins to perform validation experiments and further characterize the biological significance of their modification. Because these proteins do not fall on the same pathway, the authors end up presenting several independent stories that complicate the reader. Response: We agree that proteins that do not operate in the same signalling pathway were chosen for further biochemical analysis. Their choice was justified because they are key players in the most perturbed canonical signalling pathways identified by bioinformatic analysis with the IPA software. We agree that this may complicate the reader. However, it also helps to illustrate that excitotoxic neuronal death is a complicated cell death process caused by dysregulation of multiple neuronal proteins which regulate different cellular processes.

      Minor comment: Moreover, it is necessary for the authors to restructure their introduction, and avoid over-representing previous research on nerinetide, which is not used anywhere in the manuscript. Instead, the introduction must be more focused to better capture the necessity and essence of the present study. Response: We agree. Based on the reviewer’s comments, we decided to restructure the introduction by shortening the description of the results of Nerinetide research. Please refer to the track changes of the revised manuscript for the changes.

      Minor comment: Taking into account figures 1 and S2 I understand that the authors combined samples of neuronal cell cultures (treated or not with Glu) with samples from mouse brains (that have undergone ischemic stroke/TBI or sham operation). If this is the case, why did the authors do that? How did they combine the different samples? And why this is not mentioned anywhere is the main text? Response: For a data-independent acquisition (DIA) based mass spectrometry experiment, it is essential we generate a library of identifiable peptides first using a standard data-dependent acquisition (DDA) approach. For the DIA type experiment to work, the identified peptides have to be in that library first. Excitotoxicity is a major mechanism of neuronal loss caused by ischemic stroke and traumatic brain injury. We therefore included the brains of sham-operated mice, brains of mice suffering ischemic stroke and traumatic brain injury to construct the spectral libraries and that is why the library contains pooled samples from the representative samples. Pre-fractionation of the pooled peptides was also performed to increase the number of identifiable peptides and generate a deeper library.

      • Once we generated that library, all samples are analysed individually as a separate DIA experiment. The DIA approach then makes use of the generated library for identification and quantitation. This methodology allows for deeper identification and lower number of missing values. These statements were added in the method section of the revised manuscript (page 33)*

      Minor comment: Regarding figure 5D, the authors write in the main text "Consistent with our phosphoproteomic results, the truncated fragment CRMP2 fragments could not cross-react with the anti-pT509 CRMP2 antibody (Figure 5D)" In the upper blot the truncated CRMP2 fragment runs well below the 70 kDa marker. However, in the middle panel, where we see the blot with the phospho specific antibody, the respective area of the blot has been cropped, so we cannot see whether the truncated fragment cross-reacts with the phospho specific antibody. Response: The presentation of the western blots in Figure 5D in the revised manuscript are now less cropped and clearly demonstrate there is no cross reactivity of the phospho specific antibody with the truncated fragment. Please refer to the revised Figure 5 for the updated Western blot images.

      Minor comment: It is strange that only 1 and 13 proteins showed significant changes in abundance at 30 and 240min respectively. Especially after 240min of glutamate treatment one could expect that many proteins should change in their levels, since the neurons are almost diminished by cell death at that point. How could the authors explain this phenomenon? Additionally, in their previous publication, they showed that much more proteins change significantly in abundance following glutamate treatment (at 30min and 240min).

      Response: Even though our global spectral libraries contain over 49,000 identifiable peptides derived from 6524 proteins, only 1696 quantifiable proteins were identified in the DIA mass spectrometry analysis (Figure 1) because we used stringent criteria for their identification: (i) false discovery rate of We agree with the reviewer that many more proteins are expected to change their abundance at 240 min as significant cell death was detected. However, if we had used less stringent false discovery rates of their identification and quantification, included proteins with just one unique identified peptide and lowered the threshold of abundance fold changes, many more proteins with significantly changed abundance would be detected. But we preferred to use these stringent criteria to ensure a high confidence in our identification of neuronal proteins undergoing significant changes during excitotoxicity.*

      • *

      • *

      In agreement with the low number of neuronal proteins exhibiting significant changes in abundance reported in this manuscript, our previously published study (Hoque, et al. (2019) Cell Death & Diseases) detected only 26 neuronal proteins undergoing changes in abundance. Hence, we disagree with the reviewer that our previous publication reported much more proteins undergoing changes in abundance in excitotoxicity.

      Reviewer #1 (Significance (Required)): Comment on significance: The manuscript delivers a large amount of data, regarding changes in the proteome, the activation of specific kinases, phosphatases, as well as the molecular pathways that are activated at distinct time points of excitotoxicity. This information could be used in future studies to validate and develop potential therapeutic strategies that could protect against neuronal loss in various neurological disorders. Response: We are excited that Reviewer #1 felt that this large amount of generated data will be useful for subsequent studies to validate and develop novel therapeutic strategies.

      Comment on significance: The same group has very recently published a work very similar to the particular manuscript (Hoque et al. Cell Death and Disease, 2019). In their previous publication, the authors cover a large part of their current objectives. They performed again a proteomic and phosphoproteomic analysis of mouse primary cortical neurons treated with glutamate for distinct time points, in their aim to identify changes in expression and phosphorylation state of neuronal proteins upon excitotoxicity. Apart from the N-terminome, which they investigate in their current manuscript, the proteomic and phospho proteomic analysis are very similar. As such, and because of the fact that the current manuscript is very extensive, the authors should consider to minimize it, and include only their novel findings (changes in the N-terminome, the involvement of specific kinases that contribute to excitotoxic neuronal death, the regulatory mechanism of CRMP2, etc).

      Response: Since the coverage of phosphoproteins undergoing changes in neurons during excitotoxicity identified in the current study is much higher than that of phosphoproteins identified in our previously published study, we prefer to retain the description of the phosphoproteomic findings in this manuscript. Nonetheless, we agree that the manuscript needs to be shortened. Our suggestions to shorten the manuscript are listed below:

      1. Move the description and results of global proteomic analysis to supplementary information. Since we made the same observation that only a small number of neuronal proteins undergo significant changes in abundance during excitotoxicity in our previously published study, moving the global proteomic analysis results away from the main text will not adversely impact the quality of the presentation.
      2. For the description of how we classified the identified N-terminal peptides as those derived from degradation and those derived from proteolytic processing, we would like to move it to the supplementary information. Comment on significance: The authors should describe in a simpler way the proteomic and bioinformatics analyses they are using in the manuscript. It is difficult to understand the methodology used if you are not an expert in proteomics and bioinformatics. My suggestion is to revise their text and make it simpler and more concise. Response: We agree with this criticism. As we are not allowed to make a major revision of the manuscript at this stage, the revised manuscript contains only minor revisions that addresses all of the comments and suggestions provided by the two reviewers. Further changes will be added in the next revised version. Our suggestions to further restructure the manuscript are listed below:

      Figure S5 depicting the rationale for classification of N-terminal peptides as products of degradation and those of proteolytic processing will be moved to the main text. The description of the rationale in the main text will be revised to help readers who are not experts in proteomics to better understand the rationale. A diagram depicting the workflow of our TAILS method will be added as a supplementary figure. For bioinformatic analysis of the proteomic results, we will provide in the supplementary information the definition of the following terms relevant to Ingenuity Pathway Analysis and PhosphoPath analysis of the perturbed biological processes and signalling pathways: (a) Canonical Signalling Pathways, (b) Cellular Processes and (c) Interaction Networks. A short description of how their identification benefits the mapping of the neurotoxic signalling networks in neurons will be provided in the supplementary information.

      • *

      • *

      REVIEWER #2


      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Comment: In this article, Ameen and collaborators identify the modified proteins during neuronal excitotoxicity by using an in vitro model in which mouse primary cortical neurons are treated 30 and 240 min with 100 µM Glutamate. They use different approaches: a quantitative label-free global and phospho-proteomic methods and a quantitative N-terminomic procedure called Terminal Amine Isotopic Labelleling of Subtrates (TAILS). Results show that 240 min glutamate has minimal impact on protein abundance (13 neuronal proteins show significant changes) but enhance a modification of phosphorylation state and proteolysis of nearly 900 proteins. A significant part of these proteins are involved signalling pathway involved in cell survival, synaptogenesis and axonal guidance.

      The paper is globally well written and experiments are convincing. The methodology and the analysis are well described and well explain. The text and each figure are clear and accurate. However, I have just one comment that needs answers and/or clarifications. Thanks for your work. Response: We appreciate the compliment provided by this reviewer on our submitted manuscript.

      **Minor comment:**

      Minor comment: Primary neurons are used at DIV7 and it has been shown that at DIV7 the percentage of astrocytes is relatively low, however astrocytes plays a key role in glutamate recapture and release. It will be relevant to know the percentage of glial cell in the culture model of the authors and how astrocytes are involved in glutamate recapture and also in excitotoxicity.

      Response: The compositions of the DIV7 cultures are: 94.1+/- 1.1 % neurons, 4.9%+/-1.1% astrocytes, and *

      Reviewer #2 (Significance (Required)):

      Comment on significance: Excitotoxicity is a cell death process involved in many neurological disorders. However, nowadays, there are no existent FDA-approved pharmacological agents targeted to protect against excitotoxicity leading to neuronal death. A better comprehension of excitotoxicity is required to improve prevention, therapy and reparation following the disease.

      With this work, the authors highlighted modified proteins in excitotoxic neurons. Interestingly, few of these proteins are involved in cell survival, mRNA processing or axonal guidance. This atlas of phosphorylation and proteolytic processing events during excitotoxicity permit the identification of new therapeutic targets such as calpain-mediated cleavage of Src kinase. This atlas will interest a lot of team working on neurological disorders such as Alzheimer disease, Parkinson disease or stroke. It will permit to better characterize cellular/molecular events involved in neuronal loss and will permit to find new therapeutic targets. Response: In response to this comment and a similar comment by Reviewer 1, we expanded the discussion to include the potential therapeutic values of our findings.

      Comment on significance: My field of expertise: Stroke, cell death, excitotoxicity, signalling pathways and molecular targets, autophagy. I don't have sufficient expertise to evaluate proteomic analysis.

      Response: No response is needed.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this article, Ameen and collaborators identify the modified proteins during neuronal excitotoxicity by using an in vitro model in which mouse primary cortical neurons are treated 30 and 240 min with 100 µM Glutamate. They use different approaches: a quantitative label-free global and phospho-proteomic methods and a quantitative N-terminomic procedure called Terminal Amine Isotopic Labelleling of Subrates (TAILS). Results show that 240 min glutamate has minimal impact on protein abundance (13 neuronal proteins show significant changes) but enhance a modification of phosphorylation state and proteolysis of nearly 900 proteins. A significant part of these proteins are involved signalling pathway involved in cell survival, synaptogenesis and axonal guidance.

      The paper is globally well written and experiments are convincing. The methodology and the analysis are well described and well explain. The text and each figure are clear and accurate. However, I have just one comment that needs answers and/or clarifications. Thanks for your work.

      Minor comment:

      Primary neurons are used at DIV7 and it has been shown that at DIV7 the percentage of astrocytes is relatively low, however astrocytes plays a key role in glutamate recapture and release. It will be relevant to know the percentage of glial cell in the culture model of the authors and how astrocytes are involved in glutamate recapture and also in excitotoxicity.

      Significance

      Excitotoxicity is a cell death process involved in many neurological disorders. However, nowadays, there are no existent FDA-approved pharmacological agents targeted to protect against excitotoxicity leading to neuronal death. A better comprehension of excitotoxicity is required to improve prevention, therapy and reparation following the disease.

      With this work, the authors highlighted modified proteins in excitotoxic neurons. Interestingly, few of these proteins are involved in cell survival, mRNA processing or axonal guidance. This atlas of phosphorylation and proteolytic processing events during excitotoxicity permit the identification of new therapeutic targets such as calpain-mediated cleavage of Src kinase. This atlas will interest a lot of team working on neurological disorders such as Alzheimer disease, Parkinson disease or stroke. It will permit to better characterize cellular/molecular events involved in neuronal loss and will permit to find new therapeutic targets.

      My field of expertise: Stroke, cell death, excitotoxicity, signalling pathways and molecular targets, autophagy. I don't have sufficient expertise to evaluate proteomic analysis.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Ameen and colleagues report the results of a multidimensional proteomic analysis which combined quantitative proteomics, phosphoproteomics and N-terminomics in an effort to identify neuronal proteins displaying altered abundance or modifications by proteolysis and/or phosphorylation following an excitotoxic insult. Excitotoxicity is known to initiate by over-activation of ionotropic glutamate receptors which allows an increase in intracellular Ca2+ , ultimately leading to activation of proteases. The analysis revealed that glutamate treatment for up to 240 min did not significantly affect the abundance of neuronal proteins but caused dramatic changes in the phosphorylation state of many neuronal proteins. Based upon the phosphopeptides and neo-N-peptides, which contain the neo-N-terminal amino acid residue generated through proteolytic cleavage of intact neuronal proteins during excitotoxicity, the authors identified the proteins that undergo phosphorylation, dephosphorylation and/or enhanced proteolytic processing in excitotoxic neurons. By combining different software packages, they found that these modified proteins form complex interactions that affect signaling pathways regulating survival, synaptogenesis, axonal guidance and mRNA processing. These data suggest that perturbations in the aforementioned pathways mediate excitotoxic neuronal death. Then, the authors showed by Western blot analysis that CRMP2, a crucial regulator of axonal guidance signaling, exhibited enhanced truncation and reduced phosphorylation at specific sites upon glutamate treatment. These events may contribute to injury to dendrites and synapses associated with excitotoxic neuronal death. Furthermore, the authors showed that calpains are responsible for the proteolytic processing and cathepsins for enhanced degradation of proteins during excitotoxicity. Blockage of calpain-mediated cleavage site of the tyrosine kinase Src during excitotoxicity confers neuroprotection in an in vivo model of neurotoxicity. In that regard, over twenty protein kinases are predicted to be activated in excitotoxic neurons. Collectively, this study contributes to the construction of an atlas of phosphorylation and proteolytic processing events that occur during excitotoxicity and as such they can be targeted for therapeutic purposes.

      Comments

      The identification of potential calpain cleavage sites in neuronal proteins modified during excitotoxicity is an interesting finding of the study. However, the atlas presented appears to miss components such as Kinase D-interacting substrate of 220 kDa (Kidins220), also known as ankyrin repeat-rich membrane spanning (ARMS), a protein recently shown to be cleaved by calpain during excitotoxicity (López-Menéndez et al, 2019, Cell Death and Disease 10, 535).

      The CRMP2 antibody (Cell Signalling, 35672) used for western blots (figure 5D, also figure S11) and immunofluorescence (figure 5E) is problematic. Copied from https://www.cellsignal.com/products/primary-antibodies/crmp-2-d8l6v-rabbit-mab/35672: Monoclonal antibody is produced by immunizing animals with a synthetic peptide corresponding to residues surrounding lle546 of human CRMP-2 protein. The truncated CRMP2 (figure 5D) studied in the whole section (residues 1-516 or 1-517, ~57kDa) cannot be recognized by this monoclonal antibody. The detected band with the red letters in figure 5D might represent another cleavage product. In any case, asking Cell Signalling for more information about the exact immunogen might help, but since it's monoclonal and derived from residues surrounding lle546 it's very hard to include residues before aa516 and the unique epitope recognition upstream of aa516. The whole result section and discussion has to be reconsidered. Alternatively another antibody can be used to repeat those experiments in order to support the hypothesis. Time and resources are very familiar to authors since they have to repeat their previous work with a new antibody. Finally, there are no "western blot" and "immunofluorescence" methods for CRMP2.

      The truncated DCLK1 bands detected in figure S8B cannot be attributed to the proteolytic processing of DCLK1 at the sites described: T311↓S312, S312↓S313 and N315↓G316 (predicted M.W. of the (C-terminal) products: 48.7-49.1kDa (figure S8A) which is very close to be well-separated with conventional PAGE). The number and the separation of the bands suggest other cleavage sites.

      Could the striking observation that almost all proteolytic processing during excitotoxicity is catalyzed by calpains and/or cathepsins have derived (partially) from unspecific targets of calpeptin such as a subset of tyrosine phosphatases (Schoenwaelder and Burridge, 1999: approx. 1h treatment of fibroblasts with approx.. 10x less concentration) or other(s)?

      Describing the final part of figure 4C the authors suggest that "Liver kinase B1 homolog (LKB1), CaM kinase kinase β (CaMKKβ) and transforming growth factor‐β‐activating kinase 1 (TAK1) are the known upstream kinases directly phosphorylating T172 of AMPKα to activate AMPK (Herrero-Martin et al., 2009; Woods et al., 2005; Woods et al., 2003). Our findings therefore predict activation of these kinases during excitotoxicity (Figure 4C)." The first question arising here is whether these three kinases are the only ones know to phosphorylate AMPKα. Even if this is true, it is highly speculative to suggest that the findings of the present study predict the activation of these kinases during excitotoxicity, without providing the necessary experimental data, since the increased phosphorylation of AMPK may be an indirect effect of the reduced function of a phosphatase. Thus the proposed model does not hold.

      Minor points

      Highlights could present the key points of the study in a more straightforward manner.

      Figure 4A is too complicated. Proteins considered as hubs of signaling pathways in neurons should be somehow highlighted to distinguish them.

      The analysis of proteins with enhanced truncation and reduced phosphorylation such as CRMP2 and DCLK1 is fragmented. In addition, the authors should mention the criteria based on which these proteins were selected for further analysis.

      The potential therapeutic relevance of phosphorylation and proteolytic processing events that occur during excitotoxicity can be further explored.

      I am sorry but I could not find Figure 8, which is supposed to show the "In vivo model of NMDA neurotoxicity" (please, see page 30).

      Introduction: O'Collins et al., 2006; Savitz and Fisher, 2007; both references are missing.

      Figure S1A-B: vehicle treatment time course is needed.

      Figure 5E: Control close-up is missing.

      "Moreover, the number of CRMP2-containing dendritic blebs in neurons at 240 min of glutamate treatment was significantly higher than that in neurons at 30 min of treatment (inset of Figure 5E)." Such a statistic is not shown in the graph.

      "Consistent with this prediction, our bioinformatic analysis revealed that the identified cleavage sites in most of the significantly degraded neuronal proteins during excitotoxicity are mapped within functional domains with well-defined three-dimensional structures (Figures 6A)." Authors might mean figure S12A?

      "Neuronal Src was identified by the three criteria of our bioinformatic analysis to be cleaved by calpains to form a stable truncated protein fragment during excitotoxicity (Figures 6A and Table S6)." Authors might mean figure 6D?

      Figure 2B: Clusters 1, 3, 4 and 6 do not follow treatment trends homogenously in all time points. For example in cluster 1 there is a phosphopeptide following the pattern 1, 0, -1 and another one following the pattern 0, 1, -1, which is actually a very different pattern even if the end value is stable (-1). The first example could belong to the cluster 6 as well, while the second example to cluster 5. Please elaborate on the rationale behind the categorization. Is there any other clustering method that can be used without making the categorization more complicated?

      A problem of the manuscript is its length and lack of coherence. Apart from presenting the data from the proteomics, phosphoproteomics and N-terminomics analyses, the authors focus on several different proteins to perform validation experiments and further characterize the biological significance of their modification. Because these proteins do not fall on the same pathway, the authors end up presenting several independent stories that complicate the reader.

      Moreover, it is necessary for the authors to restructure their introduction, and avoid over-representing previous research on nerinetide, which is not used anywhere in the manuscript. Instead, the introduction must be more focused to better capture the necessity and essence of the present study.

      Taking into account figures 1 and S2 I understand that the authors combined samples of neuronal cell cultures (treated or not with Glu) with samples from mouse brains (that have undergone ischemic stroke/TBI or sham operation). If this is the case, why did the authors do that? How did they combine the different samples? And why this is not mentioned anywhere is the main text?

      Regarding figure 5D , the authors write in the main text "Consistent with our phosphoproteomic results, the truncated fragment CRMP2 fragments could not cross-react with the anti-pT509 CRMP2 antibody (Figure 5D)" In the upper blot the truncated CRMP2 fragment runs well below the 70 kDa marker. However, in the middle panel, where we see the blot with the phospho specific antibody, the respective area of the blot has been cropped, so we cannot see whether the truncated fragment cross-reacts with the phospho specific antibody.

      It is strange that only 1 and 13 proteins showed significant changes in abundance at 30 and 240min respectively. Especially after 240min of glutamate treatment one could expect that many proteins should change in their levels, since the neurons are almost diminished by cell death at that point. How could the authors explain this phenomenon? Additionally, in their previous publication, they showed that much more proteins change significantly in abundance following glutamate treatment (at 30min and 240min).

      Significance

      The manuscript delivers a large amount of data, regarding changes in the proteome, the activation of specific kinases, phosphatases, as well as the molecular pathways that are activated at distinct time points of excitotoxicity. This information could be used in future studies to validate and develop potential therapeutic strategies that could protect against neuronal loss in various neurological disorders.

      The same group has very recently published a work very similar to the particular manuscript (Hoque et al. Cell Death and Disease, 2019). In their previous publication, the authors cover a large part of their current objectives. They performed again a proteomic and phosphoproteomic analysis of mouse primary cortical neurons treated with glutamate for distinct time points, in their aim to identify changes in expression and phosphorylation state of neuronal proteins upon excitotoxicity. Apart from the N-terminome, which they investigate in their current manuscript, the proteomic and phospho proteomic analysis are very similar. As such, and because of the fact that the current manuscript is very extensive, the authors should consider to minimize it, and include only their novel findings (changes in the N-terminome, the involvement of specific kinases that contribute to excitotoxic neuronal death, the regulatory mechanism of CRMP2, etc).

      The authors should describe in a simpler way the proteomic and bioinformatics analyses they are using in the manuscript. It is difficult to understand the methodology used if you are not an expert in proteomics and bioinformatics. My suggestion is to revise their text and make it simpler and more concise.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript describes two advances. First is the technical development for a protein targeting system called PInT that brings a target protein close to (~320 bp) a DNA sequence of interest. The idea is that localisation of the target protein allows one to distinguish its effects on the DNA sequence either in cis (when targeted) or in trans (when not targeted but expressed at the same level). Since targeting is conveyed by simply adding the small molecule ABA to the experiment, it is easy to compare the two situations. This is a clever idea and it is substantiated by data showing that the components of PInT do not affect triplet repeat instability or gene expression of GFP, into whose gene the PInT system is placed. Moreover, targeting is shown to enable enzymatic activity in the targeted region. Using the DNA methylase DNMT1, there are local increases in DNA methylation. Similarly, targeting the histone deacetylase HDAC5 results in local decreases in histone H3 acetylation.

      We thank the reviewer for a thoughtful and helpful review.

      What is not clear from these experiments, however, is whether the targeted proteins can interact normally with partner proteins to form functional complexes. One necessary control is to add ChIP for at least one interacting protein each for DNMT1 and for HDAC5 and show that targeting permits normal protein-protein interactions. This experiment is straightforward as specific interacting proteins are known and good antibodies to precipitate those proteins are available.

      This is a good suggestion and we plan on doing this experiment in our 59B-Y-HDAC5 and 89B-Y-DNMT1 lines with and without ABA using interacting proteins. The exact interacting protein to be used will depend on the antibodies availability and quality, which we will test. We will start with UHRF1 and HDAC3 for PYL-Dnmt1 and PYL-HDAC5, respectively.

      Overall, PInT would likely be useful for many groups studying the effects of chromatin modifiers on a DNA sequence of interest.

      The second advance is conceptual and is focused more specifically on triplet repeat expansions. The manuscript describes experiments that measure genetic instability of long CAG-CTG repeats with and without protein targeting. The results show that allele size distributions are not significantly affected by targeting either DNMT1 or HDAC5. One curious outcome that is not discussed is contraction frequency in the HDAC5 experiment. Zero contractions are reported compared to 10-20% contractions in the other two experiments. Authors need to provide an explanation.

      Lack of contractions in this experiment is likely due to the lower number of repeats in this line (59 vs 89/91). It is known that longer repeats display higher frequency of contractions, and contractions are rarely seen in short repeats (Larson et al Neurobiology of Disease 2015, Gomes-Pereira et al PLOS Genet 2007, Morales et al HMG 2020). Albeit, the threshold may be different in our HEK293-derived cells. Of note, we had a clone of 89B-Y-HDAC5 that did not express the expected amount of GFP for unknown reasons and we did not use it here. However, small pool PCRs using this line with 89 repeats showed that contractions were indeed present. Although we cannot rule out that the reason for the contractions is the unknown mutation(s), it suggests that the difference is due to the size of the expansion. We have added a comment in the methods section.

      It reads: “We have noted that cell lines with repeats that are mildly expanded (e.g., 59 CAGs) have fewer contractions than longer ones. This is consistent with several studies in the context of DM1 and HD [82], albeit the size threshold for seeing more contractions may be shorter in HEK293-derived cells than in mice.”

      The major issue with this set of experiments is that there is no positive control where instability is shown to be clearly manipulated. A knockdown of FAN1 would be the most likely avenue to pursue for identifying a positive control. This is straightforward to perform since successful FAN1 knockdowns have been described in the literature.

      We agree that a positive control to show that the model behaves as expected is necessary. We will add the experiments proposed by the reviewer in the revised version of the manuscript.

      The manuscript also looks at effects on gene expression measured by GFP fluorescence intensity. The potential significance is to see if disease-causing genes with expanded triplet repeats can be silenced by targeting chromatin-modifying enzymes. In the examples tested here, the answer seems to be no. Expression of DNMT1 or HDAC5 reduce fluorescence even in the absence of targeting. Upon targeting, there is a small further decrease, but the expanded triplet repeat resists this further decrease. Domain analysis of HDAC5 indicates that protein-protein interactions, not deacetylase activity, are important for silencing. The key interaction may be with HDAC3, since small molecule inhibition of HDAC3 relieved repeat length-dependent silencing by HDAC5. It was very curious that targeting HDAC3 actually increased expression, instead of silencing. The explanation for this observation was inadequate.

      We have added the following paragraph to the discussion to address this.

      It reads: “We found that targeting of PYL-HDAC3 increases gene expression slightly, independently of repeat size and in the presence of an inhibitor of its catalytic activity. Although this appears counterintuitive, several studies suggest that this is not unexpected. Specifically, HDAC3 has an essential role in gene expression during mouse development that is independent of its catalytic activity [73]. Moreover, HDAC3 binds more readily to genes that are highly expressed in both human and yeast cells [74,75]. The mechanism or function of HDACs binding to highly expressed genes are currently unknown.”

      The claim on page 16 final paragraph that the manuscript 'settled a central question for both HDAC5 and DNMT1 and their involvement in CAG/CTG repeat instability' is not supported by the data. Most of the results are negative so it is premature to claim the question is 'settled'.

      We have rephrased all the conclusions about this in the text, emphasizing that we find no evidence of a role in cis, rather than stating that there is no role in cis.

      Overall, with appropriate modifications described here, these experiments would be of interest with regards to potential therapies of triplet repeat expansion diseases, where silencing the expanded gene is the goal.

      **Minor concerns**

      P 4, last line. 59 bp should read 59 repeats - This is now fixed.

      P 5, line 2. 38 bp of what? This is now amended. It reads: “The CAG/CTG repeats affect splicing of the reporter in a length-dependent manner, with longer repeats leading to more robust insertion of an alternative CAG exon that includes 38 nucleotides downstream of the CAG, creating a frameshift [30].”

      P 10, first paragraph. DNA methylation levels rise from ~10% to ~20% with DNMT1 targeting. Is there a good precedent in the literature that the magnitude of this increase can be expected to be biologically meaningful?

      To our knowledge, it is the first time that DNMT1 is used for targeted epigenome editing. This is therefore the first evidence that targeting DNMT1 leads to silencing of a reporter construct. Nevertheless, this reviewer’s comment stands: is an increase in DNA methylation of 10 to 20% biologically relevant? The answer to this is yes, changes in 10-20% are known to have functional impact on gene expression in various settings (for example see the recent study in developing oocytes by Li et al Nature 2018). Furthermore, there is evidence that DNMT1 has weak de novo activity (Li et al Nature 2018, Wang et al Nat Genet 2020), consistent with a small increase in CpG methylation upon targeting. We now acknowledge in the discussion that one reason for the lack of effect upon targeting may be that the changes in CpG methylation are not dramatic enough. We also point out more clearly that changes of 10 to 20% are correlated with changes in repeat instability (Dion et al HMG 2008). We have amended the text to reflect this.

      The results now reads “To do so, we performed bisulfite sequencing after targeting PYL-DNMT1 for 30 days. This led to changes of 10 to 20% in the levels of CpG methylation, a modest increase(Fig. 3C), which is in line with the weak de novo methyltransferase activity of DNMT1 (for example see [39,40]). Similar changes in levels of CpG methylation in Dnmt1 heterozygous ovaries and testes were seen to correlate with changes in repeat instability in vivo [31].”

      The discussion now states: “It should be pointed out that there remains the possibility that DNMT1 targeting did not lead to large enough changes in CpG methylation to affect repeat instability.”

      P12 first paragraph. Text describing Fig 5 is confusing. First, GFP expression is referred to in terms of fold decrease, but subsequently in percent. Second, the ABA-induced silencing looks to reduce expression from about 0.6 to 0.5 of control. I presume this is where the claim of 16% comes from but it was not clear. Indeed, this is what we mean.

      We now state: “In 16B-Y-DNMT1 cells, ABA treatment decreased GFP expression by 2.2-fold compared to DMSO treatment alone. Surprisingly, ABA-induced silencing was 1.8 fold compared to DMSO alone, or 16% less efficient in 89B-Y-DNMT1 than in 16B-Y-DNMT1 cells.”

      P 15 paragraph 2. Where does the P value of 0.78 come from? Fig 7B shows no corresponding value. The P-value in figure 7B has now been corrected.

      Reviewer #1 (Significance (Required)):

      See above.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      We still do not know whether epigenetics contributes to repeat instability and/or transcriptional activity in unstable CAG/CTG repeat associated pathologies. The aim of this manuscript is to examine whether induced binding of DNMT1 (CpG methylation) or HDAC5 (histone H3 acetylation) modulates CAG/CTG repeat instability and/or gene silencing upon expansion. For this the authors developed a highly sophisticated reporter system (PlnT) that allows to recruit a specific chromatin modifying enzyme (DNMT1/ HDAC5) to a GFP reporter near a CAG/CTG expansion, in the course of transcription (Dox-inducible promoter). This is to determine whether the CTGs, when lengthened and transcribed, become unstable or impede gene activity via epigenetic modifications.

      We appreciate the reviewer highlighting the importance of the question that we address here and the usefulness of PInT.

      **Findings:**

      1.Binding of DNMT1 to the reporter results in a modest increase (~10%) in local DNA methylation, with no change in repeat instability.

      3.Targeting HDAC5 to the reporter results in local reduction in histone H3 acetylation, with no effect on repeat stability.

      4.DNMT1/HDAC5 binding reduces GFP intensity differentially, in normal but not expanded alleles.

      5.The N-terminal domain of HDAC5, when mutated, abolishes the reduction in GFP expression levels.

      6.RGFP966 abolishes the allele-specific effect of HDAC5, resulting in a general decrease in GFP expression regardless of repeat tract size

      7.CTG expanded alleles abolish the reduction in GFP repression by HDAC5 via HDAC3 activity

      **Conclusions:**

      Based on the results using the PlnT reporter assay, the authors claim that:

      1.HDAC5 and DNMT1 do not affect repeat instability in cis

      2.Expanded CAG/CTGs reduce the efficiency of gene silencing by targeting DNMT1/HDAC5 to the locus

      3.Gene silencing that is mediated by HDAC5 recruitment can be abolished by inhibition of HDAC3 activity

      Unfortunately, none of the claims in this manuscript are convincing.

      We note that in the comments below the reviewer does not include a reason why he/she does not find the claims convincing. We therefore cannot address this criticism.

      **General Comments:**

      The major drawback of the PlnT experimental approach is that it ignores the importance of the flanking regions and the genomic organization of the endogenous locus. This is a major concern as it makes the conclusions irrelevant to the related loci. In the case of myotonic dystrophy type 1, for example, the reporter should reside within a CpG island, should be positioned immediately next to CTCF binding site(s), and should be transcribed bi-directionally.

      HDAC3 and DNMT1 were found to have effects on repeat instability both at reporters, which do not harbour flanking sequences from disease loci, and indeed at endogenous loci in vivo (Dion et al HMG 2008, Debacker et al PLoS Biol 2012, Suelves et al Sci Rep 2017, Williams et al PNAS 2020). This highlights the fact that cis elements from disease loci are not required for chromatin modifiers to affect repeat instability.

      The reviewer is suggesting a very interesting set of experiments where specific sequences may be added to our reporter and tested for their influence on gene expression and on repeat instability. PInT is ideally suited for this and we have now added a paragraph highlighting this in the discussion. We have also highlighted that the current study aims to isolate the repeats from its cis-elements to specifically side-step potential locus-specific effects and to look for chromatin modifiers that would be useful for epigenome editing for as many loci as possible.

      Furthermore, only large expansions (at least several hundred copies) can trigger heterochromatin at the DM1 locus. None of these features are recapitulated by the PlnT reporter assay, making it difficult to draw any conclusion regarding the role of these chromatin modifying enzymes to the locus.

      This is true for DM1 but untrue for other disease loci. For example, we have shown that there are changes in the flanking chromatin marks at the SCA1 locus of a mouse model with 145 repeats (Dion et al HMG 2008), DNA methylation is also affected near a SCA7 transgene with 92 CAG repeats (Libby et al PLoS Genet 2008) and transgenes containing CAG repeats (without the flanking sequences) lead to silencing regardless of where the transgene is integrated in the genome (Saveliev et al Nature 2003). Moreover, HDAC5 had effects on repeat expansion in a cell-based shuttle system containing as few as 22 CAG repeats (Gannon et al NAR 2012), again suggesting that chromatin modifiers affect repeat instability in a wide range of repeat sizes. We have reviewed this in Dion and Wilson TiG 2009.

      In fact, the authors state in their Discussion that "targeting a chromatin modifying peptide to different loci can have very different effects"!

      This is indeed the case and the reason why we sought to control for locus-specific effects using an exogenous reporter.

      To better substantiate their conclusions the authors must set up an improved model system that takes into account the flanking regions and the 3D genomic organization of the locus (TADs). The preferable approach would be to insert a reporter cassette by homologous recombination into the differentially methylated/acetylated regions near the repeats, and compare between normal vs. expanded alleles.

      We would like to point out that we have recently published a study where we looked at 3D chromatin folding at the DM1, HD, and the GFP transgene used here. We did not find any evidence for changes in TADs that would underlie changes in repeat instability at these loci (Ruiz Buendia et al Sci Advances 2020). We therefore do not think that it would be important to further manipulate 3D genomic organization in this context.

      To be clear, we are not denying that cis elements are likely to have an effect, there is plenty of evidence supporting this. Rather, we are using a reporter assay to disentangle the potential locus-specific (or cis-element specific) effects from the trans-activating factors. In short, we focus on the trans-acting factors rather than on the cis-elements, as suggested by the reviewer.

      We believe that the addition of the following paragraph highlights the goal of our study and also bring in the idea that cis acting elements can be studied using PInT.

      It now reads:

      “We designed PInT specifically to isolate expanded repeats tracts from other potential locus-specific cis elements. This is helpful to identify factors that would affect instability and/or gene expression across several diseases. Moreover, both HDAC3 and DNMT1 were found to impact repeat instability at different loci, including at reporter genes [31,33,36,37,45]. These observations highlight that cis-acting elements from disease loci are not required by chromatin modifiers to affect repeat instability. A potential application of PInT includes cloning in specific cis elements, including CTCF binding sites and CpG islands, next to the repeat tract and evaluate their effects on instability with or without targeting. In fact, PInT can be used to clone any sequence of interest near the targeting site and can be applied for a wide array of applications, beyond the study of expanded CAG/CTG repeats.”

      My impression was that there is a lot of data but none of it makes sense.

      The focus of the manuscript is not entirely clear: it starts with monitoring the effect of epigenetics on repeat instability and gene activity, then it shifts to the mechanism by which HDAC5 functions, and ends with the allele-specific effect of HDAC5 on gene expression. I lost my train of thought.

      We have now improved the transitions in this new version of this manuscript. Specifically, at the core of this manuscript is the development of PInT, which is highly versatile and allowed us to study multiple aspects of expanded CAG/CTG repeat biology. We hope that it is now clearer.

      **Other concerns:**

      (1)the modest increase in methylation levels following DNMT1 recruitment (10%, reaching a total of 20% at the most) prevents from drawing any conclusions regarding the effect of methylation on stability or expression.

      As mentioned in the response to reviewer 1 above, although 10% to 20% of CpG methylation are associated with changes in gene expression in a variety of settings, we now point out that one reason for the lack of effect in cis is that the de novo activity of DNMT1 is too weak to produce an effect.

      (2)The effect of protein targeting on GFP levels should be better defined at the RNA/protein level. Does it act by blocking transcription? alternative splicing? or alters steady state levels?

      Although the exact mechanism remains unclear, this goes beyond the current scope of this study. All these possibilities remain possible as we pointed out in the discussion.

      (3)Fig 5: the scale is different for A vs. B and C. Also, better to compare the effect of targeting on equal sized expansions (either 91, 89 or 58 repeats).

      We have fixed the scale on the figures.

      Unfortunately, it is not possible to have the same repeat sizes for all the cell lines because by their very nature, repeats are unstable. We have added a note relating to this in the methods.

      It reads: “Notably, it is not possible to obtain several stable lines with the exact same repeat size as they are, by their nature, highly unstable. This is why we have lines with different repeat sizes. Furthermore, the sizes can change over time and upon thawing.”

      (4)Add asterix for significance in all figures.

      This has now been done.

      (5)Figure 6: show raw data rather than normalized.

      We have now added representative flow cytometry profiles for each construct as a new supplementary figure (S5).

      (6)Figure 7: there is a notable difference in GFP expression levels in untreated wild type control (16 CAG repeats) between A vs. B. Why?

      Fig. 7a shows PYL targeting only, whereas 7b shows the GFP expression upon PYL-HDAC5 targeting. The values for PYL-HDAC5 targeting are lower because targeting it, unlike targeting PYL alone, silences the reporter.

      (7)Avoid redundancy. No need to show schematic representations so many times.

      We believe that the schematics make it clearer for the reader.

      Reviewer #2 (Significance (Required)):

      REFEREES CROSS-COMMENTING

      I totally agree with the Reviewer #1 that the PinT targeting system is a potent experimental tool to study the function of specific chromatin binding proteins. However, the significance of the flanking regions is discounted.

      We hope it is now clear that we are not discounting the potential significance of flanking regions and that rather we have designed the system to avoid their potentially complicating effects.

      The fact that the recruitment of HDAC5 has resulted in a significant reduction in acetylated histones provides evidence for that "the targeted proteins can interact normally with partner proteins to form functional complexes". Still, I agree with that the activity of DNMT1 needs to be better established, considering the minor increase in DNA methylation levels.

      We will be using ChIP against interacting proteins of DNMT1 and HDAC5 to address this issue.

      The request for a positive control for repeat instability is totally correct.

      We will be adding this in the revised manuscript.

      It is difficult to discuss the missing effect of HDAC5 on contractions or the unexpected effect of HDAC3 on gene silencing bearing in mind the limits of the experimental system.

      There is no expectation for the effect of HDAC5 on contractions as this has not been studied in any system yet. However, we believe that there is no contractions not because of HDAC5 per se but rather because of the shorter repeat size this line has (see comment to reviewer 1 above). We have now addressed the “unexpected effect” of HDAC3 by citing a number of studies finding a similar evolutionary conserved effect (see comment to Reviewer 1 above).

      I also agree with the statement that "this manuscript settled a central question for both HDAC5 and DNMT1 and their involvement in CAG/CTG repeat instability", is not supported by the data.

      We have now rephrased our conclusions. In this particular case, we changed ‘settled’ to ‘addressed’. We have also rephrased this in the results headings.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      We still do not know whether epigenetics contributes to repeat instability and/or transcriptional activity in unstable CAG/CTG repeat associated pathologies. The aim of this manuscript is to examine whether induced binding of DNMT1 (CpG methylation) or HDAC5 (histone H3 acetylation) modulates CAG/CTG repeat instability and/or gene silencing upon expansion. For this the authors developed a highly sophisticated reporter system (PlnT) that allows to recruit a specific chromatin modifying enzyme (DNMT1/ HDAC5) to a GFP reporter near a CAG/CTG expansion, in the course of transcription (Dox-inducible promoter). This is to determine whether the CTGs, when lengthened and transcribed, become unstable or impede gene activity via epigenetic modifications.

      Findings:

      1.Binding of DNMT1 to the reporter results in a modest increase (~10%) in local DNA methylation, with no change in repeat instability.

      3.Targeting HDAC5 to the reporter results in local reduction in histone H3 acetylation, with no effect on repeat stability.

      4.DNMT1/HDAC5 binding reduces GFP intensity differentially, in normal but not expanded alleles.

      5.The N-terminal domain of HDAC5, when mutated, abolishes the reduction in GFP expression levels.

      6.RGFP966 abolishes the allele-specific effect of HDAC5, resulting in a general decrease in GFP expression regardless of repeat tract size

      7.CTG expanded alleles abolish the reduction in GFP repression by HDAC5 via HDAC3 activity

      Conclusions:

      Based on the results using the PlnT reporter assay, the authors claim that:

      1.HDAC5 and DNMT1 do not affect repeat instability in cis

      2.Expanded CAG/CTGs reduce the efficiency of gene silencing by targeting DNMT1/HDAC5 to the locus

      3.Gene silencing that is mediated by HDAC5 recruitment can be abolished by inhibition of HDAC3 activity

      Unfortunately, none of the claims in this manuscript are convincing.

      General Comments:

      The major drawback of the PlnT experimental approach is that it ignores the importance of the flanking regions and the genomic organization of the endogenous locus. This is a major concern as it makes the conclusions irrelevant to the related loci. In the case of myotonic dystrophy type 1, for example, the reporter should reside within a CpG island, should be positioned immediately next to CTCF binding site(s), and should be transcribed bi-directionally. Furthermore, only large expansions (at least several hundred copies) can trigger heterochromatin at the DM1 locus. None of these features are recapitulated by the PlnT reporter assay, making it difficult to draw any conclusion regarding the role of these chromatin modifying enzymes to the locus. In fact the authors state in their Discussion that "targeting a chromatin modifying peptide to different loci can have very different effects"! To better substantiate their conclusions the authors must set up an improved model system that takes into account the flanking regions and the 3D genomic organization of the locus (TADs). The preferable approach would be to insert a reporter cassette by homologous recombination into the differentially methylated/acetylated regions near the repeats, and compare between normal vs. expanded alleles.

      My impression was that there is a lot of data but none of it makes sense.

      The focus of the manuscript is not entirely clear: it starts with monitoring the effect of epigenetics on repeat instability and gene activity, then it shifts to the mechanism by which HDAC5 functions, and ends with the allele-specific effect of HDAC5 on gene expression. I lost my train of thought.

      Other concerns:

      (1)the modest increase in methylation levels following DNMT1 recruitment (10%, reaching a total of 20% at the most) prevents from drawing any conclusions regarding the effect of methylation on stability or expression.

      (2)The effect of protein targeting on GFP levels should be better defined at the RNA/protein level. Does it act by blocking transcription? alternative splicing? or alters steady state levels?

      (3)Fig 5: the scale is different for A vs. B and C. Also, better to compare the effect of targeting on equal sized expansions (either 91, 89 or 58 repeats).

      (4)Add asterix for significance in all figures.

      (5)Figure 6: show raw data rather than normalized.

      (6)Figure 7: there is a notable difference in GFP expression levels in untreated wild type control (16 CAG repeats) between A vs. B. Why?

      (7)Avoid redundancy. No need to show schematic representations so many times.

      Significance

      REFEREES CROSS-COMMENTING

      I totally agree with the Reviewer #1 that the PinT targeting system is a potent experimental tool to study the function of specific chromatin binding proteins. However, the significance of the flanking regions is discounted. The fact that the recruitment of HDAC5 has resulted in a significant reduction in acetylated histones provides evidence for that "the targeted proteins can interact normally with partner proteins to form functional complexes". Still, I agree with that the activity of DNMT1 needs to be better established, considering the minor increase in DNA methylation levels. The request for a positive control for repeat instability is totally correct. It is difficult to discuss the missing effect of HDAC5 on contractions or the unexpected effect of HDAC3 on gene silencing bearing in mind the limits of the experimental system. I also agree with the statement that "this manuscript settled a central question for both HDAC5 and DNMT1 and their involvement in CAG/CTG repeat instability", is not supported by the data.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript describes two advances. First is the technical development for a protein targeting system called PInT that brings a target protein close to (~320 bp) a DNA sequence of interest. The idea is that localisation of the target protein allows one to distinguish its effects on the DNA sequence either in cis (when targeted) or in trans (when not targeted but expressed at the same level). Since targeting is conveyed by simply adding the small molecule ABA to the experiment, it is easy to compare the two situations. This is a clever idea and it is substantiated by data showing that the components of PInT do not affect triplet repeat instability or gene expression of GFP, into whose gene the PInT system is placed. Moreover, targeting is shown to enable enzymatic activity in the targeted region. Using the DNA methylase DNMT1, there are local increases in DNA methylation. Similarly, targeting the histone deacetylase HDAC5 results in local decreases in histone H3 acetylation. What is not clear from these experiments, however, is whether the targeted proteins can interact normally with partner proteins to form functional complexes. One necessary control is to add ChIP for at least one interacting protein each for DNMT1 and for HDAC5 and show that targeting permits normal protein-protein interactions. This experiment is straightforward as specific interacting proteins are known and good antibodies to precipitate those proteins are available. Overall, PInT would likely be useful for many groups studying the effects of chromatin modifiers on a DNA sequence of interest.

      The second advance is conceptual and is focused more specifically on triplet repeat expansions. The manuscript describes experiments that measure genetic instability of long CAG-CTG repeats with and without protein targeting. The results show that allele size distributions are not significantly affected by targeting either DNMT1 or HDAC5. One curious outcome that is not discussed is contraction frequency in the HDAC5 experiment. Zero contractions are reported compared to 10-20% contractions in the other two experiments. Authors need to provide an explanation. The major issue with this set of experiments is that there is no positive control where instability is shown to be clearly manipulated. A knockdown of FAN1 would be the most likely avenue to pursue for identifying a positive control. This is straightforward to perform since successful FAN1 knockdowns have been described in the literature. The manuscript also looks at effects on gene expression measured by GFP fluorescence intensity. The potential significance is to see if disease-causing genes with expanded triplet repeats can be silenced by targeting chromatin-modifying enzymes. In the examples tested here, the answer seems to be no. Expression of DNMT1 or HDAC5 reduce fluorescence even in the absence of targeting. Upon targeting, there is a small further decrease, but the expanded triplet repeat resists this further decrease. Domain analysis of HDAC5 indicates that protein-protein interactions, not deacetylase activity, are important for silencing. The key interaction may be with HDAC3, since small molecule inhibition of HDAC3 relieved repeat length-dependent silencing by HDAC5. It was very curious that targeting HDAC3 actually increased expression, instead of silencing. The explanation for this observation was inadequate. The claim on page 16 final paragraph that the manuscript 'settled a central question for both HDAC5 and DNMT1 and their involvement in CAG/CTG repeat instability' is not supported by the data. Most of the results are negative so it is premature to claim the question is 'settled'. Overall, with appropriate modifications described here, these experiments would be of interest with regards to potential therapies of triplet repeat expansion diseases, where silencing the expanded gene is the goal.

      Minor concerns

      P 4, last line. 59 bp should read 59 repeats

      P 5, line 2. 38 bp of what?

      P 10, first paragraph. DNA methylation levels rise from ~10% to ~20% with DNMT1 targeting. Is there a good precedent in the literature that the magnitude of this increase can be expected to be biologically meaningful?

      P12 first paragraph. Text describing Fig 5 is confusing. First, GFP expression is referred to in terms of fold decrease, but subsequently in percent. Second, the ABA-induced silencing looks to reduce expression from about 0.6 to 0.5 of control. I presume this is where the claim of 16% comes from but it was not clear.

      P 15 paragraph 2. Where does the P value of 0.78 come from? Fig 7B shows no corresponding value.

      Significance

      See above.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Summary

      The authors present well written work on the evolution of proteome size and complexity, and the corresponding changes in chaperone proteins. Interestingly, they find chaperone copy numbers increase linearly with proteome size, despite the increasing 'complexity' of, in particular, post-LECA genomes. They suggest that to address the rise in complexity, organisms express chaperones at higher levels and an expanding network of co-chaperones has evolved across the tree of life.

      Major comments

      Comment-1. Summary reads strangely relative to the rest of the manuscript, and lists facts in a way that makes the purpose of the study confusing. I think most readers will dislike the characterisation of evolution as a progress from simple to complex, and the authors' might want to avoid this language throughout the manuscript- bacteria and archaea have also been evolving over this period of times, and have not become more 'complex'? Similarly the authors should reconsider their figure legend titles. As a specific example, 'in the course of evolution' should become 'across the tree of life'.

      Response

      Thank you for these crucial suggestions. We agree with the reviewer, and with Reviewer 2 (see below) that bacteria and archaea have also been evolving since their emergence, so basically, we (humans) and the simplest archaea have the same evolutionary origin. However, we all agree that the simplest archaea/bacteria are far more similar to LUCA than we are. That said, we accept the criticism that putting our analysis in the context of evolutionary time is an over-interpretation given that we have not examined the protein/proteome phylogeny (in relation to proteome complexity; for chaperones we have). We have thus reformulated the figures and text, to a comparison across the Tree of life, rather than a time-dependent evolutionary process. Specifically: as a first step, we revised the Figures to rename the X-axis as “Order of divergence”, rather than “Divergence time (million years)” in the previous version. In the revised main text we emphasized the fact that the branch lengths of the Tree of Life represent the relative order of divergence of the different clades, rather than time. All instances of ‘in the course of evolution’ has been replaced by ‘across the Tree of Life’.

      Secondly, we revised the main text to emphasize on prokaryote vs. eukaryote comparison, rather than comparing organisms that diverged at different time-points. Within bacterial and archaeal domains, proteomes do not seem to expand against the order of divergence (as the reviewer argued, bacteria and archaea have not become more complex, also see Comment-5).

      Thirdly, the word ‘complexity’ has been omitted from the manuscript. The section “The expansion of proteome complexity” now reads as “Proteome expansion by de novo innovations”. In the previous version, increasing complexity in fact implied a torrent of de novo innovations that impose a larger burden on the chaperone machinery. Instead of ‘complexity’, the latter is clearly stated in the revised manuscript.

      In the spirit of these changes, the title of the revised manuscript, figure legend titles, and related section titles have been edited as follows.

      Submitted version

      Revised version

      Paper title. On the evolution of chaperones and co-chaperones and the exponential expansion of proteome complexity

      On the evolution of chaperones and co-chaperones and the expansion of proteomes across the Tree of Life

      Section title. A Tree of Life analysis of the expansion of proteome complexity and chaperones

      A Tree of Life analysis of the expansion of proteomes and chaperones

      Section title. The expansion of proteome size

      The expansion of proteome size across the Tree of Life

      Section title. The expansion of proteome complexity

      Proteome expansion by de novo innovations

      Figure 1 legend title. Expansion of proteome size

      Expansion of proteome size across the Tree of Life

      Figure 2 legend title. Expansion of proteome complexity

      Expansion of proteomes by de novo innovations

      Further, changes have been made in the Summary and in the main text to exclude any impression that proteomes/organisms have become more complex with time. Rather we emphasized prokaryote versus eukaryote comparison.

      Comment-2. I think the manuscript would be improved if the authors significantly shortened the discussion of genome size evolution- this is fairly well understood, and could be covered briefly, especially as the main focus of the manuscript is on the evolution of chaperone and co-chaperone repertoire. They could also make clearer quantitative links between protein complexity and the evolution of chaperones and co-chaperones- perhaps this should be in the discussion? The authors might also consider referencing 'The evolution of genome complexity', which could be relevant to this manuscript and might make the work of broader interest.

      Response

      We thank the reviewer for this suggestion. The main focus of our paper is indeed the evolution of chaperones and co-chaperones but within the context of the expansion of proteomes. Having this focus in place, the discussion on proteome size evolution (section: The expansion of proteome size across the Tree of Life) has been revised and shortened to emphasize more on prokaryote versus eukaryotic comparison.

      The suggestion to provide “clearer quantitative links between protein complexity and the evolution of chaperones and co-chaperones” is indeed very useful and we authors sincerely thank the reviewer. To address this suggestion we revised Figure 4 to quantitatively compare the expansion of proteomes and that of chaperones, under one roof. This Figure compares proteome parameters that supposedly demands more chaperone action in all three domains of life and simultaneously summarizes the expansion of the chaperone machinery lacking de novo innovations.

      The first paragraph of the Discussion section has been revised accordingly that walks the reader through the revised Figure 4 and finally introduces to the dichotomy it implies.

      We did not understand the last comment “The authors might also consider referencing 'The evolution of genome complexity', which could be relevant to this manuscript and might make the work of broader interest.” We’d be glad to address it upon further clarification.

      Comment-3. The authors state 'protein trees were generated and compared with ToL to account for gene loss and transfer events'. The methodology for this procedure is not given in the manuscript. The authors should back up this point, and make it clear this is why they reconstruct the trees. Currently it is not convincing to me that the authors have found HGT given the considerable phylogenetic uncertainty in the basal events in the tree of life. I also expect the tree of a single protein to be potentially lack information due to the short sequence considered and possible lack of power. The authors need to consider whether the data is really of high enough quality to assess this.

      Response

      Thank you for this suggestion. For the various chaperone families, we manually compared the protein trees with the Tree of Life. This is clearly stated in the revised Methods section (see Page 25, Lines 31-32). We agree, however, that the identifying HGT, and in general, trees of single domains that are highly diverged, are tricky. We did our best to address these caveats. Specifically:

      We re-evaluated our work in the light of a recent study (PMID: 32316034). This paper discussed the phylogenetic uncertainties associated with molecular dating and re-evaluated the assignment of several protein families to LUCA. A careful analysis revealed that the reviewer is indeed right, meaning many of the HGT events shown in the previous version Figure 3B was indistinguishable from the phylogenetic uncertainties.

      Accordingly, we revised the section “The core-chaperones emerged in early-diverging prokaryotes”. We removed the previous version Figure 3B, along with all instances of HGT events mentioned in the main text, except one (archaea to Firmicute HGT of HSP60, which is well-supported by the data and was also detected previously). Dating the emergence of chaperone families was also re-evaluated. Though the major conclusions were not altered, we discussed the phylogenetic uncertainties associated with our work and the overall confidence of each dating analysis. We believe these discussions would be very useful to the readers.

      Finally, we note that most of our key assignments (points of emergence, and major HGT events) are in agreement with previous works. Specifically: the emergence of HSP20 and HSP60 to LUCA (Sousa et al., 2016; Weiss et al., 2016) and HSP60 being horizontally transferred from archaea to Firmicute (Techtmann and Robb, 2010) and HSP20 being horizontally transferred between bacterial clades and between bacteria and archaea (Kriehuber et al., 2010).

      Comment-4. Methods- the authors could consider taking an alternative source of LUCA proteins, rather than those found in 'Nanoarchaeota and Aquificae': it's possible these are not representative of LUCA, and it seems a somewhat arbitrary choice- the authors could consider using one of the available curated sets, such as that generated by Ranea et al. (2006).

      Response

      The reviewer is right that a more robust LUCA set could be used. However, given that the revised manuscript focuses on comparison across the ToL, and foremost on prokaryote versus eukaryote comparison, we don’t think that refining this set is important. Foremost, this set was used for one purpose only, for determining changes in domain length. And, the set of 38 X-groups used for this analysis are in fact, the ones present in all organisms across the ToL. Hence, we kept the original analysis, while mentioning that these 38 X-groups are conserved across the ToL, and removed the argument for LUCA assignment. See Page 5, Line 22.

      Comment-5. The patterns observed might only hold because of differences in the taxa that diverged pre and post LECA? The authors might consider subgroup analyses to ensure this is not the case. The authors could also consider using methods that take phylogeny into account.

      Response

      The reviewer is right that within prokaryotic domains proteomes do not seem to expand. For example, excluding a few early-diverging prokaryotes and parasites, proteome size in bacteria and archaea varies within 2000-3000 proteins per proteome. Only when pre-LECA and post-LECA organisms are compared, significant differences are observed. We thank the reviewer for this suggestion. We revised the main text to focus on prokaryote versus eukaryote comparison. This re-focusing does not change any of our major conclusions, but rather puts our analysis in the right context (see Comment 1).

      Minor comments

      Comment-6. 'Life's habitability has also expanded from its 10 specific niche of emergence-likely deep-sea hydrothermal vents, to highly variable and extreme 11 ranges of temperature, pressure, exposure to high UV-light, dehydration and free oxygen.' This is not really correct, as bacteria and archaea are found worldwide, and in the most extreme environments.

      Response

      Thank you for this suggestion. We removed the above-mentioned sentence.

      Comment-7. 'We reconciled the topology of our tree'- on first read this was not clear, I did not realise the authors were only building trees for subsets of the data- time tree is the best source for the overall topology. The phrase 'manually curated and adjusted' is used in the methods. This language is much too vague, and not a clear explanation of the steps taken.

      Response

      We apology for this confusion. The overall topology of our Tree of Life is indeed taken for TimeTree. We edited the text in Page 4, Line 4 to clarify this issue.

      The obtained tree topology was manually curated and adjusted to depict eukaryotes stemming from Asgard archaea and Alphaproteobacteria, by an endosymbiosis event. This is clearly mentioned in the Methods section (see Page 22, Lines 24-28).





























      Reviewer #2

      Summary

      Rebeaud and colleagues analyze evolution of chaperones compared to the evolution of whole proteome complexity across the entire tree of life. Their principal conclusions are well captured in the following quote from the Discussion:

      "Comparison of the expansion of proteome complexity versus that of core-chaperones presents a dichotomy-a linear expansion of core-chaperones supported an exponential expansion of proteome complexity. We propose that this dichotomy was reconciled by two features that comprise the hallmark of chaperones: the generalist nature of core-chaperones, and their ability to act in a cooperative mode alongside co-chaperones as an integrated network. Indeed, in contrast to core chaperones, there exist a consistent trend of evolutionary expansion of co-chaperones."

      Major comments

      Comment-1. The general theme of the evolution of proteome management is of obvious interest. Unfortunately, the entire analysis is shaky and fails to convincingly ascertain the authors' conclusions. There are many issues. Throughout the manuscript, the authors discuss 'expansion' of the proteome in bacteria, archaea and eukaryotes, creating the impression of a consistent evolutionary trend. No such trend actually exists if one considers the means or medians of proteome sizes within each of the three domains of life (there is a transition to greater complexity in eukaryotes). The maximum complexity, certainly, increases with time which can be attributed to the 'drunkard's walk' effect. This hardly qualifies as 'expansion'.

      Response

      The reviewer is right that within prokaryotes proteomes do not seem to significantly expand. Reviewer-1 raised a similar concern that prokaryotes and eukaryotes have been evolving for the same period of time and have not expanded significantly. We understand the misconception instated by the earlier version and we thank the reviewers for pointing it out. Accordingly, we revised the main text to clarify these issues, as described in the following.

      Firstly, the main text was revised to emphasize on prokaryote versus eukaryote comparison. The reviewer agrees that compared to prokaryotes, “there is a transition to greater complexity in eukaryotes”. This re-focusing does not change any of our major conclusions, but rather provides a systematic comparison that is adequately supported by data.

      Secondly, we revised the Figures to rename the X-axis as “Order of divergence”, rather than “Divergence time (million years)” in the previous version. We emphasized the fact that the X-axis actually represent the relative order of divergence of the different clades, rather than absolute dates. This emphasis certainly does not create the impression of a consistent evolutionary trend. Instead, combined with the revised main text, it depicts that only when pre-LECA and post-LECA organisms are compared, clear trends of proteome expansion is observed.

      Comment-2. The authors further claim a 'linear' expansion of the chaperone set and 'exponential' expansion of the total proteome size. These are precise mathematical terms and, as such, require fitting to the respective functions. No such thing in this manuscript. Even apart from that shortcoming, the explanation of both 'linear' and 'exponential' are quite confusing. Thus, when explaining the 'linearity' of chaperone evolution, the authors refer to the lack of major innovation among the chaperones. This is correct in itself but has nothing to do with linearity. Apart from the aforementioned conceptual problems, the estimation of the 'exponential' growth of the proteome are naive, inconsistent and inaccurate.

      Response

      Our uses of ‘linear expansion’ versus ‘exponential expansion’ may have been confusing although we have defined quite clearly what we mean by that (i.e., that it is not the mathematical sense). The statement regarding “the lack of major innovation among the chaperones” was made in this context/definition and was consistent with it.

      Nonetheless, to avoid confusion, we revised the main text by excluding the ‘linear expansion’ and ‘exponential expansion’ terms. We simply stated that a torrent of de novo innovations has occurred during the expansion of proteomes from prokaryotes to eukaryotes. In contrast, the evolutionary history of core-chaperones lacks such major innovations. Accordingly, the title of the revised manuscript, figure legend titles, and related section titles have been edited as follows.

      Submitted version

      Revised version

      Paper title. On the evolution of chaperones and co-chaperones and the exponential expansion of proteome complexity

      On the evolution of chaperones and co-chaperones and the expansion of proteomes across the Tree of Life

      Section title. A Tree of Life analysis of the expansion of proteome complexity and chaperones

      A Tree of Life analysis of the expansion of proteomes and chaperones

      Section title. The expansion of proteome complexity

      Proteome expansion by de novo innovations

      Figure 1 legend title. Expansion of proteome size

      Expansion of proteome size across the Tree of Life

      Figure 2 legend title. Expansion of proteome complexity

      Expansion of proteomes by de novo innovations

      Comment-3. As the base point for the expansion estimates for archaea and eukaryotes, the authors take parasitic forms. Even leaving aside the highly dubious claims that these organisms belong to the clades that diverged first from the respective ancestors, parasites are not an appropriate choice for such estimates because they certainly are products of reductive evolution. For bacteria, inconsistently, the authors choose a free-living form from a dubious ancient clade, and not even the one with the smallest genome. All taken together, this robs the expansion estimates of any substantial meaning.

      Response

      This point is overall valid. Although we adamantly reject the insinuation of “dubious claims that these organisms belong to the clades that diverged first from the respective ancestors” – firstly, we did not make any claims to this end, but took the ToL constructed by others (Hedges et al., 2015); second, that these claims are dubious need to backup by counter-evidence/data and with all due respect, neither were provided by the reviewer. However, what is of concern is that in a symbiont/parasite chaperones of the host may have a key role, and thus the comparison to free-living organisms could be misleading. To address this concern we excluded the obligatory endosymbiont Nanoarchaeum equitans and the parasitic organisms from the expansion estimates and such discussions are now limited to free-living organisms only. Further, as described in response to Comment-1, the revised manuscript focuses on prokaryote versus eukaryote comparison.

      Note that phylogenetic analysis often assigns parasitic and symbiotic organisms that have experienced reductive evolution as the earliest diverging clades of their corresponding kingdoms of life. Examples include Nanoarchaeum equitans, an obligate symbiont, assigned as the earliest diverging archaea (Hedges et al., 2015; Huber et al., 2002; Waters et al., 2003), and parasitic Excavate assigned as one of the earliest diverging eukaryotes (Burki et al., 2020; Simpson et al., 2002). In accordance with these studies, these parasitic and symbiotic organisms were included in our analysis. We acknowledged this fact in the Methods section (see Page 22, Lines 9-16).

      Comment-4. The authors do make a salient and I think essentially correct observation: chaperones typically comprise about 0.3% of the proteins in any organism. As such, this presents no dichotomy in evolutionary trends to be explained. Surely, as examined and discussed in the paper, eukaryotes also show significant increases in the size and domain content of the encoded proteins, suggesting the possibility that might need more chaperones. However, if this is the explanandum, rather than the number of proteins in the proteome as such, it should be clearly stated. Furthermore, it is quite natural to assume that this increase in protein complexity without a commensurate increase in the chaperone diversity, is enabled by higher expression of the chaperones as suggested in the Discussion of this paper. I doubt there is any big surprise here and even much need for an extended discussion let alone a special publication.

      Response

      As emphasized, and shown, eukaryotes have not only larger proteomes in terms of the number of proteins or protein size. They have a higher content of proteins that are prone to misfolding. This is shown explicitly, in Figure 2 (namely, multidomain proteins, repeat, beta-rich proteins, etc’) and is reiterated in a summary figure (suggested by Reviewer 1). Further, in response to Reviewer-3’s suggestion, we showed that eukaryotes feature much higher proportions of aggregation-prone proteins per proteome than prokaryotes (Figure 2E).

      To further clarify, we revised Figure 4 to quantitatively compare the expansion of proteomes and that of chaperones, under one roof. This Figure compares proteome parameters that supposedly demands more chaperone action in all three domains of life and simultaneously summarizes the expansion of the chaperone machinery lacking de novo innovations.

      In addition, the first paragraph of this Discussions section is revised to state that from prokaryotes to eukaryotes, proteomes have expanded by duplication-divergence as well as by innovations (de novo emergence of new folds). Thus, it’s not about the size only (a challenge that a proportion expansion of chaperone genes would resolve, i.e., the 0.3%) but about proteome composition changing in a way that demands more and more chaperone action.

      We also agree with the assertion that “it is quite natural to assume that this increase in protein complexity without a commensurate increase in the chaperone diversity, is enabled by higher expression of the chaperones”. However, we belong to a group of scientists for whom natural assumptions are insufficient, and think that supporting evidence is of importance.

      Reviewer’s significance statement

      As such, in the opinion of this reviewer, there is no substantial advance over the existing knowledge in this paper. Should the authors wish to revise, they would need to develop robust methodology to measure proteome expansion. That would involve starting from reconstructed ancestors rather than any extant forms (let alone parasites). I doubt that such analysis, non-trivial in itself, reveals an strong, consistent trends other than the well known increase in complexity in eukaryotes.

      Response

      We agree that to assert evolutionary, time-dependent trends one needs to analyze phylogenies and reconstructed ancestors, but still think that a comparison of proteome and chaperone contents along the Tree of Life is meaningful. We thus respectfully, yet adamantly disagree with “no substantial advance over the existing knowledge”. We strongly believe, as does Reviewer-3, that the results and the model presented in this paper are “fascinating to consider and… will stimulate a good deal of important discussion…”.

      Reviewer #3

      Summary

      The manuscript by Rebeaud et al describes phylogenetic analyses of proteome and chaperone complexity. The authors analyzed species across the tree of life to predict the proteome and chaperone properties of ancestors spanning to the last universal common ancestor. Their analyses indicate that many proteome properties increased in complexity over evolutionary time including: average protein length, the number of multi-domain proteins, the size of the proteome, the number of repeat proteins, and the number of beta-superfold proteins that are known to be difficult to fold. Their analyses also indicate an expansion in chaperone families that corresponds to the increase in proteome complexity. Based on their analyses, the authors propose a model where early life relied on a limited number of chaperones (Hsp20 and Hsp60) and that as proteome complexity evolved, so did chaperone complexity. Core chaperones including Hsp90, Hsp70, and Hsp100 evolved relatively early, and later chaperone evolution was driven by the appearance and alterations of co-chaperones and auxiliary factors as well as by increases in the protein abundance of chaperones.

      Major concerns

      Comment-1. This work is appropriately based on phylogenetic inferences, but as such, the limitations and uncertainties of phylogenetic inferences need to be discussed. This in no way takes away from the work, quite the opposite, it would make it richer by encouraging broader interpretations where justified and clear understanding of where support for the model is strongest. Posterior probabilities need to be discussed and the range of properties that a likely ancestor might have based on the data should be discussed. How this impacts the conclusions and models should be discussed. Throughout the manuscript, the authors present most-likely ancestral models (as I understood it), what are the next most likely models? How much power is there to distinguish one model from another? It would be very helpful to have a section describing the limitations and uncertainties of the phylogenetic analyses and how these relate to the main findings and conclusions.

      Response

      We thank the reviewer for this suggestion. Reviewer-1 raised a similar suggestion (see Comment-3). The phylogenetic analysis in our paper included dating the emergence of core- and co-chaperone families, and attempt to infer major their HGT events, foremost in relation to the origin of eukaryotic chaperones. To highlight the uncertainties of phylogenetic inferences we re-evaluated our work in the light of a recent study (PMID: 32316034) that carefully analyzed the uncertainties associated with the assignment of several protein families to LUCA.

      Ideally, for a protein family to be assigned to LUCA, there must be a single split of bacterial and archaeal domains at the root of the protein tree with strong bootstrap support, and the inter-domain branches would be longer than the intra-domain branches (PMID: 32316034). In the revised main text we discussed that only the HSP60 protein tree satisfies this criterion. HSP20 protein tree depicts a clear single split of bacterial and archaeal domains at the root, albeit with weak bootstrap support, and inter-domain branch lengths are smaller than intra-domain branch-lengths. We discussed that this is indeed the case of phylogenetic uncertainty, which means the sequence of this small, single-domain chaperone lacks the information to make reliable inference at the basal events in the ToL.

      In addition, the HGT events discussed in the previous version appear to be indistinguishable from phylogenetic uncertainties and we removed all instances of HGT events mentioned in the main text as well as Figure 3B. Only one HGT event – HSP60 being horizontally transferred from archaea to Firmicute, which is well-supported by the data is kept in the revised main text. We believe these discussions would be very useful to the readers.

      Finally, we note that most of our key assignments (points of emergence, and major HGT events) are in agreement with previous works. Specifically: the emergence of HSP20 and HSP60 to LUCA (Sousa et al., 2016; Weiss et al., 2016) and HSP60 being horizontally transferred from archaea to Firmicute (Techtmann and Robb, 2010) and HSP20 being horizontally transferred between bacterial clades and between bacteria and archaea (Kriehuber et al., 2010).

      Comment-2. General features that impact foldability, including contact order, should be discussed and what features can be searched for in genomes that relate to these - e.g. beta-rich proteins.

      Response

      Thanks for this valuable idea! Contact order, and other predictors of problematic folding are highly relevant but their analysis is structure-based and hence inapplicable on the proteome (sequence) scale. We did, hwoever, estimate the proportion of aggregation-prone proteins in the proteome. These proteins were identified by CamSol method that assigns poorly soluble regions from sequence data. Indeed, some of these predicted ‘poorly soluble segments’ refer to the hydrophobic core of the respective folded state instead of ‘true’ aggregation hotspots. With this unavoidable potential caveat, it appears that compared to prokaryotes, aggregation-prone proteins in the proteome have become nearly 6-fold more frequent in Chordates.

      Following changes were made to accommodate this new analysis:

      Figure 2 is revised to include a new panel (panel-E) that shows the expansion of aggregation-prone proteins in the proteome across the Tree of Life. The same result is summarized in the summary Figure 4.

      A new paragraph entitled “Proteins predicted as aggregation-prone became ~6-fold more frequent in the proteome” is added to the Results section, which describes the principle and the main results (see Page 7, Lines 14-28).

      The methodology is included in the Methods section, in a paragraph entitled “Predicted proportion of aggregation-prone proteins in the proteome”, see Page 24 Lines 17-27. For each representative organism, the percent of aggregation-prone proteins in proteome data are provided as Data S10.

      This analysis is also included in the revised Abstract: “Proteins prone to misfolding and aggregation, such as repeat and beta-rich proteins, proliferated ~600-fold, and accordingly, proteins predicted as aggregation-prone became 6-fold more frequent in mammalian compared to bacterial proteomes.” See Page 2, Lines 7-9.

      Comment-3. "Core" chaperones needs to be defined.

      Response

      Thank you for this suggestion. We restructured Page 3 Lines 19-23 in the Introduction to clearly explain this aspect. The current text is quoted below.

      “Chaperones can be broadly divided into core- and co-chaperones. Core-chaperones can function on their own, and include ATPases HSP60, HSP70, HSP100, and HSP90 and the ATP-independent HSP20. The basal protein holding, unfolding, and refolding activities of the core-chaperones are facilitated and modulated by a range of co-chaperones such as J-domain proteins (Caplan, 2003; Duncan et al., 2015; Schopf et al., 2017).”

      Minor concerns and thoughts

      Comment-4. This manuscript stimulated me to think about the dynamics between chaperone evolution and proteome evolution. The ability to tolerate proteins that need chaperones seems linked to major evolutionary innovations. Once you have these innovations though, you are addicted to the chaperones - and an expansion of the number of sub-optimal proteins. These ideas seem like they would be valuable to include in the discussion of this work. More generally, it would be wonderful to have a discussion of future directions that this work may spark.

      Response

      This is indeed a fascinating question or set of questions, that we have also become intrigued about following this work, We introduced a short section, though more of an ‘appetizer’ than a detailed discussion, as we know almost nothing about the co-evolution of new proteins and chaperones.

      Reviewer’s significance statement

      This manuscript provides a fascinating glimpse back in time of a fundamental interplay - between chaperone evolution/addiction and proteome evolution. I am not an expert in phylogenetic analyses so I cannot judge the details of the analyses. As an expert in molecular evolution and chaperones, I found the approach and model fascinating to consider and I believe it will stimulate a good deal of important discussion in these fields. I have one major concern that I feel ought to be addressed in the manuscript and a number of points that I would encourage the authors to consider. I am sure that these can be readily addressed and I look forward to seeing this work published and the further discussion and ideas that it may stimulate.

      Response

      Thank you!

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript by Rebeaud et al describes phylogenetic analyses of proteome and chaperone complexity. The authors analyzed species across the tree of life to predict the proteome and chaperone properties of ancestors spanning to the last universal common ancestor. Their analyses indicate that many proteome properties increased in complexity over evolutionary time including: average protein length, the number of multi-domain proteins, the size of the proteome, the number of repeat proteins, and the number of beta-superfold proteins that are known to be difficult to fold. Their analyses also indicate an expansion in chaperone families that corresponds to the increase in proteome complexity. Based on their analyses, the authors propose a model where early life relied on a limited number of chaperones (Hsp20 and Hsp60) and that as proteome complexity evolved, so did chaperone complexity. Core chaperones including Hsp90, Hsp70, and Hsp100 evolved relatively early, and later chaperone evolution was driven by the appearance and alterations of co-chaperones and auxiliary factors as well as by increases in the protein abundance of chaperones.

      Major concerns:

      1. This work is appropriately based on phylogenetic inferences, but as such, the limitations and uncertainties of phylogenetic inferences need to be discussed. This in no way takes away from the work, quite the opposite, it would make it richer by encouraging broader interpretations where justified and clear understanding of where support for the model is strongest. Posterior probabilities need to be discussed and the range of properties that a likely ancestor might have based on the data should be discussed. How this impacts the conclusions and models should be discussed. Throughout the manuscript, the authors present most-likely ancestral models (as I understood it), what are the next most likely models? How much power is there to distinguish one model from another? It would be very helpful to have a section describing the limitations and uncertainties of the phylogenetic analyses and how these relate to the main findings and conclusions.
      2. General features that impact foldability, including contact order, should be discussed and what features can be searched for in genomes that relate to these - e.g. beta-rich proteins.
      3. "Core" chaperones needs to be defined.

      Minor concerns and thoughts:

      1. This manuscript stimulated me to think about the dynamics between chaperone evolution and proteome evolution. The ability to tolerate proteins that need chaperones seems linked to major evolutionary innovations. Once you have these innovations though, you are addicted to the chaperones - and an expansion of the number of sub-optimal proteins. These ideas seem like they would be valuable to include in the discussion of this work. More generally, it would be wonderful to have a discussion of future directions that this work may spark.

      Significance

      This manuscript provides a fascinating glimpse back in time of a fundamental interplay - between chaperone evolution/addiction and proteome evolution. I am not an expert in phylogenetic analyses so I cannot judge the details of the analyses. As an expert in molecular evolution and chaperones, I found the approach and model fascinating to consider and I believe it will stimulate a good deal of important discussion in these fields. I have one major concern that I feel ought to be addressed in the manuscript and a number of points that I would encourage the authors to consider. I am sure that these can be readily addressed and I look forward to seeing this work published and the further discussion and ideas that it may stimulate.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Rebeaud and colleagues analyze evolution of chaperones compared to the evolution of whole proteome complexity across the entire tree of life. Their principal conclusions are well captured in the following quote from the Discussion:

      "Comparison of the expansion of proteome complexity versus that of core-chaperones presents a dichotomy-a linear expansion of core-chaperones supported an exponential expansion of proteome complexity. We propose that this dichotomy was reconciled by two features that comprise the hallmark of chaperones:the generalist nature of core-chaperones,and their ability to act in a cooperative mode alongside co-chaperones as an integrated network.Indeed, in contrast to core chaperones, there exist a consistent trend of evolutionary expansion of co-chaperones."

      The general theme of the evolution of proteome management is of obvious interest. Unfortunately, the entire analysis is shaky and fails to convincingly ascertain the authors' conclusions. There are many issues. Throughout the manuscript, the authors discuss 'expansion' of the proteome in bacteria, archaea and eukaryotes, creating the impression of a consistent evolutionary trend. No such trend actually exists if one considers the means or medians of proteome sizes within each of the three domains of life (there is a transition to greater complexity in eukaryotes). The maximum complexity, certainly, increases with time which can be attributed to the 'drunkard's walk' effect. This hardly qualifies as 'expansion'. The authors further claim a 'linear' expansion of the chaperone set and and 'exponential' expansion of the total proteome size. These are precise mathematical terms and, as such, require fitting to the respective functions. No such thing in this manuscript. Even apart from that shortcoming, the explanation of both 'linear' and 'exponential' are quite confusing. Thus, when explaining the 'linearity' of chaperone evolution, the authors refer to the lack of major innovation among the chaperones. This is correct in itself but has nothing to do with linearity. Apart from the aforementioned conceptual problems, the estimation of the 'exponential' growth of the proteome are naive, inconsistent and inaccurate. As the base point for the expansion estimates for archaea and eukaryotes, the authors take parasitic forms. Even leaving aside the highly dubious claims that these organisms belong to the clades that diverged first from the respective ancestors, parasites are not an appropriate choice for such estimates because they certainly are products of reductive evolution. For bacteria, inconsistently, the authors choose a free-living form from a dubious ancient clade, and not even the one with the smallest genome. All taken together, this robs the expansion estimates of any substantial meaning.

      The authors do make a salient and I think essentially correct observation: chaperones typically comprise about 0.3% of the proteins in any organism. As such, this presents no dichotomy in evolutionary trends to be explained. Surely, as examined and discussed in the paper, eukaryotes also show significant increases in the size and domain content of the encoded proteins, suggesting the possibility that might need more chaperones. However, if this is the explanandum, rather than the number of proteins in the proteome as such, it should be clearly stated. Furthermore, it is quite natural to assume that this increase in protein complexity without a commensurate increase in the chaperone diversity, is enabled by higher expression of the chaperones as suggested in the Discussion of this paper. I doubt there is any big surprise here and even much need for an extended discussion let alone a special publication.

      Significance

      As such, in the opinion of this reviewer, there is no substantial advance over the existing knowledge in this paper. Should the authors wish to revise, they would need to develop robust methodology to measure proteome expansion. That would involve starting from reconstructed ancestors rather than any extant forms (let alone parasites). I doubt that such analysis, non-trivial in itself, reveals an strong, consistent trends other than the well known increase in complexity in eukaryotes.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The authors present well written work on the evolution of proteome size and complexity, and the corresponding changes in chaperone proteins. Interestingly, they find chaperone copy numbers increase linearly with proteome size, despite the increasing 'complexity' of, in particular, post-LECA genomes. They suggest that to address the rise in complexity, organisms express chaperones at higher levels and an expanding network of co-chaperones has evolved across the tree of life.

      Major comments:

      -Summary reads strangely relative to the rest of the manuscript, and lists facts in a way that makes the purpose of the study confusing. I think most readers will dislike the characterisation of evolution as a progress from simple to complex, and the authors' might want to avoid this language throughout the manuscript- bacteria and archaea have also been evolving over this period of times, and have not become more 'complex'? Similarly the authors should reconsider their figure legend titles. As a specific example,'in the course of evolution' should become 'across the tree of life' .

      -I think the manuscript would be improved if the authors significantly shortened the discussion of genome size evolution- this is fairly well understood, and could be covered briefly, especially as the main focus of the manuscript is on the evolution of chaperone and co-chaperone repertoire. They could also make clearer quantitative links between protein complexity and the evolution of chaperones and co-chaperones- perhaps this should be in the discussion? The authors might also consider referencing 'The evolution of genome complexity', which could be relevant to this manuscript and might make the work of broader interest.

      -The authors state 'protein trees were generated and compared with ToL to account for gene loss and transfer events'. The methodology for this procedure is not given in the manuscript. The authors should back up this point, and make it clear this is why they reconstruct the trees. Currently it is not convincing to me that the authors have found HGT given the considerable phylogenetic uncertainty in the basal events in the tree of life. I also expect the tree of a single protein to be potentially lack information due to the short sequence considered and possible lack of power. The authors need to consider whether the data is really of high enough quality to assess this.

      -Methods- the authors could consider taking an alternative source of LUCA proteins, rather than those found in 'Nanoarchaeota and Aquificae':it's possible these are not representative of LUCA, and it seems a somewhat arbitrary choice- the authors could consider using one of the available curated sets, such as that generated by Ranea et al. (2006)

      -The patterns observed might only hold because of differences in the taxa that diverged pre and post LECA? The authors might consider subgroup analyses to ensure this is not the case. The authors could also consider using methods that take phylogeny into account.

      Minor comments:

      'Life's habitability has also expanded from its 10 specific niche of emergence-likely deep-sea hydrothermal vents, to highly variable and extreme 11 ranges of temperature, pressure, exposure to high UV-light, dehydration and free oxygen.' This is not really correct, as bacteria and archaea are found worldwide, and in the most extreme environments.

      ' We reconciled the topology of our tree'- on first read this was not clear, I did not realise the authors were only building trees for subsets of the data- time tree is the best source for the overall topology. The phrase 'manually curated and adjusted' is used in the methods. This language is much too vague, and not a clear explanation of the steps taken.

      Significance

      The work presents interesting results that suggest that more 'complex' organisms have evolved a strategy to cope with increasing proteome size, and is interesting to researchers in the field of molecular evolution.

      I am a researcher in population genetics and molecular evolution.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This study outlines calcium probes for assessing the poorly understood role of peroxisomes in calcium signaling. The authors suggest that these organelles sequester calcium from either calcium influx across the plasma membrane or from release from the ER/SR. This is important since we need to know more about the roles of these organelles in calcium homeostasis and signaling. However, it needs to be robustly demonstrated that the probes are targeted to the right organelle without confounding contamination from other organelles which can be very significant even for a small degree of mis-targeting.

      Major

      1. The difference between the signals seen between the peroxisome and cytosolic D3 versions are not compelling, other than a dampened spike with the former (higher resting levels, smaller peak). See below for pH concerns.
      2. How clean is the peroxisome distribution? Prove that D3 spillover from its being partially in (or on) other compartments (e.g. cyto, ER) is not contributing to the changes. Selective manipulation of Ca2+ in these other compartments should not affect the peroxisome signal.
        • a. For example, the small changes in the D3-px could be explained by peroxisome not changing at all but rather the other compartments (where larger responses are observed) signal(s) contaminating the response.
          • b. e.g. if in the ER lumen, the signal should be eliminated with SERCA inhibitors (thapsigargin, CPA). They used Thapsigargin in cardiac myocytes, why not in HeLa during characterization)?
      3. Any Ca2+ reporter will pH-sensitive to an extent, even D3 (Ca2+ binding, inherent fluorescent proteins).
        • a. It is essential to prove that the signal changes are not due changes perox pH. Target pH-sensitive proteins to the perox lumen by the same strategy and show that the same Ca2+ interventions do not cause pH changes.
        • b. The authors claim different resting levels of [Ca2+] in cytosol/mitochondria/peroxisome. The resting FRET level also depends on the resting pH of the compartments which may also be different. Certainly, mitochondria are more alkaline than the cytosol. Again, to interpret these are real Ca2+ differences requires the pH to be accounted for.
      4. I am puzzled by the model, in particular in view of Fig 3. The genetically-encoded calcium indicator (GECI) is allegedly in on the cytosolic face of the peroxisome and measuring peri-peroxisomal Ca2+.
        • a. The changes with this reporter look pretty similar to the luminal reporter (save that the resting ratio may be lower). I don't understand how the lumen [Ca2+] > cytosolic [Ca2+] without a higher local [Ca2+] (unless there is an energy-driven uptake mechanism, but then how does this fit in with ER-driven Ca2+ release?).
      5. The claim that resting peroxisome [Ca2+] is higher than cytosol is questionable. Is this a calibration artifact (e.g. compartment pH-differences or the reporter behaves differently in the lumen)? Such a gradient could not be sustained without energy-dependent Ca2+ uptake. The authors make no discussion of this.

      Minor

      1. Quantitate localization. Pearson's coefficients for GECIs and Peroxisomes.
      2. Different upstroke rates of D3 with His vs Cao. Quantify.
      3. Page 5. Line 161. 'Different sites', do the authors mean different sides? Similarly, the Legend of Fig 3.

      Significance

      Good peroxisome calcium probes is important to the genral calcium signaling field. This is fundamental science of interst to all cell biologists.

      There has been little published on peroxisome calcium, although for example, the Pozzan lab published a paper in JBC in 2008 on a GFP-based lumenally targeted peroxisome probe. There is contradictory data in the field and reliable new approaches are needed.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by Sargsyan et al describes an unappreciated role for peroxisomes in Calcium dynamics. Specifically, the authors propose that GPCR/VDCC/SOCE-mediated cytosolic Ca2+ elevation is rapidly sensed by peroxisomes and sequestered. The authors used/generated a peroxisome-targeted genetically encoded Ca2+ indicators which is elegant and powerful tool to monitor the luminal Ca2+ dynamics. While the results and conclusions are novel, there are some important gaps that need to be addressed for consideration for publication in EMBO J.

      Comments:

      Peroxisomes are single membrane bound organelles which are conserved across species spanning from yeast to humans. While housing only -100 proteins, they are responsible for essential steps in lipid metabolism, amino acid metabolism and ROS homeostasis. Unlike other organelles, peroxisomes import fully folded and cofactor-bound proteins into their matrix. Though peroxisomes house specific metabolic functions, there is extensive crosstalk with other organelles, including mitochondria. It is essential to test and define whether silencing/knockdown of mitochondrial Ca2+ transport components like MCU will impact peroxisome Ca2+ uptake upon stimulation with histamine or electrical stimulation.

      Since peroxisomes buffer significant amount of Ca2+, it is worth testing whether blockade of mitochondrial Ca2+ uptake would not alter peroxisome mediated Ca2+ influx. This analysis will provide Ca2+ uptake rate of mitochondria vs peroxisomes (mallilankaraman K. et al CELL 2012 and Nemani N. et al Science Signaling 2020).

      Peroxisomal synthesis of plasmalogens is Ca2+ and oxygen tension dependent, it is essential to show that altering Ca2+ controls plasmalogen synthesis.

      In the introduction authors have stated that "Elevated mitochondrial uptake increases 39 mitochondrial reactive oxygen species (ROS) production and is associated with heart falure and ischemic 40 brain injury (Starkov et al., 2004; Santulli et al., 2015)." These cited articles remotely links MCU and ROS elevation. It is important to point out that Tomar et al 2016 Cell Reports clearly demonstrated that genetic ablation of MCU suppresses mROS production that is mitochondrial Ca2+ dependent.

      Significance

      The significance of the work is very high. The authors employ a variety of complementary techniques and experimental systems to demonstrate that peroxisomes indeed buffer a large quantity of Ca2+ upon stimulation.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      These are straight forward studies aimed to develop probes to asses peroxisomal Ca2+ in rest and in response to receptor stimulation. The probes were designed to measure intraperoxisomal Ca2+ and the Ca2+ the peroxisome experience when cytoplasmic Ca2+ is increased. The pobes fill a need in understanding peroxisomal Ca2+ and Ca2+ signaling in general and should be very useful to investigators in the field.

      The comments are aimed to help in improving the studies and taking them to the next stage.

      The grammar needs improvement and the introduction needs sharpening. It is long and, in many places, not to the point. The results and discussion sections are also quite verbose.

      The sidedness of the probes need to be validated further, especially since the peroxisomal Ca2+ increase follows the cytoplasmic and the slower reduction rate may results from the environment experienced by the probe. Simple experiments: how the probes respond to Ca2+ ionophore; does Ca2+ reduced rapidly when removed from the media of the digitonin permeabilized cells; how the cytoplasmic and peroxisomal thapsigargin responses compare using the protocols in 2A and 4A? Sidedness of PEX13-D3cpV was not examined.

      Calculation of peroxisomal Ca2+ are based on Kd reported in the literature. The Kds of D3cpV-px and PEX13-D3cpV should be determined when in the peroxisome in permeabilized cells for the numbers to have any meaning.

      How the localization of the probes look in the differentiated cardiomyocytes? How it compares to RyRs, VACC, etc..

      The major weakness of the study is that the probes are used only as a tool. The enhance the study and bring it beyond an excellent technical achievement, the authors should use them to study a significant Ca2+-dependent peroxisomal function and show how the use of the tools eliminate the role of Ca2+ in such a function.

      Significance

      These are straight forward studies aimed to develop probes to asses peroxisomal Ca2+ in rest and in response to receptor stimulation. The probes were designed to measure intraperoxisomal Ca2+ and the Ca2+ the peroxisome experience when cytoplasmic Ca2+ is increased. The pobes fill a need in understanding peroxisomal Ca2+ and Ca2+ signaling in general and should be very useful to investigators in the field.

      The major weakness of the study is that the probes are used only as a tool. The enhance the study and bring it beyond an excellent technical achievement, the authors should use them to study a significant Ca2+-dependent peroxisomal function and show how the use of the tools eliminate the role of Ca2+ in such a function.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** This interesting study by Putker et al. showed that circadian rhythmicity persists in several typical circadian assay systems lacking Cry, including Cry knockout mouse behavior and gene expression in Cry knockout fibroblasts. They further demonstrated weak but significant circadian rhythmicity in Cry- and Per- knockout cells. Cry- (and potentially Per-)-independent oscillations are temperature compensated, and CKId/e still has a role in the period regulation of Cry-independent oscillations. **Major comments:** 1) The authors propose that the essential role of mammalian Cryptochrome is to bring the robust oscillation. As the authors analyze in many parts, the robustness of oscillation can be validated by the (relative) amplitude and phase/period variation, both of which should be affected significantly by the method for cell synchronization. Unfortunately, the method for synchronization is not adequately written in this version of supplementary information. This reviewer has no objection to the "iterative refinement of the synchronization protocol" but at least the correspondence between which methods were used in which experiments needs to be clearly explained. The detailed method may be found in the thesis of Dr. Wong, but the methods used in this manuscript need to be detailed within this manuscript.

      We thank the reviewer for recognising the importance of different synchronisation protocols. In experiments where bioluminescent CKO rhythms were observed, different synchronisation protocols resulted in similar results when comparing WT with CKO cells. The different synchronisation methods used in each experiment are now specified in the supplementary methods.

      2) The authors revealed that CKO mice have apparent behavioral rhythmicity under the condition of LL>DD. This is an intriguing finding. However, it should be carefully evaluated whether this rhythmicity (16 hr cycle) is the direct consequence of circadian rhythmicity observed in CKO and CPKO cells (24 hr cycle) because the period length is much different. Is it possible to induce the 16 hr periodicity in CKO mice behavior by 16 hr-L:16 hr-D cycle? Would it be a plausible another possibility that the 16 hr rhythmicity is the mice version of internal desynchronization or another type of methamphetamine-induced-oscillation/food-entrainable-oscillattion?

      The reviewer makes an excellent suggestion. As described in the manuscript text (page 13), CKO mice have already been shown to entrain to restricted feeding cycles (Iijima et al., 2005) and we therefore assessed whether CKO rhythms would entrain to a 16h day as suggested. Whilst CKO (but not WT) mice showed 16h behavioural rhythms during entrainment, they were arrhythmic under constant darkness thereafter (Revised Figure S2A). CKO cellular rhythms show reduced robustness under constant conditions ex vivo, and our other work has revealed that CRY-deficiency renders cells much more susceptible to stress (Wong et al, 2020, BioRxiv). The parsimonious explanation, therefore, is that whilst the cellular timing mechanism remains functional when CRY is absent, the amplitude of cellular clock outputs is severely attenuated (as we showed previously in Hoyle et al., Sci Trans Med, 2017) in a fashion that impairs the fidelity of intercellular synchronisation under most conditions in vivo, as well as the molecular mechanisms of entrainment to light-dark cycles.

      With respect to the apparent discrepancy between mean periods of CKO cultured cells (~21h), SCN (~19h) and mice (~17h). This is also observed in WT cells (~26h), SCN (~25h) and mice (~24h), simply with a smaller effect size and longer intrinsic period.

      We believe this difference in effect size can adequately be explained by differences in oscillator coupling, combined with the reduced robustness of CKO timekeeping. In Figure 1F we show that the range of rhythmic periods expressed by cultured CKO fibroblasts (14-30h) is much greater than for their WT counterparts (range of 22-26h), or that which is observed when cellular oscillators are coupled in CKO SCN (19h). Thus period of CKO oscillations is demonstrably more plastic (less robust) than WT, and with a cell-intrinsic tendency towards shorter period which is revealed more clearly when oscillators are coupled.

      In vivo there is more oscillator coupling in the intact SCN than in an isolated slice, from which communication with the caudal and rostral hypothalamus has been removed. Thus it seems plausible that increased coupling in vivo, combined with positive feedback via behavioural cycles of feeding and locomotor activity, resonate with a common frequency which is shorter than in isolated tissue.

      Critically, for both WT and CKO mice/SCN, the circadian period lies within the range of periods observed in isolated fibroblasts. To communicate this rather nuanced point we have inserted the following text into the supplementary discussion:

      “Circadian timekeeping is a cellular phenomenon. Co-ordinated ~24h rhythms in behaviour and physiology are observed in multi-cellular mammals under non-stressed conditions when individual cellular rhythms are synchronised and amplified by appropriate extrinsic and intrinsic timing cues. In light of short period (~16.5h) locomotor rhythms observed in CKO mice after transition from constant light to constant dark, but failure to entrain to 12h:12h light:dark cycles, it seemed plausible that either CKO mice might entrain to an short 8h:8h light:dark (16h day) or else have a general deficiency to entrainment by light:dark cycles. The data in Figure S2 supports the latter possibility, in that neither WT nor CKO mice stably entrained to 16h cycles whereas WT but not CKO mice entrained to 24h days. The bioluminescence oscillations observed in CKO cells conform to the long-established definition of a circadian rhythm (temperature-compensated ~24h period of oscillation with appropriate phase-response to relevant environmental stimuli). Whereas the locomotor rhythms observed in CKO mice under quite specific environmental conditions correlates with both the cellular and SCN data to suggest the persistence of capacity to maintain behavioural rhythms close to the circadian range, but which is masked under most circumstances. We suggest that in vivo the (pathophysiological) stress of CRY-deficiency is epistatic to the expression of daily rhythms in locomotor activity following standard entrainment by light:dark cycles and thus, whilst not arrhythmic, also cannot be described as circadian in the strictest sense.”

      3) The authors proposed that CKId/e at least in part is the component of cytoscillator (Fig. 5D), and turnover control of PER (likely to be controlled by CKId/e) may be an interaction point between cytoscillator and canonical circadian TTFL (Fig. 4). Strictly speaking, this model is not directly supported by the experimental setting of the current manuscript. The contribution of CKId/e is evaluated in the presence of PER by monitoring the canonical TTFL output (i.e. PER2::LUC); thus it is not clear whether the kinase determines the period of cytoscillator. It would be valuable to ask whether the PF and CHIR have the period-lengthening effect on the Nrd1:LUC in the CPKO cell.

      Another excellent suggestion, thanks. The experiment, showing similar results in CKO and CPKO cells, was performed and is now reported in Revised Figure S5D. The text was amended as follows: “We found that inhibition of CK1d/e and GSK3-α/β had the same effect on circadian period in CKO cells, CPKO cells, and WT controls (Figure 5A, B, S5A, B, D).”

      Moreover, our data are further supported by findings in RBCs, where CK1 inhibition affects circadian period in a similar manner as in WT and CKO cells (Beale et al, JBR 2019).

      **Minor comments:**

      4) The authors argue that the CKO cells' rhythmicity is entrained by the temperature cycle (Fig. 2C). Because the data of CKO cell only shows one peak after the release of constant temperature phase, it is difficult to conclude whether the cell is entrained or just respond to the final temperature shift.

      We agree with the reviewer and have replaced the original figure with another recording that includes an extra circadian cycle in free-running conditions (Revised Figure 2C).

      5) It would be useful for readers to provide information on the known phenotype of TIMELESS knockout flies; TIM is widely accepted as an essential component of the circadian clock in flies; are there any studies showing the presence of circadian rhythmicity in Tim-knockout flies (even if it is an oscillation seen in limited conditions, such as the neonatal SCN rhythm in mammalian Cry knockout)?

      The reviewer is correct that TIM is widely accepted as an essential component of the circadian clock in flies. Using more sensitive modern techniques however, ~50% of classic Tim01 mutant flies exhibit significant behavioural rhythms in the circadian range under constant darkness, as reported:

      https://opus.bibliothek.uni-wuerzburg.de/frontdoor/index/index/year/2015/docId/11914

      For this reason we employed a full gene knockout of the Timeless gene (Lamaze et al., Sci Rep, 2017), where the majority of flies are behaviourally arrhythmic under constant conditions following standard entrainment by light cycles and therefore represents a more appropriate model for CRY-deficient cells.

      We have revised the legend of Figure S2 to include the following:

      “N.B. The generation of Timout flies is reported in Lamaze et al, Sci Rep, 2017. Similar to CRY-deficient mice, whole gene Timeless knockout flies are characterised as being behaviourally arrhythmic under constant darkness following entrainment by light:dark cycles: https://opus.bibliothek.uni-wuerzburg.de/frontdoor/index/index/year/2015/docId/11914”

      5) Figure 3C shows that the amount of PER2::LUC mRNA changes ~2 fold between time = 0 hr and 24 hr in the CKO cell. This amplitude is similar to that observed in WT cell although the peak phase is different. Does the PER2::LUC mRNA level show the oscillation in CKO cells?

      No, we think we have shown convincingly this is not the case. We argue the data in figure 3C show that: (a) there is no circadian variation in mRNA PER2::LUC expression (mRNA levels increase but no trough is observed) and (b) that the temporal relationship between protein and mRNA as observed in WT is broken; i.e. the CRY-independent circadian variation in protein levels cannot be “driven by” changes in transcript levels. Similar results were obtained using transcriptional reporters Per2:LUC and Cry1:LUC (Figure S3E and F). Moreover, our findings are also in line with previous reports, such as Nangle et al. (2014, eLife) and Ode et al. (Mol Cell, 2017).

      6) Figure 3D: the authors discuss the amplitude and variation (whether the signal is noisier or not) of reporter luciferase expression between different cell lines. However, a huge difference in the luciferase signal can be observed even in the detrended bioluminescence plot. This reviewer concerns that some of the phenotypes of CKO and CPKO MEF reflect the lower transfection efficiency of the reporter gene, not the nature of circadian oscillators of these cell lines.

      As reported in the methods, these are stable cell lines rather than transiently transfected cells. The detrended luciferase data presented here do not actually reflect raw levels of luciferase protein expression, but rather reflect the amount of deviation from the 24 hour average. To make it easier to compare expression levels of Per2:LUC and Nr1d1:LUC between the different cell lines we have added figure S3H, presenting the average raw bioluminescence levels over 24 hours (after 24 hours of recovery from media change; ie from 24-48 hours). Using these data one can appreciate that expression levels of the Per2 reporter are never lower in CRY KO cells when compared to WT. We hope these data can take away the reviewer’s concerns about expression levels causing the differences observed.

      Reviewer #1 (Significance (Required)): Although Cryptochrome (Cry) has been considered a central component of the mammalian circadian clock, several studies have shown that circadian rhythms are maintained in the absence of Cry, including in the neonate SCN and red blood cells. Thus, although the need for Cry as a circadian oscillator has been debated, its essential role as a circadian oscillator remains established, at least in the cell-autonomous clock driven by the TTFL. This study provides additional evidence that the circadian rhythmicity can persist in the absence of Cry. More general context, the presence of a non-TTFL circadian oscillator has been one of the major topics in the field of circadian clocks except for the cyanobacteria. In mammals, the authors’ and other groups lead the finding of circadian oscillation in the absence of canonical TTFL by showing the redox cycle in red blood cells (O’Neil, Nature 2011). The presence of circadian oscillation in the absence of Bmal1 is also reported recently(Ray, Science 2020). Bmal1(-CLOCK), CRY, and PER compose the core mechanism of canonical circadian TTFL; thus, this manuscript put another layer of evidence for the non-TTFL circadian oscillation in mammals. Overall, the manuscript reports several surprising results that will receive considerable attention from the circadian community. This reviewer has expertise in the field of mammalian circadian clocks, including genomics, biochemistry, and mice's behavior analysis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In the canonical model of the mammalian circadian system, transcription factors, BMAL1/CLOCK, drive transcription of Cry and Per genes and CRY and PER proteins repress the BMAL1/CLOCK activity to close the feedback loop in a circadian cycle. The dominant opinion was that CRY1 and CRY2 are essential repressors of the mammalian circadian system. However, this was challenged by persistent bioluminescence rhythms observed in SCN slices derived from Cry-null mice (Maywood et al., 2011 PNAS) and then by persistent behavior rhythms shown by the Cry1 and Cry2 double knockout mice if they are synchronized under constant light prior to free running in the dark (Ono et al., 2013 PLOS One). In the manuscript, the authors first confirmed behavioral and molecular rhythms in the Cry1/Cry2- deficient mice and then provided evidence to suggest the rhythms of Per2:LUC and Nr1d1:LUC in CKOs are generated from the cytoplasmic oscillator instead of the well-studied transcription and translation feedback loop: Constant Per2 transcription driven by BMAL1/CLOCK plus rhythmic degradation of the PER protein result in a rhythmic PER2 level in the absence of both Cry1 and Cry2, which suggests a connection between the classic transcription- and translation-based negative feedback loops and non-canonical oscillators. **Major points:** Line 38-39, "Challenging this interpretation, however, we find evidence for persistent circadian rhythms in mouse behavior and cellular PER2 levels when CRY is absent." The rhythmic behavioral phenotype of cry1 and cry2 double knockout mice was first documented by Ono et al., 2013 PLOS ONE, in which eight cry1 and cry2 double knockout mice after synchronization in the light displayed circadian periods with different lengths and qualities. The paper reported two period lengths from the Cry mutant mice: "An eye-fitted regression line revealed that the mean shorter period was 22.86+/-0.4 h (n= 8) and the mean longer period was 24.66+/-0.2 h (n =9). The difference of two periods was statistically significant (p, 0.01).", either of which is quite different from the ~16.5 hr period in Figure 1B of the manuscript. A brief discussion on the period difference between studies will be helpful for readers to understand. Period information from the individual mouse should be calculated and shown since big period variations exist among CKO mice (Ono et al., 2013 PLOS One).

      Thanks for this suggestion. The mice used by Ono et al were raised from birth in constant light, whereas we used mice that were weaned and raised in normal LD cycles before being subject to constant light then constant dark as adults. Instead of the somewhat subjective fitting of regression lines by eye performed by Ono et al, our analysis was performed using the periodogram analysis routine of ClockLab 6.0 with a significance threshold for rhythmicity of p=0.0001. We have now repeated this experiment with 10 adult CKO mice (male and female), and found no evidence for two period lengths in that the second most significant period was consistently double that of the first. As the reviewer suggests, there is a much broader distribution of CKO mouse periods compared with WT, as we also found in cultured cells and SCN. These new data are now reported in revised Figure S1B & C. We have also included a statement about how our study differs from Ono et al in the supplementary discussion.

      The behavioral phenotype of Cry-null mice and luminescence from their SCNs are robustly rhythmic while fibroblasts derived from these mice only produce rhythms with very low amplitudes compared with those in WT, which may reflect the difference between the SCN’s rhythm and peripheral clocks. The behavioral phenotype is supposed to be controlled mainly by SCN. However, most molecular analyses in the work were done with MEF and lung fibroblasts. These tissues may not be the best representative of the behavioral phenotype of the CKO mice.

      Behavioural rhythms of CKO mice are significantly less robust than WT, with mean amplitude less than 50% of WT controls (Figures 1A & B, revised S1B. Furthermore, as reported, 40% of CKO SCN slices exhibited PER2::LUC rhythms, compared with 100% of WT SCN slices (as also observed by Maywood et al., PNAS, 2013), and therefore are also less robust by the definition used in this manuscript.

      As now discussed in the revised supplementary discussion:

      Circadian timekeeping is a cellular phenomenon. Co-ordinated ~24h rhythms in behaviour and physiology are observed in multi-cellular mammals under non-stressed conditions when individual cellular rhythms are synchronised and amplified by appropriate extrinsic and intrinsic timing cues.”

      The objective of this study was to understand the fundamental determinants that allow mammalian cells to generate a circadian rhythm, which we find does not include an essential role for CRY genes/proteins. Thus the cell is the appropriate level of biological abstraction at which to investigate the phenomenon, whereas the SCN and behavioural recordings simply serve to illustrate the competence of CRY-independent timing mechanisms to co-ordinate biological rhythms at higher levels of biological scale which are manifest under some conditions. To reiterate, the behavioural data supports the cellular observations, not the converse.

      Stronger evidence is needed to fully exclude the possibility that in CKO cells, the rhythm is not generated by PERs' compensation for the loss of Crys to repress BMAL1 and CLOCK. Since the rhythms of Per:LUC or Nr1d1:LUC (Figures 3D and S3E) are much weaker than those in WT, molecular analyses might not be sensitive enough to reflect the changes across a circadian cycle in the CKOs if the TTFL still occurs. CLOCKΔ19 mutant mice have a ~4 hr longer period than WT (Antoch et al., 1997 Cell; King et al., 1997 Cell). CLOCKΔ19; CKO cells or mice should be very helpful to address the question. Periods of Per:LUC and Nr1d1:LUC from the CLOCKΔ19; CKO should be similar to those in the CKO alone if the transcription feedback does not contribute to their oscillations.

      We agree this would be an interesting experiment, however the data in this manuscript and Wong et al. (BioRxiv, 2020), whilst not disputing the existence of the TTFL, strongly suggest that it fulfils a different function to that which is currently accepted and is not the mechanism that ultimately confers circadian periodicity upon mammalian cells. CLOCKΔ19 is an antimorphic gain-of-function mutation with many pleiotropic effects. Therefore, if the TTFL is not the basis of circadian timekeeping in mammalian cells, it follows that the CLOCKΔ19 mutation may not elicit its effects on circadian rhythms through delaying the timing of transcriptional activation, as was proposed. As such, whether or not CLOCKΔ19 alters circadian period of CKO cells/mice would not allow the two models to be distinguished in the way that the reviewer envisions.

      Secondly, we cannot detect any interaction between PER2 and BMAL1 in the absence of CRY using an extremely sensitive assay.

      Thirdly, very strong biochemical evidence suggests that PER has no repressive function in the absence of CRY (Chiou et al., 2016; Kume et al., 1999; Ode et al., 2017; Sato et al., 2006).

      Finally, in several figures particularly 3C and 4A, we show that PER2 peaks at the same time CKO and WT cells, but in CKO cells this is not accompanied by a coincident peak in the mRNA. Thus, even if PER were able to repress BMAL1/CLOCK without CRY, rhythms in PER2 protein level could not be explained by some residual PER/BMAL1-dependent TTFL mechanism.

      To address the reviewer’s concern however, we have employed mouse red blood cells which offer unambiguous insight into the causal determinants of circadian timing, as we can be absolutely confident that there is no transcriptional contribution to cellular timekeeping. Briefly, we took fibroblasts and RBCs from WT, short period Tau/Tau and long period Afh/Afh mutant mice. The basis of the circadian phenotype of these mutations is quite well established as occurring through the post-translational regulation of PER and CRY proteins respectively, and result in short and long period PER2::LUC rhythms compared with WT fibroblasts. RBCs do not express PER or CRY proteins, and commensurately no genotype-dependent differences of RBC circadian period were observed (Beale et al, 2020, in submission). In contrast, RBC circadian rhythms are sensitive to pharmacological inhibition of casein kinase 1 (Beale et al., JBR, 2019).

      Lines 51-52, "PER/CRY-mediated negative feedback is dispensable for mammalian circadian timekeeping" and lines 310-311, "We found that transcriptional feedback in the canonical TTFL clock model is dispensable for cell-autonomous circadian timekeeping in animal and cellular models." The authors have not excluded the possibility that the rhythmic behaviors of the CKO mice are derived from the PERs' compensation for the role of Crys in the feedback loop of the circadian clock in the SCN. In the fibroblasts, only two genes, Per2 and Nr1d1, have been studied in the work, which cannot be simply expanded to the thousands of circadian controlled genes. Also amplitudes of PER2:LUC and NR1D1:LUC in the CKOs are much lower than those in WT and no evidence has been provided to show that their weak rhythms are biologically relevant.

      The definition of a circadian rhythm (Pittendrigh, 1960) does not mention biological relevance or stipulate any lower threshold for amplitude. As now stated in the revised text (page 6):

      PER2::LUC rhythms in CKO cells were temperature compensated (Figure 2A, B) and entrained to 12h:12h 32°C:37°C temperature cycles in the same phase as WT controls (Figures 2C), and thus conform to the classic definition of a circadian rhythm (Pittendrigh, 1960) – which does not stipulate any lower threshold for amplitude or robustness.

      We make no claims about biological relevance or amplitude in this manuscript, which are addressed in our related manuscript (Wong et al., BioRxiv, 2020). In this related manuscript, we explicitly address whether CRY is necessary for mammalian cells to maintain a circadian rhythm in the abundance of clock-controlled proteins and find that it is not. Indeed, twice as many rhythmically abundant proteins are observed in CKO cells than WT controls, which suggests that, if anything, CRY functions to suppress rhythms in protein abundance rather than to generate them.

      We observe circadian rhythms in the activity of two different bioluminescent reporters, which have already been extensively characterised. The mouse and SCN data in figure 1 are correlative, and simply show that previous published observations are reproducible. PER2::LUC oscillations are not accompanied by Per2 mRNA oscillations. This, together with the absence of a BMAL1-PER2::LUC complex strongly argues against a model where PER2 oscillations are driven by residual (PER2-driven) transcriptional oscillations.

      We therefore concede the reviewer’s point that we “cannot exclude rhythmic behaviors of the CKO mice are derived from the PERs' compensation for the role of Crys in the feedback loop of the circadian clock in the SCN”. The reviewer will agree however, that there exists very strong biochemical evidence suggests that PER has no repressive function in the absence of CRY (Chiou et al., 2016; Kume et al., 1999; Ode et al., 2017; Sato et al., 2006); that there exists no experimental evidence to suggest that PERs can fulfil this function in the absence of CRY in any mammalian cellular context; and finally that our observations are not consistent with the canonical model for the generation of circadian rhythms in mammals.

      We have therefore amended the text to focus on CRY specifically, as follows:

      PER/CRY-mediated negative feedback is dispensable for mammalian circadian timekeeping

      Page 12. “We found that CRY-mediated transcriptional feedback in the canonical TTFL clock model is dispensable for cell-autonomous circadian timekeeping in cellular models. Whilst we cannot exclude the possibility that in the SCN, but not fibroblasts, PER alone may be competent to effect transcriptional feedback repression in the absence of CRY, we are not aware of any evidence that would render this possibility biochemically feasible.”

      **Minor points:** Lines 66-67, "...(Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)." to "... (reviewed in Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)."

      Thanks, changed as requested.

      Line 70, "...((Liu et al., 2008..." to "...(Liu et al., 2008..."

      Thanks, changed as requested.

      Lines 174-175, "Considering recent reports that transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ...". Larrondo et al., 2015 paper says "however, in such ∆fwd-1 cells, the amount of FRQ still oscillated, the result of cyclic transcription of frq and reinitiation of FRQ synthesis." The point of the paper is "we unveiled an unexpected uncoupling between negative element half-life and circadian period determination." instead of "...transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ,"

      This is a good point which, following discussion with Profs Dunlap and Larrondo, we have revised into “no obligate relationship between clock protein turnover and circadian regulation of its activity” – a more accurate summary of their findings.

      Lines 249-252, "CKO cells exhibit no rhythm in Per2 mRNA (Figure 3C, D), nor do they show a rhythm in global translational rate (Figure S4A, B), nor did we observe any interaction between BMAL1 and S6K/eIF4 as occurs in WT cells (Lipton et al, 2015) (Figure S4C)." In figures 3D and S3E, in CKO and CPKO cells the Per2:LUC data without fitting look better than that of Nr1d1:LUC. But the Nr1d1:LUC rhythm became clear after fitting the raw data. So to better visualize the low amplitude rhythm, if any, of Per2:LUC and compare with Nr1d1:LUC, fitted the Per2:LUC data in CKOs and CPKOs in Figure 3D and S3E should be shown as what has been done to Nr1d1:LUC.

      Thanks, these data can be found in Figure S3F. The detrended Per2:Luc CKO and CPKO bioluminescence traces were better fit by the null hypothesis (straight line) than a damped sine wave (p>0.05) and so were not significantly rhythmic by the criteria used in this manuscript.

      Lines 258-259, "much less than the half-life of luciferase expressed in fibroblasts under a constitutive promoter" In figure S4D, the y-axis of the PER2::LUC is ~800 while the y-axis of the SV40::LUC is ~600000. The over-expressed LUC by the SV40 promoter might saturate the degradation system in the cell so the comparison is not fair. A weaker promoter with the level similar to Per2 should be used to make the comparison.

      Thank you for this suggestion. In our experience, the SV40 promoter is actually a rather weak promoter compared with CMV, and faithfully facilitates the constitutive (non-rhythmic) expression of heterologous proteins such as Luciferase (Feeney et al., JBR, 2016). It has been shown previously that constitutive over-expression of heterologous proteins such as GFP or even CRY1 does not affect circadian rhythms in fibroblast cells (e.g. Chen et al., Mol Cell, 2009). To address the reviewer’s reasonable concern however, multiple stable SV40:Luc fibroblast lines were generated by puromycin selection, grown to confluence in 96-well plates, then treated with 25 μg/mL CHX at the beginning of the recording. Random genomic integration of SV40:Luc leads to a broad range of different levels of luciferase expression, evident from the broad range of initial luciferase activities. For each line the decline in luciferase activity was fit with a simple one-phase exponential decay curve (R2≥0.98) to derive the half-life of luciferase in each cell line. There was no significant relationship between the level of luciferase expression and luciferase stability (straight line vs. horizontal line fit p-value = 0.82). Therefore constitutive expression of SV40:Luc in fibroblasts does affect the cellular protein degradation machinery within the range of expression used for our half-life measurements. These new data are reported in Revised Figure S3H.

      Line 430, "sigma" to "Sigma".

      Changed

      In figure S2, the classification of rhythms in Drosophila is not clear since even the "Robustly rhythmic" ones have high background noise. Detrending or fitting the data might be able to improve the quality of the rhythms prior to classification.

      These are noisy data as they come from freely behaving flies. The mean data was shown in Figure S3A and individual examples in S3B, and look very similar to previous bioluminescence fly recordings of XLG-LUC flies in papers from the Stanewsky lab who have published extensively using this model. The classifications arose from double-blinded analysis of the bioluminescence traces by several individuals, but we agree that this was not clearly communicated in our original submission. In Revised figure S2 we now present the mean bioluminescence traces, with and without damped sine wave vs. straight line fitting, as suggested, which is more consistent with the mammalian cellular data presented elsewhere.

      In figure S3B, the original blots for Per2 including Input and IP should be shown.

      The original blots for BMAL1 are shown in figure S3I. PER2::LUC levels were assessed by measuring bioluminescence levels present on the anti-bmal1-beads, as described in the figure 3B legend.

      Supplemental information Line 44, "...(reviewed in (Lakin-Thomas,..." to "...(reviewed in Lakin-Thomas,..."

      Changed

      Line 188, "Period CDS", the full name of CDS should be provided the first time it appearances.

      Changed to “coding sequence”.

      Reviewer #2 (Significance (Required)): The work suggests a link between the TTFL and non-canonical oscillators, which should be interesting to the circadian field.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): **Summary:** The paper "CRYPTOCHROMES confer robustness, not rhythmicity, to circadian timekeeping" by Putker et al. answers the question of whether or not the rhythmic abundance of clock proteins is a prerequisite for circadian timekeeping. They addressed this by monitoring PER2::LUC rhythms in WT and CRY KO (CKO) cells. CRY forms a complex with PER, which in turn represses the ability of CLOCK/BMAL1 to drive the expression of clock-controlled genes, including PER and CRY. Consistent with previous observations, the authors found residual PER2::LUC rhythms in CKO SCN slices, fibroblasts and in a functional analogue KO of CRY in Drosophila, even in the absence of rhythmic Per2 transcription due to the loss of CRY as a negative regulator of the oscillation. They have shown that these rhythms, in the absence of CRY, follow the formal definition of circadian rhythms. They attributed these residual PER2::LUC rhythms to the maintenance of oscillation in PER2::LUC stability independent of CRY, by testing the decay kinetics of luciferase activity when translation is inhibited. Moreover, they implicated the kinases CK1d/e and GSK3 to be involved in regulating PER2::LUC post-translational rhythms through kinase inhibitor studies. They concluded that CRY is not necessary for maintaining PER2::LUC rhythms, but plays an important role in reinforcing high-amplitude rhythms when coupled to a proposed "ctyoscillator" likely composed of CK1d/e and GSK3. **Major comments:** The authors have shown sufficient data that under different testing conditions (mice locomotor activity, SCN preps or fibroblasts), behavioral rhythms and PER2::LUC rhythms are still observed in the CRY KO (CKO) cells, contrary to a previous study (Liu et al., 2007). They also indicated limitations to some of the.experimental work. However, there are some parts of the paper that need clarification to support their conclusions. 1.In Fig. 1A, the x-axes of the actograms for WT and CKO are different. While they mentioned this in the figure legend, and described the axis transformation in Fig. S1A, they need a justification statement about why they did this in the results.

      Thanks, we have included the following sentence in the results section as requested:

      Figure 1 representative actograms are plotted as a function of endogenous tau (**t) to allow the periodic organisation of rest-activity cycles to be readily discerned; 24h-plotted actograms are shown in Figure S1A and S2A

      2.In an attempt to show conservation of their proposed role for CRY, they tested the model system Drosophila melanogaster where TIMELESS serves as the functional analogue of CRY. While they showed in the figures and described in the text that rhythms still persisted with lower relative amplitude in the TIMELESS-deficient flies, they did not describe any period differences between WT and mutant. Showing the period quantification in Supp. Fig. S2 using the robustly rhythmic datasets, and describing this data in the text, will strengthen their claim.

      These analyses are now reported in revised Figure S2 as requested. As described in our response to reviewer 2, the “robustly rhythmic” flies were scored as such through double-blinded analysis by several individuals. We hope the reviewer will appreciate our concern that exclusion of the majority of TIMELESS-deficient flies that were not robustly rhythmic might skew their apparent period by unconscious bias towards favouring traces that most clearly resemble robustly rhythmic WT controls. To avoid any potential bias we therefore included all flies of both genotypes in the analysis of circadian period for the revised figure, as suggested by our other reviewers.

      In Fig. S2B, there is no clear distinction between the representative datasets shown for poorly rhythmic and arrhythmic, i.e. they all appear arrhythmic, without an indicated statistical test. The authors could present better representative data to better reflect the categories.

      As described above, we now show the grouped mean with and without fitting for all flies of both genotypes. The statistical test for rhythmicity and analysis of circadian period is now the same as was performed for the cellular data presented elsewhere.

      3.In Fig. 2A, the authors note the lack of rhythmicity in the CKO fibroblasts in the 1st three days at 37oC. How are the conditions here different from fibroblasts in Fig. 1E, where rhythms are seen during the 1st three days in CKO fibroblasts?

      As discussed in the manuscript, PER2::LUC rhythms in CKO cells and SCN are observed stochastically between recordings i.e. if one dish in a recording showed rhythms, all dishes showed rhythms and vice versa. The media change that occurred after 3 days in Fig 2A, in this case, was sufficient to initiate clear rhythms of PER2::LUC in all experimental replicates. In other experiments, media change did not have this effect. Herculean efforts by multiple lab members over many years, including the PI, have been unable to delineate the basis of this variability – which is discussed at length in the thesis of Dr. David Wong https://www.repository.cam.ac.uk/handle/1810/300610. As such, we clearly state in the discussion:

      We were unable to identify all of the variables that contribute to the apparent stochasticity of CKO PER2::LUC oscillations, and so cannot distinguish whether this variability arises from reduced fidelity of PER2::LUC as a circadian reporter or impaired timing function in CKO cells. In consequence, we restricted our study to those recordings in which clear bioluminescence rhythms were observed, enabling the interrogation of TTFL-independent cellular timekeeping.”

      1. The authors claimed in the results section- "in contrast and as expected, Per2 mRNA in WT cells varied in phase with co-recorded PER2::LUC oscillations." but Fig. 3C does not show this expected lag between mRNA and protein levels. This needs to be explained

      No lag is expected in vitro. A lag between PER protein levels and Per mRNA does occur in vivo and is very likely to attributable to daily rhythms in feeding (Crosby et al, Cell, 2019), where increased insulin signalling elicits an increase in PER protein production 4-6h after E-box and GRE-stimulated increase in Per transcription.

      When luciferin is saturating intracellularly, PER2::LUC activity correlates most closely with the amount of PER2::LUC protein that was translated during the preceding 1-2h, rather than the total amount of PER2, due to the enzymatic inactivation of the luciferase protein (Feeney et al, JBR, 2016). Consistent with many previous observations, under constant conditions, the rate of nascent PER protein synthesis is largely determined by the level of Per2 mRNA, and thus more similar phases are observed between protein and mRNA in vitro than in vivo.

      We have inserted an additional citation of Feeney et al at this point in the text to make this clear.

      5.In Figs. 5A-B, the PER2::LUC periods in the CKO untreated cells seem to vary significantly between A, B, and C. While this could be due to the high variability in the rhythms that were previously described by the authors, the average periods here seem to be longer than the one reported in Fig. 1F. Are there specific condition differences?

      There are no specific condition differences. As reported in Figure S1B, D & E, the range of CKO cellular periods is simply much broader than for WT cells. Over several dozen experiments the average period was significantly shorter, but the period variance is an equally striking feature of rhythms in these cells which we take as evidence for their lack of robustness.

      *Would additional experiments be essential to support the claims of the paper?*

      1. There is sufficient experimental data to support the major claims; however some suggested experiments are listed below.

        a. If CKO exhibits residual rhythms in PER::LUC, it would be interesting to know how CRY overexpression influences PER2::LUC rhythms, or point to previous reference papers which may have already shown such effects. The prediction would be PER2::LUC levels will still be rhythmic when CRY is overexpressed. What would be the extent of "robustness" conferred by CRY on PER2::LUC rhythms based on CRY KO and overexpression studies?

      These experiments have largely already been performed (see Chen et al., Mol Cell; Nangle et al., eLife, 2014; Fan et al., Curr Biol, 2007; Edwards et al., PNAS, 2016) and are cited in this manuscript. As suggested, PER2 rhythms remain intact under CRY1 over-expression, though are clearly perturbed, but their robustness was not investigated in any detail. We hope to be able to address this important question in our subsequent work

      The authors found that CK1d/e and GSK3 contribute to CRY-independent PER2 oscillations by showing that addition of kinase inhibitors affect the PER2::LUC period lengths in WT and CKO in the same manner. It would be interesting to know if a) PER2::LUC stability and b) PER2 phosphorylation status, is affected in WT and CKO in the presence of the inhibitors, or point to previous reference papers which may already have shown such effects.

      As the reviewer points out, PER2 stability is already reported to be regulated via phosphorylation by GSK3 and CK1. We have made explicit reference to this in the revised manuscript as follows:

      In contemporary models of the mammalian cellular clockwork CRY proteins are essential for rhythmic PER protein production, however, the stability and activity of PER proteins are also regulated post-translationally (Lee et al., 2009; Philpott et al., 2020; Iitaka et al, 2005).”

      *Are the data and the methods presented in such a way that they can be reproduced?*

      1. The protocol for the inhibitor treatments are not in the main or supplemental methods.

      In the main text methods, section luciferase recordings we state: “For pharmacological perturbation experiments (unless stated otherwise in the text) cells were changed into drug-containing air medium from the start of the recording. Mock-treatments were carried out with DMSO or ethanol as appropriate.”

      *Are the experiments adequately replicated and statistical analysis adequate?*

      1. All experiments had the sufficient number of technical and biological replicates to make valid statistical analyses. For Fig. S2, the authors used RAIN to assess rhythmicity in WT and mutant flies, but it is not clear whether the different categories (rhythmic, poorly rhythmic, and arrhythmic) were based on amplitude differences alone, or a combination of amplitude and p-values as determined by RAIN.

      As reported above, we have revised the analysis of the fly data to be consistent with the cellular data reported elsewhere in the manuscript.

      **Minor comments:** *1. Are prior studies referenced appropriately?* Authors may wish to include Fan et al., 2007, Current Biology which demonstrated that cycling of CRY1, CRY2, and BMAL1 is not necessary for circadian-clock function in fibroblasts.

      Apologies for the omission of citation to this excellent paper. Now referenced in the introduction.

      *2. Are the text and figures clear and accurate?* Figures were clear and illustrated well. See minor comments on text below:

      1. Other minor comments

      Main Text: p3, line 62; p12, line l32: It doesn't seem necessary or appropriate to cite the dictionary for the definition of robust.

      Thanks for this suggestion. During preparation of the manuscript we found that there was some disagreement between authors as to the meaning of robustness in a circadian context. We therefore feel it most necessary to define clearly what we mean by the use of this word to avoid any potential ambiguity.

      p4, line l87: "~20 h" rhythms instead of "~20h-hour" p3, line 70; p5, line 121; p14, line 380; p16, line 416 and p18, line 458: Close parentheses have been doubled in parenthetical references. p14, line 363: "crassa" instead of "Crassa" p17, line 430: "Sigma" instead of "sigma" p18, lines 464 and 483; p20, line 521: put a space between numerical values and units, to be consistent with other entries p19, line 488: "luciferase" instead of Luciferase p20, line 512: "Cell Signaling" instead of "cell signalling" p20, line 526: "single" instead of "Single"

      We thank the reviewer for his/her thoroughness, all of the above have been changed.

      Main figures: Fig. 2 p37, line 921: close parenthesis was doubled on "red"

      This was actually correct.

      Fig. 4 p41, line 989: "0.1 mM" instead of "0.1 mM" for consistency throughout text Supplementary text: line 171: "30 mM HEPES" instead of "30mM HEPES" line 184: "Cell Signaling" instead of "cell signalling" Supplementary figures: Fig. S2A "Drosophila melanogaster" instead of "Drosophila Melanogaster"

      All of the above have been changed.

      Reviewer #3 (Significance (Required)): This paper revisits the previously proposed idea that rhythmic expression of central TTFL components is not essential for circadian timekeeping to persist. However, this paper does not add a significant advance in the understanding of the underlying reasons behind sustained clock protein rhythmicity like PER in the absence of CRY, since such mechanisms in functional analogs have been shown in other systems, like Neurospora (Larrondo et al., 2015). However, this paper does clarify some issues in the field, such as discrepancies between behavioral and cellular rhythms observed in CKO mice, leading future researchers to examine closely the conditions of their CKO rhythmic assays before making conclusions pertaining to rhythmicity. The identification of the kinases as components of the proposed cytosolic oscillator (cytoscillator) needs further validation, but this is perhaps beyond the scope of the paper. The data provides incremental evidence for the existence of a cytoscillator, but opens up opportunities to identify other players, like phosphatases, to establish the connection between the central TTFL and the proposed cytoscillator.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The paper "CRYPTOCHROMES confer robustness, not rhythmicity, to circadian timekeeping" by Putker et al. answers the question of whether or not the rhythmic abundance of clock proteins is a prerequisite for circadian timekeeping. They addressed this by monitoring PER2::LUC rhythms in WT and CRY KO (CKO) cells. CRY forms a complex with PER, which in turn represses the ability of CLOCK/BMAL1 to drive the expression of clock-controlled genes, including PER and CRY. Consistent with previous observations, the authors found residual PER2::LUC rhythms in CKO SCN slices, fibroblasts and in a functional analogue KO of CRY in Drosophila, even in the absence of rhythmic Per2 transcription due to the loss of CRY as a negative regulator of the oscillation. They have shown that these rhythms, in the absence of CRY, follow the formal definition of circadian rhythms. They attributed these residual PER2::LUC rhythms to the maintenance of oscillation in PER2::LUC stability independent of CRY, by testing the decay kinetics of luciferase activity when translation is inhibited. Moreover, they implicated the kinases CK1and GSK3 to be involved in regulating PER2::LUC post-translational rhythms through kinase inhibitor studies. They concluded that CRY is not necessary for maintaining PER2::LUC rhythms, but plays an important role in reinforcing high-amplitude rhythms when coupled to a proposed "ctyoscillator" likely composed of CK1and GSK3.

      Major comments:

      The authors have shown sufficient data that under different testing conditions (mice locomotor activity, SCN preps or fibroblasts), behavioral rhythms and PER2::LUC rhythms are still observed in the CRY KO (CKO) cells, contrary to a previous study (Liu et al., 2007). They also indicated limitations to some of the.experimental work. However, there are some parts of the paper that need clarification to support their conclusions.

      1.In Fig. 1A, the x-axes of the actograms for WT and CKO are different. While they mentioned this in the figure legend, and described the axis transformation in Fig. S1A, they need a justification statement about why they did this in the results.

      2.In an attempt to show conservation of their proposed role for CRY, they tested the model system Drosophila melanogaster where TIMELESS serves as the functional analogue of CRY. While they showed in the figures and described in the text that rhythms still persisted with lower relative amplitude in the TIMELESS-deficient flies, they did not describe any period differences between WT and mutant. Showing the period quantification in Supp. Fig. S2 using the robustly rhythmic datasets, and describing this data in the text, will strengthen their claim.

      In Fig. S2B, there is no clear distinction between the representative datasets shown for poorly rhythmic and arrhythmic, i.e. they all appear arrhythmic, without an indicated statistical test. The authors could present better representative data to better reflect the categories.

      3.In Fig. 2A, the authors note the lack of rhythmicity in the CKO fibroblasts in the 1st three days at 37oC. How are the conditions here different from fibroblasts in Fig. 1E, where rhythms are seen during the 1st three days in CKO fibroblasts?

      1. The authors claimed in the results section- "in contrast and as expected, Per2 mRNA in WT cells varied in phase with co-recorded PER2::LUC oscillations." but Fig. 3C does not show this expected lag between mRNA and protein levels. This needs to be explained

      5.In Figs. 5A-B, the PER2::LUC periods in the CKO untreated cells seem to vary significantly between A, B, and C. While this could be due to the high variability in the rhythms that were previously described by the authors, the average periods here seem to be longer than the one reported in Fig. 1F. Are there specific condition differences?

      Would additional experiments be essential to support the claims of the paper?

      1. There is sufficient experimental data to support the major claims; however some suggested experiments are listed below.

      a. If CKO exhibits residual rhythms in PER::LUC, it would be interesting to know how CRY overexpression influences PER2::LUC rhythms, or point to previous reference papers which may have already shown such effects. The prediction would be PER2::LUC levels will still be rhythmic when CRY is overexpressed. What would be the extent of "robustness" conferred by CRY on PER2::LUC rhythms based on CRY KO and overexpression studies?

      b. The authors found that CK1and GSK3 contribute to CRY-independent PER2 oscillations by showing that addition of kinase inhibitors affect the PER2::LUC period lengths in WT and CKO in the same manner. It would be interesting to know if a) PER2::LUC stability and b) PER2 phosphorylation status, is affected in WT and CKO in the presence of the inhibitors, or point to previous reference papers which may already have shown such effects.

      Are the data and the methods presented in such a way that they can be reproduced?

      1. The protocol for the inhibitor treatments are not in the main or supplemental methods.

      Are the experiments adequately replicated and statistical analysis adequate?

      1. All experiments had the sufficient number of technical and biological replicates to make valid statistical analyses. For Fig. S2, the authors used RAIN to assess rhythmicity in WT and mutant flies, but it is not clear whether the different categories (rhythmic, poorly rhythmic, and arrhythmic) were based on amplitude differences alone, or a combination of amplitude and p-values as determined by RAIN.

      Minor comments:

      1. Other minor comments

      Main Text:

      p3, line 62; p12, line l32: It doesn't seem necessary or appropriate to cite the dictionary for the definition of robust.

      p4, line l87: "~20 h" rhythms instead of "~20h-hour"

      p3, line 70; p5, line 121; p14, line 380; p16, line 416 and p18, line 458: Close parentheses have been doubled in parenthetical references.

      p14, line 363: "crassa" instead of "Crassa"

      p17, line 430: "Sigma" instead of "sigma"

      p18, lines 464 and 483; p20, line 521: put a space between numerical values and units, to be consistent with other entries

      p19, line 488: "luciferase" instead of Luciferase

      p20, line 512: "Cell Signaling" instead of "cell signalling"

      p20, line 526: "single" instead of "Single"

      Main figures:

      Fig. 2 p37, line 921: close parenthesis was doubled on "red"

      Fig. 4 p41, line 989: "0.1 mM" instead of "0.1 mM" for consistency throughout text

      Supplementary text:

      line 171: "30 mM HEPES" instead of "30mM HEPES"

      line 184: "Cell Signaling" instead of "cell signalling"

      Supplementary figures:

      Fig. S2A "Drosophila melanogaster" instead of "Drosophila Melanogaster"

      Significance

      This paper revisits the previously proposed idea that rhythmic expression of central TTFL components is not essential for circadian timekeeping to persist. However, this paper does not add a significant advance in the understanding of the underlying reasons behind sustained clock protein rhythmicity like PER in the absence of CRY, since such mechanisms in functional analogs have been shown in other systems, like Neurospora (Larrondo et al., 2015). However, this paper does clarify some issues in the field, such as discrepancies between behavioral and cellular rhythms observed in CKO mice, leading future researchers to examine closely the conditions of their CKO rhythmic assays before making conclusions pertaining to rhythmicity. The identification of the kinases as components of the proposed cytosolic oscillator (cytoscillator) needs further validation, but this is perhaps beyond the scope of the paper. The data provides incremental evidence for the existence of a cytoscillator, but opens up opportunities to identify other players, like phosphatases, to establish the connection between the central TTFL and the proposed cytoscillator.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In the canonical model of the mammalian circadian system, transcription factors, BMAL1/CLOCK, drive transcription of Cry and Per genes and CRY and PER proteins repress the BMAL1/CLOCK activity to close the feedback loop in a circadian cycle. The dominant opinion was that CRY1 and CRY2 are essential repressors of the mammalian circadian system. However, this was challenged by persistent bioluminescence rhythms observed in SCN slices derived from Cry-null mice (Maywood et al., 2011 PNAS) and then by persistent behavior rhythms shown by the Cry1 and Cry2 double knockout mice if they are synchronized under constant light prior to free running in the dark (Ono et al., 2013 PLOS One). In the manuscript, the authors first confirmed behavioral and molecular rhythms in the Cry1/Cry2- deficient mice and then provided evidence to suggest the rhythms of Per2:LUC and Nr1d1:LUC in CKOs are generated from the cytoplasmic oscillator instead of the well-studied transcription and translation feedback loop: Constant Per2 transcription driven by BMAL1/CLOCK plus rhythmic degradation of the PER protein result in a rhythmic PER2 level in the absence of both Cry1 and Cry2, which suggests a connection between the classic transcription- and translation-based negative feedback loops and non-canonical oscillators.

      Major points:

      Line 38-39, "Challenging this interpretation, however, we find evidence for persistent circadian rhythms in mouse behavior and cellular PER2 levels when CRY is absent." The rhythmic behavioral phenotype of cry1 and cry2 double knockout mice was first documented by Ono et al., 2013 PLOS ONE, in which eight cry1 and cry2 double knockout mice after synchronization in the light displayed circadian periods with different lengths and qualities. The paper reported two period lengths from the Cry mutant mice: "An eye-fitted regression line revealed that the mean shorter period was 22.86+/-0.4 h (n= 8) and the mean longer period was 24.66+/-0.2 h (n =9). The difference of two periods was statistically significant (p, 0.01).", either of which is quite different from the ~16.5 hr period in Figure 1B of the manuscript. A brief discussion on the period difference between studies will be helpful for readers to understand. Period information from the individual mouse should be calculated and shown since big period variations exist among CKO mice (Ono et al., 2013 PLOS One).

      The behavioral phenotype of Cry-null mice and luminescence from their SCNs are robustly rhythmic while fibroblasts derived from these mice only produce rhythms with very low amplitudes compared with those in WT, which may reflect the difference between the SCN's rhythm and peripheral clocks. The behavioral phenotype is supposed to be controlled mainly by SCN. However, most molecular analyses in the work were done with MEF and lung fibroblasts. These tissues may not be the best representative of the behavioral phenotype of the CKO mice.

      Stronger evidence is needed to fully exclude the possibility that in CKO cells, the rhythm is not generated by PERs' compensation for the loss of Crys to repress BMAL1 and CLOCK. Since the rhythms of Per:LUC or Nr1d1:LUC (Figures 3D and S3E) are much weaker than those in WT, molecular analyses might not be sensitive enough to reflect the changes across a circadian cycle in the CKOs if the TTFL still occurs. CLOCKΔ19 mutant mice have a ~4 hr longer period than WT (Antoch et al., 1997 Cell; King et al., 1997 Cell). CLOCKΔ19; CKO cells or mice should be very helpful to address the question. Periods of Per:LUC and Nr1d1:LUC from the CLOCKΔ19; CKO should be similar to those in the CKO alone if the transcription feedback does not contribute to the their oscillations.

      Lines 51-52, "PER/CRY-mediated negative feedback is dispensable for mammalian circadian timekeeping" and lines 310-311, "We found that transcriptional feedback in the canonical TTFL clock model is dispensable for cell-autonomous circadian timekeeping in animal and cellular models." The authors have not excluded the possibility that the rhythmic behaviors of the CKO mice are derived from the PERs' compensation for the role of Crys in the feedback loop of the circadian clock in the SCN. In the fibroblasts, only two genes, Per2 and Nr1d1, have been studied in the work, which cannot be simply expanded to the thousands of circadian controlled genes. Also amplitudes of PER2:LUC and NR1D1:LUC in the CKOs are much lower than those in WT and no evidence has been provided to show that their weak rhythms are biologically relevant.

      Minor points:

      Lines 66-67, "...(Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)." to "... (reviewed in Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)."

      Line 70, "...((Liu et al., 2008..." to "...(Liu et al., 2008..."

      Lines 174-175, "Considering recent reports that transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ...". Larrondo et al., 2015 paper says "however, in such ∆fwd-1 cells, the amount of FRQ still oscillated, the result of cyclic transcription of frq and reinitiation of FRQ synthesis." The point of the paper is "we unveiled an unexpected uncoupling between negative element half-life and circadian period determination." instead of "...transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ,"

      Lines 249-252, "CKO cells exhibit no rhythm in Per2 mRNA (Figure 3C, D), nor do they show a rhythm in global translational rate (Figure S4A, B), nor did we observe any interaction between BMAL1 and S6K/eIF4 as occurs in WT cells (Lipton et al, 2015) (Figure S4C)." In figures 3D and S3E, in CKO and CPKO cells the Per2:LUC data without fitting look better than that of Nr1d1:LUC. But the Nr1d1:LUC rhythm became clear after fitting the raw data. So to better visualize the low amplitude rhythm, if any, of Per2:LUC and compare with Nr1d1:LUC, fitted the Per2:LUC data in CKOs and CPKOs in Figure 3D and S3E should be shown as what has been done to Nr1d1:LUC.

      Lines 258-259, "much less than the half-life of luciferase expressed in fibroblasts under a constitutive promoter" In figure S4D, the y-axis of the PER2::LUC is ~800 while the y-axis of the SV40::LUC is ~600000. The over-expressed LUC by the SV40 promoter might saturate the degradation system in the cell so the comparison is not fair. A weaker promoter with the level similar to Per2 should be used to make the comparison.

      Line 430, "sigma" to "Sigma".

      In figure S2, the classification of rhythms in Drosophila is not clear since even the "Robustly rhythmic" ones have high background noise. Detrending or fitting the data might be able to improve the quality of the rhythms prior to classification.

      In figure S3B, the original blots for Per2 including Input and IP should be shown.

      Supplemental information

      Line 44, "...(reviewed in (Lakin-Thomas,..." to "...(reviewed in Lakin-Thomas,..."

      Line 188, "Period CDS", the full name of CDS should be provided the first time it appearances.

      Significance

      The work suggests a link between the TTFL and non-canonical oscillators, which should be interesting to the circadian field.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      This interesting study by Putker et al. showed that circadian rhythmicity persists in several typical circadian assay systems lacking Cry, including Cry knockout mouse behavior and gene expression in Cry knockout fibroblasts. They further demonstrated weak but significant circadian rhythmicity in Cry- and Per- knockout cells. Cry- (and potentially Per-)-independent oscillations are temperature compensated, and CKId/e still has a role in the period regulation of Cry-independent oscillations.

      Major comments:

      1) The authors propose that the essential role of mammalian Cryptochrome is to bring the robust oscillation. As the authors analyze in many parts, the robustness of oscillation can be validated by the (relative) amplitude and phase/period variation, both of which should be affected significantly by the method for cell synchronization. Unfortunately, the method for synchronization is not adequately written in this version of supplementary information. This reviewer has no objection to the "iterative refinement of the synchronization protocol" but at least the correspondence between which methods were used in which experiments needs to be clearly explained. The detailed method may be found in the thesis of Dr. Wong, but the methods used in this manuscript need to be detailed within this manuscript.

      2) The authors revealed that CKO mice have apparent behavioral rhythmicity under the condition of LL>DD. This is an intriguing finding. However, it should be carefully evaluated whether this rhythmicity (16 hr cycle) is the direct consequence of circadian rhythmicity observed in CKO and CPKO cells (24 hr cycle) because the period length is much different. Is it possible to induce the 16 hr periodicity in CKO mice behavior by 16 hr-L:16 hr-D cycle? Would it be a plausible another possibility that the 16 hr rhythmicity is the mice version of internal desynchronization or another type of methamphetamine-induced-oscillation/food-entrainable-oscillattion?

      3) The authors proposed that CKId/e at least in part is the component of cytoscillator (Fig. 5D), and turnover control of PER (likely to be controlled by CKId/e) may be an interaction point between cytoscillator and canonical circadian TTFL (Fig. 4). Strictly speaking, this model is not directly supported by the experimental setting of the current manuscript. The contribution of CKId/e is evaluated in the presence of PER by monitoring the canonical TTFL output (i.e. PER2::LUC); thus it is not clear whether the kinase determines the period of cytoscillator. It would be valuable to ask whether the PF and CHIR have the period-lengthening effect on the Nrd1:LUC in the CPKO cell.

      Minor comments:

      4) The authors argue that the CKO cells' rhythmicity is entrained by the temperature cycle (Fig. 2C). Because the data of CKO cell only shows one peak after the release of constant temperature phase, it is difficult to conclude whether the cell is entrained or just respond to the final temperature shift.

      5) It would be useful for readers to provide information on the known phenotype of TIMELESS knockout flies; TIM is widely accepted as an essential component of the circadian clock in flies; are there any studies showing the presence of circadian rhythmicity in Tim-knockout flies (even if it is an oscillation seen in limited conditions, such as the neonatal SCN rhythm in mammalian Cry knockout)?

      5) Figure 3C shows that the amount of PER2::LUC mRNA changes ~2 fold between time = 0 hr and 24 hr in the CKO cell. This amplitude is similar to that observed in WT cell although the peak phase is different. Does the PER2::LUC mRNA level show the oscillation in CKO cells?

      6) Figure 3D: the authors discuss the amplitude and variation (whether the signal is noisier or not) of reporter luciferase expression between different cell lines. However, a huge difference in the luciferase signal can be observed even in the detrended bioluminescence plot. This reviewer concerns that some of the phenotypes of CKO and CPKO MEF reflect the lower transfection efficiency of the reporter gene, not the nature of circadian oscillators of these cell lines.

      Significance

      Although Cryptochrome (Cry) has been considered a central component of the mammalian circadian clock, several studies have shown that circadian rhythms are maintained in the absence of Cry, including in the neonate SCN and red blood cells. Thus, although the need for Cry as a circadian oscillator has been debated, its essential role as a circadian oscillator remains established, at least in the cell-autonomous clock driven by the TTFL. This study provides additional evidence that the circadian rhythmicity can persist in the absence of Cry.

      More general context, the presence of a non-TTFL circadian oscillator has been one of the major topics in the field of circadian clocks except for the cyanobacteria. In mammals, the authors' and other groups lead the finding of circadian oscillation in the absence of canonical TTFL by showing the redox cycle in red blood cells (O'Neil, Nature 2011). The presence of circadian oscillation in the absence of Bmal1 is also reported recently(Ray, Science 2020). Bmal1(-CLOCK), CRY, and PER compose the core mechanism of canonical circadian TTFL; thus, this manuscript put another layer of evidence for the non-TTFL circadian oscillation in mammals.

      Overall, the manuscript reports several surprising results that will receive considerable attention from the circadian community.

      This reviewer has expertise in the field of mammalian circadian clocks, including genomics, biochemistry, and mice's behavior analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the Reviewers for the positive assessment of our work and their insightful remarks. Please find below a point-by-point response to each comment.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Scheckel et al. report a large dataset on cell type-specific translational profiling of PrD-associated molecular alterations in the a mouse model thorough RiboTRAP and ribosome profiling approaches. They report a more severe alteration in the translatome specifically in astrocyte and microglia as compared to neuronal populations. This highlights that changes in these two cell classes might have a predominant role in the pathology of PrD.

      Data and the methods are presented such that they can be reproduced. The data analysis section of the manuscript could be further elaborated. In particular, it could be clarified which / how comparisons with existing dataset have been performed. Statistical analysis description is sometimes missing (e.g. fig 6e, not clear what the stars on top of the bars stands for, which test was performed and the significance). Moreover, the section of the methods regarding the western blots presented in figure 6 appear to be missing.

      Fig 6e shows the output (log2 fold change) of DESeq2. Genes with a Benjamini-Hochberg adjusted p value \*Major concern:**

      The most important improvement the authors should consider for their paper is to more specifically attempt to isolate specific effects on translational efficiency of mRNAs. As it stands, the authors largely use RiboTrap data as a reference to compare their footprinting data - but arguably, this misses mRNAs that are present in the transcriptome and not efficiently recruited onto ribosomes. It appears to be somewhat a lost opportunity to not attempt to test in the dataset (possibly by comparison to RNA-Seq from FACS isolated cells as a reference) whether there is a systematic change in translational efficiency (possibly in mRNAs with specific features?). In the current form, the RiboTrap and footprinting approaches largely serve to isolate mRNAs from cre-defined cell types but given the lack of a "total transcriptome" reference from the respective cells, it can not be easily interpreted whether certain transcripts are heavily regulated at the level of translation. Thus, despite using much more advanced methodologies than the Sorce study, the fundamental conclusions emerging from this work are rather similar to this previously published piece of work.

      Translational changes can be assessed in a cell-type specific manner without artefacts related to dissociation/isolation procedures and are arguably more relevant than transcriptional changes (Haimon et al., Nat. Immunol. 2018). Both, the assessment of translation as well as the investigation of specific cell types differentiates this study from transcriptional profiling studies including Sorce et al. Accordingly, our approach identified > 1000 cell-type specific translational changes that were missed in the Sorce study (Fig. 5a-d).

      We agree however with the reviewer that a comparison of our data with RiboTrap data does not take non-transcribed RNAs into account. We have refrained from such a comparison for several reasons:

      We agree with the reviewer that a systematic comparison of transcriptomes and translatomes in the assessed cell types at every time point would have allowed us to identify genes regulated on a post-transcriptional level. The goal of this study was however to identify biologically relevant prion-induced molecular changes in a cell-type specific manner rather than identify post-transcriptional regulation. To assess the validity of our approach we chose closely related datasets (RiboTrap datasets) to compare our data to. The inclusion of RNAseq datasets from FACS-isolated cells would require an additional 2 years of work since all samples and datasets would need to be newly generated (breeding mice, inoculating mice with prions and waiting for up to 8 months for mice to reach the terminal time point, establishing procedures, generating and analyzing datasets) RNA-Seq from FACS isolated neurons is problematic due to neuronal processes often being lost during the dissociation/isolation procedures. Additionally, dissociation/isolation procedures typically introduce stress-related artefacts. These procedure-induced changes complicate comparisons with techniques that have been optimized to avoid such artefacts (including the method applied in this manuscript). Differences between transcriptional and translational datasets could thus be either due to post-transcriptional regulation or due to artefact differences and are likely difficult to interpret.

      **Additional suggestions:**

      1) In Figure 1d the authors point out occasional neuronal cells exhibiting Rpl10a-GFP expression with arrows. It appears that these arrows may have moved during figure preparation - please check/fix if necessary.

      Thank you for pointing this out. We have fixed the arrows.

      2) In Supplementary Figure 1b and c it appears that the PV labeling is missing in the panel for Rpl10a:GFP controls. If this is intentional please indicate this in the figure legend.

      A co-localization of GFP-positive cells and PV was assessed only in Cre-positive (GFP expressing) mice but not in Cre-negative mice that don’t express GFP. We have clarified this point in the corresponding figure legend.

      3) It appears that the authors sequenced a significant number of libraries generated for multiple time points post-inoculation. From the figures and legends it was not entirely clear to me, how many replicates were analyzed given that in some analyses samples from different time points were combined in a single plot.

      All analyzed samples are listed in Supplementary File 1. We have emphasized this pointed in the results section.

      4) It was unclear to me how long after inoculation the group of "terminally ill" mice were sacrificed. Somewhere in the text it states that there are 2 months between 24 wpi and terminally ill - but it appears that this was not a preset timepoint but varied from animal to animal based on symptoms. Please clarify.

      We sacrifice mice at the last humane time point possible at which they show terminal disease symptoms, including piloerection, hind limb clasping, kyphosis and ataxia. Intraperitoneal inoculated mice reach that time point at 31 - 32 weeks post inoculation (+/- few days). Control mice (inoculated with non-infectious brain homogenate) were sacrificed at the same time. We have clarified this point in the methods section.

      5) From the Western blot data in Figure 6f the authors conclude that GFAP expression is upregulated in PrD mice whereas astrocyte number is unchanged. Given that the translatome is assessed based on a Rpl10-GFP dependent on recombination mediated by cre driven from GFAP promoter it is possible that the astrocytic alterations in ribosome footprints are in part a secondary consequence of increased Rpl10-GFP recombination/ expression in PrD mice (due to activation of the GFAP promoter). To estimate the impact of such an effect the authors should compare GFP levels in terminally ill control and PrD mice by western blotting.

      We agree with the reviewer that this information would be important to add. We have therefore assessed GFP levels in Rpl10a:GFP mice bred with GFAPCre and Cx3cr1CreER mice. The corresponding western blots are included in Supplementary Figure 11. GFP levels remained constant in terminally ill GFAPCre mice. This is not surprising since even a low GFAP promoter activity is likely to allow sufficient Cre recombinase expression to remove a STOP cassette allowing GFP expression (controlled by the Rosa26 promoter) in GFAPCre mice. In contrast, we observed an increase in GFP expression in terminally Cx3cr1CreER mice, which is most likely linked to the increase in microglia numbers. As pointed out in the manuscript, the translational changes we identified cannot reflect differences in cell numbers due to the nature of our assay. This suggests that a difference in GFP expression does not impact our analyses.

      We have added this data to the manuscript.

      6) The western blot analysis of fig 6f-g has been performed using a normalization over calnexin, yet no calnexin signals shown to support this statement.

      We have included blots of the normalization control calnexin as Supplementary Figure 11a.

      7) Clarify the percentage of non-parenchimal machrophages that are accounting for the Cx3cr1-creER mouse line since the authors consider this only to be a minor contamination.

      The labeling of non-parenchymal macrophages using Cx3cr1CreER mice has previously been estimated to be ~1% (Haimon et al., Nat. Immunol. 2018). We have added this information to the manuscript.

      8) Regarding the presentation of the data, Fig 5a would be clearer if in the y axes, for each cell type the order of PrD and Ctrl samples was maintained.

      Fig 5a displays hierarchical clustering based on Euclidian distances. As samples are ordered according their distance from each other, we cannot change the order as suggested by the reviewer.

      Reviewer #1 (Significance (Required)):

      Overall, this is an important and interesting study. Besides its insights into the biology, the transcriptomic data will provide a valuable resource for researchers in the field.

      Previous studies employed bulk RNAseq or microdissection for mapping transcriptomic changes (Majer et al.2019; Sorce et al. 2020 and others). The Sorce et al study concluded that astrocytic alterations in the transcriptome are more dominant than neuronal gene expression changes. While the conclusion of the present study remains the same, it is the first to use of ribosome profiling to dissect actively translated transcripts over the progression of the pathology in the mouse model. Thus, the data presented here would allow for identifying cell type-specific alterations as well as alterations specifically in mRNA translation which would be missed by bulk RNA-Seq and RNA-Seq on FACS-isolated cells. However, the authors do not fully capitalize on this strength, given that no detailed comparisons are done to a real transcriptome reference are performed (see above).

      This work is of broad interest to scientists in neurodegeneration as well as glial biology.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Using a series of Cre-driven mouse strains a GFP-tagged version of RPL10a (a ribosomal protein) was targeted to different cell types allowing Dr Scheckel and colleagues to investigate translational changes as prion disease progresses in mice. Their data suggest massive changes in microglia and astrocytes but not neurons. The approach was particularly powerful as ribosome IP has been combined with ribosome profiling. The manuscript is very well written. What might help, however, is to make the figures more accessible (perhaps change some of the labelling?)

      I have only minor comments regarding some of the figures:

      Fig 1a: This scheme could be improved, adding wpi and better aligning the cell-types in relation to the time when the cell-types were analysed.

      We have replaced weeks with wpi and changed the alignment of cell types to clarify that all cell types were analyzed at every time point.

      Fig 1b-e: The resolution could be improved to better discern the different cell-types.

      We submitted low-quality figures due to an upload limit but will submit final figures of higher quality. Additionally, we have added higher magnification pictures to better discern the different cell types as Supplementary Fig. 1d-e.

      Fig 4: Astrocytes are categorised into A1 and A2 and microglia based on DAM and homeostatic signature (How does this relate to the M1 and M2 classification?).

      The categorization of microglia into homeostatic and disease-associated (as well as other) microglia has largely replaced the initial categorization into pro-inflammatory M1 and anti-inflammatory M2 microglia (Dubbelaar et al., Front Immunol. 2018), We have therefore opted for the more current categorization. This explanation has also been added to the manuscript.

      Reviewer #2 (Significance (Required)):

      Highly significant. I have published on de novo protein synthesis in neurodegenerative disease

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The authors sampled actively translated proteins by cell type in the brains of RiboTag expressing mice under the control of cell specific cre recombination to determine changes in the translational profiles. They injected prions IP to induce prion disease. Their model shows little to no neuron loss at the terminal stage due to animal welfare regulations, but neuronal loss is a key hallmark of prion disease, along with gliosis. However, since other groups under different animal welfare regulations have shown that prion injection is sufficient to fully model the disease given enough time, there is sufficient evidence that this model captures early disease pathogenesis. The methodology used here has some clear advantages over previous cell-type isolation methods that require more lengthy sorting procedures. However, proteins with a long half-life or tightly regulated levels (such as TDP-43) are likely underrepresented by this method. The method also depends strongly on the specificity of the cre driver used; CamkIIa (excitatory N), parvalbumin (inhibitory N), GFAP (A), Cx3cr1 (microglia). While there is some off-target expression of the GFAP and Cx3cr1, the overall expression profiles generally match cell-specific transcriptomes obtained by other groups using other methods. They find major changes in astrocytes and microglia at terminal stages, after the onset of neurological symptoms, and comparatively fewer in neurons. Oligodendrocytes are not examined. The authors are commended on a thorough and well-designed study, especially in the comparison of multiple neuronal and glial types simultaneously.

      **Major comments:**

      Key conclusion 1: "Our results suggest that aberrant translation within glia may suffice to cause severe neurological symptoms and may even be the primary driver of prion disease." This conclusion is well-supported, serving as a hypothesis for future work. The data shows that the most abundant PTG changes are indeed in microglia at 24 wpi, before the onset of symptoms. In addition, although some genes are also differentially translated in the neuronal populations, examination of the Supplemental Tables shows that these are mostly highly expressed glial genes and could represent contamination of the sample during gliosis. The authors may wish to discuss this more prominently to avoid confusion. This data indeed suggests that glial changes alone are could be sufficient to produce the neurological symptoms in these mice. However, the authors should include discussion that the two genes changed at 24 weeks in PV neurons (Oprm1, Cyp2s1) do appear to be neuronal and may be relevant to pathogenesis as well. These mRNAs were also decreased in their previous paper conducting bulk sequencing in the hippocampus, according to the authors' online Prion RNAseq Database. Knockout experiments in mouse models have shown that dysregulation of one or a few critical genes in neurons can be sufficient to induce dysfunction and neurological symptoms, and the current evidence does not seem sufficient to rule it out. Fig 3d also suggests that PTGs in PV neurons may be particularly important, even accounting for the additional regions present in the RP analysis.

      We agree with the reviewer that few critical neuronal genes might be sufficient to induce neurological dysfunction and symptoms and have added this point to the results and discussion. Additionally, we have highlighted that many neuronal genes are glia-enriched and might reflect glia contamination.

      Key Conclusion 2: "Cell-type specific changes become only evident at late PrD stages." This conclusion is well supported. However, as the authors noted, due to legal constraints their model represents early to mid disease onset rather than a true terminal environment matching that of patients. Therefore, it would be advantageous to choose a more appropriate name for the "terminal" group, perhaps based on one of the key humane endpoint criteria that would help readers in the field to place these important results in context of the overall disease process.

      We have added additional information to clarify our definition of terminal stage to the methods.

      Key Conclusion 3: "This suggests that the prion-induced molecular phenotypes reflect major glia alterations, whereas the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes such as altered neuronal connectivity." The authors should modify the second half of this claim. As discussed above, changes to even a few neuronal genes can be sufficient to induce neurodegeneration. The claim that "the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes," fails to acknowledge the changes in PV neurons observed in this study, however few they may be. The authors also do not take into account the possible role of transcribed RNAs that are not immediately translated (for example those that accumulate at synapses for fast translation on demand) or the overall proteome, which are not included in their analysis. Though their method cannot detect these components, the authors should examine the implications that such other changes may still be present in the discussion. The authors should also discuss the functions of the few specific PV PTGs and explore their potential relationship with neurodegeneration. This is especially important since the authors acknowledge that a key reason for including PV neurons in the analysis is ample evidence in the literature that they play a role in disease pathogenesis. Finally, the authors note that a top GO term in microglial cells was synaptic transmission. The authors should expand on this finding in the discussion, as the interplay of glia and neurons in the pathogenesis of disease is likely highly relevant.

      We have removed the claim that “behavioral phenotypes may be ascribed to biochemically undetectable changes” and added the point that few neuronal changes might be sufficient to induce neuronal dysfunction & symptoms. As stated in the manuscript, we believe that the enrichment of the GO term synaptic transmission in microglia is an artefact. We therefore refrained from further discussing this finding and have highlighted that it is in artefact in the results.

      • *Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.* - *Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.*

      As discussed above, the inclusion of RNAseq datasets from FACS isolated cells would require an additional 2 years of work since all samples and datasets would need to be newly generated (breeding mice, inoculating mice with prions and waiting for up to 8 months for mice to reach the terminal time point, establishing procedures, generating and analyzing datasets).

      Key Conclusion 1: No additional experiments needed. Key Conclusion 2: No additional experiments needed. Key Conclusion 3: No additional experiments needed for a modified statement.

      The data and methods are largely reproducible. Additional information should be provided about the methods for Gene Ontology analysis, how it was controlled, and what was used as a significance measure.

      We have added additional information about the GO analysis to the methods section. The complete list of GO terms is now included as Supplementary File 10.

      Some groups contain only two animals. At least three should be included per group for a minimally robust analysis.

      We have tried to include 3 replicates per group as suggested by the reviewer. In few exceptions, we lost an individual sample and one sample had to be excluded due to low quality. In these instances (GFAP_2wpi Ctrl; CamKIIa_CX_term_Ctrl, CamKIIa_CX_term_PrD, Cx3cr1_term_Ctrl and Cx3cr1_term_PrD) we ensured that both replicates showed a high correlation and could still yield reliable results (see below). Consistently, the DESeq2 algorithm (which can handle also just 2 replicates per group) identified differentially translated genes in the terminal samples.

      **Minor comments:**

      Fig. 1 c-e all panels should have a scale bar. E, closer insets or larger images are needed to see the colocalization in these very small cells.

      We have added scale bars to all panels. A colocalization is indeed not visible in the uploaded low-quality Figures that were submitted due to the size limit. We believe that a colocalization is visible in the high-quality final pictures but are also happy to provide closer insets upon editorial request.

      Fig. 5f: To allow interpretation of the Gene Ontology analysis, authors should include the number of genes involved in the pathway and the number of those genes found in their sample input list.

      We have added details regarding the GO analysis to the methods section, and are now providing the requested information in Supplementary File 10.

      Fig. S6: It is not clear from viewing the figure or the legend what the percentages on the axes refer to.

      The principal components 1 and 2 are plotted on the x and y axes, respectively. The % of variance explained by these principal components is indicated. We have added this information to the figure legend.

      Fig. S7: the gene numbers are confusing because they do not match the data in Fig. 4a. It would be helpful to use the same LFC cutoff as in Fig. 4a to avoid misunderstandings by the reader, or explain why no cutoff is used and what information the authors wish to convey by presenting the data that way.

      *Typically, all significant changes (p adj Fig S9: The legend indicates that genes changed in all 5 datasets are colored in green, however this is not easily visible on the graphs (appears more gray).

      Genes changing in all datasets are colored in green in Fig. 5. Genes changing in all datasets are colored in grey in Supplementary Fig. 9. We have adjusted the corresponding legends. The quality of the figures is very low due to the upload limit. The final figures will be of higher quality.

      Fig. S10: on page 12 Supplementary Fig. 10c is referenced, but likely refers to 10b. Throughout manuscript: It should be RNase, not RNAse.

      Both points have been addressed.

      Reviewer #3 (Significance (Required)):

      This work provides an important conceptual advance in prion disease research that glia may be primary drivers of disease equal to or surpassing certain neuronal populations. Though the authors have shown previously that glial changes are dominant in bulk sequencing of the hippocampus, cell type-specific analysis adds an important level of detail to convince the field that few transcriptional changes occur in neurons though neurological defects are already present. Historically, neuronal defects have been assumed to occupy the main role, with glia being largely ignored. This echoes recent similar changes in other areas of the neurodegenerative disease field where we are recognizing the important roles of glia in pathogenesis, and how they may be modulated to treat disease.

      Their findings in PV neurons also may reflect early key changes in this important neuronal population that contribute to neurological symptom onset. They will allow further study of the genes and pathways involved and may lead to additional effective treatments for disease. Finally, the thorough comparison of multiple neuronal and glial populations will allow future investigation of the interplay of neurons and microglia in pathogenesis and shows the importance of studying them synergistically rather than individually.

      *Audience:*

      The neurodegenerative disease field in general will be interested in the findings. Immunologists, other neuroscientists, and pharmaceutical and other drug development organizations will also be influenced by the work.

      *Own expertise:*

      Neurodegenerative disease, transgenic mouse models, neuropathology, translational neuroscience

      REFEREE'S CROSS-COMMENTING:

      I agree with Reviewer 1 that a comparison of the total transcriptome with ribosomally active transcripts would aid the interpretation of this work. It would also uncover or refute the presence of cell-type differences in translation efficiency that directly impact the authors' major conclusion that glia are more affected than neurons. I support the request of this additional experiment.

      As discussed above we have refrained from such a comparison since 1) the scope of this study was to identify biologically relevant prion-induced molecular changes and not study post-transcriptional regulation, 2) the generation of such dataset will take ~ 2 years, and 3) difference between transcriptional and translational changes are likely a combination of post-transcriptional regulation and artefact induced change that are probably difficult to interpret.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors sampled actively translated proteins by cell type in the brains of RiboTag expressing mice under the control of cell specific cre recombination to determine changes in the translational profiles. They injected prions IP to induce prion disease. Their model shows little to no neuron loss at the terminal stage due to animal welfare regulations, but neuronal loss is a key hallmark of prion disease, along with gliosis. However, since other groups under different animal welfare regulations have shown that prion injection is sufficient to fully model the disease given enough time, there is sufficient evidence that this model captures early disease pathogenesis. The methodology used here has some clear advantages over previous cell-type isolation methods that require more lengthy sorting procedures. However, proteins with a long half-life or tightly regulated levels (such as TDP-43) are likely underrepresented by this method. The method also depends strongly on the specificity of the cre driver used; CamkIIa (excitatory N), parvalbumin (inhibitory N), GFAP (A), Cx3cr1 (microglia). While there is some off-target expression of the GFAP and Cx3cr1, the overall expression profiles generally match cell-specific transcriptomes obtained by other groups using other methods. They find major changes in astrocytes and microglia at terminal stages, after the onset of neurological symptoms, and comparatively fewer in neurons. Oligodendrocytes are not examined. The authors are commended on a thorough and well-designed study, especially in the comparison of multiple neuronal and glial types simultaneously.

      Major comments:

      Key conclusion 1: "Our results suggest that aberrant translation within glia may suffice to cause severe neurological symptoms and may even be the primary driver of prion disease." This conclusion is well-supported, serving as a hypothesis for future work. The data shows that the most abundant PTG changes are indeed in microglia at 24 wpi, before the onset of symptoms. In addition, although some genes are also differentially translated in the neuronal populations, examination of the Supplemental Tables shows that these are mostly highly expressed glial genes and could represent contamination of the sample during gliosis. The authors may wish to discuss this more prominently to avoid confusion. This data indeed suggests that glial changes alone are could be sufficient to produce the neurological symptoms in these mice. However, the authors should include discussion that the two genes changed at 24 weeks in PV neurons (Oprm1, Cyp2s1) do appear to be neuronal and may be relevant to pathogenesis as well. These mRNAs were also decreased in their previous paper conducting bulk sequencing in the hippocampus, according to the authors' online Prion RNAseq Database. Knockout experiments in mouse models have shown that dysregulation of one or a few critical genes in neurons can be sufficient to induce dysfunction and neurological symptoms, and the current evidence does not seem sufficient to rule it out. Fig 3d also suggests that PTGs in PV neurons may be particularly important, even accounting for the additional regions present in the RP analysis.

      Key Conclusion 2: "Cell-type specific changes become only evident at late PrD stages." This conclusion is well supported. However, as the authors noted, due to legal constraints their model represents early to mid disease onset rather than a true terminal environment matching that of patients. Therefore, it would be advantageous to choose a more appropriate name for the "terminal" group, perhaps based on one of the key humane endpoint criteria that would help readers in the field to place these important results in context of the overall disease process.

      Key Conclusion 3: "This suggests that the prion-induced molecular phenotypes reflect major glia alterations, whereas the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes such as altered neuronal connectivity." The authors should modify the second half of this claim. As discussed above, changes to even a few neuronal genes can be sufficient to induce neurodegeneration. The claim that "the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes," fails to acknowledge the changes in PV neurons observed in this study, however few they may be. The authors also do not take into account the possible role of transcribed RNAs that are not immediately translated (for example those that accumulate at synapses for fast translation on demand) or the overall proteome, which are not included in their analysis. Though their method cannot detect these components, the authors should examine the implications that such other changes may still be present in the discussion. The authors should also discuss the functions of the few specific PV PTGs and explore their potential relationship with neurodegeneration. This is especially important since the authors acknowledge that a key reason for including PV neurons in the analysis is ample evidence in the literature that they play a role in disease pathogenesis. Finally, the authors note that a top GO term in microglial cells was synaptic transmission. The authors should expand on this finding in the discussion, as the interplay of glia and neurons in the pathogenesis of disease is likely highly relevant.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Key Conclusion 1: No additional experiments needed. Key Conclusion 2: No additional experiments needed. Key Conclusion 3: No additional experiments needed for a modified statement.

      The data and methods are largely reproducible. Additional information should be provided about the methods for Gene Ontology analysis, how it was controlled, and what was used as a significance measure. Some groups contain only two animals. At least three should be included per group for a minimally robust analysis.

      Minor comments:

      Fig. 1 c-e all panels should have a scale bar. E, closer insets or larger images are needed to see the colocalization in these very small cells. Fig. 5f: To allow interpretation of the Gene Ontology analysis, authors should include the number of genes involved in the pathway and the number of those genes found in their sample input list. Fig. S6: It is not clear from viewing the figure or the legend what the percentages on the axes refer to. Fig. S7: the gene numbers are confusing because they do not match the data in Fig. 4a. It would be helpful to use the same LFC cutoff as in Fig. 4a to avoid misunderstandings by the reader, or explain why no cutoff is used and what information the authors wish to convey by presenting the data that way. Fig S9: The legend indicates that genes changed in all 5 datasets are colored in green, however this is not easily visible on the graphs (appears more gray). Fig. S10: on page 12 Supplementary Fig. 10c is referenced, but likely refers to 10b. Throughout manuscript: It should be RNase, not RNAse.

      Significance

      This work provides an important conceptual advance in prion disease research that glia may be primary drivers of disease equal to or surpassing certain neuronal populations. Though the authors have shown previously that glial changes are dominant in bulk sequencing of the hippocampus, cell type-specific analysis adds an important level of detail to convince the field that few transcriptional changes occur in neurons though neurological defects are already present. Historically, neuronal defects have been assumed to occupy the main role, with glia being largely ignored. This echoes recent similar changes in other areas of the neurodegenerative disease field where we are recognizing the important roles of glia in pathogenesis, and how they may be modulated to treat disease.

      Their findings in PV neurons also may reflect early key changes in this important neuronal population that contribute to neurological symptom onset. They will allow further study of the genes and pathways involved and may lead to additional effective treatments for disease. Finally, the thorough comparison of multiple neuronal and glial populations will allow future investigation of the interplay of neurons and microglia in pathogenesis and shows the importance of studying them synergistically rather than individually.

      Audience:

      The neurodegenerative disease field in general will be interested in the findings. Immunologists, other neuroscientists, and pharmaceutical and other drug development organizations will also be influenced by the work.

      Own expertise:

      Neurodegenerative disease, transgenic mouse models, neuropathology, translational neuroscience

      REFEREE'S CROSS-COMMENTING:

      I agree with Reviewer 1 that a comparison of the total transcriptome with ribosomally active transcripts would aid the interpretation of this work. It would also uncover or refute the presence of cell-type differences in translation efficiency that directly impact the authors' major conclusion that glia are more affected than neurons. I support the request of this additional experiment.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Using a series of Cre-driven mouse strains a GFP-tagged version of RPL10a (a ribosomal protein) was targeted to different cell types allowing Dr Scheckel and colleagues to investigate translational changes as prion disease progresses in mice. Their data suggest massive changes in microglia and astrocytes but not neurons. The approach was particularly powerful as ribosome IP has been combined with ribosome profiling. The manuscript is very well written. What might help, however, is to make the figures more accessible (perhaps change some of the labelling?)

      I have only minor comments regarding some of the figures:

      Fig 1a: This scheme could be improved, adding wpi and better aligning the cell-types in relation to the time when the cell-types were analysed. Fig 1b-e: The resolution could be improved to better discern the different cell-types. Fig 4: Astrocytes are categorised into A1 and A2 and microglia based on DAM and homeostatic signature (How does this relate to the M1 and M2 classification?).

      Significance

      Highly significant. I have published on de novo protein synthesis in neurodegenerative disease

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Scheckel et al. report a large dataset on cell type-specific translational profiling of PrD-associated molecular alterations in the a mouse model thorough RiboTRAP and ribosome profiling approaches. They report a more severe alteration in the translatome specifically in astrocyte and microglia as compared to neuronal populations. This highlights that changes in these two cell classes might have a predominant role in the pathology of PrD.

      Data and the methods are presented such that they can be reproduced. The data analysis section of the manuscript could be further elaborated. In particular, it could be clarified which / how comparisons with existing dataset have been performed. Statistical analysis description is sometimes missing (e.g. fig 6e, not clear what the stars on top of the bars stands for, which test was performed and the significance). Moreover, the section of the methods regarding the western blots presented in figure 6 appear to be missing.

      Major concern:

      The most important improvement the authors should consider for their paper is to more specifically attempt to isolate specific effects on translational efficiency of mRNAs. As it stands, the authors largely use RiboTrap data as a reference to compare their footprinting data - but arguably, this misses mRNAs that are present in the transcriptome and not efficiently recruited onto ribosomes. It appears to be somewhat a lost opportunity to not attempt to test in the dataset (possibly by comparison to RNA-Seq from FACS isolated cells as a reference) whether there is a systematic change in translational efficiency (possibly in mRNAs with specific features?). In the current form, the RiboTrap and footprinting approaches largely serve to isolate mRNAs from cre-defined cell types but given the lack of a "total transcriptome" reference from the respective cells, it can not be easily interpreted whether certain transcripts are heavily regulated at the level of translation. Thus, despite using much more advanced methodologies than the Sorce study, the fundamental conclusions emerging from this work are rather similar to this previously published piece of work.

      Additional suggestions:

      1) In Figure 1d the authors point out occasional neuronal cells exhibiting Rpl10a-GFP expression with arrows. It appears that these arrows may have moved during figure preparation - please check/fix if necessary.

      2) In Supplementary Figure 1b and c it appears that the PV labeling is missing in the panel for Rpl10a:GFP controls. If this is intentional please indicate this in the figure legend.

      3) It appears that the authors sequenced a significant number of libraries generated for multiple time points post-inoculation. From the figures and legends it was not entirely clear to me, how many replicates were analyzed given that in some analyses samples from different time points were combined in a single plot.

      4) It was unclear to me how long after inoculation the group of "terminally ill" mice were sacrificed. Somewhere in the text it states that there are 2 months between 24 wpi and terminally ill - but it appears that this was not a preset timepoint but varied from animal to animal based on symptoms. Please clarify.

      5) From the Western blot data in Figure 6f the authors conclude that GFAP expression is upregulated in PrD mice whereas astrocyte number is unchanged. Given that the translatome is assessed based on a Rpl10-GFP dependent on recombination mediated by cre driven from GFAP promoter it is possible that the astrocytic alterations in ribosome footprints are in part a secondary consequence of increased Rpl10-GFP recombination/ expression in PrD mice (due to activation of the GFAP promoter). To estimate the impact of such an effect the authors should compare GFP levels in terminally ill control and PrD mice by western blotting.

      6) The western blot analysis of fig 6f-g has been performed using a normalization over calnexin, yet no calnexin signalis shown to support this statement.

      7) Clarify the percentage of non-parenchimal machrophages that are accounting for the Cx3cr1-creER mouse line since the authors consider this only to be a minor contamination.

      8) Regarding the presentation of the data, Fig 5a would be clearer if in the y axes, for each cell type the order of PrD and Ctrl samples was maintained.

      Significance

      Overall, this is an important and interesting study. Besides its insights into the biology, the transcriptomic data will provide a valuable resource for researchers in the field.

      Previous studies employed bulk RNAseq or microdissection for mapping transcriptomic changes (Majer et al.2019; Sorce et al. 2020 and others). The Sorce et al study concluded that astrocytic alterations in the transcriptome are more dominant than neuronal gene expression changes. While the conclusion of the present study remains the same, it is the first to use of ribosome profiling to dissect actively translated transcripts over the progression of the pathology in the mouse model. Thus, the data presented here would allow for identifying cell type-specific alterations as well as alterations specifically in mRNA translation which would be missed by bulk RNA-Seq and RNA-Seq on FACS-isolated cells. However, the authors do not fully capitalize on this strength, given that no detailed comparisons are done to a real transcriptome reference are performed (see above).

      This work is of broad interest to scientists in neurodegeneration as well as glial biology.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      __*Review 1 Summary:

      __In this manuscript, Borah et al showed that Heh2, a component of INM, can be co-purified with a specific subset of nucleoporins. They also found that disrupting interactions between Heh2 and NPC causes NPC clustering. Lastly, they showed that the knockout of Nup133, which does not physically interact with Heh2, causes the dissociation of Heh2 from NPCs. These findings led the authors to propose that Heh2 acts as a sensor of NPC assembly state. *

      __Reviewer 1 major comment 1:__ The authors claimed that Heh2 acts as a sensor of NPC assembly state, as evidenced by their finding that Heh2 fails to bind with NPCs in nup133 Δ cells (Fig2, Fig 5). However, there is a possibility that the association between Heh2 and NPCs is merely affected by the clustering of the NPCs (as the authors discussed) but not related to the structural integrity of NPC.

      • *

      Our Response: We agree that this is a possibility, however, we ask the reviewer to also consider that we artificially cluster NPCs using the anchor away system (Figure 3C) and this does not affect Heh2’s association with NPCs. Thus, clustering per se is insufficient to disrupt Heh2 binding to NPCs. We will also make changes in the text to make this point.

      • *

      Reviewer 1 major comment 2: In addition, their data showing that the Heh2-NPCs association is not easily disrupted by knocking out the individual components of the IRC (Fig. 5A and 5D), also disfavor the idea that Heh2 could sense NPC assembly state.

      Our Response: There are three considerations here. The first is that as this is the first evidence of any kind of “NPC assembly state” sensor, it is difficult to make any assumptions as to what specifically such a sensor would be monitoring. i.e. perhaps sensing only the ORC is what is functionally important. Second, for obvious reasons, we only tested non-essential IRC nups so by definition there is inherent functional redundancy that maintains NPC function and thus there may be no need to “sense” anything in the absence of these IRC nups. Further (and last), the IRC is essential for NPC assembly. Thus, without an IRC there is no NPC assembly state to sense.

      Reviewer 1 major comment 3: Since some nup knockout strains, other than nup133 Δ, are also known to show the NPC clustering (ex. nup159 (Gorsch JCB 1995) and nup120 (Aitchison JCB 1995; Heath JCB 1995)), it will be worth trying to monitor the localization of Heh2 and its interaction with nucleoporins (by Heh2-TAP) using these strains. While Nup159 is a member of the cytoplasmic complex, Nup120 is an ORC nucleoporin. Thus, biochemical and phenotypical analysis using these mutant cells will be useful to clarify if the striking phenotypes the authors found are specific to nup133 knockout strain (or ORC Nup knockouts) or could be commonly observed in the strains that show NPC clustering. Another interesting point is that Nup159 shows strong interaction with Heh2, even in nup133Δ cells. As the authors mentioned, Nup159-Heh2 interaction may not be sufficient for Heh2-NPC association, but it could be important for NPC clustering.

      Our Response: These are excellent points and we agree that there is a need to more thoroughly explore how NPC clustering driven by abrogating the function of other nups impacts Heh2’s association with NPCs. Thus, in a revised manuscript, we would examine Heh2’s association with NPCs in several additional genetic backgrounds where NPCs cluster.

      Reviewer 1 major comment 4: Figure 4C: Is it known that rapamycin treatment in this strain did not affect the protein levels of nucleoporins? Otherwise, the authors should confirm this by western blotting (at least some of them).

      Our Response: This is a good point and we will directly address this with Western blotting of some nups.

      Reviewer 1 major comment 5: Figure 5: The authors mentioned (line 256-257) that "in all cases the punctate, NPC-like distribution of Heh2-GFP was retained (Fig 5D)". However, nup107 KO strain seems to show more diminished punctate staining as compared with other strains. To clarify this, the authors should express mCherry tagged Nup as in Fig. 2 or Fig. 3.

      Our Response: Yes, we agree and in fact this observation is consistent with the fact that there is an ER-pool of Heh2 observed in this strain and we observe loss of nup interactions in the affinity purification. We will include a more thorough quantification of this in a revised manuscript and more directly address this in the text.

      **Minor comments:**

      Reviewer 1 minor comment 1: Figure 4A and 4B: The authors should show Scatter plot as in Fig. 2 and Fig. 3.

      • *

      We will include this in a revised manuscript.

      Reviewer 1 minor comment 2: Figure 5C: Explanations of the arrowheads is missing in the figure legend.

      Thank you for pointing this out, it will be fixed in a revised manuscript.

      Reviewer 1 minor comment 3: Figure 6: Is there any information as to where Heh2 (316-663) is localized in the cell?

      As this truncation lacks INM targeting sequences, it is found throughout the cortical ER. The determinants of Heh2 targeting (including truncations) has been extensively evaluated in King et al. 2006, Meinema et al., 2011 and Rempel et al. 2020. We will make this clearer in the revised manuscript.

      Reviewer 1 minor comment 4: Figure 6B: Nucleoporins should be marked with color circles as in Fig. 1 and Fig. 5.

      This will be done.

      Reviewer 2

      Borah et al. present a biochemical and cell biological examination of the inner nuclear membrane (INM) protein Heh2 and its putative interactions with the nuclear pore complex (NPC). The potential conceptual advance of this study is that Heh2 interacts with the NPC, while mutations believed to trigger NPC mis-assembly are shown to abolish interaction with Heh2, leading to the hypothesis that Heh2 is a sensor for NPC assembly states within the (INM). The conclusions would undoubtably be of broad interest to the nucleocytoplasmic transport field, but the evidence provided thus far is insufficient to build confidence and consequently this manuscript is premature for publication.

      Our Response: We thank the reviewer for recognizing the potential for a significant conceptual advance for the field but object to the notion that the work is “premature for publication”. This is a highly subjective statement that does not seem to meet the mission or purpose of the Review Commons platform. While it is possible that some of the conclusions drawn in our manuscript might not be fully supported by the data in its current form, there is a substantial body of work here that is certainly publishable.

      Reviewer 2 major comment 1: The TAP-tag Heh1/Heh2 pulldowns are the most significant experiment presented, and on face value provide compelling evidence that Heh2 interacts with the NPC. It is stated that mass spectroscopy (MS) was used to confirm the identities of the labeled bands yet there is no methods section, nor any MS data reported in the manuscript. Given the large number of unspecified proteins observed in these gels, and the single-step pulldown methodology used, knowledge of the contaminants present may aid in elucidating how Heh2 pulls down NPC components. Consequently, within the supplementary materials, the authors must indicate which regions of the gel were excised for MS analysis and provide a table listing all of the proteins that were detected for each sample, including the number of unique/expected peptides observed. Our Response: This was a major oversight on our part and a revised manuscript will contain all relevant details with regards to the MS analysis including a more detailed description of the excised bands and the quantification of spectra derived from these bands.

      Reviewer 2 major comment 2a: The representative micrographs provided across Figures 2, 3, 4, 5 and 6 are very noisy. Particularly in the case of the mCherry labeled nucleoporins, this is both unusual and unfortunate given this is used to infer colocalization of Heh2 with the NPC.

      Our Response: These micrographs are not unusual and are in fact of respectable quality. We agree that the apparent “noise” is unfortunate, but this is simply a reality of the yeast system. We remind the reviewer that there are only ~100 to ~200 NPCs per budding yeast nucleus, which is an order of magnitude smaller than a typical mammalian cell nucleus. Further, the copy number of yeast nups per NPC is half of the mammalian cell NPC. Further, budding yeast are spherical with a cell wall that is extremely effective at scattering light; they are also highly autofluorescent (particularly in the red channel). Lastly, unlike in mammalian cells, budding yeast NPCs are mobile on the nuclear envelope. Thus, co-localization is challenging (particularly with the long exposures required to obtain good images). This is why clustering of NPCs driven by nup133**∆ cells has provided one of the key assays in the field to assess whether a given protein associates with NPCs at the level of light microscopy.

      Reviewer 2 major comment 2b: As a result it is unclear whether this experiment can be used to differentiate between NPC colocalization vs. nuclear envelope colocalization.

      Our Response: The reviewer is correct. Co-localization between Heh2-GFP and any Nup-mCherry is insufficient to assess NPC association in WT cells. In fact, as we point out in Figure 3B, at best one can expect a correlation of r = 0.48 for two well established nups. Thus, to further support the conclusion that Heh2 associates with NPCs, we established the Nsp1-FRB NPC clustering assay (Figure 3).

      Reviewer 2 major comment 2c: The authors should include negative controls for an alternative NE membrane protein that doesn't bind the NPC, which would be expected to exhibit a reduced level of colocalization with NPC proteins when compared to Heh2. For example, Heh1 would be a suitable, given the clear-cut negative pulldown data and its prior usage as a negative control in Figure 4.

      • *

      Our Response: This is included in Figure 3D.

      Reviewer 2 major comment 3a. Figure 2. The rim staining for the Nup82-mCherry in the WT background is unusually punctate, bringing into question the viability of the cells imaged.

      Our Response: As the middle cell in the panel is undergoing cell division, these cells are clearly viable. All our imaging is performed on mid-log phase cultures.

      • *

      Reviewer 2 major comment 3b. Why has ScNup82, a cytoplasmic filament component, been selected for colocalization experiments when Heh2 is proposed to interact with the inner ring complex?

      Our Response: The resolution of a conventional light microscope is, at best, 200 nm in x, y. As NPCs are 100 nm in diameter, even two NPCs side-by-side cannot be resolved. The IRC is tens of nm away from the cytoplasmic filaments thus any nup is relevant for a co-localization analysis with a light microscope.

      Reviewer 2 major comment 3c: Additionally, the experiments shown in panels A and C are not directly comparable, ScNup82 is an asymmetric cytoplasmic nucleoporin, while SpNup107 is located in the Y-shaped Nup84 nucleoporin complex and present on both faces of the NPC. This experiment should be repeated with scNup84 to match panel C, additionally a viability dot spot assay and western blot analysis of the labeled proteins should be conducted.

      Our response: These are in fact directly comparable within the limits of resolution of light microscopy as described above. Viability assays are not required here as both nups are essential and perturbation to their function would lead to inviability.

      Reviewer 2 major comment 4: Figure 3, the authors use yeast strains where proteins are tagged with FRB and FKBP12 domains, which dimerize upon the addition of rapamycin inducing NPC clusters. The authors then observe the effect this has on Heh2 NPC colocalization. However, Rapamycin may also have an effect independent from the induced dimerization event. Negative controls should be performed in strains lacking the FRB and FKBP12 tagged proteins to demonstrate that Rapamycin doesn't modify Heh2 localization independently of NPC clustering.

      Our response: This is a good point and important control that we performed in prior studies, see Colombi et al., JCB, 2013. We will be more explicit in describing that this control has been done.

      Reviewer 2 major comment 5: Figure 4. The authors provide a qualitative description of the colocalization presented, while in all other instances they calculate a Pearson correlation coefficient. This is significant because Heh2 appears to be evenly distributed within the NE of the DMSO control (panel B). Given the presented hypothesis isn't colocalization expected with Nup192? As a minimum, a Pearson correlation coefficient analysis should be conducted and added to Figure 4.

      Our response: This will be included in a revised manuscript.

      Reviewer 2 major comment 6: Figure 4. Pom152-mCherry localizes at both the NE and strongly within the cytoplasm, which is unexpected given typical rim staining phenotypes observed previously for both Pom152-YFP and Pom152-GFP strains (Katta, ..., Jaspersen et al., Genetics (2015) & Upla, ..., Fernandez-Martinez et al., Structure (2017), respectively). Given the unusually weak rim staining observed throughout, viability assays of the strains listed in Table S1 and protein expression analysis of the tagged nucleoporins via western blot is necessary.

      Our response: This is not localization in the cytoplasm but is in fact autofluorescence from the yeast vacuole. We regret we were not more explicit in describing this and we will make the manuscript more accessible for the non yeast expert. In order to perform the Western blot analysis for all strains requested by the reviewer would require a battery of antibodies to the endogenous proteins to directly assess how tagging influences nup levels, which we do not have (nor does anyone else that we are aware of). This is also not standard practice in the field as it is an onerous and unnecessary burden.

      Reviewer 2 major comment 7:* Figure 5A. The TAP-tagged pulldowns from ∆Pom152 and ∆Nup133 strains appear to be from a different round of experiments than the previous deletion strains presented. Interestingly, there appears to be an additional band at approximately 250 kDa in both cases that is not present in any other experiments. This band could be a contaminant observed due to different experimental conditions, or a protein that exclusively binds to Heh2 in the ∆Pom152 and ∆Nup133 background. Either way the authors should identify this protein with MS to address this ambiguity.

      *

      Our response: We will include negative controls for these specific experiments to show that this is a non specific band.

      Reviewer 2 major comment 8: Figure 6B. Please label the nucleoporin bands in the TAP-tagged pulldowns.

      Our response: This will be done.

      Reviewer 2 major comment 9: Figure 6D. Please specify Heh2-GFP clustering in the y-axis.

      Our response: As this represents both Heh2-GFP and heh2-1-570-GFP, we will keep it as is to avoid confusion.

      Reviewer 2 major comment 10: *Under the results section titled 'Heh2 binds to specific nups in evolutionarily distant yeasts', the authors state that spHeh2 co-purifies with "several specific species". The meaning is unclear, this sentence should be rephrased and the specific species clearly described. **

      *

      Our response: Ok.

      Reviewer 2 major comment 11: Under the results section titled 'Heh2 fails to interact with NPCs lacking Nup133', the authors refer to a Pearson correlation coefficient of -0.03 as a clear anticorrelation. Instead state there was no correlation.

      Our response: Ok.

      Reviewer 2 major comment 12: In the discussion, the authors state that "clustering itself may sterically preclude an interaction with Heh2". The text should be expanded to explain this in more detail, it is not clear from the presented data why this would occur.

      Our response: Ok.

      Reviewer 2 comment on significance: the manuscript is premature for publication.

      Our Response: Such a statement has no relevance to this form of review as a decision as to whether a study is premature for publication should be made by journal editors, not reviewers. We would argue quite strongly that we have definitively shown that Heh2 binds to NPCs, that it does so in multiple evolutionarily distant yeasts and that this binding is functionally relevant. For example, we can specifically disrupt the association of Heh2 with NPCs with a specific domain deletion and observe a loss of function phenotype (e.g. NPC clustering). What all three reviewers agree on is that the concept of a “NPC assembly state sensor” needs additional data to be fully supported, although we note that this reviewer did not provide any suggestions for how we might achieve this goal. We further note that we added the qualifier “may” into the title of the work. Thus, we will therefore perform additional experiments as outlined in comments to Reviewer 1 to support this conclusion in order to introduce this as a new concept in the field.

      Reviewer Comment from Cross Commenting: It seems to me that all reviewers agree that the manuscript is premature for publication. The data thus far do not support the conclusion that Heh2 may be an NPC assembly sensor nor does it provide any mechanistic insight. Reading the comments of the other two reviewers makes me more negative, as it is care that the paper also lacks scientific rigor. The manuscript is a great starting point for a rigorous dissection but I do not see this paper to be a candidate for a broad impact journal.

      Our Response: The statement that this manuscript is premature for publication is an opinion and does not seem to reflect the sentiment of the other reviewers. It is also confounding that this reviewer suggests that this work lacks rigor. With the exception of the omission of the MS analysis (our fault), the data are of high quality and rigorously quantified. Our assertion of rigor and data quality is based on our collective team’s many decades-long history of publishing and reviewing papers at the highest levels in this field. Questions as to the quality of the data as stated by this reviewer (and only this reviewer) in fact address limitations of light microscopy and the yeast system more generally in this one respect.


      Reviewer 3

      Reviewer 3 Summary part a*: This is quite an interesting manuscript that explores the relationship between an INM protein, Heh2, and NPCs. It represents an extension of earlier work performed by this group in which it was shown that the HEH2 gene shares genetic interactions with the genes encoding various nucleoporins. Heh2 belongs to an intriguing family of conserved proteins that includes its orthologue, Heh1, as well as human MAN1 (LEMD3) and LEMD2, among others. Each of these proteins contains two transmembrane domains with the N- and C-terminal regions extending in to the nucleoplasm. The two TM domains are separated by a short lumenal loop.

      In this study, the authors show that a population of Heh2 is associated with Nups of the NPC inner ring complex. This was demonstrated initially in pulldown experiments. The authors go on to show that when NPCs are caused to aggregate, by physical tethering employing an FKBP/FRP system in combination with Rapamycin, Heh2, but not Heh1, colocalizes with the NPC clusters. *

      • *

      Our Response: Thank you to the reviewer for recognizing the value of this work.

      • *

      Reviewer 3 Summary_b. Although not stated explicitly in the manuscript, this would imply that there is a population of Heh2 that resides in the NPC membrane domain, with the remainder in the INM. As an idle question, is there any evidence for a similar localization of MAN1 or LEMD2 in mammals? I am guessing probably not.

      Our Response: We regret this was not made more clear but the idea that there is a pool of Heh2 at the POM and a pool at the INM is an important conclusion of the work and was stated in the results - we’ll re-emphasize in the revised discussion. As to whether MAN1 or LEMD2 has a similar NPC association, we hypothesize that MAN1 but not LEMD2 will indeed interact with NPCs in mammalian cells. This is based on considering that we show that both the budding and fission yeast orthologues of MAN1 share this association so unless it was lost in evolution, this is a likely outcome of future studies.

      Reviewer 3 Significance statement a: The complications arise when the authors show that an alternative method of NPC aggregation (although they did this first), involving Nup133 deletion, results in failure of Heh2 to co-aggregate. In other words, Nup133 is required for the association of Heh2 with NPCs. The issue here is that there is no evidence for an interaction between Heh2 and Nup133, and furthermore that loss of Nup133 (a Y complex component of the outer ring complex) leaves the inner ring complex intact.

      • *

      Our Response: We tested the nup133Δ background first as this is the standard approach for assessing NPC-association of a given protein so we felt this would be logical for a reader in the field. Further, while the disruption of Heh2’s binding by loss of Nup133 may be a complication, we prefer to see it as an opportunity for discovery. As described in our manuscript, we have chosen to interpret this result in the context of a new biological function/concept with Heh2 being a novel “NPC assembly state” sensor. While one could argue that we have not fully met this bar yet, we will perform additional experiments as outlined in our response to reviewer 1 to help support this compelling conclusion.

      • *

      Reviewer 3 Signfiicance statement b: What is clear, however, is that Heh2 seems to be required to inhibit NPC aggregation since Heh2 deficient cells exhibit NPC clusters. The association between Heh2 and IRC Nups resides in the C-terminal nucleoplasmic winged helix domain. The N-terminal domain, in contrast confers INM localization.

      • *

      Our Response: We agree.__*


      Reviewer 3 Signfiicance statement c I must admit, I am in two minds about this manuscript. The data clearly show that Heh2 is associated with IRC components and I agree with the authors that this protein may well have a role in NPC assembly quality control perhaps in the guise of a chaperone. However, I find it hard to come up with a convincing model for the effects of Nup133. On the one hand, one could make an argument that the data presented here is too preliminary and fails to provide a complete story. On the other hand, it does provide an intriguing foundation for future studies and I do feel positively disposed towards it. In short, I have no fundamental complaints about the science, I am just uncertain as to whether the study is ready for publication.

      Our Response: This statement nicely articulates the challenge with this manuscript as there are some solid findings (that Heh2 binds specifically to NPCs etc.) but also a provocative finding (that loss of Nup133 breaks Heh2’s interaction with NPCs despite not physically interacting). Thus, there is a decision to be made about whether there is value in introducing a novel concept to the field once additional data is provided in a revised manuscript.

      Reviewer 3 Cross commenting: I have no fundamental disagreements with either of the other two reviewers. The comment from Reviewer#2 summarises this quite neatly. While I have fewer concerns about the quality of the data as presented, I think we all agree that at best the study is preliminary. What the authors need to do is to construct a coherent model that will account for the observations described here and then to design experiments that will test this model. I'm not suggesting that they must have a complete story, but they do need to go beyond what is in the current manuscript.

      • *

      Our Response: We appreciate that the reviewer does not have any questions about the quality of our data, but we argue that we have in fact presented the most coherent interpretation of the data as it currently stands. As described above, we intend to attempt to solidify this model by performing experiments suggested by reviewer 1.



      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting. Reply to the Reviewers I thank the Referees for their...Referee #1__

      1. The authors should provide more information when... Responses__

      The typical domed appearance of a hydrocephalus-harboring skull is apparent as early as P4, as shown in a new side-by-side comparison of pups at that age (Fig. 1A). Though this is not stated in the MS

      1. Figure 6: Why has only... Response: We expanded the comparisonMinor comments:__

      2. The text contains several... Response: We added... Referee #2__

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This is quite an interesting manuscript that explores the relationship between an INM protein, Heh2, and NPCs. It represents an extension of earlier work performed by this group in which it was shown that the HEH2 gene shares genetic interactions with the genes encoding various nucleoporins. Heh2 belongs to an intriguing family of conserved proteins that includes its orthologue, Heh1, as well as human MAN1 (LEMD3) and LEMD2, among others. Each of these proteins contains two transmembrane domains with the N- and C-terminal regions extending in to the nucleoplasm. The two TM domains are separated by a short lumenal loop.

      In this study, the authors show that a population of Heh2 is associated with Nups of the NPC inner ring complex. This was demonstrated initially in pulldown experiments. The authors go on to show that when NPCs are caused to aggregate, by physical tethering employing an FKBP/FRP system in combination with Rapamycin, Heh2, but not Heh1, colocalizes with the NPC clusters. Although not stated explicitly in the manuscript, this would imply that there is a population of Heh2 that resides in the NPC membrane domain, with the remainder in the INM. As an idle question, is there any evidence for a similar localization of MAN1 or LEMD2 in mammals? I am guessing probably not.

      Significance

      The complications arise when the authors show that an alternative method of NPC aggregation (although they did this first), involving Nup133 deletion, results in failure of Heh2 to co-aggregate. In other words, Nup133 is required for the association of Heh2 with NPCs. The issue here is that there is no evidence for an interaction between Heh2 and Nup133, and furthermore that loss of Nup133 (a Y complex component of the outer ring complex) leaves the inner ring complex intact. What is clear, however, is that Heh2 seems to be required to inhibit NPC aggregation since Heh2 deficient cells exhibit NPC clusters. The association between Heh2 and IRC Nups resides in the C-terminal nucleoplasmic winged helix domain. The N-terminal domain, in contrast confers INM localization.

      I must admit, I am in two minds about this manuscript. The data clearly show that Heh2 is associated with IRC components and I agree with the authors that this protein may well have a role in NPC assembly quality control perhaps in the guise of a chaperone. However, I find it hard to come up with a convincing model for the effects of Nup133. On the one hand, one could make an argument that the data presented here is too preliminary and fails to provide a complete story. On the other hand, it does provide an intriguing foundation for future studies and I do feel positively disposed towards it. In short, I have no fundamental complaints about the science, I am just uncertain as to whether the study is ready for publication.

      REFEREES CROSS COMMENTING

      I have no fundamental disagreements with either of the other two reviewers. The comment from Reviewer#2 summarises this quite neatly. While I have fewer concerns about the quality of the data as presented, I think we all agree that at best the study is preliminary. What the authors need to do is to construct a coherent model that will account for the observations described here and then to design experiments that will test this model. I'm not suggesting that they must have a complete story, but they do need to go beyond what is in the current manuscript.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Borah et al. present a biochemical and cell biological examination of the inner nuclear membrane (INM) protein Heh2 and its putative interactions with the nuclear pore complex (NPC). The potential conceptual advance of this study is that Heh2 interacts with the NPC, while mutations believed to trigger NPC mis-assembly are shown to abolish interaction with Heh2, leading to the hypothesis that Heh2 is a sensor for NPC assembly states within the (INM). The conclusions would undoubtably be of broad interest to the nucleocytoplasmic transport field, but the evidence provided thus far is insufficient to build confidence and consequently this manuscript is premature for publication.

      Specific comments:

      (1)The TAP-tag Heh1/Heh2 pulldowns are the most significant experiment presented, and on face value provide compelling evidence that Heh2 interacts with the NPC. It is stated that mass spectroscopy (MS) was used to confirm the identities of the labeled bands yet there is no methods section, nor any MS data reported in the manuscript. Given the large number of unspecified proteins observed in these gels, and the single-step pulldown methodology used, knowledge of the contaminants present may aid in elucidating how Heh2 pulls down NPC components. Consequently, within the supplementary materials, the authors must indicate which regions of the gel were excised for MS analysis and provide a table listing all of the proteins that were detected for each sample, including the number of unique/expected peptides observed.

      (2)The representative micrographs provided across Figures 2, 3, 4, 5 and 6 are very noisy. Particularly in the case of the mCherry labeled nucleoporins, this is both unusual and unfortunate given this is used to infer colocalization of Heh2 with the NPC. As a result it is unclear whether this experiment can be used to differentiate between NPC colocalization vs. nuclear envelope colocalization. The authors should include negative controls for an alternative NE membrane protein that doesn't bind the NPC, which would be expected to exhibit a reduced level of colocalization with NPC proteins when compared to Heh2. For example, Heh1 would be a suitable, given the clear-cut negative pulldown data and its prior usage as a negative control in Figure 4.

      (3)Figure 2. The rim staining for the Nup82-mCherry in the WT background is unusually punctate, bringing into question the viability of the cells imaged. Why has ScNup82, a cytoplasmic filament component, been selected for colocalization experiments when Heh2 is proposed to interact with the inner ring complex? Additionally, the experiments shown in panels A and C are not directly comparable, ScNup82 is an asymmetric cytoplasmic nucleoporin, while SpNup107 is located in the Y-shaped Nup84 nucleoporin complex and present on both faces of the NPC. This experiment should be repeated with scNup84 to match panel C, additionally a viability dot spot assay and western blot analysis of the labeled proteins should be conducted.

      (4)Figure 3, the authors use yeast strains where proteins are tagged with FRB and FKBP12 domains, which dimerize upon the addition of rapamycin inducing NPC clusters. The authors then observe the effect this has on Heh2 NPC colocalization. However, Rapamycin may also have an effect independent from the induced dimerization event. Negative controls should be performed in strains lacking the FRB and FKBP12 tagged proteins to demonstrate that Rapamycin doesn't modify Heh2 localization independently of NPC clustering.

      (5)Figure 4. The authors provide a qualitative description of the colocalization presented, while in all other instances they calculate a Pearson correlation coefficient. This is significant because Heh2 appears to be evenly distributed within the NE of the DMSO control (panel B). Given the presented hypothesis isn't colocalization expected with Nup192? As a minimum, a Pearson correlation coefficient analysis should be conducted and added to Figure 4.

      (6)Figure 4. Pom152-mCherry localizes at both the NE and strongly within the cytoplasm, which is unexpected given typical rim staining phenotypes observed previously for both Pom152-YFP and Pom152-GFP strains (Katta, ..., Jaspersen et al., Genetics (2015) & Upla, ..., Fernandez-Martinez et al., Structure (2017), respectively). Given the unusually weak rim staining observed throughout, viability assays of the strains listed in Table S1 and protein expression analysis of the tagged nucleoporins via western blot is necessary.

      (7)Figure 5A. The TAP-tagged pulldowns from ∆Pom152 and ∆Nup133 strains appear to be from a different round of experiments than the previous deletion strains presented. Interestingly, there appears to be an additional band at approximately 250 kDa in both cases that is not present in any other experiments. This band could be a contaminant observed due to different experimental conditions, or a protein that exclusively binds to Heh2 in the ∆Pom152 and ∆Nup133 background. Either way the authors should identify this protein with MS to address this ambiguity.

      (8)Figure 6B. Please label the nucleoporin bands in the TAP-tagged pulldowns.

      (9)Figure 6D. Please specify Heh2-GFP clustering in the y-axis.

      (10)Under the results section titled 'Heh2 binds to specific nups in evolutionarily distant yeasts', the authors state that spHeh2 co-purifies with "several specific species". The meaning is unclear, this sentence should be rephrased and the specific species clearly described.

      (11)Under the results section titled 'Heh2 fails to interact with NPCs lacking Nup133', the authors refer to a Pearson correlation coefficient of -0.03 as a clear anticorrelation. Instead state there was no correlation.

      (12)In the discussion, the authors state that "clustering itself may sterically preclude an interaction with Heh2". The text should be expanded to explain this in more detail, it is not clear from the presented data why this would occur.

      Significance

      the manuscript is premature for publication.

      REFEREES CROSS COMMENTING

      It seems to me that all reviewers agree that the manuscript is premature for publication. The data thus far do not support the conclusion that Heh2 may be an NPC assembly sensor nor does it provide any mechanistic insight. Reading the comments of the other two reviewers makes me more negative, as it is care that the paper also lacks scientific rigor. The manuscript is a great starting point for a rigorous dissection but I do not see this paper to be a candidate for a broad impact journal.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript, Borah et al showed that Heh2, a component of INM, can be co-purified with a specific subset of nucleoporins. They also found that disrupting interactions between Heh2 and NPC causes NPC clustering. Lastly, they showed that the knockout of Nup133, which does not physically interact with Heh2, causes the dissociation of Heh2 from NPCs. These findings led the authors to propose that Heh2 acts as a sensor of NPC assembly state.

      Major comments:

      The authors claimed that Heh2 acts as a sensor of NPC assembly state, as evidenced by their finding that Heh2 fails to bind with NPCs in nup133 Δ cells (Fig2, Fig 5). However, there is a possibility that the association between Heh2 and NPCs is merely affected by the clustering of the NPCs (as the authors discussed) but not related to the structural integrity of NPC. In addition, their data showing that the Heh2-NPCs association is not easily disrupted by knocking out the individual components of the IRC (Fig. 5A and 5D), also disfavor the idea that Heh2 could sense NPC assembly state. Since some nup knockout strains, other than nup133 Δ, are also known to show the NPC clustering (ex. nup159 (Gorsch JCB 1995) and nup120 (Aitchison JCB 1995; Heath JCB 1995)), it will be worth trying to monitor the localization of Heh2 and its interaction with nucleoporins (by Heh2-TAP) using these strains. While Nup159 is a member of the cytoplasmic complex, Nup120 is an ORC nucleoporin. Thus, biochemical and phenotypical analysis using these mutant cells will be useful to clarify if the striking phenotypes the authors found are specific to nup133 knockout strain (or ORC Nup knockouts) or could be commonly observed in the strains that show NPC clustering. Another interesting point is that Nup159 shows strong interaction with Heh2, even in nup133Δ cells. As the authors mentioned, Nup159-Heh2 interaction may not be sufficient for Heh2-NPC association, but it could be important for NPC clustering.

      Figure 4C: Is it known that rapamycin treatment in this strain did not affect the protein levels of nucleoporins? Otherwise, the authors should confirm this by western blotting (at least some of them).

      Figure 5: The authors mentioned (line 256-257) that "in all cases the punctate, NPC-like distribution of Heh2-GFP was retained (Fig 5D)". However, nup107 KO strain seems to show more diminished punctate staining as compared with other strains. To clarify this, the authors should express mCherry tagged Nup as in Fig. 2 or Fig. 3.

      Minor comments:

      Figure 4A and 4B: The authors should show Scatter plot as in Fig. 2 and Fig. 3.

      Figure 5C: Explanations of the arrowheads is missing in the figure legend.

      Figure 6: Is there any information as to where Heh2 (316-663) is localized in the cell?

      Figure 6B: Nucleoporins should be marked with color circles as in Fig. 1 and Fig. 5.

      Significance

      Heh2 has been implicated in the quality control of NPC assembly, however, the molecular mechanism of how Huh2 interacts and affects NPC assembly/function remained largely unknown. The relationship between Heh2 and specific nucleoporins shown in this study is novel and interesting. While the data are overall good quality and convincing, the current manuscript still lacks the molecular mechanistic insights. In particular, it is not clear if the observed phenotypes are due to structural defects of NPC or NPC clustering.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): The manuscript by Huh et al. reports that oxidative stress causes fragmentation of a specific tyrosine pre-tRNA, leading to two parallel outcomes. First, the fragmentation depletes the mature tRNA, causing translational repression of genes that are disproportionally rich in tyrosine codon. These genes are enriched for those involved in electron transport chain, cell cycle and growth. Second, the fragmentation generates tRNA fragments (tRFs) that bind to two known RNA binding proteins. Finally, the authors identify a nuclease that is needed for efficient formation of tyrosine tRFs. Comment 1: Th­­­­e authors should include a short diagram indicating the various known steps of pre-tRNA fragmentation (perhaps as a supplement) for general readers.

      Response: We thank the reviewer for their suggestion. Pre-tRNA fragmentation is still an unknown field but an initial introduction is best seen from pre-tRNA processing where there is a cleavage event for pre-tRNAs with an intron. This is a complex subject but a recent review from Hopper and Nostramo has done an excellent job in in describing the current field in yeast and vertebrate species (Hopper and Nostramo, Front. Genet., 2019). We have added this citation and new text in the manuscript about pre-tRNA processing for general readers to follow up on. We feel that a supplementary figure might be a bit too brief in describing the knowns and unknowns of pre-tRNA processing and fragmentation.

      Comment 2: I find the enrichment for mitochondrial electron transport chain (ETC) curious. The ETC includes several oxidoreductases, which may be rich in tyrosine as it is a common amino acid used in electron transfer. The depletion of the tyrosine tRNA from among many tRNAs under oxidative stress may not be incidental but related to an attempt by the cell to decrease oxygen consumption to avoid further oxidative damage. The authors could further mine their data to corroborate this hypothesis. For example, are the ETC genes among the targets of the RNA binding proteins targeted by tyrosine tRFs? This could potentially connect the effects of mature tRNA depletion and tRFs.

      Response: We thank the reviewer for this very interesting comment and insight, which had not occurred to us. The relationship between this response and oxidoreductase regulation could be a factor in both the tRNA and tRF modulations seen in our cells. Interestingly, we find that many oxidoreductases genes (such as the NDUF family) are bound by hnRNPA1 by CLIP. In new data, we have done stability experiments with the tRF (new Fig 7E-F) to show the regulon of hnRNPA1 is modulated with overexpression and LNA against the tRF, revealing that this tRNA fragmentation response modulates expression of certain oxidoreductase genes. However, we do not see clear and significant differences for ETC genes in particular. As hnRNPA1 is known to act as both a promoter and destabilizer of genes depending on context, it is likely that further and more detailed work will be needed to parse this hypothesis out in future studies.

      Comment 3: In figure 4A, the authors should provide the tyrosine codon content of the overlap genes and show how much it differs from a randomly selected sample.

      Response: We have identified an error in our manuscript where the overlap actually identifies 109 proteins rather than the 102 reported in the original manuscript. We apologize for this oversight. As for the overlap proteins, we plotted the downstream proteins detected in the proteome by mass spectrometry based off on Tyr-codon content. As explained in the text, the targets we tested were chosen for having higher than median levels of Tyr-codon, as seen in the histogram, and for showing some of the greatest reduction after Tyr tRNA-GUA depletion (Fig S4A). The other proteins found in the overlap will fall in a similar pattern along the histogram.

      Comment 4: Fig.6F, lower panel: the model should show pre-tRNA, as opposed to mature tRNA, because it is the former that is fragmented.

      Response: We apologize for the confusion. The model in Fig 7F was supposed to denote the pre-tRNA with the trailer and leader sequences intact initially, then lost with processing to mature tRNA. To make it clearer, we have now labeled the first species as “Pre-tRNA.”

      Reviewer #1 (Significance (Required)): This study is comprehensive and novel, and includes several orthogonal and complementary approaches to provide convincing evidence for the conclusions. The main discovery is significant because it presents an important advance in post-transcriptional control of gene expression. The process of tRF formation was previously thought not to affect the levels of mature tRNA. This study changes that understanding by describing for the first time the depletion of a specific mature tRNA as its precursor form is fragmented to generate tRFs. Finally, the authors identify DIS3L2 as a nuclease involved in fragmentation. This is also an important finding as the only other suspected nuclease, albeit with contradictory evidence, is angiogenin. Collectively, the findings of this study would be of interest to a broad group of scientists. I only have a few minor comments and suggestions (see above).

      Response: We thank the reviewer for their very positive and insightful comments and feedback.

      REFEREES CROSS-COMMENTING I have the following comments on other reviewers' critiques. Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly? Reviewer 3 raises the issue of cross hybridization in Northern blots. The authors indicate that they "could not detect the other tyrosyl tRNA (tRNA Tyr AUA) in MCF10A cells by northern blot..." (page 6). Also, they gel extracted tRFs and sequenced them (figure S6B), directly identifying the fragments. I think these findings mitigate the concern of cross hybridization and clearly identify the nature of tRFs. Finally, I think that the codon-dependent reporter experiment (figure 5D) addresses many issues surrounding codon dependent vs indirect effects. In that experiment, the authors mutate 5 tyrosine codons of a reporter gene and demonstrate that the encoded protein is less susceptible to repression in response to oxidative stress.

      Response: We thank the reviewer for their tremendous insights. We are in agreement regarding the three points in the cross-comments.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): This very interesting study from Sohail Tavazoie's lab describes the consequences of oxidative stress on the tRNA pool in human epithelial cell lines. As previously described, the authors observed that tRNA fragments were generated upon exposure of cells to ROS. In addition, the authors made the novel observation that specific mature tRNAs were also depleted under these conditions. In particular, the authors focused on tyrosyl tRNA-GUA, which was decreased ~50% after 24 hours of ROS exposure, an effect attributable to a decrease in the pre-tRNA pool. Depletion of tyrosyl tRNA resulted in reduced translation of specific mRNAs that are enriched in tyr codons and likely contributed to the anti-proliferative effects of ROS exposure. In addition, the authors demonstrated that the tRFs produced from tyr tRNA-GUA can interact with specific RNA binding proteins (SSB and hnRNPA1). The major contribution of this paper is the novel finding that stress-induced tRNA fragmentation can result in a measurable reduction of specific mature tRNAs, leading to a selective reduction in translation of mRNAs that are enriched for the corresponding codons. Previously, studies of tRNA fragmentation largely focused on the functions of the tRFs themselves and it was generally believed that the mature tRNA pool was not impacted sufficiently to reduce translation. The findings reported here therefore add a new dimension to our understanding of the cellular consequences of stress-induced tRNA cleavage. Overall, the data are of high quality, the experiments are convincing, and the conclusions are well supported. I have the following suggestions that would further strengthen the study and bolster the conclusions. Comment 1: The authors have not formally demonstrated that the reduction in pre-tRNA in H2O2-treated cells is a consequence of pre-tRNA cleavage. It is possible that reduced transcription contributes to this effect. Pulse-chase experiments with nucleotides such as EU would provide a tractable approach to demonstrate that a labelled pool of pre-tRNA is rapidly depleted upon H2O2 treatment, which would further support their model. Since the response occurs rapidly (within 1 hour), it would be feasible to monitor the rate of pre-tRNA depletion during this time period in control vs. H2O2-treated cells.

      Response: We thank the reviewer for their suggestion and agree that testing for a transcriptional effect using a pulse-chase experiment would further support these findings. We are grateful to both reviewer 1 and reviewer 2 in the cross-comments for recognizing that the tRNA repression response we see is too rapid to be a transcriptional response and that the fact that this tRNA depletion response occurs concomitantly with the tRF generation supports our model that this is a pre-tRNA fragmentation response. It would be of interest for future studies to also examine the impact of cellular stress on tRNA transcription.

      Comment 2: To what extent is the growth arrest that results from H2O2 treatment attributable to tyr tRNA-GUA depletion (Fig. 3A)? Since the reduction in tRNA levels is only partial (~50%), it should be feasible to restore tRNA levels by overexpression (strategy used in Fig. 3E, S3B) and determine whether this measurably rescues growth in H2O2-treated cells.

      Response: We thank the reviewer for their suggestion. Originally, we had also thought of this experiment and attempted to test this hypothesis. Upon experimentation, we ran into technical challenges that prevented us from drawing any conclusions. The problems were that we were unable to develop a cell line that stably overexpressed the Tyr tRNA-GUA and had to settle for a transient overexpression that only lasted for a couple of days (Fig S3B). For transient transfection, we used Lipofectamine 3000 (Invitrogen) that has associated cell toxicities and requires a control RNA transfection in lipofectamine. In addition, H2O2 in itself is a stress. The simultaneous occurrence of these two stresses led to a combination of cell death and cell growth for the control and experimental group. Given the high variability, we were unable to draw any conclusions on cell growth with this combination. We hope to identify a way to stably overexpress Tyr tRNA-GUA in the future to address this hypothesis.

      Comment 3: Knockdown of YARS/tyr tRNA-GUA resulted in reduced expression of EPCAM, SCD, and USP3 at both the protein and mRNA levels (Fig. 4C-D, S4C). In contrast, H2O2-exposure reduced the abundance of these proteins without affecting mRNA levels (Fig. 5A-B, S5A). The authors should comment on this apparent discrepancy. Perhaps translational stalling induces No-Go decay, but it is unclear why this response would not also be triggered by ROS.

      Response: We would like to clarify that out of the three genes in Fig. S5A, only EPCAM mRNA levels were significantly reduced with H2O2-exposure while no changes were observed in the mRNA levels of USP3 or SCD. It is difficult to ascertain the reason for EPCAM mRNA reduction but one hypothesis is due to timing and steady state levels. Levels of mRNAs seen with knockdown of YARS or tRNA represent steady state levels where mRNA decay and transcriptional changes can be easily seen. Following H2O2, the data is collected at 24 hours, which may be before mRNA effects can be fully appreciated. We have edited the text to clarify the uncertainty involved. We agree with the reviewer’s insightful comment and find these differences to be interesting and will consider them in future studies to better understand the interplay between translation and mRNA levels in the context of tRNA depletion.

      Comment 4: In addition to the analyses of ribosome profiling in Fig. 5E-F, it might also be helpful to show a metagene analysis of ribosome occupancy centered upon UAC/UAU codons (for an example, see Figure 2 of Schuller et al., Mol Cell, 2017). This has previously been used as an effective way to visualize ribosome stalling at specific codons. Additionally, do the authors see a global correlation between tyrosine codon density and reduced translational efficiency in tRNA knockdown cells?

      Response: We thank the reviewer for their important suggestion. We have expanded the analysis to look at codon usage scatterplots across all codons for shTyr and shControl replicates (Fig S5D). The 5 most changed codons are labeled with UAC, a codon for the tyrosine amino acid, being the most affected (red arrow). Consistent with our model, a tyrosine codon, when at the ribosome A-site, is most affected with depletion of the corresponding tRNA. The text has also been edited to reflect our new analysis providing further evidence that ribosomal stalling could occur upon depletion of this tRNA. The gray outline around the regression line represents the 95% confidence interval.

      Fig S5D

      As seen in Fig 5F, a significant overlap was noted for genes with the lowest translational efficiency and tyrosine enrichment. We did further analysis to test if a direct and linear relationship exists between tyrosine codon density and reduced translational efficiency on the global scale (i.e. does more stalling occur with more tyrosine codons on a global scale). We again see that a reduced translational efficiency is significantly correlated with tyrosine codon enrichment (above median parameters) in the tRNA knockdown ribosome profiling data. However, our analysis on a direct relationship between codon density and translational efficiency is inconclusive. This analysis is limited given the sequencing depth and number of experimental replicates available and we lack the statistical power to draw strong conclusions. To prevent overstating our claims, we have omitted any conclusions regarding this second analysis.

      Comment 5: MINOR: On pg. 4, the authors state that tRF-tyrGUA is the most highly induced tRF, but Fig. S1B appears to show stronger induction of tRF-LeuTAA.

      Response: The reviewer is correct in that the data from Fig S1B shows Leu-tRFs with higher induction. Our text was meant to suggest we focused on tRF-TyrGUA due to higher band intensity seen on northern blot validation. We have edited the text in the manuscript to clarify this.

      Reviewer #2 (Significance (Required)): The major advance provided by this work is the demonstration that stress-induced tRNA cleavage can reduce the abundance of the mature tRNA pool sufficiently to impact translation. Moreover, the effect on mature tRNAs is selective, resulting in the reduced translation of a specific set of mRNAs under these conditions. These findings reveal previously unknown consequences of oxidative stress on gene expression and will be of interest to scientists working on cellular stress responses and post-transcriptional regulation.

      Response: We thank the reviewer for the kind comments and feedback.

      REFEREES CROSS-COMMENTING Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly? Here is what I was thinking: The generation of tRFs does not generally result in reduction in levels of the mature tRNAs. So you can imagine a scenario where oxidative stress causes tRF generation from the mature tyr tRNA (which does not impact its steady-state levels), as is the case for other tRNAs. At the same time, decreased transcription would reduce the pre-tRNA pool, leading to a delayed reduction in mature tRNA, as observed. However, looking back at the data, I see that after only 5 min of H2O2 treatment, the authors observed reduced pre-tRNA and increased tRFs (Fig. 2A). This seems very fast for a transcriptional response, which would presumably require some kind of signal transduction. In addition, when you consider the amount of tRFs produced in Fig. S2C, it is hard to imagine that this would not impact the mature tRNA pool if they were derived from there. So I agree that the transcriptional scenario seems unlikely. Nevertheless, I think that looking at pre-tRNA degradation directly with the pulse-chase strategy would strengthen their story, so I would like to give the authors this suggestion. However, I am fine with listing this as an optional experiment which would enhance the paper but should not be essential for publication.

      Response: We thank the reviewer for these insightful comments. As mentioned above, five minutes is likely too rapid for a transcriptional response to be the main effect of H2O2 on Tyr-tRNA GUA. Moreover, the concomitant appearance of the tRF at this time-point makes tRNA fragmentation the most parsimonious and likely explanation rather than transcriptional repression, which would not cause a tRNA fragment to occur concurrently. Moreover, extraction and sequencing of the tRF shows it likely derives from the pre-tRNA as a 5’ leader sequence is present. We appreciate the reviewer’s suggestion and scholarly willingness to reassess their own hypothesis.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): The major findings in this manuscript are: 1.) Oxidative stress in human cells causes a decrease in tyrosine tRNA levels and accumulation of tyrosine tRNA fragments; 2.) The depletion of tyrosyl-tRNA synthetase or tyrosine tRNAs in human cells results in altered translation of certain genes and reduced cell growth and 3.) hnRNPA1 and SSB/La can bind tyrosine tRNA fragments. There is also preliminary evidence that the DIS3L2 endonuclease contributes to the appearance of tyrosine tRNA fragments upon oxidative stress. Based upon these results, the Authors conclude that tyrosine tRNA depletion is part of a conserved stress-response pathway to regulate translation in a codon-based manner. **Major comments:** Comment 1: There is a considerable amount of data in this paper and the experiments are performed in a generally rigorous manner. Sufficient details are provided for reproducing the findings and all results have been provided to appropriate databases (RNA-Seq and ribosome profiling).

      Response: We thank the reviewer for the positive comments and feedback.

      Comment 2: The manuscript uses a probe against the 5' half of Tyrosine tRNA for Northern blotting. However, tRNA probes can be prone to cross-hybridization, especially with some tRNA isoacceptors being similar in sequence. Thus, the blots in Figure 2 and Supplemental Figures should be probed with an oligonucleotide against the 3' half of tRNA-Tyr. This will confirm the pre- and mature tRNA-Tyr bands detected with the 5' probe. Moreover, this will determine whether 3' tRNA-Tyr fragments accumulate.

      Response: We agree that the reviewer is correct in suggesting that the 3’ tRNA-Tyr might also accumulate. However, we disagree that any accumulation of the 3’ tRF might be relevant in our particular model for multiple reasons. As supported by reviewer 1’s cross-comments, cross-hybridization between isoacceptors (GUA vs AUA) would be unlikely as Tyr-AUA could not even be detected by the initial 5’ tRF probe. Additionally, the sequences for Tyr-GUA are different with no nucleotide alignment from Tyr-AUA. Furthermore, the extraction and sequencing of the 5’ tRF (Fig S6B) confirms the 5’ leader sequence unique to the pre-tRNA (also noted by reviewer 1). While the 3’ half of many Tyr-GUA are similar, we find selective binding of our RNA binding proteins only to the 5’ tRF. The 3’ tRF may play some role in binding to other proteins in cell regulatory pathways but such experiments would be outside the scope of this study.

      Comment 3: The analysis of the proteomic and ribosome profiling experiments seem rather limited, or based upon what was presented in this manuscript. If additional analyses were performed, then they should be included as well, even if they yielded negative results. For example, the manuscript identifies 102 proteins that decrease after tRNA-Tyr depletion and YARS-depletion with a certain threshold of Tyr codon content. We realize the Authors were trying to find potential genes that are modulated under all three conditions. However, this does not provide information whether there is a relationship between a certain codon such as Tyr and protein abundance if only binning into two categories representing below and above a certain codon content. The Authors should plot the abundance change of each detected protein versus each codon and determine the correlation coefficient. This analysis is important for substantiating the conclusion of a codon-based system of specifically modulating transcripts enriched for certain codons. Otherwise, how could changes in tRNA-Tyr levels modulate codon-dependent gene expression if two different transcripts with the same Tyr codon content exhibit differences in translation? Moreover, this analysis should be performed with all the other codons as well.

      Response: We have identified an error in our manuscript where the overlap identified 109 proteins and not 102 as reported previously. We apologize for this oversight. While the reviewer is correct in that identifying codon dependent changes for all 3500+ proteins detected would offer greater insight, our study was specifically focused on tyrosine as we observed this tRNA to become depleted and our experimental system modulated this specific tRNA. As for the second point on Tyr tRNA level effects on translation, we felt that the most rigorous course would be to assess causality rather than an association for this tRNA and its codon in regulating a target gene. The only way to do this is to perform mutagenesis and reporter studies. Our codon dependent reporter clearly shows a direct effect on translation in a tyrosine-codon dependent manner. As for translational regulation for two different transcripts with the same Tyr codon content, it is unclear the molecular mechanisms that could dictate these differences. The reviewer has already brought up possibilities in the next comment regarding Tyr codons in 5’ or 3’ ends or consecutive Tyr codons. These are all interesting hypotheses that others in the field have devoted entire publications to try and understand how and why codon interactions and localizations impact translation (see Gamble et al., Cell 2016, Kunec and Osterreider, Cell Reports 2016, Gobet et al., PNAS 2020). While these further analyses would be interesting, our current experimental data would be insufficient to properly address these questions. We have focused on a specific tRNA, its fragment, and demonstrated direct effects of the tRNA on the codon-dependent translation of a specific growth-regulating target gene and the tRNA fragment on the modulation of the activity of the RNA binding protein it binds to with respect to its regulon. We believe that these findings individually reveal causal roles for this tRNA and tRF in downstream gene regulation and collectively reveal a previously unappreciated post-transcriptional response. We hope the reviewer agrees with us regarding the already deep extent of the studies and that further such analyses beyond this tRNA are outside the scope and focus of this current study.

      Comment 4: The Authors should provide the specific parameters used to calculate the median abundance of Tyr codons in a protein and the list of proteins containing higher than median abundance of Tyr codon content. Moreover, the complete list of 102 candidate genes should also be provided. This will allow one to determine what percentage of these Tyr-enriched proteins exhibited a decrease in levels. Moreover, is there anything special about these Tyr codon-enriched transcripts where they are affected at the level of translation but not the other Tyr-codon enriched transcripts? For example, are these transcripts enriched at the 5' or 3' ends for Tyr codons? Do these transcripts exhibit multiple consecutive Tyr codons? This deeper analysis would enrich the findings in this manuscript.

      Response: For the proteins identified in the mass spectrometry and overlap listed in Fig 4A, Tyr codon abundance was calculated by dividing the number of Tyr amino acids present by the total number of amino acids for each protein. For genes with different isoforms possible, the principal isoform, using ENSEMBL, was used for calculations. We are also happy to provide the entire list of proteins. Additionally, please see above response to comment 3. We wish to emphasize that the goal of identification of these proteins was to identify downstream targets of this response for functional studies, which we have done. We have identified downstream genes that become modulated by this response and that regulate cell growth, consistent with the phenotype of the tRNA. We then demonstrated a direct causal tRNA-dependent codon-based response with a specific target gene using mutagenesis.

      While we agree that the additional analysis the reviewer is requesting to determine what constitutes heightened translational sensitivity to this response is interesting, we believe this is a challenging question for future studies. It is possible that enrichment at 5’ or 3’ or concentration of tyrosine codons could cause increased sensitivity. Ideally, one would have information on a larger set of proteins so that such challenging questions could be better statistically bolstered. Ultimately, the requested experiments that go beyond our current work would require further analyses and experiments to allow firm conclusions to be drawn. As the other reviewers state and this reviewer agrees, we have uncovered the initial discovery regarding this tRNA fragmentation response and provided mechanistic characterization. Future studies, which are beyond the scope of the current work will undoubtedly further characterize features of this response.

      Comment 5: The ribosome profiling results are condensed into two panels of Figure 5E and 5F. We recommend the ribosome profiling experiment be expanded into its own figure with more extensive analysis and comparison beyond just looking at tRNA-Tyr. This could reveal insight into other codons that are impacted coordinately with Tyr codons and perhaps strengthen their conclusion. As an example of a more thorough analysis of ribosome profiling and proteomics, we point the Authors to this recent paper: Lyu et al. 2020 PLoS Genetics, https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008836

      Response: We thank the reviewer for their suggestion. We have expanded the analysis to look at codon usage scatterplots across all codons for shTyr and shControl replicates (Fig S5D). The 5 most changed codons are labeled with UAC, a codon for the tyrosine amino acid, being the most affected (red arrow). Consistent with our model, a tyrosine codon, when at the ribosome A-site, is most affected with depletion of the corresponding tRNA. The text has also been edited to reflect our new analysis providing further evidence that ribosomal stalling might occur with depletion of a given tRNA. The gray outline around the regression line represents the 95% confidence interval.

      Fig S5D

      Comment 6: Moreover, one would expect that the mRNAs encoding USP3, EPCAM and SCD would exhibit increased ribosome occupancy. Thus, the authors should at least provide relative ribosome occupancy information on these transcripts to provide evidence that the decrease in protein levels is indeed linked to ribosome pausing or stalling.

      Response: We would like to emphasize that resolution of ribosomal profiling data at the codon level for specific genes requires a high number of reads and replicates to draw accurate conclusions. There is an inherent level of stochasticity when mapping RPFs to specific genes and as a result, our analysis revolved around Tyr-enriched vs Tyr-low populations as this analysis was appropriate for our sequencing depth and number of replicates. To be able to conclusively make claims regarding ribosome pausing or stalling for specific genes, we would likely need further experimentation than can be currently done. However, we are currently conducting the requested bioinformatic analysis and have promising preliminary transcript-level data supporting our model.

      Comment 7: The results with hnRNPA1 and SSB/La are extremely preliminary and simply show binding of tRNA fragments but no biological relevance. We realize that the Authors attempted to see if Tyr-tRNA fragments impacted RNA Pol III RNA but found no effect. A potential experiment would be to perform HITS-CLIP on H2O2-treated cells to see if stress-induced tRNA fragments bind to SSB/La or hnRNPA1. In this case, at least the Authors would link the oxidative stress results found in Figure 1 and 2 with La/SSB and hnRNPA1.

      Response: We agree with the reviewer that a tRF function was not established in the manuscript. As a result, we have recently completed experiments looking at mRNA stability of the hnRNPA1 regulon in the context of overexpressing the tRF as well as using LNA to inhibit this Tyr-tRF (Fig 7E-F). Our data shows, in an hnRNPA1-dependent manner, that its regulon can be functionally regulated by Tyr-tRF. With tRF overexpression and RNAi-mediated depletion of hnRNPA1, a right shift in transcript stability is seen. Importantly, when we do the converse experiment with tRF inhibition in the same RNAi-mediated reduction of hnRNPA1, we see a left shift. These complementary experiments provide data that the Tyr-tRF has a functional role when bound to hnRNPA1 by modulating the regulon of hnRNPA1 and expand the scope of this manuscript and extend the pathway defined downstream of this tRNA fragmentation event.

      Fig 7E-F

      Comment 8: The manuscript concludes that "Tyrosyl tRNA-GUA fragments are generated in a DIS3L2-dependent manner" based upon data in Supplemental Figure S7. However, there is still a substantial amount of tyrosine tRNA fragments in both worms and human cells depleted of DIS3L2. Thus, DIS3L could play a role in the formation of Tyrosine tRNA fragments but it is too strong a claim to say that tRNA fragments are "dependent" upon DIS3L2. We suggest that the Authors soften their conclusions.

      Response: While there are certainly tRFs still apparent with DIS3L2 depletion (Fig S7F-I), we note significant impairment of tRF induction with DIS3L2 knockdown/knockout with multiple different methods in C. elegans and human cells. This data supports our conclusion that tRF generation is dependent on DIS3L2 as this ribonuclease is necessary to elicit the full Tyr-tRF response. We do not make claims that Tyr-tRFs are solely or completely dependent on DIS3L2. There must be other RNases involved given the data highlighted by the reviewer. To this point, we have added clarifying text that DIS3L2 depletion does not completely eliminate the tRF induction.

      Comment 9: Moreover, what is the level of DIS3L2 depletion in the worm and human cell lines? The Authors should provide the immunoblot of DIS3L2 that was described in the Materials and Methods.

      Response: An immunoblot of DIS3L2 depletion in human cells has now been added as a supplementary figure (Fig S7I). Depletion in C. elegans was confirmed through sequencing of a mutation, as is standard in the field. The wild-type PCR product is 1nt longer (859 bp) than the mutant product (858 bp) with CTC to TAG nonsynonymous mutation preceding a single nucleotide deletion.

      Wild-type disl-2: GTTGAAGCCGCAGGGC[CTC]ACTCAGACAGCTACAGG

      disl-2 (syb1033): GTTGAAGCCGCAGGGC[TAG]-CTCAGACAGCTACAGG

      Fig S7I

      Comment 10: The key conclusions of "a tRNA-regulated growth suppressive oxidative stress response pathway" and an "underlying adaptive codon-based gene regulatory logic inherent to the genetic code" are overstated. This is because of the major caveat that knockdown of tyrosine-tRNA or tyrosyl-tRNA synthetase are likely to trigger numerous indirect effects. While the authors validate that three proteins are expressed at lower levels under all three conditions (H2O2, tRNA-Tyr and YARS), they might overlap in some manner but not necessarily define a coordinated response. Thus, a glaring gap in this paper is a clear, mechanistic link between H2O2-induced changes in translation versus the changes in expression when either tRNA-Tyr or YARS is depleted. Thus, it is too preliminary to conclude that tRNA depletion is part of a "pathway" and "regulatory logic" when it could all be pleiotropic effects. At the very least, the authors should discuss the possibility of indirect effects to provide a more nuanced discussion of the results obtained using two different cell systems and oxidative stress.

      Response: We thank the reviewer for the feedback. While we agree that indirect effects may exist, we do not make any claims that our pathway is the only one required to have translation effects. The text for Fig 4A already acknowledges the pleiotropic effects of tRNA depletion. Our data shows that H2O2 stress leads to a depletion of Tyr tRNA-GUA and that depletion of this tRNA through multiple complementary methods has a codon-dependent effect on protein expression. We hope the reviewer agrees that the reduction of a specific target gene in a tyrosine codon-dependent manner (demonstrated by mutagenesis) and the binding of the tRF directly to an RBP and the modulation of the regulon of this RBP by this tRF (demonstrated by gain- and loss-of-function studies) demonstrates a direct role of this response on specific downstream target genes rather than pleiotropy. This is in keeping with the cross-comments of reviewer 1, where Fig 5D shows a direct Tyr codon link between H2O2 and downstream effects. As a result, we feel that our conclusions of a pathway (not the only pathway) are valid. However, the conclusion of a “regulatory logic” might not be interpreted in the same way by all readers and we have thus changed the text to reflect a more nuanced position.

      **Minor comments:** Comment 11: Tyrosyl-tRNAs refers to the aminoacylated form of tRNA. We recommend that all instances of tyrosyl-tRNA be changed to tyrosine tRNA or tRNA-Tyr which is more generic and provides no indication as to the aminoacylation status of a tRNA.

      Response: We thank the reviewer for their correction. We have changed all instances of “tyrosyl” to “tyrosine” in the text.

      Comment 12: In Figure 5C, the promoter is drawn as T7, which is a bacteriophage promoter. While the plasmid used in this manuscript (psiCHECK2) does contain a T7 promoter, mammalian gene expression is driven from the SV40 promoter. Thus, the relevant label in Figure 5C should be "SV40 promoter". Moreover, additional details should be provided on how the construct was made (such as sequence information etc.).

      Response: We thank the reviewer for their correction. We have changed the promoter text in the figure. In the methods for the construct, we have included which USP3 was used and would be happy to include further information if requested.

      Comment 13: Please provide original blots for each of the replicates in: Figure 4C, n=4 Figure 4A, n=9 Figure 4D, n=3 Figure 5D, n=3

      Response: There appears to be an unintentional mislabeling of the requested blots by the reviewer. The original blots for Fig 4C, Fig 5A, Fig 5D, and Fig 6D have been made available in a separate file for reviewers.

      Reviewer #3 (Significance (Required)): This manuscript provides evidence that specific tRNAs are depleted upon oxidative stress as part a conserved stress-response pathway in humans (and worms) to regulate translation in a codon-based manner. Unfortunately, the manuscript attempts to tie together results from different conditions and systems without providing any definitive links that suggest a "pathway" involved in the oxidative stress response. The findings in this paper provide a useful starting point but fall short of being a major advance due to the lack of a clear mechanism. However, there are intriguing results in this manuscript based upon the cell lines depleted of tRNA-Tyr or tyrosine synthetase that could interest researchers in the field of tRNA biology.

      Response: We thank the reviewer for the positive comments regarding our demonstration of a conserved stress response, acknowledging the intriguing nature of our findings that will be a starting point for future studies and that our work will be of interest to researchers in the field of tRNA biology. We hope that the very positive comments of reviewer 1 and 2, the cross-comments of reviewer 1 in response to reviewer 3’s comments regarding the specificity of this response, and our inclusion for reviewer 3 of additional data on the function of the tRF in regulating the activity of the hnRNPA1 RNA binding protein defining a post-transcriptional pathway and additional corroborating requested codon-level computational analyses provide compelling support that that our findings indeed represent a major advance for the field.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The major findings in this manuscript are: 1.) Oxidative stress in human cells causes a decrease in tyrosine tRNA levels and accumulation of tyrosine tRNA fragments; 2.) The depletion of tyrosyl-tRNA synthetase or tyrosine tRNAs in human cells results in altered translation of certain genes and reduced cell growth and 3.) hnRNPA1 and SSB/La can bind tyrosine tRNA fragments. There is also preliminary evidence that the DIS3L2 endonuclease contributes to the appearance of tyrosine tRNA fragments upon oxidative stress. Based upon these results, the Authors conclude that tyrosine tRNA depletion is part of a conserved stress-response pathway to regulate translation in a codon-based manner.

      Major comments:

      •There is a considerable amount of data in this paper and the experiments are performed in a generally rigorous manner. Sufficient details are provided for reproducing the findings and all results have been provided to appropriate databases (RNA-Seq and ribosome profiling).

      •The manuscript uses a probe against the 5' half of Tyrosine tRNA for Northern blotting. However, tRNA probes can be prone to cross-hybridization, especially with some tRNA isoacceptors being similar in sequence. Thus, the blots in Figure 2 and Supplemental Figures should be probed with an oligonucleotide against the 3' half of tRNA-Tyr. This will confirm the pre- and mature tRNA-Tyr bands detected with the 5' probe. Moreover, this will determine whether 3' tRNA-Tyr fragments accumulate.

      •The analysis of the proteomic and ribosome profiling experiments seem rather limited, or based upon what was presented in this manuscript. If additional analyses were performed, then they should be included as well, even if they yielded negative results. For example, the manuscript identifies 102 proteins that decrease after tRNA-Tyr depletion and YARS-depletion with a certain threshold of Tyr codon content. We realize the Authors were trying to find potential genes that are modulated under all three conditions. However, this does not provide information whether there is a relationship between a certain codon such as Tyr and protein abundance if only binning into two categories representing below and above a certain codon content. The Authors should plot the abundance change of each detected protein versus each codon and determine the correlation coefficient. This analysis is important for substantiating the conclusion of a codon-based system of specifically modulating transcripts enriched for certain codons. Otherwise, how could changes in tRNA-Tyr levels modulate codon-dependent gene expression if two different transcripts with the same Tyr codon content exhibit differences in translation? Moreover, this analysis should be performed with all the other codons as well.

      •The Authors should provide the specific parameters used to calculate the median abundance of Tyr codons in a protein and the list of proteins containing higher than median abundance of Tyr codon content. Moreover, the complete list of 102 candidate genes should also be provided. This will allow one to determine what percentage of these Tyr-enriched proteins exhibited a decrease in levels. Moreover, is there anything special about these Tyr codon-enriched transcripts where they are affected at the level of translation but not the other Tyr-codon enriched transcripts? For example, are these transcripts enriched at the 5' or 3' ends for Tyr codons? Do these transcripts exhibit multiple consecutive Tyr codons? This deeper analysis would enrich the findings in this manuscript.

      •The ribosome profiling results are condensed into two panels of Figure 5E and 5F. We recommend the ribosome profiling experiment be expanded into its own figure with more extensive analysis and comparison beyond just looking at tRNA-Tyr. This could reveal insight into other codons that are impacted coordinately with Tyr codons and perhaps strengthen their conclusion. As an example of a more thorough analysis of ribosome profiling and proteomics, we point the Authors to this recent paper: Lyu et al. 2020 PLoS Genetics, https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008836

      •Moreover, one would expect that the mRNAs encoding USP3, EPCAM and SCD would exhibit increased ribosome occupancy. Thus, the authors should at least provide relative ribosome occupancy information on these transcripts to provide evidence that the decrease in protein levels is indeed linked to ribosome pausing or stalling.

      •The results with hnRNPA1 and SSB/La are extremely preliminary and simply show binding of tRNA fragments but no biological relevance. We realize that the Authors attempted to see if Tyr-tRNA fragments impacted RNA Pol III RNA but found no effect. A potential experiment would be to perform HITS-CLIP on H2O2-treated cells to see if stress-induced tRNA fragments bind to SSB/La or hnRNPA1. In this case, at least the Authors would link the oxidative stress results found in Figure 1 and 2 with La/SSB and hnRNPA1.

      •The manuscript concludes that "Tyrosyl tRNA-GUA fragments are generated in a DIS3L2-dependent manner" based upon data in Supplemental Figure S7. However, there is still a substantial amount of tyrosine tRNA fragments in both worms and human cells depleted of DIS3L2. Thus, DIS3L could play a role in the formation of Tyrosine tRNA fragments but it is too strong a claim to say that tRNA fragments are "dependent" upon DIS3L2. We suggest that the Authors soften their conclusions.

      •Moreover, what is the level of DIS3L2 depletion in the worm and human cell lines? The Authors should provide the immunoblot of DIS3L2 that was described in the Materials and Methods.

      •The key conclusions of "a tRNA-regulated growth suppressive oxidative stress response pathway" and an "underlying adaptive codon-based gene regulatory logic inherent to the genetic code" are overstated. This is because of the major caveat that knockdown of tyrosine-tRNA or tyrosyl-tRNA synthetase are likely to trigger numerous indirect effects. While the authors validate that three proteins are expressed at lower levels under all three conditions (H2O2, tRNA-Tyr and YARS), they might overlap in some manner but not necessarily define a coordinated response. Thus, a glaring gap in this paper is a clear, mechanistic link between H2O2-induced changes in translation versus the changes in expression when either tRNA-Tyr or YARS is depleted. Thus, it is too preliminary to conclude that tRNA depletion is part of a "pathway" and "regulatory logic" when it could all be pleiotropic effects. At the very least, the authors should discuss the possibility of indirect effects to provide a more nuanced discussion of the results obtained using two different cell systems and oxidative stress.

      Minor comments:

      •Tyrosyl-tRNAs refers to the aminoacylated form of tRNA. We recommend that all instances of tyrosyl-tRNA be changed to tyrosine tRNA or tRNA-Tyr which is more generic and provides no indication as to the aminoacylation status of a tRNA.

      •In Figure 5C, the promoter is drawn as T7, which is a bacteriophage promoter. While the plasmid used in this manuscript (psiCHECK2) does contain a T7 promoter, mammalian gene expression is driven from the SV40 promoter. Thus, the relevant label in Figure 5C should be "SV40 promoter". Moreover, additional details should be provided on how the construct was made (such as sequence information etc.).

      •Please provide original blots for each of the replicates in:

      Figure 4C, n=4

      Figure 4A, n=9

      Figure 4D, n=3

      Figure 5D, n=3

      Significance

      This manuscript provides evidence that specific tRNAs are depleted upon oxidative stress as part a conserved stress-response pathway in humans (and worms) to regulate translation in a codon-based manner. Unfortunately, the manuscript attempts to tie together results from different conditions and systems without providing any definitive links that suggest a "pathway" involved in the oxidative stress response. The findings in this paper provide a useful starting point but fall short of being a major advance due to the lack of a clear mechanism. However, there are intriguing results in this manuscript based upon the cell lines depleted of tRNA-Tyr or tyrosine synthetase that could interest researchers in the field of tRNA biology.

      This review is written from the perspective of a researcher with expertise in RNA processing, RNA biology and translation regulation.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This very interesting study from Sohail Tavazoie's lab describes the consequences of oxidative stress on the tRNA pool in human epithelial cell lines. As previously described, the authors observed that tRNA fragments were generated upon exposure of cells to ROS. In addition, the authors made the novel observation that specific mature tRNAs were also depleted under these conditions. In particular, the authors focused on tyrosyl tRNA-GUA, which was decreased ~50% after 24 hours of ROS exposure, an effect attributable to a decrease in the pre-tRNA pool. Depletion of tyrosyl tRNA resulted in reduced translation of specific mRNAs that are enriched in tyr codons and likely contributed to the anti-proliferative effects of ROS exposure. In addition, the authors demonstrated that the tRFs produced from tyr tRNA-GUA can interact with specific RNA binding proteins (SSB and hnRNPA1).

      The major contribution of this paper is the novel finding that stress-induced tRNA fragmentation can result in a measurable reduction of specific mature tRNAs, leading to a selective reduction in translation of mRNAs that are enriched for the corresponding codons. Previously, studies of tRNA fragmentation largely focused on the functions of the tRFs themselves and it was generally believed that the mature tRNA pool was not impacted sufficiently to reduce translation. The findings reported here therefore add a new dimension to our understanding of the cellular consequences of stress-induced tRNA cleavage.

      Overall, the data are of high quality, the experiments are convincing, and the conclusions are well supported. I have the following suggestions that would further strengthen the study and bolster the conclusions.

      1.The authors have not formally demonstrated that the reduction in pre-tRNA in H2O2-treated cells is a consequence of pre-tRNA cleavage. It is possible that reduced transcription contributes to this effect. Pulse-chase experiments with nucleotides such as EU would provide a tractable approach to demonstrate that a labelled pool of pre-tRNA is rapidly depleted upon H2O2 treatment, which would further support their model. Since the response occurs rapidly (within 1 hour), it would be feasible to monitor the rate of pre-tRNA depletion during this time period in control vs. H2O2-treated cells.

      2.To what extent is the growth arrest that results from H2O2 treatment attributable to tyr tRNA-GUA depletion (Fig. 3A)? Since the reduction in tRNA levels is only partial (~50%), it should be feasible to restore tRNA levels by overexpression (strategy used in Fig. 3E, S3B) and determine whether this measurably rescues growth in H2O2-treated cells.

      3.Knockdown of YARS/tyr tRNA-GUA resulted in reduced expression of EPCAM, SCD, and USP3 at both the protein and mRNA levels (Fig. 4C-D, S4C). In contrast, H2O2-exposure reduced the abundance of these proteins without affecting mRNA levels (Fig. 5A-B, S5A). The authors should comment on this apparent discrepancy. Perhaps translational stalling induces No-Go decay, but it is unclear why this response would not also be triggered by ROS.

      4.In addition to the analyses of ribosome profiling in Fig. 5E-F, it might also be helpful to show a metagene analysis of ribosome occupancy centered upon UAC/UAU codons (for an example, see Figure 2 of Schuller et al., Mol Cell, 2017). This has previously been used as an effective way to visualize ribosome stalling at specific codons. Additionally, do the authors see a global correlation between tyrosine codon density and reduced translational efficiency in tRNA knockdown cells?

      5.MINOR: On pg. 4, the authors state that tRF-tyrGUA is the most highly induced tRF, but Fig. S1B appears to show stronger induction of tRF-LeuTAA.

      Significance

      The major advance provided by this work is the demonstration that stress-induced tRNA cleavage can reduce the abundance of the mature tRNA pool sufficiently to impact translation. Moreover, the effect on mature tRNAs is selective, resulting in the reduced translation of a specific set of mRNAs under these conditions. These findings reveal previously unknown consequences of oxidative stress on gene expression and will be of interest to scientists working on cellular stress responses and post-transcriptional regulation.

      REFEREES CROSS-COMMENTING

      Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly?

      Here is what I was thinking: The generation of tRFs does not generally result in reduction in levels of the mature tRNAs. So you can imagine a scenario where oxidative stress causes tRF generation from the mature tyr tRNA (which does not impact its steady-state levels), as is the case for other tRNAs. At the same time, decreased transcription would reduce the pre-tRNA pool, leading to a delayed reduction in mature tRNA, as observed.

      However, looking back at the data, I see that after only 5 min of H2O2 treatment, the authors observed reduced pre-tRNA and increased tRFs (Fig. 2A). This seems very fast for a transcriptional response, which would presumably require some kind of signal transduction. In addition, when you consider the amount of tRFs produced in Fig. S2C, it is hard to imagine that this would not impact the mature tRNA pool if they were derived from there. So I agree that the transcriptional scenario seems unlikely.

      Nevertheless, I think that looking at pre-tRNA degradation directly with the pulse-chase strategy would strengthen their story, so I would like to give the authors this suggestion. However, I am fine with listing this as an optional experiment which would enhance the paper but should not be essential for publication.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Huh et al. reports that oxidative stress causes fragmentation of a specific tyrosine pre-tRNA, leading to two parallel outcomes. First, the fragmentation depletes the mature tRNA, causing translational repression of genes that are disproportionally rich in tyrosine codon. These genes are enriched for those involved in electron transport chain, cell cycle and growth. Second, the fragmentation generates tRNA fragments (tRFs) that bind to two known RNA binding proteins. Finally, the authors identify a nuclease that is needed for efficient formation of tyrosine tRFs.

      The authors should include a short diagram indicating the various known steps of pre-tRNA fragmentation (perhaps as a supplement) for general readers.

      I find the enrichment for mitochondrial electron transport chain (ETC) curious. The ETC includes several oxidoreductases, which may be rich in tyrosine as it is a common amino acid used in electron transfer. The depletion of the tyrosine tRNA from among many tRNAs under oxidative stress may not be incidental but related to an attempt by the cell to decrease oxygen consumption to avoid further oxidative damage. The authors could further mine their data to corroborate this hypothesis. For example, are the ETC genes among the targets of the RNA binding proteins targeted by tyrosine tRFs? This could potentially connect the effects of mature tRNA depletion and tRFs.

      In figure 4A, the authors should provide the tyrosine codon content of the overlap genes and show how much it differs from a randomly selected sample.

      Fig.6F, lower panel: the model should show pre-tRNA, as opposed to mature tRNA, because it is the former that is fragmented.

      Significance

      This study is comprehensive and novel, and includes several orthogonal and complementary approaches to provide convincing evidence for the conclusions. The main discovery is significant because it presents an important advance in post-transcriptional control of gene expression. The process of tRF formation was previously thought not to affect the levels of mature tRNA. This study changes that understanding by describing for the first time the depletion of a specific mature tRNA as its precursor form is fragmented to generate tRFs. Finally, the authors identify DIS3L2 as a nuclease involved in fragmentation. This is also an important finding as the only other suspected nuclease, albeit with contradictory evidence, is angiogenin. Collectively, the findings of this study would be of interest to a broad group of scientists. I only have a few minor comments and suggestions (see above).

      REFEREES CROSS-COMMENTING

      I have the following comments on other reviewers' critiques.

      Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly?

      Reviewer 3 raises the issue of cross hybridization in Northern blots. The authors indicate that they "could not detect the other tyrosyl tRNA (tRNA Tyr AUA) in MCF10A cells by northern blot..." (page 6). Also, they gel extracted tRFs and sequenced them (figure S6B), directly identifying the fragments. I think these findings mitigate the concern of cross hybridization and clearly identify the nature of tRFs.

      Finally, I think that the codon-dependent reporter experiment (figure 5D) addresses many issues surrounding codon dependent vs indirect effects. In that experiment, the authors mutate 5 tyrosine codons of a reporter gene and demonstrate that the encoded protein is less susceptible to repression in response to oxidative stress.

  4. Aug 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to the References

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript Yan et al describe a method to perform imaging based pooled CRISPR screens based on photoactivation followed by selection and sorting of the cells with the desired phenotypes.

      They establish a system in mammalian RPE-1 cells where they integrate a photo-activatable mCherry, identify the cells of interest under the microscope based on a phenotype, automatically activate the mCherry fluorescence in these cells and then sort the desired populations by FACS. They demonstrate the reliability of their enrichment method and finally use this approach to look for factors that regulate nuclear size by a targeted pooled CRISPR screen.

      **Major points:**

      1.This year Hassle et al described a very very similar approach that they name: Visual Cell Sorting . In this case, they use a photoconvertible fluorescent protein (green-to-red conversion) to select cells with a certain visual cellular phenotype and enrich those by FACS. The Hassle et al 2020 MSB paper is only mentioned together with the other methods in the introduction in one sentence (ref #19 in this manuscript):

      " Recently, several in situ sequencing15,16 and cell isolation methods17-20 were developed which allow microscopes to be used for screening. However, these methods contain non-high throughput steps that limit their scalability."

      I think the current citation of the Hassle et al paper, is not really fair. The idea and the execution of the two approaches are almost exactly the same. Here, the authors concentrate on a CRISPR based application, but obviously the applications of the method are not limited to that. The authors should discuss how these similar ideas can be used in several different applications.

      We agree with the reviewer that we need to describe more about the Hasle et al. paper (now ref #20 in the revised manuscript) and expand our description of other applications that could be performed with the method. For this purpose, we have made the following changes:

      We have modified the relevant paragraph in the Introduction.

      p.3 the second paragraph

      Recently, an imaging based method named “visual cell sorting” was described that uses the photo-convertible fluorescent protein Dendra2 to enrich phenotypes optically, enabling pooled genetic screens and transcription profiling(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020). Here, we developed an analogous approach to execute an imaging-based pooled CRISPR screen using optical enrichment by automated photo-activation of the photo-activatable fluorescent protein, PA-mCherry.

      We have also added the following paragraph in the Discussion.

      p.14 line 1

      In our study, optical enrichment was utilized for pooled CRISPR screens on phenotypes identifiable through microscopy. However, optical enrichment can be used for other purposes, as demonstrated previously(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020). In a recent study by Hasle et al.(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020), the process of separating cells by FACS after optical enrichment was termed “visual cell sorting”. This method was used to evaluate hundreds of nuclear localization sequence variants in a pooled format and to identify transcriptional regulatory pathways associated with paclitaxel resistance using single cell sequencing(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020), demonstrating the broad applicability and power of this approach beyond CRISPR screening.

      1. While I understand that the authors mean conversion from the dark state to fluorescent state when they describe their photo-activatable mCherry, I think the term "photo-activation" can be confusing for the general reader since typically photo-conversion refers to a change in color. I would here suggest stick to the term photo-activation.

      We thank the reviewer for pointing this out and to avoid future confusion, we restricted the usage of photo-conversion to specifically indicate conversion of fluorescence from one color into another: e.g. when talking about the published visual cell sorting paper in which Dendra2 is used as a photo-convertible fluorescent protein. We use photo-activation in reference to the activation of PA-mCherry in our work.

      1. For validation of the hits coming from the nuclear size screen: Did the authors have any controls making sure that the right targets were down-regulated? This might be obvious for some of the targets (e.g. CPC proteins that are known to induce division errors display the nuclear fragmentation that the authors also observe) but especially for the ones that are less known or unknown to induce any nuclear size change, it will be important to demonstrate the specificity of the targets.

      For validating hits coming from the nuclear size screen, we have verified the successful transduction of corresponding sgRNA constructs by FACS analysis, but have not confirmed the knockdown. Before final journal publication, we propose to perform rt-qPCR on our 15 gene hits before and after knockdown to measure the percentage of knockdown separately.

      In addition, it is not clear from the figure legends and the material and methods if these phenotypes are verified by 3-4 gRNAs they use in the validation. Are the histograms representative of a single experiment with one gRNA or a combination of gRNAs in different experiments? Methods of replication of the data presented in Fig4 is unclear.

      We apologize for the confusion. These phenotypes were verified with pools of 3-4 sgRNAs and the histograms are representative of a single replicate infected with a mixed 3-4 sgRNA pool. We have modified the legend to Figure 5 (original Fig. 4) and the method section to explain this point.

      Minor points:

      1. Related to major point #3: I could not find much experimental info on how the hits from the screen were verified in materials and methods.

      The description of the experiment and information about the selected sgRNAs has been added in the Method section as follows:

      p.23

      Verification of hits from nuclear size screen

      For each hit in the nuclear size screen, the two sgRNAs with the highest phenotypic score in the screen and the two sgRNAs with the highest score predicted by the CRISPRi-v2 algorithm24 were selected and pooled to generate a mixed sgRNA pool of 3-4 sgRNAs (detailed information in Supplementary file 8). Cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were transduced with pooled sgRNAs targeting each gene and puromycin selected for 2 days to prepare for imaging. Cells were then seeded into 96-well glass bottom imaging dishes. Images were collected the next day and nuclear size was measured using the Auto-PhotoConverter µManager plugin. To focus on cells with successful transduction, BFP was co-expressed on the sgRNA construct and only cells with BFP intensity above a threshold value were included in nuclear size measurements. This BFP threshold was established by comparing the average BFP intensity of cells with and without sgRNA transduction (Fig.S3a).

      We agree with this important point and have changed the figure legend of Fig. 5c (original Fig. 4c) to just describe the plot:

      c, The ratios between median level of nuclear size measured from microscopy and H2B-mGFP fluorescence or FSC signal measured from FACS after knockdown, were plotted separately. TACC3, confirmed to be a control gene, was used for comparison (Grey bar).

      The typo has been corrected.

      Reviewer #1 (Significance (Required)):

      I think the idea of performing pooled screens coupled to microscopy is exciting and this approach has definitely more potential than the Craft-ID approach that the authors also discuss in their manuscript. In addition, the approach that is described in this manuscript is convincing and although the fact that the analysis part will require more work (to adapt the software to recognise different types of phenotypic readouts) in the future to make it accessible to the scientific community, the authors present sufficient evidence that the system can be robust. They also present some clever ideas such as to calculate enrichments with different photo-activation times (2sec vs 100ms) followed by separation of these populations by FACS.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Yan et al. present optical enrichment, a method for conducing pooled optical screens. Optical enrichment works by combining microscopy to mark cells of interest using the PA-mCherry photo-activatable fluorescent protein with FACS to recover them. The method is similar to other methods (Photostick, Visual Cell Sorting), and provides an alternative to in situ sequencing/FISH methods. The authors use optical enrichment to conduct a pooled optical CRISPRi screen for nuclear size. They identify and exhaustively validate hits, showing that optical enrichment works for its intended purpose. The development of a uManager protocol and discussion of the number of sgRNA's required for a genetic screen using optical enrichment were welcome. The authors' reported throughput of 1.5 million cells per eight hour experiment is impressive; and the demonstrated use of low cell number input for next generation sequencing appears promising. Overall, the manuscript is well written, the methods clear and the claims supported by the data presented.

      **General comments**

      -I found the analysis and scoring methods to be lacking, both in terms of the clarity of description and in terms of what was actually done. The authors might consider using established methods (eg https://www.biorxiv.org/content/10.1101/819649v1.full). In any case, they should revise the text to clarify what was done and address the other concerns raised below.

      -Relatedly, details regarding how to perform the experiments described are lacking. It is not clear from the text, figures, "Online Methods" section, and Supplementary Files whether all imaging is performed before activation, or whether each field of view is subject to an individual round of imaging followed by activation. It is also unclear whether cells in 96 well plates are sorted as 96 separate tubes or pooled into a single tube prior to sorting. Furthermore, at a minimum, the following details are requested for each optical enrichment "run". These details are critical considerations for those who seek to use optical enrichment in their own laboratories:

      Seeding density

      Time elapsed (in hours) between cell plating and optical enrichment

      The number of fields of view examined

      The median number of cells per field of view; the proportion of each plate's surface area that is imaged and photo-converted

      The total time taken (in hours) to perform imaging and photoconversion

      The gating protocol used for sorting by FACS (preferably including a figure with example gates for one or two experiments). The gating protocol is described for the genetic screen but not for the control experiments.

      We agree with the reviewer and apologize for the confusion that arose from our description. We also thank the reviewer for suggesting using established methods. However, MAUDE, an analysis for sorting-based CRISPR screen with multiple expression bins, might not be suitable for our study since 1) the distribution of mCherry fluorescence intensity is a reflection of photo-activation efficiency and not sgRNA effect 2) only one sorting bin is collected for each experimental condition. Our analysis is adapted from an existing method from the Weissman lab (https://github.com/mhorlbeck/ScreenProcessing).

      We agree with the reviewer regarding clarifying other points and rewrote the following part in the Method section:

      p. 20

      mIFP proof-of-principle screen, Nuclear size screen, FSC screen and H2B-mGFP screen

      For the mIFP proof-of-principle screen, mIFP positive cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP mIFP-NLS) and mIFP negative cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were stably transduced with the “mIFP sgRNA library” (CRISPRa library with 860 elements, see Supplementary file 5) and the “control sgRNA library” (CRISPRa library with 6100 elements, see Supplementary file 6) separately. For the nuclear size screen, FSC screen and H2B-mGFP screen, cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were stably transduced with the “nuclear size library” (CRISPRi library with 6190 elements, see Supplementary file 7). To guarantee that cells receive no more than one sgRNA per cell, BFP was expressed on the same sgRNA construct and cells were analyzed by FACS the day after transduction. The experiment only continued when 10-15% of the cells were BFP positive. These cells were further enriched by puromycin selection (a puromycin resistance gene was expressed from the sgRNA construct) for 3 days to prepare for imaging. For FSC and H2B-mGFP screens, cells were then subjected to FACS sorting. Cells before FACS (unsorted sample for FSC and H2B-mGFP screens) and top 10% cells based on either FSC signal (high FSC sample) or GFP fluorescence signal (high GFP sample) were separately collected and prepared for high throughput sequencing. For mIFP proof-of-principle screen and nuclear size screen, cells were then seeded into 96-well glass bottom imaging dishes (Matriplate, Brooks) and imaged starting from the morning of the next day (around 15 hr after plating). A series of densities ranging from 0.5E4 cells/well to 2.5E4 cells/well with 0.5E4 cells/well interval were selected and seeded. The imaging dish with cells around 70% confluency was selected to be screened on the imaging day. For mIFP proof-of-principle screen, a single imaging plate was performed for each replicate while 4 imaging plates per replicate were imaged for the nuclear size screen. When executing multiple imaging runs, 2 consecutive runs could be imaged on the same day (day run and night run). 64 (8x8, day run) or 81 (9x9, night run) fields of view were selected for each imaging well and each field of view was subjected to an individual round of imaging directly followed by photo-activation. Around 200-250 cells were present in each given field of view and 60% to 80% surface area of each well was covered. Either mIFP positive cells or cells passing the nuclear size filter were identified and photo-activated automatically using the Auto-PhotoConverter µManager plugin. The total time to perform imaging and photo-activation of a single 96-well imaging dish with around 1.5 million cells was around 8 hr. The night run generally took longer, since more fields of view were included than in the day run. Cells were then harvested by trypsinization and pooled into a single tube for isolation by FACS. Sorting gates were pre-defined using samples with different photo-activation times (e.g. 0s, 200ms, 2s) and detailed gating strategies are described in Supplementary file 1. Sorted samples were used to prepare sequencing samples.

      -The authors use PA-mCherry. There are a variety of other photo-activatable fluorophores available, and it would be good for them to comment on why they chose PA-mCherry. Also, since the method is supposed to be used for generic pooled optical screens, it would be good for the authors to comment on what colors remain available for imaging cellular structures.

      To address these, we have added the following sentences:

      p. 4 line 16

      A photo-activatable fluorescent protein was chosen over a photo-convertible fluorescent protein to increase the number of channels available for imaging. PA-mCherry was chosen to leave the better performing green channel open for labeling of other cellular features. Moreover, non-activated PA-mCherry has low background fluorescence in the mCherry channel (Fig. S1b), and it can be activated to different intensities when photo-activated for various amounts of time.

      p. **14 line 10

      Phenotypes of interest should be identifiable under the microscope and generally require fluorescent labeling. Commonly used fluorescence microscopes use four channels for fluorescent imaging with little spectral overlap: blue, green, red and far red. In our study, the red channel was occupied by cell labeling with PA-mCherry and the blue channel was used to estimate sgRNA transduction efficiency. Since sgRNA transduction efficiency can be measured by other approaches, the blue channel could be used together with the remaining two channels to label cellular structures. Combining bright field imaging with deep learning can be used to reconstruct the localization of fluorescent labels(Ounkomol, C.; Seshamani, S.; Maleckar, M. M.; Collman, F.; Johnson, G. R. 2018), making it possible to use bright field imaging to further expand the phenotypes that can be studied with our technique.

      -In general, the figures are hard to read, with most space being dedicated to beautiful but complex schematics/workflows. Points and fonts should be bigger, and the authors should consider revising the schematics to take up less space.

      We thank the reviewer for this remark and revised all figures accordingly. Points and fonts were enlarged, and schematics were simplified or removed.

      -There is extensive use of editorialzing adverbs. Adverbs such as "highly" (abstract and page 15), "easily" (pages 4 and 11), "completely" (page 11), and "only" (page 12) are unnecessary at best and unsupported by the data at worst (e.g. cells are not "completely" separable with 100 ms photo-conversion, see page 11 and Figure 1C). Please remove "completely" from page 11 and consider removing other adverbs as well.

      We agree with the reviewer and the following adverbs have been removed: “highly” in abstract and page 15; “easily” on pages 4 and 11; “completely” on page 11 and three “only” on page 12.

      -Apologies if I missed it, but I couldn't find a data availability statement. Sequencing reads from the experiments should be deposited in SRA or GEO and made available upon publication.

      We apologize that we missed this, and the sequencing data has been deposited to GEO (GSE156623) which will be made available before final publication. The following part has been added to address this.

      p. 24

      DATA AND SOFTWARE AVAILABILITY

      The raw and processed data for the high throughput sequencing results have been deposited in NCBI GEO database with the accession number (GSE156623). The plugin Auto-PhotoConverter developed for open source microscope control software μManager(Edelstein, A. D.; Tsuchida, M. A.; Amodaj, N.; Pinkard, H.; Vale, R. D.; Stuurman, N. 2014) has been deposited on github (https://github.com/nicost/mnfinder).

      **Specific comments**

      Pages 5/6 - The authors present experiments that show that optical enrichment is highly specific for desired cells. But, they should consider presenting precision (fraction of called positives that are true positive) and recall (fraction of all true positives that are called positive) instead. I think these relate more directly to a pooled optical screen than specificity.

      We apologize for our poor terminology. Our original definition of “specificity” is the same as “precision” suggested by the reviewer. To avoid future confusion, we have changed all relevant occurrences of “specificity” into “precision”. The following sentence was modified to clarify the definition:

      p. 5 line 15

      To evaluate the precision (the fraction of called positives that are true positives) of this assay, all cells were collected and analyzed by FACS after image analysis and photo-activation (Fig. 2d and 2e). We calculated precision as the fraction of photo-activated cells (mCherry positive cells) that are true positives (mIFP-mCherry double positive cells) (Fig. 2f).

      Measuring recall is complicated because the microscope is unable to visit all locations in the imaging plate, hence recall will depend on the fraction of cells actually “seen” by the microscope. For the screening strategy employed in the nuclear size screen, recall is not as important as precision, since lower recall rates are compensated for by screening larger cell numbers. We therefore did not attempt to measure recall directly.

      Page 6 - Related to the above point, the authors state "These results indicate the assay yields reliable hit identification regardless of the percentage of hits in the library." This statement seems too strong given that the authors looked at specificity experimentally with a mixture of ~1% mIFP positive cells. In fact, hits might be much less than 1% of the total population of cells, and specificity would certainly fall from the 80% measured at 1% of the total population. The authors should do a bit more to fairly discuss their ability to find rare hits.

      We agree with the reviewer and have changed the following description:

      p. 5 line 20

      The precision varied with the initial percentage of mIFP positive cells and ranged from 80% to ~100% (initial percentage of mIFP positive cells ranging between 2.3% and 43.7%) (Fig. 2f). Precision is expected to fall below 80% with initial percentage of mIFP positive cells less than 2.3%. However, these results indicate that optical enrichment can be used to identify hits with high precision even at relatively low hit rates.

      Pages 6/7 - The authors perform a validation experiment using two different sgRNA libraries, infecting mIFP- and mIFP+ cells separately. Then, they demix these populations via optical enrichment, sequence and compute a phenotype score for sgRNAs or groups of sgRNAs. The way the experiment is described and visualized is extremely confusing. If I understood correctly (and I am not sure that I did), the bottom right panel of Figure 2b shows that if sgRNAs are (randomly?) paired AND two replicates are combined then optical enrichment nearly perfectly separates all (combined, paired) sgRNAs in the two libraries. The authors should rewrite this section, especially clarifying what is meant by "1 sgRNA/group and 2 sgRNA/group," and consider changing Figure 2b (perhaps just show the lower right panel?).

      We apologize for our confusing description. To avoid the confusion, we rewrote the paragraph describing the experiment and added a schematic (Fig. 3a) to better describe this experiment. We also simplified the result by just presenting the lower right panel of original Fig. 2b (current Fig. 3b) and moved the other data into supplementary figures (Fig. S2).

      p. 6 line 4

      mIFP negative cells and mIFP positive cells were separately infected with two different CRISPRa sgRNA libraries (6100 sgRNAs for mIFP negative cells; 860 sgRNAs for mIFP positive cells) at a low multiplicity of infection (MOI) to guarantee a single sgRNA per cell. Note that in these experiments, the sgRNAs only function as barcodes to be read out by sequencing, but do not cause phenotypic changes as the cells do not express corresponding CRISPR reagents. These two populations were then mixed at a ratio of 9:1 mIFP negative cells: mIFP positive cells. We again used mIFP expression as our phenotype of interest (outlined in Fig. 3a). Two biological replicates were performed and at least 200-fold coverage of each sgRNA library was guaranteed throughout the screen, including library infection, puromycin selection, imaging/photo-activation and FACS.

      Page 8 - Related to Supplementary Figure 3, why are there not clear BFP+ and BFP- populations but instead one continuous population? How was the gating determined (e.g. how was the boundary between red and gray picked)? Here, and generally, flow plots and histograms of flow plots should indicate the number of cells. If replicates were performed, they should be included.

      We have clarified our description. There are no clear BFP+ and BFP- populations but instead one continuous population due to the background expression of BFP from the dCas9 construct: dCas9-KRAB-BFP (which is now clearly indicated in the manuscript). On top of the dCas9-KRAB-BFP, another BFP is encoded on the sgRNA construct, which leads to a higher BFP expression level.

      There was no gating in the experiment, the grey dots in the figure represents wild type cells without viral transduction while the red dots (partially covered by the grey dots) were cells infected with the two negative control sgRNAs. We mistakenly wrote the legend of original Fig. S3 (current Fig. S3a) that these were FACS data; however, the data were acquired by imaging. We apologize for the confusion and thank the reviewer for detecting the issue. We completely rewrote the legend to Fig. S3a (original Fig. S3) to clarify.

      We now include the number of cells analyzed and the number of replicates for the other flow plots and histograms in the manuscript.

      Page 8 - "Nuclear sizes...". The authors should say in the main text what size metric was used.

      To address the reviewer’s point, we have included the following sentence:

      p. 8 line 23

      We defined nuclear size as the 2D area in square microns measured by H2B-mGFP using an epifluorescence microscope, as determined by automated image analysis (Fig. 4a and Supplementary file 2).

      Page 9 - I am a little confused about the statistical analysis of the screen. In Supplementary File 1, the authors state that p-values were "calculated based on comparison between the distribution of all the phenotypic scores of sgRNAs targeting to the gene/assigning in the group and the one of negative control sgRNAs in the libraries." I presume this means that all phenotypic scores (across replicates) of all sgRNAs targeting each gene were included in a Mann Whitney U test with a single randomized set of phenotypic scores. If that's right, it seems like an odd way to get p-values. Better would be a randomization test, where a null distribution of phenotypic scores for each gene is built by randomizing sgRNA-level scores many times. Then the actual phenotypic score is compared to the randomized null distribution, yielding a p-value. In any case, the authors must clarify what they did in the main text and Supplementary File 1.

      Page 9 - It does not appear that the p-values presented in Figure 3c have been adjusted for multiple hypothesis testing. This should be done.

      Page 9 - "A value of the top 0.1 percentile of control groups was used as a cutoff for hits." Why? This seems arbitrary. It seems like appropriate false-discovery rate control would enable a more rigorous method for choosing a cutoff.

      Page 9 - The same comments regarding analysis and scoring of the optical enrichment screen applies to the FSC and GFP screens.

      We clarified the description of the statistical analysis of the screen (see new/changed text below). Mann-Whitney p-values for the two replicates were calculated independently. The Mann-Whitney U test was not performed against a randomized set of phenotypic scores, but using the phenotypic scores of the 22 control non-targeting sgRNAs that were part of the library. Because there are only 22 control sgRNAs (adding more control sgRNAs would increase the size of the library, and reduce the number of genes that can be screened within a given amount of time), the statistical significance of testing genes against these controls is not expected to be very high, and using direct approaches such as multiple hypothesis testing are not expected to yield hits. Instead, we calculated a score combining the severity (phenotypic score) and the trustworthiness (Mann-Whitney p value) of the phenotype (a method previously developed in the Weissman lab at UCSF: https://github.com/mhorlbeck/ScreenProcessing24). We thank the reviewer for suggesting using false discovery rate control as a better method for choosing a cutoff. We modified our original analysis and now determine the threshold of our score based on a calculated empirical false discovery rate (eFDR). We used this approach to maximize the number of true hits and relied on a repeat of the screen and follow-up testing of hits to narrow down true hits. We added the following part in the method section and added an analysis example to the supplementary files (Supplementary file 9)."

      p. 22

      Bioinformatic analysis of the screen

      Analysis was based on the ScreenProcessing pipeline developed in the Weissman lab (https://github.com/mhorlbeck/ScreenProcessing)**(Horlbeck, M. A.; Gilbert, L. A.; Villalta, J. E.; Adamson, B.; Pak, R. A.; Chen, Y.; Fields, A. P.; Park, C. Y.; Corn, J. E.; Kampmann, M.; Weissman, J. S. 2016). The phenotypic score (ε) of each sgRNA was quantified as previously defined(Kampmann, M.; Bassik, M. C.; Weissman, J. S. 2013)** (Supplementary file 9). For the mIFP proof-of-principle screen, phenotypic score of each group was the average score of two sgRNAs assigned to the group and averaged between two replicates except otherwise described. For the nuclear size screen, FSC screen and H2B-mGFP screen, genes were scored based on the average phenotypic scores of the sgRNAs targeting them. For the nuclear size screen, phenotypic scores were further averaged between 4 runs for each replicate. For the nuclear size screen, FSC screen and H2B-mGFP screen, sgRNAs were first clustered by transcription start site (TSS) and scored by the Mann-Whitney U test against 22 non-targeting control sgRNAs included in the library. Since only 22 control sgRNAs were included, significance of hits was assessed by comparison with simulated negative controls that were generated by random assignment of all sgRNAs in the library and phenotypic scores of these simulated negative controls were scored in the same way as phenotypic scores for genes. A score η that includes the phenotypic score and its significance was calculated for each gene and simulated negative control. The optimal cut-off for score η was determined by calculating an empirical false discovery rate (eFDR) at multiple values of η as the number of simulated negative controls with score η higher than the cut-off (false positives) divided by the sum of genes and simulated negative controls with score η higher than the cut-off (all positives). The cut-off score η resulting in an eFDR of 0.1% was used to call hits for further analysis (Supplementary file 9). An example analysis is described in detail in Supplementary file 9 and raw counts and phenotypic scores for all four screens are listed in Supplementary file 10 and 11.

      Page 9 - "These data suggest that a direct measurement utilizing a microscope can provide significant improvement in hit yield even for phenotypes that could be indirectly screened with other approaches." I think this conclusion is too strong. It rests on the assumption that the FSC/GFP phenotypes should have the same set of hits as the microscope phenotype (larger nuclear area). This may not be the case. For example, genes whose inactivation increases GFP expression would be hits in the former, but not latter case. The authors should moderate this statement.

      We agree with the reviewer and have changed the sentence into:

      p. 10 line 17

      These data suggest that a direct measurement utilizing a microscope can provide different information and reveal hits that are inaccessible using other screening approaches.

      Page 11 - "This is significantly faster than the in situ methods." The authors should provide a citation and an actual comparison to the speed of in situ methods.

      We agree with the reviewer and have modified the sentence with a citation:

      p. 12 line 20

      This is significantly faster than in situ methods which process millions of cells over a period of a few days(Feldman, D.; Singh, A.; Schmid-Burgk, J. L.; Carlson, R. J.; Mezger, A.; Garrity, A. J.; Zhang, F.; Blainey, P. C. 2019).

      Page 12 - I think the authors could say a bit more about the possibility of low hit rate screens. How low do they think it is feasible to go? What hit rates are expected based on existing arrayed optical screens?

      We have added more description in the discussion section:

      p. 13 the second paragraph

      Optical enrichment screening also is possible for phenotypic screens with relatively low hit rates (defined as the fraction of all genes screened that are true hits). The ability to detect hits at low hit rates in our method depends on multiple factors, including: 1) the penetrance of the phenotype; 2) cellular fitness effect of the phenotype; 3) detection and photo-activation accuracy of the phenotype; 4) limitations imposed by FACS recovery and sequencing sample preparations of low cell numbers. The first three factors vary with the phenotype of interest. We optimized the genomic DNA preparation protocol (Methods), and are now able to process sequencing samples from a few thousand cells, enabling screens of low hit rate phenotypes. In our nuclear size screen, more than 1.5 millions cells were analyzed during each run with 2000-4000 cells recovered after FACS sorting. The hit rate of this screen was 2.76%, similar to optical CRISPR screens performed in an arrayed format(de Groot, R.; Luthi, J.; Lindsay, H.; Holtackers, R.; Pelkmans, L. 2018)**, demonstrating the possibility to apply our approach to investigate phenotypes with low hit rates.

      Page 14 - It is weird that the discussion includes a fairly important couple of paragraphs that seem to belong in the results (e.g. the text surrounding Figure 4b and c). Obviously, I don't want to prescribe stylistic changes, but I suggest the authors consider moving this description of the experiments/analyses to the results.

      The relevant description has been moved to the results.

      Page 14 - The authors validate their hits individually, and observe that expression of hit sgRNAs does increase nuclear size in some cells. But, many/most cells remain control-like in these validation experiments. The authors should comment on why this is the case (e.g. inefficient knockdown, cell cycle effects, etc).

      To address this point, we have added the following sentences in legend of Fig. 5:

      The cell population is heterogeneous due to inefficient knockdown, incomplete puromycin selection, and penetrance of the phenotype. A BFP was expressed from the same sgRNA construct. Only cells with high BFP intensity, indicating successfully sgRNA transduction, were included for data analysis as described in Methods.

      Page 14 - It would be nice to formally compare the control and sgRNA distributions in each panel of 4a and Supplementary Figure 5 (e.g. with a Komolgorov-Smirnov test, etc). That would allow a more precise statement to be substituted for "14 out of 15 hits (the exception was TACC3) were confirmed to be real hits, with cells exhibiting larger nuclei after knock down (Fig. 4a and Fig. S5)," which is not quantitative.

      We applied the Kolmogorov-Smirnov test and the corresponding sentence was changed into:

      p. 10 last line

      *14 out of 15 hits were confirmed to be real hits (Kolmogorov-Smirnov test two tailed p-value

      Figure 2a - I am not sure it is necessary to show the entire workflow again. The first and possibly last panels are the informative ones here.

      Figure 3a - Same comment as above - these workflow panels take up a lot of real estate and I suggest simplifying them if possible.

      The figures were simplified to just show the example images.

      Figure 3c - At least on my PDF/screen, the "scrambled control" points appear very light gray and are impossible to find. They should be an easier to spot color.

      We agree with the reviewer and changed the color.

      Figure 4b - "Most cells developed a larger cellular size and higher H2B-mGFP level after knock down." I think it would be more accurate to say that the median cell size/GFP level increased, or that some cells developed larger sizes/median GFP levels.

      We agree with the reviewer’s point; “most” has been changed to “some”.

      Figure 4c - I don't understand "Normalized FITC/nuclear size." Do the bars show the mean/median of a population (if so, why not show a dot plot or box plot or violin plot)? Also, what is FITC (I presume it's GFP levels)?

      Figure 4c - "Most cells maintained a constant ratio between nuclear size and DNA content..." I'm not sure where DNA content came from. Are the authors assuming that their H2B-mGFP is a proxy for DNA content? Or was some other measurement made? If the former, is there a citable reason why this is a good assumption?

      The bars represent the ratio of the median level of H2B-mGFP intensity (the axis is now labeled with "GFP" rather than "FITC", the colloquial name for the channel used on the FACS machine) measured by FACS and the median nuclear size of the same population of cells measured by microscopy. We plan to perform additional experiments to measure DNA content using a DNA dye in the same cell by microscopy so that we will be able to correlate these on a cell by cell basis. Data will be added before final publication.

      Reviewer #2 (Significance (Required)):

      I don't generally comment on significance in reviews. Since ReviewCommons is specifically asking, I'll say that this manuscript describes optical enrichment, a method that is an extension of previous work and is substantially similar to a previously published method, Visual Cell Sorting. However, given the timing, it is obvious that these authors have been working independently on optical enrichment. Since the application is distinct, and optical enrichment incorporates some nice features like software to make it easier to execute, it is clearly of independent value.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This study reports a rapid and high-throughput CRISPR-based phenotypic screen approach consisting of selecting cells with phenotypes of interest, label them by photo-conversion and isolating them by FACS. The idea of the method is interesting (has been around) in principle. The key advantage is that is relatively simple, accessible to many groups as it does not require robotics. However, the manuscript is so badly written and hard to follow, that it makes it difficult to judge the technology, to really understand how the experiments were done and whether the results are interpreted correctly. Strictly speaking, it is unclear whether and how good scientific practices GSP have been followed, as the description of the experiments is sometimes lacking totally. Consequently, it is impossible to seriously evaluate this study and judge whether the technology described is really promising. It is probably less sensitive than arrayed screens, in all likelihood can miss hits that affect growth, cannot capture as many phenotypic classes as one would like from high-content screens and the computational and experimental workflow is more complicated. It is puzzling that the authors don't even compare the results with arrayed screens which are of course the current gold-standard.

      We do not in any way claim that the presented method replaces arrayed screens. However, most current sgRNA libraries are pooled libraries, and the few available arrayed sgRNA libraries are expensive and difficult to maintain, hence our methods to screen pooled sgRNA libraries are timely and useful. Comparisons with arrayed screens are unwarranted as no claims are made with respect to arrayed screens.

      We have clarified the manuscript in many places, and hope it is now readable and better understandable by more readers with diverse backgrounds.

      **Specific points:**

      The specificity test (Fig 1) does not make sense how it is described. If the authors spike a certain percentage of cells that can be photoconverted, when analysing the outcome, there will be three classes: mIFP positive, mIFP/mCherry positive and negative. How can they calculate specificity if they do not know whether they converted all mIFP cells? Also the formula used is questionable or is her an error? Furthermore, it is totally unclear how many cells were used and how they were scanned. If they took 90 negative cells and 10 mIFP cells, getting them all back is easy. If they start with 10e9 cells, the specificity should be quantified. Furthermore, the phenotype they pick is an easy and convenient one. Much more challenging is to apply it on a multi-parametric phenotype. Again, this is now the gold standard.

      We used the term specificity inadvertently and should have used precision, as also pointed out by Referee 2. This has been corrected in the current manuscript. We picked the mIFP phenotype as this was a proof of principle screen to clarify the performance of our screening approach and needed a phenotype that can be measured both by microscopy and FACS. We demonstrate that multi-parametric read-outs are possible, but do not think that the first demonstration of new technology needs such an application.

      In their first sgRNA assay, it is not possible to have a clear idea of what groups they are talking about. Do they mean they get phenotypic signatures which they group? How? They need to describe what they do. Here, only ~3500 genes are scanned (the 6843 is both populations and you only select from the mIFP neg population) and it took them 8hrs. This means for the genome it would require ~60h which is indeed fast. However, this experiment is not clearly described. They cannot select the negative population since there is no fluorescent marker (except false positive which are around 1.7%). So I assume they just randomly pick cells (they should really explain much better what they do!). Why go through the hassle? If these sequences are supposed to be a negative population, just pick them in the computer. Also, they cannot calculate an enrichment compared to the negative population, since two different libraries were infected. Again, I can't follow.

      We improved the description of this experiment. To clarify, we used mIFP in a proof of concept screen to validate whether sgRNAs infecting mIFP positive cells can be distinguished from those infecting mIFP negative cells No phenotypic signature other than the mIFP signal is used (as described in the text). As customary in pooled screens, a primary comparison was made between the positive (optically selected) cells and the complete population. To improve the clarity of this screen, we further described the concept of pooled sgRNA screens, which may have made this section harder to follow.

      I find their results about calculating scores based only on true negatives surprising. The average phenotypic score is improved from 3 to 5, which is enormous. This suggests that the phenotypes induced in the mIFP population are extremely common. These results are hard to interpret given the poor description of the experiment. It is possible that it is the same dataset as in 1, but in that case, the false negatives must be rare since the negatives can be selected by absence of both mCherry and mIFP.

      There are no phenotypes induced in the mIFP population (as now explicitly explained in the text). The mIFP population is isolated using optical enrichment, and we test our ability to discriminate the sgRNAs present in the enriched population. It is unsurprising that comparing to the negatively selected population (which is not possible in most other pooled screens) is significantly better than comparing against the total population (as customary in pooled screens).

      In the nuclear size screen, 6000 sgRNAs were screened. To array so many sequences would require 20 plates. They required ~40h for imaging one replicate. This is slow, imagine the time with a 60x lens.

      There are no arrayed screens performed in our study.

      Reviewer #3 (Significance (Required)):

      Overall, there is no sufficient evidence in this manuscript to convince this reviewer that this method is valid and truly powerful. I cannot support publication in its present form.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This study reports a rapid and high-throughput CRISPR-based phenotypic screen approach consisting of selecting cells with phenotypes of interest, label them by photo-conversion and isolating them by FACS. The idea of the method is interesting (has been around) in principle. The key advantage is that is relatively simple, accessible to many groups as it does not require robotics. However, the manuscript is so badly written and hard to follow, that it makes it difficult to judge the technology, to really understand how the experiments were done and whether the results are interpreted correctly. Strictly speaking, it is unclear whether and how good scientific practices GSP have been followed, as the description of the experiments is sometimes lacking totally. Consequently, it is impossible to seriously evaluate this study and judge whether the technology described is really promising. It is probably less sensitive than arrayed screens, in all likelihood can miss hits that affect growth, cannot capture as many phenotypic classes as one would like from high-content screens and the computational and experimental workflow is more complicated. It is puzzling that the authors don't even compare the results with arrayed screens which are of course the current gold-standard.

      Specific points:

      The specificity test (Fig 1) does not make sense how it is described. If the authors spike a certain percentage of cells that can be photoconverted, when analysing the outcome, there will be three classes: mIFP positive, mIFP/mCherry positive and negative. How can they calculate specificity if they do not know whether they converted all mIFP cells? Also the formula used is questionable or is her an error? Furthermore, it is totally unclear how many cells were used and how they were scanned. If they took 90 negative cells and 10 mIFP cells, getting them all back is easy. If they start with 10e9 cells, the specificity should be quantified. Furthermore, the phenotype they pick is an easy and convenient one. Much more challenging is to apply it on a multi-parametric phenotype. Again, this is now the gold standard.

      In their first sgRNA assay, it is not possible to have a clear idea of what groups they are talking about. Do they mean they get phenotypic signatures which they group? How? They need to describe what they do. Here, only ~3500 genes are scanned (the 6843 is both populations and you only select from the mIFP neg population) and it took them 8hrs. This means for the genome it would require ~60h which is indeed fast. However, this experiment is not clearly described. They cannot select the negative population since there is no fluorescent marker (except false positive which are around 1.7%). So I assume they just randomly pick cells (they should really explain much better what they do!). Why go through the hassle? If these sequences are supposed to be a negative population, just pick them in the computer. Also, they cannot calculate an enrichment compared to the negative population, since two different libraries were infected. Again, I can't follow.

      I find their results about calculating scores based only on true negatives surprising. The average phenotypic score is improved from 3 to 5, which is enormous. This suggests that the phenotypes induced in the mIFP population are extremely common. These results are hard to interpret given the poor description of the experiment. It is possible that it is the same dataset as in 1, but in that case, the false negatives must be rare since the negatives can be selected by absence of both mCherry and mIFP.

      In the nuclear size screen, 6000 sgRNAs were screened. To array so many sequences would require 20 plates. They required ~40h for imaging one replicate. This is slow, imagine the time with a 60x lens.

      Significance

      Overall, there is no sufficient evidence in this manuscript to convince this reviewer that this method is valid and truly powerful. I cannot support publication in its present form.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Yan et al. present optical enrichment, a method for conducing pooled optical screens. Optical enrichment works by combining microscopy to mark cells of interest using the PA-mCherry photo-activatable fluorescent protein with FACS to recover them. The method is similar to other methods (Photostick, Visual Cell Sorting), and provides an alternative to in situ sequencing/FISH methods. The authors use optical enrichment to conduct a pooled optical CRISPRi screen for nuclear size. They identify and exhaustively validate hits, showing that optical enrichment works for its intended purpose. The development of a uManager protocol and discussion of the number of sgRNA's required for a genetic screen using optical enrichment were welcome. The authors' reported throughput of 1.5 million cells per eight hour experiment is impressive; and the demonstrated use of low cell number input for next generation sequencing appears promising. Overall, the manuscript is well written, the methods clear and the claims supported by the data presented.

      General comments

      -I found the analysis and scoring methods to be lacking, both in terms of the clarity of description and in terms of what was actually done. The authors might consider using established methods (eg https://www.biorxiv.org/content/10.1101/819649v1.full). In any case, they should revise the text to clarify what was done and address the other concerns raised below.

      -Relatedly, details regarding how to perform the experiments described are lacking. It is not clear from the text, figures, "Online Methods" section, and Supplementary Files whether all imaging is performed before activation, or whether each field of view is subject to an individual round of imaging followed by activation. It is also unclear whether cells in 96 well plates are sorted as 96 separate tubes or pooled into a single tube prior to sorting. Furthermore, at a minimum, the following details are requested for each optical enrichment "run". These details are critical considerations for those who seek to use optical enrichment in their own laboratories: • Seeding density • Time elapsed (in hours) between cell plating and optical enrichment • The number of fields of view examined • The median number of cells per field of view; the proportion of each plate's surface area that is imaged and photo-converted • The total time taken (in hours) to perform imaging and photoconversion • The gating protocol used for sorting by FACS (preferably including a figure with example gates for one or two experiments). The gating protocol is described for the genetic screen but not for the control experiments.

      -The authors use PA-mCherry. There are a variety of other photo-activatable fluorophores available, and it would be good for them to comment on why they chose PA-mCherry. Also, since the method is supposed to be used for generic pooled optical screens, it would be good for the authors to comment on what colors remain available for imaging cellular structures.

      -In general, the figures are hard to read, with most space being dedicated to beautiful but complex schematics/workflows. Points and fonts should be bigger, and the authors should consider revising the schematics to take up less space.

      -There is extensive use of editorialzing adverbs. Adverbs such as "highly" (abstract and page 15), "easily" (pages 4 and 11), "completely" (page 11), and "only" (page 12) are unnecessary at best and unsupported by the data at worst (e.g. cells are not "completely" separable with 100 ms photo-conversion, see page 11 and Figure 1C). Please remove "completely" from page 11 and consider removing other adverbs as well.

      -Apologies if I missed it, but I couldn't find a data availability statement. Sequencing reads from the experiments should be deposited in SRA or GEO and made available upon publication.

      Specific comments

      Pages 5/6 - The authors present experiments that show that optical enrichment is highly specific for desired cells. But, they should consider presenting precision (fraction of called positives that are true positive) and recall (fraction of all true positives that are called positive) instead. I think these relate more directly to a pooled optical screen than specificity.

      Page 6 - Related to the above point, the authors state "These results indicate the assay yields reliable hit identification regardless of the percentage of hits in the library." This statement seems too strong given that the authors looked at specificity experimentally with a mixture of ~1% mIFP positive cells. In fact, hits might be much less than 1% of the total population of cells, and specificity would certainly fall from the 80% measured at 1% of the total population. The authors should do a bit more to fairly discuss their ability to find rare hits.

      Pages 6/7 - The authors perform a validation experiment using two different sgRNA libraries, infecting mIFP- and mIFP+ cells separately. Then, they demix these populations via optical enrichment, sequence and compute a phenotype score for sgRNAs or groups of sgRNAs. The way the experiment is described and visualized is extremely confusing. If I understood correctly (and I am not sure that I did), the bottom right panel of Figure 2b shows that if sgRNAs are (randomly?) paired AND two replicates are combined then optical enrichment nearly perfectly separates all (combined, paired) sgRNAs in the two libraries. The authors should rewrite this section, especially clarifying what is meant by "1 sgRNA/group and 2 sgRNA/group," and consider changing Figure 2b (perhaps just show the lower right panel?).

      Page 8 - Related to Supplementary Figure 3, why are there not clear BFP+ and BFP- populations but instead one continuous population? How was the gating determined (e.g. how was the boundary between red and gray picked)? Here, and generally, flow plots and histograms of flow plots should indicate the number of cells. If replicates were performed, they should be included.

      Page 8 - "Nuclear sizes...". The authors should say in the main text what size metric was used.

      Page 9 - I am a little confused about the statistical analysis of the screen. In Supplementary File 1, the authors state that p-values were "calculated based on comparison between the distribution of all the phenotypic scores of sgRNAs targeting to the gene/assigning in the group and the one of negative control sgRNAs in the libraries." I presume this means that all phenotypic scores (across replicates) of all sgRNAs targeting each gene were included in a Mann Whitney U test with a single randomized set of phenotypic scores. If that's right, it seems like an odd way to get p-values. Better would be a randomization test, where a null distribution of phenotypic scores for each gene is built by randomizing sgRNA-level scores many times. Then the actual phenotypic score is compared to the randomized null distribution, yielding a p-value. In any case, the authors must clarify what they did in the main text and Supplementary File 1.

      Page 9 - It does not appear that the p-values presented in Figure 3c have been adjusted for multiple hypothesis testing. This should be done.

      Page 9 - "A value of the top 0.1 percentile of control groups was used as a cutoff for hits." Why? This seems arbitrary. It seems like appropriate false-discovery rate control would enable a more rigorous method for choosing a cutoff. Page 9 - The same comments regarding analysis and scoring of the optical enrichment screen applies to the FSC and GFP screens.

      Page 9 - "These data suggest that a direct measurement utilizing a microscope can provide significant improvement in hit yield even for phenotypes that could be indirectly screened with other approaches." I think this conclusion is too strong. It rests on the assumption that the FSC/GFP phenotypes should have the same set of hits as the microscope phenotype (larger nuclear area). This may not be the case. For example, genes whose inactivation increases GFP expression would be hits in the former, but not latter case. The authors should moderate this statement.

      Page 11 - "This is significantly faster than the in situ methods." The authors should provide a citation and an actual comparison to the speed of in situ methods.

      Page 12 - I think the authors could say a bit more about the possibility of low hit rate screens. How low do they think it is feasible to go? What hit rates are expected based on existing arrayed optical screens?

      Page 14 - It is weird that the discussion includes a fairly important couple of paragraphs that seem to belong in the results (e.g. the text surrounding Figure 4b and c). Obviously, I don't want to prescribe stylistic changes, but I suggest the authors consider moving this description of the experiments/analyses to the results.

      Page 14 - The authors validate their hits individually, and observe that expression of hit sgRNAs does increase nuclear size in some cells. But, many/most cells remain control-like in these validation experiments. The authors should comment on why this is the case (e.g. inefficient knockdown, cell cycle effects, etc).

      Page 14 - It would be nice to formally compare the control and sgRNA distributions in each panel of 4a and Supplementary Figure 5 (e.g. with a Komolgorov-Smirnov test, etc). That would allow a more precise statement to be substituted for "14 out of 15 hits (the exception was TACC3) were confirmed to be real hits, with cells exhibiting larger nuclei after knock down (Fig. 4a and Fig. S5)," which is not quantitative.

      Figure 2a - I am not sure it is necessary to show the entire workflow again. The first and possibly last panels are the informative ones here.

      Figure 3a - Same comment as above - these workflow panels take up a lot of real estate and I suggest simplifying them if possible.

      Figure 3c - At least on my PDF/screen, the "scrambled control" points appear very light gray and are impossible to find. They should be an easier to spot color.

      Figure 4b - "Most cells developed a larger cellular size and higher H2B-mGFP level after knock down." I think it would be more accurate to say that the median cell size/GFP level increased, or that some cells developed larger sizes/median GFP levels.

      Figure 4c - I don't understand "Normalized FITC/nuclear size." Do the bars show the mean/median of a population (if so, why not show a dot plot or box plot or violin plot)? Also, what is FITC (I presume it's GFP levels)?

      Figure 4c - "Most cells maintained a constant ratio between nuclear size and DNA content..." I'm not sure where DNA content came from. Are the authors assuming that their H2B-mGFP is a proxy for DNA content? Or was some other measurement made? If the former, is there a citable reason why this is a good assumption?

      Significance

      I don't generally comment on significance in reviews. Since ReviewCommons is specifically asking, I'll say that this manuscript describes optical enrichment, a method that is an extension of previous work and is substantially similar to a previously published method, Visual Cell Sorting. However, given the timing, it is obvious that these authors have been working independently on optical enrichment. Since the application is distinct, and optical enrichment incorporates some nice features like software to make it easier to execute, it is clearly of independent value.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript Yan et al describe a method to perform imaging based pooled CRISPR screens based on photoactivation followed by selection and sorting of the cells with the desired phenotypes. They establish a system in mammalian RPE-1 cells where they integrate a photo-activatable mCherry, identify the cells of interest under the microscope based on a phenotype, automatically activate the mCherry fluorescence in these cells and then sort the desired populations by FACS. They demonstrate the reliability of their enrichment method and finally use this approach to look for factors that regulate nuclear size by a targeted pooled CRISPR screen.

      Major points:

      1.This year Hassle et al described a very very similar approach that they name: Visual Cell Sorting . In this case, they use a photoconvertible fluorescent protein (green-to-red conversion) to select cells with a certain visual cellular phenotype and enrich those by FACS. The Hassle et al 2020 MSB paper is only mentioned together with the other methods in the introduction in one sentence (ref #19 in this manuscript):

      " Recently, several in situ sequencing15,16 and cell isolation methods17-20 were developed which allow microscopes to be used for screening. However, these methods contain non-high throughput steps that limit their scalability."

      I think the current citation of the Hassle et al paper, is not really fair. The idea and the execution of the two approaches are almost exactly the same. Here, the authors concentrate on a CRISPR based application, but obviously the applications of the method are not limited to that. The authors should discuss how these similar ideas can be used in several different applications.

      1. While I understand that the authors mean conversion from the dark state to fluorescent state when they describe their photo-activatable mCherry, I think the term "photo-activation" can be confusing for the general reader since typically photo-conversion refers to a change in color. I would here suggest stick to the term photo-activation.
      2. For validation of the hits coming from the nuclear size screen: Did the authors have any controls making sure that the right targets were down-regulated? This might be obvious for some of the targets (e.g. CPC proteins that are known to induce division errors display the nuclear fragmentation that the authors also observe) but especially for the ones that are less known or unknown to induce any nuclear size change, it will be important to demonstrate the specificity of the targets. In addition, it is not clear from the figure legends and the material and methods if these phenotypes are verified by 3-4 gRNAs they use in the validation. Are the histograms representative of a single experiment with one gRNA or a combination of gRNAs in different experiments? Methods of replication of the data presented in Fig4 is unclear.

      Minor points:

      1. Related to major point #3: I could not find much experimental info on how the hits from the screen were verified in materials and methods.
      2. The legend of Figure 4c is not describing what the plot is showing. Instead it tells the readers the authors' interpretation of the data.
      3. Figure S1b there is a typo

      Significance

      I think the idea of performing pooled screens coupled to microscopy is exciting and this approach has definitely more potential than the Craft-ID approach that the authors also discuss in their manuscript. In addition, the approach that is described in this manuscript is convincing and although the fact that the analysis part will require more work (to adapt the software to recognise different types of phenotypic readouts) in the future to make it accessible to the scientific community, the authors present sufficient evidence that the system can be robust. They also present some clever ideas such as to calculate enrichments with different photo-activation times (2sec vs 100ms) followed by separation of these populations by FACS.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to express our upmost gratitude to the three anonymous reviewers for their constructive and insightful comments on our manuscript. We broadly agree with all comments made and have uploaded a preliminary revised version with changes highlighted in bold. We now deal with each of the reviewer comments in turn.

      Reviewer #1

      L50-52: Can you predict where the unmapped read came from? Could viral infections be the source as in land plants?

      Having done a crude examination of unmapped reads, we couldn't find compelling evidence of them being of viral origin. The unmapped fraction in fact was in the same region as seen for other sRNA libraries in our lab which we found to occur for a number of reasons such as sequencing errors, incomplete assembly, differences between the sequenced lines and the reference line. Those all result in unmapped reads, which is also cause by since we employed a stringent mapping (0 mismatches).

      L67-68, which is the explanation?

      Thank you for querying this. After much closer inspection of the papers cited by Casas-Mollano et al. as evidence of the 23nt peak the evidence for the 23nt doesn't seem that strong and may even be a mistake on their part. Nonetheless, it is far from a critical piece of information for this paper and we have thus decided to remove this sentence.

      Fig 1D the reference to the A,C,G,U 5' should be re-positioned within Figure 1D panel space.

      Thanks, this has been addressed.

      Figure 3: it could be a supplementary figure based on the relevance given in the manuscript to this point.

      We agree, and have moved Fig3 to Supplement.

      *P5, line 107: while commenting on strand bias there seems to be a mistake in strong bias definition, it should be x 0.8, not "strong bias (0.2

      Thank you for pointing this out, we have now corrected this error. We have duly corrected it in the text.

      P5, line 110: marked changes regarding locus size are not as striking in my opinion, in particular log size 6 and following, which is not marked in the graph (the cut off between 6 and 8). Maybe this curve should be split into two distribution graphs based on some important features (as repetitiveness?) that might allow a better definition of cut-offs.

      Thank you for pointing this out. You are correct that the changes in the density distribution are not as striking for locus size. A great deal of deliberation on our part went into deciding what to do about this. In the end, we decided that for the size classes there was benefit in having several different classes with the understanding that having additional potentially redundant cut-offs would not adversely effect the analysis. In doing this, we were partially driven by the albeit subtle changes in the curve, but also by the desire to have size classes that were biologically relevant and informative. For example, a locus 3000nt captures the long tail. However, we neglected to fully explain these subtleties in our decision-making, something we have now rectified through some added explanation in the text. These choices were validated by the way size classes are differentially associated with different locus clusters in Figure 8.

      Fig 5: the legend has the C subfigure twice, the second should be D.

      Thank you for highlighting this. It has now been corrected.

      Table 1: I believe the data would be better presented in a plot, potentially something similar to the plot in Figure 1 A and B. The numbers are already presented in the supplementary spreadsheet.

      Thanks for pointing this out. We agree with this suggestion and have replaced Table 1 with a Figure (Fig 5) which is indeed a better way to present those results.

      Fig 6A: The boxplots regarding Stability of the clusters should be better described. What exactly does the y-axis in each "small plot" represent?

      Thank you for pointing this out, we understand that this isn't clear at the moment. Briefly, for this analysis we performed the clustering multiple times each time with a random sample of the loci (with replacement) of the same size as the original dataset. We then calculated the proportion of loci that retained their original clustering. We have clarified this in the figure legend and also elaborated on the approach in the methods section to ensure that it is better described.

      P6, line 142: analyses of stability and variance shows 7 as the optimal k, while gap statistics and NMI suggested 6 as the optimal. It is not clear why 6 was preferred. The MCA section in Methods is unclear regarding this point too.

      Thank you for querying this. The process of choosing the appropriate value of k is a complicated one and we appreciate that the explanation could be clearer. After your comment, we re-visited our decision-making process and were reassured that a k value of 6 rather than 7 was indeed appropriate. The stability plots in Fig. 6A start with k=2 and it can be clearly seen for k=6 that stability is comparatively high for dimensions 7-10. Indeed, k values of 2,3 and 6 seem to be the only feasible values. k=7 is fairly unstable for all dimensions from 1-8. We have done some rewording of the methods to hopefully make this clearer.

      Fig S2-S5: please check legends, they are identical, although they should cover examples of loci in LC2 through LC5. These figures are not cited in the text, only S1 and S2.

      Thanks for pointing this out. This is now corrected and we have referenced all figures in the main text.

      Fig 9: I suggest using different colors in density plots to ease interpretation. LC tracks could share a color and Gene, TEs, DNA meth, and All loci should have a different color each.

      A good suggestion - this has been replotted with different colours.

      Supplementary Files S1: The full-annotated locus map should be provided as a spreadsheet file or as a text (.csv) file, not as a pdf file.

      Thanks for pointing this out. We originally submitted this file as a gff format. We are not sure why this got converted. We will make sure this is going to be in appropriate format in the final form, especially having suffered from the pains of pdf tables ourselves in the past.

      I may be misunderstanding Fig. 6E, but it looks strange that the observed sum-of-squares is smooth, but the expected is not. Is it possible that the in-figure reference is inverted?

      Indeed, the colours were inverted. Thanks a lot for that spot, we have now swapped them around.

      Reviewer #2

      I am concerned that the methodology used does not adequately distinguish small RNA loci that are attributable to random RNA degradation products from loci that are truly fit the DCL / AGO paradigm. I think this is critical to maximize the utility of the annotations for the community. This issue was not directly addressed in the current version of the manuscript. There is cause for concern: 64% of the annotations overlap with protein-coding genes (lines 116-117), 55% with exons (line 118), and 41% of loci show strong strand bias (lines 123-124). These are all associations expected for breakdown products of mRNAs. Furthermore, only 11% of the loci were found to be dependent on CrDCL3 (line 123). Small RNA sequencing data from the other 2 DCL mutants are not yet available (line 211). One way that has been effective in angiosperms is to track the proportion of "DCL-sized" RNAs within all RNAs from each locus. Loci comprised of random degradation products will be single-stranded, generally touching exons, and have a very wide size distribution. In contrast, loci where the small RNAs are truly created by a DCL protein will have a very narrow size distribution. In any event, I think a strong effort to identify and flag small RNA loci that are less likely to be DCL / AGO silencing RNAs, and more likely to be degradation products, would be an important change to this study.

      Thank you for this very insightful comment which has helped us to reflect on the methodological approach. While it is likely that there are some RNA breakdown products picked-up in the sRNA sequencing, we do not think that the locus-map as a whole is undermined by this. For example 54% of loci have a predominance for 21-nt sRNAs and 18% for 20-nt sRNAs, so the majority of sRNA loci do have a predominance for a specific RNA size.

      However, your point does raise a very valid concern with implications for the interpretation of LC4. Although we posit some explanations for these loci (e.g. DCL-mediated sRNA production without an accessory protein to provide PAZ domain-like sRNA measurement), given the very strong strand bias and association with genic regions we do agree that there is a risk that these loci predominantly represent degradation fragments. Therefore, we have now reworded how we discuss LC4 in the discussion to reflect this. This also reveals a key advantage of the clustering approach in that should LC4 indead represent degradation products, they have been successfully grouped together into a seperate cluster such that they don't undermine the insights gained from the other locus clusters.

      One of the key results likely to be used by others is the final GFF3 file (Sup File S1). The Description fields in this file are extremely verbose. Do these load well on a genome browser? I suggest it might be good to store most of the information currently in the Description field in a separate flat file, and limit the GFF3 descriptions to key information (locus name, the LC group).

      Thank you for pointing this out. In a pursuit to share as many details as possible, we appreciate that this can be too verbose, as righlfully noticed here. In order to not compromise detail too much, we have created a second, toned down, version as csv which now includes essential details such as name, position and LC. As for the gff, we kept all details in since it loads quickly in a genome browser, but also into other tools such R in which those feature can be used as efficient filters.

      Sup Table S1 would be much more useful for future researchers if it had a column with the direct accession numbers for the raw sequencing libraries.

      We have included another table which includes direct accession number for ENA as well as numerous other meta data in Sup Table S6 i.e. "Supp_Table_S6_library_ENA_accession"

      Figures showing genome browser snapshots are too small; the text is mostly illegible on screen and when printed. This includes Figure 4 and Figures S1-S5.

      The snapshots have been improved to ensure better readability.

      Lines 67-68: This is unclear to me. Did the authors do Northerns? Please clarify / re-write.

      Thank you for querying this. After much closer inspection of the papers cited by Casas-Mollano et al. as evidence of the 23nt peak the evidence for the 23nt doesn't seem that strong and may even be a mistake on their part. Nonetheless, it is far from a critical piece of information for this paper and we have thus decided to remove this sentence.

      Figure 2B: X-axis label, perhaps change to "number of reads in library" for clarity.

      We agree and have changed it accordingly

      Figure 4 caption: The acronym "CRSL" should be defined.

      CRSL is now been duly defined in the manuscript

      Line 387: Reference #29 (line 509): There is not enough information here to find the data.

      We have used the appropriate bibtex code to reference this Zenodo share (https://zenodo.org/record/3862405/export/hx). The current cite format does somehow omit some information. We hope this will be fixed by the publisher but we have also provided the full DOI address in the “additional information” section just in-case. We will keep an eye on how it comes out.

      Style suggestion on title: What is "secret" about the genome? I didn't really understand that first part of the title. Perhaps consider revision to make it more factual and less literary. Just "A small RNA locus map for Chlamydomonas reinhardtii"?

      Thank you for this suggestion, we have adapted the title to make it more descriptive.

      Reviewer #3

      …the evolutionary implications are not clear. The authors state in the abstract that "These results are consistent with the idea that there was diversification in sRNA mechanisms after the evolutionary divergence of algae from higher plant lineages." Although in the end this may prove to be correct, the only species compared are Arabidopsis thaliana (as representative of land plants) and Chlamydomonas reinhardtii (as representative of green algae). With this very limited information it is not possible to infer the sRNA loci (much less sRNA mechanisms) in an ancestral species. It remains formally possible that an ancestral progenitor species had a greater diversity of sRNA loci that were subsequently lost in a selective manner in specific lineages. Moreover, the diversity of sRNA loci may not correlate strictly with the diversity of the RNAi machinery since, at least some loci, do not appear to be associated with RNAi components such as Dicer or Argonaute.

      Thank you for these insightful comments. As we followed a very similar methodological approach to that used to produce the Arabidopsis sRNA locus map published in Hardcastle et al. (2018), we wanted to take the opportunity to compare the results and build upon the ongoing discussion concerning the evolution of sRNA mechanisms in Chlamydomonas (e.g. Valli et al. 2016). Your point about the possibility of an ancestral progenitor with greater diversity that was then lost is very valid. You are also of course correct about the limitations to what can be concluded from this study and the limited comparisons that can be made. We see our approach as a useful tool for hypothesis generation which can be complemented by more in-depth exploration in the future. With this in mind, and taking on board your comments, we have elaborated on our discussion of the evolutionary implications of our study, which we hope now gives a more balanced account.

      I may have missed it but I could not find a table listing the specific sRNA loci assigned to each of the locus classes. It would be very useful to provide the class annotation of each sRNA locus in order to facilitate future analyses of sRNA biogenesis and function.

      That information was indeed missing, thanks for bringing it up. We have now included this in the gff file (column LC) as well as in another cleaner table (Supp_Table_S7_loci_class_annotation).

      Figures S2 to S5 have the same legend but they correspond to different loci. It would be useful to provide for each locus class, as supplementary figures, two examples of typical sRNA loci.

      Thanks for pointing this out, this was an error on our part, the captions have now been corrected. Unfortunately, due to the ongoing pandemic-related restrictions we were unable to run to get a genome browser session to run to this point to create more loci figures.

      If information is available, the paper would be strengthened by some locus class validation based on features not used to generate the classification.

      Thank you for this suggestion. In fact, not all annotation features were used predictively in the MCA and clustering process, and so these "supplementary" annotations as outlined in supplementary table S3 can provide some cross-validation. With that in mind, we have now included an additional heatmap as a supplementary figure which shows associations for some of these supplementary annotations as well as corresponding explanations in the text. Further validation is provided by the chromosome tracks in figure 9 showing the distinct genomic distributions of each locus cluster despite chromosomal location not being a factor in the clustering.

      Pg 5, line 108. I think you mean "strong bias (0.2 > x > 0.8)."

      Thank you for pointing this out, we have now corrected this error.

      Pg 7, Table 1. Some of the annotation features are obvious but some abbreviations may need clarification using footnotes.

      Table 1 has been replaced by the new Fig 5, annotation/abbreviations should now be more obvious.

      Pg 8, lines 156-157. This sentence is not clear. Additionally, the legends to Figures S2-S5 do not refer to LC2 paragon (CSRL003890).

      Thank you for pointing this out. We have now moved the reference to the paragons to earlier in the section where we introduce the six clusters. We hope this is now clearer.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript presents a detailed map of sRNA (precursor) loci in the green alga Chlamydomonas reinhardtii based on large volumes of sequencing data (145 sRNA libraries). The locus map based on a false discovery rate of less than 0.05 had 6164 loci, covering 4.1% of the Chlamydomonas reference genome. Individual loci were annotated based on both intrinsic features, such as sRNA size, 5'-nucleotide, strand bias and phasing pattern, and extrinsic features, such as sRNA expression, genotype and overlap with genomic attributes (e.g., genes, transposons, methylation levels).

      By using the intrinsic and extrinsic features of each sRNA locus and Multiple Correspondence Analysis (MCA) approaches, the sRNA loci were clustered into six distinct classes, referred to as locus class (LC) 1-6. This strategy is partly validated by the grouping of well-characterized Chlamydomonas miRNAs into the same cluster, LC3.

      As the authors state, this data-driven approach is valuable for hypothesis generation since (with the possible exception of LC3) the biogenesis and function of most sRNA loci (and of the corresponding locus classes) remain uncharacterized in Chlamydomonas. The analysis provides a framework to facilitate future characterization of the diverse types of sRNAs in this model algal system.

      However, the evolutionary implications are not clear. The authors state in the abstract that "These results are consistent with the idea that there was diversification in sRNA mechanisms after the evolutionary divergence of algae from higher plant lineages." Although in the end this may prove to be correct, the only species compared are Arabidopsis thaliana (as representative of land plants) and Chlamydomonas reinhardtii (as representative of green algae). With this very limited information it is not possible to infer the sRNA loci (much less sRNA mechanisms) in an ancestral species. It remains formally possible that an ancestral progenitor species had a greater diversity of sRNA loci that were subsequently lost in a selective manner in specific lineages. Moreover, the diversity of sRNA loci may not correlate strictly with the diversity of the RNAi machinery since, at least some loci, do not appear to be associated with RNAi components such as Dicer or Argonaute.

      Some specific comments:

      1.I may have missed it but I could not find a table listing the specific sRNA loci assigned to each of the locus classes. It would be very useful to provide the class annotation of each sRNA locus in order to facilitate future analyses of sRNA biogenesis and function.

      2.Figures S2 to S5 have the same legend but they correspond to different loci. It would be useful to provide for each locus class, as supplementary figures, two examples of typical sRNA loci.

      3.If information is available, the paper would be strengthened by some locus class validation based on features not used to generate the classification.

      4.Pg 5, line 108. I think you mean "strong bias (0.2 > x > 0.8)."

      5.Pg 7, Table 1. Some of the annotation features are obvious but some abbreviations may need clarification using footnotes.

      6.Pg 8, lines 156-157. This sentence is not clear. Additionally, the legends to Figures S2-S5 do not refer to LC2 paragon (CSRL003890).

      Significance

      Chlamydomonas reinhardtii is a model unicellular green alga, the lineage of which diverged from land plants approximately one billion years ago. Chlamydomonas encodes a great number of diverse small RNAs. However, the biogenesis and function of the majority of these sRNAs are not known. By grouping sRNA loci into specific classes (based on intrinsic and extrinsic features), this manuscript provides a framework that will facilitate the future characterization of sRNAs in Chlamydomonas and, very likely, in other algal species. This information may also contribute to our understanding of the evolution of sRNA loci within eukaryotes.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This manuscript describes the annotation of small RNA-prodicing loci from the green alga Chlamydomonas reinhardtii. A large number of small RNA-sequencing datasets were anlayzed and used to create genome-wide annotations of small RNA-producing loci. These loci were annotated based on several features, and then classified into six major groups based on these features.

      Major comments:

      Are the key conclusions convincing? --> Yes.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? --> No

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary to evaluate the paper as it is, and do not ask authors to open new lines of experimentation. --> Yes, additional analyses should be conducted, see itemized list below.

      Are the suggested experiments realistic for the authors? It would help if you could add an estimated cost and time investment for substantial experiments. --> Perhaps a few weeks to a month of analysis and revision time.

      Are the data and the methods presented in such a way that they can be reproduced? --> Yes.

      Are the experiments adequately replicated and statistical analysis adequate? --> Yes.

      SPECIFIC COMMENTS:

      1.I am concerned that the methodology used does not adequately distinguish small RNA loci that are attributable to random RNA degradation products from loci that are truly fit the DCL / AGO paradigm. I think this is critical to maximize the utility of the annotations for the community. This issue was not directly addressed in the current version of the manuscript. There is cause for concern: 64% of the annotations overlap with protein-coding genes (lines 116-117), 55% with exons (line 118), and 41% of loci show strong strand bias (lines 123-124). These are all associations expected for breakdown products of mRNAs. Furthermore, only 11% of the loci were found to be dependent on CrDCL3 (line 123). Small RNA sequencing data from the other 2 DCL mutants are not yet available (line 211). One way that has been effective in angiosperms is to track the proportion of "DCL-sized" RNAs within all RNAs from each locus. Loci comprised of random degradation products will be single-stranded, generally touching exons, and have a very wide size distribution. In contrast, loci where the small RNAs are truly created by a DCL protein will have a very narrow size distribution. In any event, I think a strong effort to identify and flag small RNA loci that are less likely to be DCL / AGO silencing RNAs, and more likely to be degradation products, would be an important change to this study.

      MINOR COMMENTS:

      2.One of the key results likely to be used by others is the final GFF3 file (Sup File S1). The Description fields in this file are extremely verbose. Do these load well on a genome browser? I suggest it might be good to store most of the information currently in the Description field in a separate flat file, and limit the GFF3 descriptions to key information (locus name, the LC group).

      3.Sup Table S1 would be much more useful for future researchers if it had a column with the direct accession numbers for the raw sequencing libraries.

      4.Figures showing genome browser snapshots are too small; the text is mostly illegible on screen and when printed. This includes Figure 4 and Figures S1-S5.

      5.Lines 67-68: This is unclear to me. Did the authors do Northerns? Please clarify / re-write.

      6.Figure 2B: X-axis label, perhaps change to "number of reads in library" for clarity.

      7.Figure 4 caption: The acronym "CRSL" should be defined.

      8.Line 387: Reference #29 (line 509): There is not enough information here to find the data.

      9.Style suggestion on title: What is "secret" about the genome? I didn't really understand that first part of the title. Perhaps consider revision to make it more factual and less literary. Just "A small RNA locus map for Chlamydomonas reinhardtii"?

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.:

      This study provides a genome-wide annotation of small RNA-producing loci from Chlamydomonas reinhardtii. This will serve as a use data resource for researchers working with this model system. The results overall confirm what is known from previous studies of Chlamy small RNAs : They are rather distinct from angiosperm small RNAs and from animal small RNAs.

      Place the work in the context of the existing literature (provide references, where appropriate).:

      This may be the first study to provide a genome-wide annotation (as opposed to a focused effort) for Chalmy small RNA populations.

      State what audience might be interested in and influenced by the reported findings:

      Chlamy researchers, especially those interested in gene silencing and genome annotations, and small RNA specialists with interest in annotations and in wide phylogenetic comparisons.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. :

      Plant microRNAs, siRNAS, genetics, and genomics.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study, Müller, Matthews, Vali, and Baulcombe have used data-driven machine learning approaches to annotated and classified sRNA loci of Chlamydomonas reinhardtii. I have found the manuscript very interesting and a handy handbook for the appropriate way to annotate sRNA loci in different organisms. I believe this is not only a great resource paper on its own, but it also contains essential information to start understanding how Chalmydomonas silence TEs without a RdDM pathway. I have a few comments that may help to improve the manuscript.

      -L50-52: Can you predict where the unmapped read came from? Could viral infections be the source as in land plants? -L67-68, which is the explanation?

      • Fig 1D the reference to the A,C,G,U 5' should be re-positioned within Figure 1D panel space. -Figure 3: it could be a supplementary figure based on the relevance given in the manuscript to this point. -P5, line 107: while commenting on strand bias there seems to be a mistake in strong bias definition, it should be x < 0.2 and x > 0.8, not "strong bias (0.2 < x < 0.8)", as in the text. -P5, line 110: marked changes regarding locus size are not as striking in my opinion, in particular log size 6 and following, which is not marked in the graph (the cut off between 6 and 8). Maybe this curve should be split into two distribution graphs based on some important features (as repetitiveness?) that might allow a better definition of cut-offs.
      • Fig 5: the legend has the C subfigure twice, the second should be D.
      • Table 1: I believe the data would be better presented in a plot, potentially something similar to the plot in Figure 1 A and B. The numbers are already presented in the supplementary spreadsheet.
      • Fig 6A: The boxplots regarding Stability of the clusters should be better described. What exactly does the y-axis in each "small plot" represent?
      • P6, line 142: analyses of stability and variance shows 7 as the optimal k, while gap statistics and NMI suggested 6 as the optimal. It is not clear why 6 was preferred. The MCA section in Methods is unclear regarding this point too.
      • Fig S2-S5: please check legends, they are identical, although they should cover examples of loci in LC2 through LC5. These figures are not cited in the text, only S1 and S2. -Fig 9: I suggest using different colors in density plots to ease interpretation. LC tracks could share a color and Gene, TEs, DNA meth, and All loci should have a different color each. -Supplementary Files S1: The full-annotated locus map should be provided as a spreadsheet file or as a text (.csv) file, not as a pdf file. -I may be misunderstanding Fig. 6E, but it looks strange that the observed sum-of-squares is smooth, but the expected is not. Is it possible that the in-figure reference is inverted?

      Significance

      This is a very interesting aticle. It may looks a little bit technical but is provide useful information for people studying Chlamydomonas. In addition, the way the authors approached the annotation of sRNA is very meticulous and elegant. I would suggest people exploring small RNAs in non-model organisms to use this article as a handbook of how to annotate sRNAs. In this particular way the artivle will be of interest beyong the Chlamydomonas, and event plant, research field.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer1

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript is clearly written and the figures appropriate and informative. Some descriptions of data analyses are a little dense but reflect what would appear long hard efforts on the part of the authors to identify and control for possible sources of misinterpretation due to sensitivities of parameters in their fitness model. The authors efforts to retest interactions under non-competition conditions allay fears of most concerns that I would have. One problem though that I could not see explicitly addressed was that of potential effects of interactions between methotrexate and the other conditions and how this is controlled for. Specifically, I could be argued that the fact that a particular PPI is observed under a specific condition could have more to do with a synthetic effect of treatment of cells with a drug plus methotrexate. Is this controlled for and how? I raise this because in a chemical genetic screen for fitness it was shown that methotrexate is particularly promiscuous for drug-drug interactions (Hillenmeyer ME ,et al. Science 2008). I tried to think of how this works but couldn't come up with anything immediately. I'd appreciate if the authors would take a crack at resolving this issue. Otherwise I have no further concerns about the manuscript.

      We thank the reviewer for the kind comments. We agree with the reviewer’s point that methotrexate could be interacting with drugs or other perturbagens, similar to how the chosen nitrogen source, carbon source, or other growth conditions may interact with a drug. However, the methotrexate concentration is held constant across all conditions, as is the rest of the media components such as the nitrogen and carbon source (with the exception of the raffinose perturbation). Any interactions with methotrexate, or other media components, is undetectable without systematically varying all components for all stressors. Therefore, we use the typical experimental design of measuring molecular variation from a reference, holding invariant media components (such as methotrexate, glucose, or vitamins) fixed between conditions. This is a general practice, and we describe that every condition contains methotrexate on page 3, line 10.

      The library was grown under mild methotrexate selection in 9 environments for 12-18 generations in serial batch culture, diluting 1:8 every ~3 generations, with a bottleneck population size greater than 2 x 109 cells (Table S1).

      We also list the full details of each environment in Table S1.

      Reviewer #1 (Significance (Required)):

      Lui et al expand on previous work from the Levy group to explore a massive in vivo protein interactome in the yeast S. cerevisiae. They achieve this by performing screens cross 9 growth conditions, which, with replication, results in a total of 44 million measurements. Interpreting their results based on a fitness model for pooled growth under methotrexate selection, they make the key observation that there is a vastly expanded pool of protein-protein interactions (PPI) that are found under only one or two condition compared to a more limited set of PPI that are found under a broad set of conditions (mutable versus immutable interactors). The authors show that this dichotomy suggests some important features of proteins and their PPIs that raise important questions about functionality and evolution of PPIs. Among these are that mutable PPIs are enriched for cross-compartmental, high disorder and higher rates of evolution and subcellular localization of proteins to chromatin, suggesting roles in gene regulation that are associated with cellular responses to new conditions. At the same time these interactions are not enriched for changes in abundance. These results are in contrast to those of immutable PPIs, which seem to form a core background noise, more determined by changes in abundance than what the authors interpret must be post-translational processes that may drive, for instance, changes in subcellular localization resulting in appearance of PPIs under specific conditions. The authors are also able to address a couple of key issues about protein interactomes, including the controversial Party-date Hub hypothesis of Vidal, in which they could now affirm support for this hypothesis based on their results and notably negative correlation of PPIs to protein abundance for mutable PPIs. Finally, they also addressed the problem of predicting the upper limit of PPIs in yeast, showing the remarkable results that it may be no more than about 2 times the number of proteins expressed by yeast. Such an upper limit is profoundly important to modelling cellular network complexity and, if it holds up, could define a general upper limit on organismal complexity.

      This manuscript is a very important contribution to understanding dynamics of molecular networks in living cells and should be published with high priority.

      Reviewer 2

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Report on Liu et al. "A large accessory protein interactome is rewired across environments"

      Liu et al. use a mDHFR-based, pooled barcode sequencing / competitive growth / mild methotrexate selection method to investigate changes of PPI abundance of 1.6 million protein pairs across different 9 growth conditions. Because most PPI screens aim to identify novel PPIs in standard growth conditions, the currently known yeast PPI network may be incomplete. The key concept is to define immutable" PPIs that are found in all conditions and "mutable" PPIs that are present in only some conditions.

      The assay identified 13764 PPIs across the 9 conditions, using optimized fitness cut offs. Steady PPI i.e. across all environments, were identified in membrane compartments and cell division. Processes associated with the chromosome, transcription, protein translation, RNA processing and ribosome regulation were found to change between conditions. Mutable PPIs are form modules as topological analyses reveals.

      Interestingly, a correlation on intrinsic disorder and PPI mutability was found and postulated as more flexible in the conformational context, while at the same time they are formed by less abundant proteins.

      I appreciate the trick to use homodimerization as an abundance proxy to predict interaction between heterodimers (of proteins that homodimerize). This "mass-action kinetics model" explains the strength of 230 out of 1212 tested heterodimers.

      A validation experiment of the glucose transporter network was performed and 90 "randomly chosen" PPIs that were present in the SD environment were tested in NaCl (osmotic stress) and Raffinose (low glucose) conditions through recording optical density growth trajectories. Hxt5 PPIs stayed similar in the tested conditions, supported by the current knowledge that Hxt5 is highly expressed in stationary phase and under salt stress. In Raffinose, Hxt7, previously reported to increase the mRNA expression, lost most PPIs indicating that other factors might influence Hxt7 PPIs.

      **Points for consideration:**

      *) A clear definition of mutable and immutable is missing, or could not be found e.g. at page 4 second paragraph.

      We thank the reviewer for pointing this out. We have now added better definition of mutable and immutable on line 19 page 4:

      We partitioned PPIs by the number of environments in which they were identified and defined PPIs at opposite ends of this spectrum as “mutable” PPIs (identified in only 1-3 environments) and “immutable” (identified in 8-9 environments).

      *) Approximately half of the PPIs have been identified in one environment. Many of those mutable PPIs were detected in the 16{degree sign}C condition. Is there an explanation for the predominance of this specific environment? What are these PPIs about?

      The reviewer is correct that ~40% of the PPIs identified in only one environment were found in the 16 ℃ environment. One reason for this could be technical: the positive predictive value (PPV) is the lowest amongst the conditions (16 ℃: 31.6%, mean: 57%, Table SM6). It must be noted, however, that PPVs are calculated using reference data that has generally been collected in standard growth conditions. So, it might be expected that the most divergent environment from standard growth conditions (resulting in the most differences in PPIs) would result in a lower PPV in our study even if the true frequency of false positives was equivalent across environments. We have attempted to be transparent about the quality of the data in each environment by reporting PPVs and other metrics in Table SM6. However, we suspect that the large number of PPIs unique to 16 ℃ is due in part to the fact that it causes the largest changes in the protein interactome, and believe that it should be included, even at the risk of lowering the overall quality of the data. The main reason for this is that this data is likely to contain valuable information about how the cell copes with this stress. For example, we find, but do not highlight in the manuscript, that 16 ℃-specific PPIs contain two major hubs (DID4: 285 PPIs involved in endocytosis and vacuolar trafficking, and DED1: 102 PPIs involved in translation), both of which are reported to be associated with cold adaptation in yeast (Hilliker et al., 2011; Isasa et al., 2015).

      To assess whether the potentially higher false-positive rate in 16 ℃ could be impacting our conclusions related to PPI network organization and features of immutable and mutable PPIs, we repeated these analyses leaving out the 16 ℃ data and found that our main conclusions did not change. This new analysis is now presented in Figure S8 and described on page 5, line 10.

      Finally, we used a pair of more conservative PPI calling procedures that either identified PPIs with a low rate of false positives across all environments (FPR

      We have also added references to other panels in Figure S8 throughout the manuscript, where appropriate.

      *) 50 % overall retest validation rate is fair and reflects a value comparable to other large-scale approaches. However what is the actual variation, e.g. between mutable PPIs and immutable or between condition. e.g. at 16{degree sign}C.

      We validated 502 PPIs present in the SD environment and an additional 36 PPIs in the NaCl environment. As the reviewer suggests, we do indeed observe differences in the validation rate across mutability bins. This data is reported in Figures 3B and S6B, and we use this information to provide a confidence score for each PPI on page 5, line 4.

      To better estimate how the number of PPIs changes with PPI mutability, we used these optical density assays to model the validation rate as a function of the mean PPiSeq fitness and the number of environments in which a PPI is detected. This accurate model (Spearman's r =0.98 between predicted and observed, see Methods) provided confidence scores (predicted validation rates) for each PPI (Table S5) and allowed us to adjust the true positive PPI estimate in each mutability bin. Using this more conservative estimate, we still found a preponderance of mutable PPIs (Figure S6E).

      The validation rate in NaCl is similar to SD (39%, 14/36), suggesting that validation rates do not vary excessively across environments. Because validation experiments are time consuming (we performed 6 growth experiments per PPI), performing a similar scale of validations in all environments as in SD would be resource intensive. Insead, we report a number of metrics (true positive rate, false positive rate, positive predictive value) in Table SM6 using large positive and random reference sets. We believe these metrics are sufficient for readers to compare the quality of data across environments.

      *) What is the R correlation cutoff for PPIs explained in the mass equilibrium model vs. not explained?

      We do not use an R correlation cutoff to assess if a PPI is explained by the mass-action equilibrium model. We instead rely on ordinary least-squares regression as detailed in the methods on page 68, line 13.

      ...we used ordinary least-squares linear regression in R to fit a model of the geometric mean of the homodimer signals multiplied by a free constant and plus a free intercept. Significantly explained heterodimer PPIs were judged by a significant coefficient (FDR 0.05, single-test). This criteria was used to identify PPIs for which protein expression does or does not appear to play as significant of a role as other post-translational mechanisms.

      The first criterion identifies a quantitative fit to the model of variation being related. The second criterion is used to filter out PPIs for which the relationship appears to be explained by more than just the homodimer signals. This approach is more stringent, but we believe this is the most appropriate statistical test to assess fit to this linear model.

      *) 90 "randomly chosen" PPIs for validation. It needs to be demonstrated that these interaction are a random subset otherwise is could also mean cherry picked interactions.

      We selected 90 of the 284 glucose transport-related PPIs for validation using the “sample” function in R (replace = FALSE). We have now included text that describes this on page 63, line 3 in the supplementary methods:

      Diploids (PPIs) on each plate were randomly picked using the “sample” function in R (replace = FALSE) from PPIs that meet specific requirements.

      *) Figure 4 provides interesting correlations with the goal to reveal properties of mutable and less mutable PPIs. PPIs detected in the PPIseq screen can partially be correlated to co-expression (4A) as well as co-localization. Does it make sense to correlate the co-expression across number of conditions? Are the expression correlation condition specific. In this graph it could be that expression correlation stems from condition 1 and 2 and the interaction takes place in 4 and 5 still leading to the same conclusion ... Is the picture of the co-expression correlation similar when you simply look at individual environments like in S4A?

      We use co-expression mutual rank scores from the COXPRESdb v7.3 database (Obayashi et al., 2019). These mutual rank scores are derived from a broad set of 3593 environmental perturbations that are not limited to the environments we tested here. By using this data, we are asking if co-expression in general is correlated with mutability and report that it is in Figure 4A. We thank the reviewer for pointing out that this was not clear and have now added text to clarify that the co-expression analysis is derived from external data on page 6, line 7.

      We first asked whether co-expression is indeed a predictor of PPI mutability and found that it is: co-expression mutual rank (which is inversely proportional to co-expression across thousands of microarray experiments) declined with PPI mutability (Figures 4A and S11) (Obayashi and Kinoshita, 2009; Obayashi et al., 2019).

      The new figure S11 examines how the co-expression mutual rank changes with PPI mutability for PPIs identified in each environment, as the reviewer suggested. For each environment, we find the same general pattern as in Figure 4A (which considers PPIs from all environments).

      *) Figure 4C: Interesting, how dependent are the various categories?

      It is well known that many of these categories are correlated (e.g. mRNA expression level and protein abundance, and deletion fitness effect and genetic interaction degree). However, we believe it is most valuable to report the correlation of each category with PPI mutability independently in Figures 4C and S12, since similar correlations with related categories provide more confidence in our conclusions.

      *) Figure 4 F: When binned in the number of environments in which the PPI was found, the distribution peaks at 6 environments and decreases with higher and lower number of environments. The description /explanation in the text clearly says something else.

      We reported on page 7, line 15:

      We next used logistic regression to determine what features may underlie a good or poor fit to the model (Figure S14C) and found that PPI mutability was the best predictor, with more mutable PPIs being less frequently explained (Figure 4F). Unexpectedly, mean protein abundance was the second best predictor, with high abundance predicting a poor fit to the model, particularly for less mutable PPIs (Figure S14D and S14E).

      As the reviewer notes, Figure 4F shows that the percent of heterodimers explained by the model does appear to decrease for PPIs observed in the most environments. We suspect that the reviewer is correct that something more complicated is going on. One possibility is that extraordinarily stable PPIs (stable in all conditions) would have less quantitative variation in protein or PPI abundance across environments. If this is true, it would be statistically difficult to fit the mass action kinetics model for these PPIs (lower signal relative to noise), thereby resulting in the observed dip.

      A second possibility is that multiple correlated factors are associated with contributing positively or negatively to a good fit, and the simplicity of Figure 4F or a Pearson correlation does not capture this interplay. This second possibility is why we used multivariate logistic regression (Figure S14C) to dissect the major contributing factors. In the text quote above, we report that high abundance is anti-correlated with a good fit to the model (S14D, S14E). Figure 4C shows that immutable PPIs tend to be formed from highly abundant proteins. One possible explanation is that highly abundant proteins saturate the binding sites of their binding partners, breaking from the assumptions of mass action kinetics model. We have now changed the word “limit” to “saturate” on page 7, line 22 to make this concept more explicit.

      Taken together, these data suggest that mutable PPIs are subject to more post-translational regulation across environments and that high basal protein abundance may saturate the binding sites of their partners, limiting the ability of gene expression changes to regulate PPIs.

      A third possibility is that the dip is simply due to noise. Given the complexity of the possible explanations and our uncertainty about which is more likely, we chose to leave this description out of the main text and focus on the major finding: that PPIs detected in more environments are generally associated with a better fit to the mass action kinetics model.

      *) Figure 6: I apologize, but for my taste this is not a final figure 6 for this study. Investigation of different environments increases the PPI network in yeast, yes, yet it is very well known that a saturation is reached after testing of several conditions, different methods and even screening repetition (sampling). It does not represent an important outcome. Move to suppl or remove.

      We included Figure 6 to summarize and illustrate the path forward from this study. This is an explicit reference to impactful computational analyses done using earlier generations of data to assess the completeness of single-condition interaction networks (Hart et al., 2006; Sambourg and Thierry-Mieg, 2010). Here, we are extending PPI measurement of millions-scale networks across multiple environments, and are using this figure to extend these concepts to multi-condition screens. We agree that the property of saturation in sampling is well known, but it is surprising that we can quantitatively estimate convergence of this expanded condition-specific PPI set using only 9 conditions. Thus, we agree with Reviewer 1 that these are “remarkable results” and that the “upper limit is profoundly important to modelling cellular network complexity and, if it holds up, could define a general upper limit on organismal complexity.” We think this is an important advance of the paper, and this figure is useful to stimulate discussion and guide future work.

      Reviewer #2 (Significance (Required)):

      Liu et al. increase the current PPI network in yeast and offer a substantial dataset of novel PPIs seen in specific environments only. This resource can be used to further investigate the biological meaning of the PPI changes. The data set is compared to previous DHFR providing some sort of quality benchmarking. Mutable interactions are characterized well. Clearly a next step could be to start some "orthogonal" validation, i.e. beyond yeast growth under methotrexate treatment.

      The reviewer makes a great point that we also discuss on page 9, line 33:

      While we used reconstruction of C-terminal-attached mDHFR fragments as a reporter for PPI abundance, similar massively parallel assays could be constructed with different PCA reporters or tagging configurations to validate our observations and overcome false negatives that are specific to our reporter. Indeed, the recent development of “swap tag” libraries, where new markers can be inserted C- or N-terminal to most genes (Weill et al., 2018; Yofe et al., 2016), in combination with our iSeq double barcoder collection (Liu et al., 2019), makes extension of our approach eminently feasible.

      Reviewer 3

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary**

      The manuscript "A large accessory protein interactome is rewired across environments" by Liu et al. scales up a previously-described method (PPiSeq) to test a matrix of ~1.6 million protein pairs of direct protein-protein interactions in each of 9 different growth environments.

      While the study found a small fraction of immutable PPIs that are relatively stable across environments, the vast majority were 'mutable' across environments. Surprisingly, PPIs detected only in one environment made up more than 60% of the map. In addition to a false positive fraction that can yield apparently-mutable interactions, retest experiments demonstrate (not surprisingly) that environment-specificity can sometimes be attributed to false-negatives. The study authors predict that the whole subnetwork within the space tested will contain 11K true interactions.

      Much of environment-specific rewiring seemed to take place in an 'accessory module', which surrounds the core module made of mostly immutable PPIs. A number of interesting network clustering and functional enrichment analyses are performed to characterize the network overall and 'mutable' interactions in particular. The study report other global properties such as expression level, protein abundance and genetic interaction degree that differ between mutable and immutable PPIs. One of the interesting findings was evidence that many environmentally mutable PPI changes are regulated post-translationally. Finally, authors provide a case study about network rewiring related to glucose transport.

      **Major issues**

      -The results section should more prominently describe the dimensions of the matrix screen, both in terms of the set of protein pairs attempted and the set actually screened (I think this was 1741 x 1113 after filtering?). More importantly, the study should acknowledge in the introduction that this was NOT a random sample of protein pairs, but rather focused on pairs for which interaction had been previously observed in the baseline condition. This major bias has a potentially substantial impact on many of the downstream analyses. For example, any gene which was not expressed under the conditions of the original Tarrasov et al. study on which the screening space was based will not have been tested here. Thus, the study has systematically excluded interactions involving proteins with environment-dependent expression, except where they happened to be expressed in the single Tarrasov et al. environment. Heightened connectivity within the 'core module' may result from this bias, and if Tarrasov et al had screened in hydrogen peroxide (H2O2) instead of SD media, perhaps the network would have exhibited a code module in H2O2 decorated by less-densely connected accessory modules observed in other environments. The paper should clearly indicate which downstream analyses have special caveats in light of this design bias.

      We have now added text the matrix dimensions of our study on page 3, line 3:

      To generate a large PPiSeq library, all strains from the protein interactome (mDHFR-PCA) collection that were found to contain a protein likely to participate in at least one PPI (1742 X 1130 protein pairs), (Tarassov et al., 2008) were barcoded in duplicate using the double barcoder iSeq collection (Liu et al., 2019), and mated together in a single pool (Figure 1A). Double barcode sequencing revealed that the PPiSeq library contained 1.79 million protein pairs and 6.05 million double barcodes (92.3% and 78.1% of theoretical, respectively, 1741 X 1113 protein pairs), with each protein pair represented by an average of 3.4 unique double barcodes (Figure S1).

      We agree with the reviewer that our selection of proteins from a previously identified set can introduce bias in our conclusions. Our research question was focused on how PPIs change across environments, and thus we chose to maximize our power to detect PPI changes by selecting a set of protein pairs that are enriched for PPIs. We have now added a discussion of the potential caveats of this choice to the discussion on page 9, line 4:

      Results presented here and elsewhere (Huttlin et al., 2020) suggest that PPIs discovered under a single condition or cell type are a small subset of the full protein interactome emergent from a genome. We sampled nine diverse environments and found approximately 3-fold more interactions than in a single environment. However, the discovery of new PPIs began to saturate, indicating that most condition-specific PPIs can be captured in a limited number of conditions. Testing in many more conditions and with PPI assays orthogonal to PPiSeq will undoubtedly identify new PPIs, however a more important outcome could be the identification of coordinated network changes across conditions. Using a test set of ~1.6 million (of ~18 million) protein pairs across nine environments, we find that specific parts of the protein interactome are relatively stable (core modules) while others frequently change across environments (accessory modules). However, two important caveats of our study must be recognized before extrapolating these results to the entire protein interactome across all environment space. First, we tested for interactions between a biased set of proteins that have previously been found to participate in at least one PPI as measured by mDHFR-PCA under standard growth conditions (Tarassov et al., 2008). Thus, proteins that are not expressed under standard growth conditions are excluded from our study, as are PPIs that are not detectable by mDHFR-PCA or PPiSeq. It is possible that a comprehensive screen using multiple orthogonal PPI assays would alter our observations related to the relative dynamics of different regions of the protein interactome and the features of mutable and immutable PPIs. Second, we tested a limited number of environmental perturbations under similar growth conditions (batch liquid growth). It is possible that more extreme environmental shifts (e.g. growth as a colony, anaerobic growth, pseudohyphal growth) would introduce new accessory modules or alter the mutability of the PPIs we detect. Nevertheless, results presented here provide a new mechanistic view of how the cell changes in response to environmental challenges, building on the previous work that describes coordinated responses in the transcriptome (Brauer et al., 2007; Gasch et al., 2000) and proteome (Breker et al., 2013; Chong et al., 2015).

      -Related to the previous issue, a quick look at the proteins tested (if I understood them correctly) showed that they were enriched for genes encoding the elongator holoenzyme complex, DNA-directed RNA polymerase I complex, membrane docking and actin binding proteins, among other functional enrichments. Genes related to DNA damage (endonuclease activity and transposition), were depleted. It was unclear whether the functional enrichment analyses described in the paper reported enrichments relative to what would be expected given the bias inherent to the tested space?

      We did two functional enrichment analyses in this study: network density within Gene Ontology terms (related to Figure 2) and gene ontology enrichment of network communities (related to Figure 3). For both analyses, we performed comparisons to proteins included in PPiSeq library. This is described in the Supplementary Materials on page 63, line 35:

      To estimate GO term enrichment in our PPI network, we constructed 1000 random networks by replacing each bait or prey protein that was involved in a PPI with a randomly chosen protein from all proteins in our screen. This randomization preserves the degree distribution of the network.

      And on page 66, line 38:

      The set of proteins used for enrichment comparison are proteins that are involved in at least one PPI as determined by PPiSeq.

      -Re: data quality. To the study's great credit, they incorporated positive and random reference sets (PRS and RRS) into the screen. However, the results from this were concerning: Table SM6 shows that assay stringency was set such that between 1 and 3 out of 67 RRS pairs were detected. This specificity would be fine for an assay intended for retest or validate previous hits, where the prior probability of a true interaction is high, but in large-scale screening the prior probability of true interactions that are detectable by PCA is much lower, and a higher specificity is needed to avoid being overwhelmed by false positives. Consider this back of the envelope calculation: Let's say that the prior probability of true interaction is 1% as the authors' suggest (pg 49, section 6.5), and if PCA can optimistically detect 30% of these pairs, then the number of true interactions we might expect to see in an RRS of size 67 is 1% * 30% * 67 = 0.2 . This back of the envelope calculation suggests that a stringency allowing 1 hit in RRS will yield 80% [ (1 - 0.2) / 1 ] false positives, and a stringency allowing 3 hits in RRS will yield 93% [ (3 - 0.2) / 3] false positives. How do the authors reconcile these back of the envelope calculations from their PRS and RRS results with their estimates of precision?

      We thank the reviewer for bringing up with this issue. We included positive and random reference sets (PRS:70 protein pairs, RRS:67 protein pairs) to benchmark our PPI calling (Yu et al., 2008). The PRS reference lists PPIs that have been validated by multiple independent studies and is therefore likely to represent true PPIs that are present in some subset of the environments we tested. For the PRS set, we found a rate of detection that is comparable to other studies (PPiSeq in SD: 28%, Y2H and yellow fluorescent protein-PCA: ~20%) (Yu et al., 2008). The RRS reference, developed ten years ago, is randomly chosen protein pairs for which there was no evidence of a PPI in the literature at the time (mostly in standard growth conditions). Given the relatively high rate of false negatives in PPI assays, this set may in fact contain some true PPIs that have yet to be discovered. We could detect PPIs for four RRS protein pairs in our study, when looking across all 9 environments. Three of these (Grs1_Pet10, Rck2_Csh1, and YDR492W_Rpd3) could be detected in multiple environments (9, 7, and 3, respectively), suggesting that their detection was not a statistical or experimental artifact of our bar-seq assay (see table below derived from Table S4). The remaining PPI detected in the RRS, was only detected in SD (standard growth conditions) but with a relatively high fitness (0.35), again suggesting its detection was not a statistical or experimental artifact. While we do acknowledge it is possible that these are indeed false positives due to erroneous interactions of chimeric DHFR-tagged versions of these proteins, the small size of the RRS combined with the fact that some of the protein pairs could be true PPIs, did not give us confidence that this rate (4 of 70) is representative of our true false positive rate. To determine a false positive rate that is less subject to biases stemming from sampling of small numbers, we instead generated 50 new, larger random reference sets, by sampling for each set ~ 60,000 protein pairs without a reported PPI in BioGRID. Using these new reference sets, we found that the putative false positive rate of our assay is generally lower than 0.3% across conditions for each of the 50 reference sets. We therefore used this more statistically robust measure of the false positive rate to estimate positive predictive values (PPV = 62%, TPR = 41% in SD). We detail these statistical methods in Section 6 of the supplementary methods and report all statistical metrics in Table SM6.

      PPI

      Environment_number

      SD

      H2O2

      Hydroxyurea

      Doxorubicin

      Forskolin

      Raffinose

      NaCl

      16℃

      FK506

      Rck2_Csh1

      7

      0.35

      0.35

      0

      0.20

      0.54

      0.74

      0

      0.17

      0.59

      Grs1_Pet10

      9

      0.44

      0.39

      0.34

      0.25

      0.65

      1.19

      0.2

      0.16

      0.95

      YDR492W_Rpd3

      3

      0

      0.18

      0

      0

      0

      0

      0

      0.17

      0.61

      Mrps35_Bub3

      1

      0.35

      0

      0

      0

      0

      0

      0

      0

      0

      Positive_control

      9

      1

      0.8

      0.73

      0.62

      1.4

      2.44

      0.4

      0.28

      1.8

      Table. Mean fitness in each environment

      -Methods for estimating precision and recall were not sufficiently well described to assess. Precision vs recall plots would be helpful to better understand this tradeoff as score thresholds were evaluated.

      We describe in detail our approach to calling PPIs in section 6.6 of the supplementary methods, including Table SM6, and Figures SM3, SM4, SM6, and now Figure SM5. We identified positive PPIs using a dynamic threshold that considers the mean fitness and p-value in each environment. For each dynamic threshold, we estimated the precision and recall based on the reference sets (described supplementary methods in section 6.5). We then chose the threshold with the maximal Matthews correlation coefficient (MCC) to obtain the best balance between precision and recall. We have now added an additional plot (Figure SM5) that shows the precision and recall for the chosen dynamic threshold in each environment.

      -Within the tested space, the Tarassov et al map and the current map could each be compared against a common 'bronze standard' (e.g. literature curated interactions), at least for the SD map, to have an idea about how the quality of the current map compares to that of the previous PCA map. Each could also be compared with the most recent large-scale Y2H study (Yu et al).

      We thank the reviewer for this suggestion. We have now added a figure panel (Figure S4) that compares PPiSeq in SD (2 replicates) to mDHFR PCA (Tarassov et al., 2008), Y2H (Yu et al., 2008), and our newly constructed ‘bronze standard’ high-confidence positive reference set (PRS, supplementary method section 6.4).

      • Experimental validation of the network was done by conventional PCA. However, it should be noted that this is a form of technical replication of the DHFR-based PCA assay, and not a truly independent validation. Other large-scale yeast interaction studies (e.g., Yu et al, Science 2008) have assessed a random subset of observed PPIs using an orthogonal approach, calibrated using PRS and RRS sets examined via the same orthogonal method, from which overall performance of the dataset could be determined.

      We appreciate the reviewer’s perspective, since orthogonal validation experiments have been a critical tool to establish assay performance following early Y2H work. We know from careful work done previously that modern orthogonal assays have a low cross validation rate ((Yu et al., 2008) and that they tend to be enriched for PPIs in different cellular compartments (Jensen and Bork, 2008), indicating that high false negative rates are the likely explanation. High false negative rates have been confirmed here and elsewhere using positive reference sets (e.g. Y2H 80%, PCA 80%, PPiSeq 74% using the PRS in (Yu et al., 2008)). Therefore, the expectation is that PPiSeq, as with other assays, will have a low rate of validation using an orthogonal assay -- although we would not know if this rate is 10%, 30% or somewhere in between without performing the work. However, the exact number -- whether it be 10% or 30% -- has no practical impact on the main conclusions of this study (focused on network dynamics rather than network enumeration). Neither does that number speak to the confidence in our PPI calls, since a lower number may simply be due to less overlap in the sets of PPIs that are callable by PPiSeq and another assay. Our method uses bar-seq to extend an established mDHFR-PCA assay (Tarassov et al., 2008). The validations we performed were aimed at confirming that our sequencing, barcode counting, fitness estimation, and PPI calling protocols were not introducing excessive noise relative to mDHFR-PCA that resulted in a high number of PPI miscalls. Confirming this, we do indeed find a high rate of validation by lower throughput PCA (50-90%, Figure 3B). Finally, we do include independent tests of the quality of our data by comparing it to positive and random reference sets from literature curated data. We find that our assay performs extremely well (PPV > 61%, TPR > 41%) relative to other high-throughput assays.

      -The Venn diagram in Figure 1G was not very informative in terms of assessing the quality of data. It looks like there is a relatively little overlap between PPIs identified in standard conditions (SD media) in the current study and those of the previous study using a very similar method. Is there any way to know how much of this disagreement can be attributed to each screen being sub-saturation (e.g. by comparing replica screens) and what fraction to systematic assay or environment differences?

      We have now added a figure panel (Figure S4) that compares PPiSeq in SD (2 replicates) to mDHFR-PCA (Tarassov et al., 2008), Y2H (Yu et al., 2008), and our newly constructed ‘bronze standard’ high-confidence positive reference sets (PRS, supplementary methods section 6.4). We find that SD replicates have an overlap coefficient of 79% with each other, ~45% with mDHFR-PCA, ~45% the ‘bronze standard’ PRS, and ~13% with Y2H. Overlap coefficients between the SD replicates and mDHFR-PCA are much higher than those found between orthologous methods ((Yu et al., 2008), indicating that these two assays are identifying a similar set of PPIs. We do note that PPiSeq and mDHFR-PCA do screen for PPIs under different growth conditions (batch liquid growth vs. colonies on agar), so some fraction of the disagreement is due to environmental differences. PPIs that overlap between the two PPiSeq SD replicates are more likely to be found in mDHFR-PCA, PRS, and Y2H, indicating that PPIs identified in a single SD replicate are more likely to be false positives. However, we do find (a lower rate of) overlaps between PPIs identified in only one SD replicate and other methods, suggesting that a single PPiSeq replicate is not finding all discoverable PPIs.

      -In Figure S5C, the environment-specificity rate of PPIs might be inflated due to the fact that authors only test for the absence of SD hits in other conditions, and the SD condition is the only condition that has been sampled twice during the screening. What would be the environment-specific verification rate if sample hits from each environment were tested in all environments? This seems important, as robustly detecting environment-specific PPIs is one of the key points of the study.

      We use PPIs found in the SD environment to determine the environment-specificity because this provides the most conservative (highest) estimate of the number of PPIs found in other environments that were not detectable by our bar-seq assay. To identify PPIs in the SD environment, we pooled fitness estimates across the two replicates (~ 4 fitness estimates per replicate, ~ 8 total). The higher number of replicates results in a reduced rate of false positives (an erroneous fitness estimate has less impact on a PPI call), meaning that we are more confident that PPIs identified in SD are true positives. Because false positives in one environment (but not other environments) are likely to erroneously contribute to the environment-specificity rate, choosing the environment with the lowest rate of false positives (SD) should result in the lowest environment-specificity rate (highest estimate of PPIs found in other environments that were not detectable by our bar-seq assay).

      **Minor issues**

      -Re: "An interaction between the proteins reconstitutes mDHFR, providing resistance to the drug methotrexate and a growth advantage that is proportional to the PPI abundance" (pg 2). It may be more accurate to say "monotonically related" than "proportional" here. Fig 2 from the cited Freschi et al ref does suggests linearity with colony size over a wide range of inferred complex abundances, but non-linear at low complex abundance. Also note that Freschi measured colony area which is not linear with exponential growth rate nor with cell count.

      We agree with the reviewer and have changed “proportional” to “monotonically related” on page 2, line 41.

      -Re: "Using putatively positive and negative reference sets, we empirically determined a statistical threshold for each environment with the best balance of precision and recall (positive predictive value (PPV) > 61% in SD media, Methods, section 6)." (pg 3). Should state the recall at this PPV.

      We agree with the reviewer and have added the recall (41%) in the main text (line 26, page3).

      Using putatively positive and negative reference sets, we empirically determined a statistical threshold for each environment with the best balance of precision and recall (positive predictive value (PPV) > 61% and true positive rate > 41% in SD media, Methods, section 6).

      -Authors could discuss the extent to which related methods (e.g. PMID: 28650476, PMID: 27107012, PMID: 29165646, PMID: 30217970) would be potentially suitable for screening in different environments.

      We have now added a reference to a barcode-based Y2H study that examined interactions between yeast proteins to the introduction on page 2, line 2:

      Yet, little is known about how PPI networks reorganize on a global scale or what drives these changes. One challenge is that commonly-used high-throughput PPI screening technologies are geared toward PPI identification (Gavin et al., 2002; Ito et al., 2001; Tarassov et al., 2008; Uetz et al., 2000; Yu et al., 2008, Yachie et al., 2016), not a quantitative analysis of relative PPI abundance that is necessary to determine if changes in the PPI network are occurring. The murine dihydrofolate reductase (mDHFR)‐based protein-fragment complementation assay (PCA) provides a viable path to characterize PPI abundance changes because it is a sensitive test for PPIs in the native cellular context and at native protein expression levels (Freschi et al., 2013; Remy and Michnick, 1999; Tarassov et al., 2008).

      We have excluded the references to other barcode-based Y2H studies that reviewer mentions because they test heterologous proteins within yeast, and the effect of perturbations to yeast on these proteins would be difficult to interpret in the context of our questions. The yeast protein Y2H study, although a wonderful approach and paper, would also not be an appropriate method to examine how PPI networks change across environments because protein fusions are not expressed under their endogenous promoters and must be transported to, in many cases, a non-native compartment (cell nucleus) to be detected. Rather than explicitly discuss the caveats of this particular approach, we have instead chosen to discuss why we use PCA.

      • the term "mutable" is certainly appropriate according to the dictionary definition of changeable. The authors may wish to consider though, that in a molecular biology context the term evokes changeability by mutation (a very interesting but distinct topic). Maybe another term (environment-dependent interactions or ePPIs?) would be clearer. Of course this is the authors' call.

      We thank the reviewer for this suggestion, and have admittedly struggled with the terminology. For clarity of presentation, we strived to have a single word that describes the property of a PPI that is at the core of this manuscript -- how frequently a PPI is found across environments. However, the most descriptive words come with preloaded meanings in PPI research (e.g. transient, stable, dynamic), as does “mutable” with another research field. We are, quite frankly, open to suggestions from the reviewers or editors for a more appropriate word that does not raise similar objections.

      -Some discussion is warranted about the phenomenon that a PPI that is unchanged in abundance could appear to change because of statistical significance thresholds that differ between screens. This would be a difficult question for any such study, and I don't think the authors need to solve it, but just to discuss.

      We agree with the reviewer that significance thresholds could be impacting our interpretations and discuss this idea at length on page 4, line 23 of the Results. This section has been modified to include an additional analysis (excluding 16 ℃ data) in response to another reviewer’s comment:

      Immutable PPIs were likely to have been previously reported by colony-based mDHFR-PCA or other methods, while the PPIs found in the fewest environments were not. One possible explanation for this observation is that previous PPI assays, which largely tested in standard laboratory growth conditions, and variations thereof, are biased toward identification of the least mutable PPIs. That is, since immutable PPIs are found in nearly all environments, they are more readily observed in just one. However, another possible explanation is that, in our assay, mutable PPIs are more likely to be false positives in environment(s) in which they are identified or false negatives in environments in which they are not identified. To investigate this second possibility, we first asked whether PPIs present in very few environments have lower fitnesses, as this might indicate that they are closer to our limit of detection. We found no such pattern: mean fitnesses were roughly consistent across PPIs found in 1 to 6 conditions, although they were elevated in PPIs found in 7-9 conditions (Figure S6A). To directly test the false-positive rate stemming from pooled growth and barcode sequencing, we validated randomly selected PPIs within each mutability bin by comparing their optical density growth trajectories against controls (Figures 3B). We found that mutable PPIs did indeed have lower validation rates in the environment in which they were identified, yet putative false positives were limited to ~50%, and, within a bin, do not differ between PPIs that have been previously identified and those that have been newly discovered by our assay (Figure S65B). We also note mutable PPIs might be more sensitive to environmental differences between our large pooled PPiSeq assays and clonal 96-well validation assays, indicating that differences in validation rates might be overstated. To test the false-negative rate, we assayed PPIs identified in only SD by PPiSeq across all other environments by optical density growth and found that PPIs can be assigned to additional environments (Figure S6C). However, the number of additional environments in which a PPI was detected was generally low (2.5 on average), and the interaction signal in other environments was generally weaker than in SD (Figure S6D). To better estimate how the number of PPIs changes with PPI mutability, we used these optical density assays to model the validation rate as a function of the mean PPiSeq fitness and the number of environments in which a PPI is detected. This accurate model (Spearman's r =0.98 between predicted and observed, see Methods) provided confidence scores (predicted validation rates) for each PPI (Table S5) and allowed us to adjust the true positive PPI estimate in each mutability bin. Using this more conservative estimate, we still found a preponderance of mutable PPIs (Figure S6E). Finally, we used a pair of more conservative PPI calling procedures that either identified PPIs with a low rate of false positives across all environments (FPR

      We later examine major conclusions of our study using more conservative calling procedures, and find that they are consistent. On page 6, line 14:

      Both the co-expression and co-localization patterns were also apparent in our higher confidence PPI sets (Figures S7B, and S7C, S8B, S8C ), indicating that they are not caused by different false positive rates between the mutability bins.

      And on page 6, line 19:

      We binned proteins by their PPI degree, and, within each bin, determined the correlation between the mutability score and another gene feature (Figure 4C and S12A, Table S8) (Costanzo et al., 2016; Finn et al., 2014; Gavin et al., 2006; Holstege et al., 1998; Krogan et al., 2006; Levy and Siegal, 2008; Myers et al., 2006; Newman et al., 2006; Östlund et al., 2010; Rice et al., 2000; Stark et al., 2011; Wapinski et al., 2007; Ward et al., 2004; Yang, 2007; Yu et al., 2008). These correlations were also calculated using our higher confidence PPI sets, confirming results from the full data set (Figures S7D and, S7E, S8D, S8E). We found that mutable hubs (> 15 PPIs) have more genetic interactions, in agreement with predictions from co-expression data (Bertin et al., 2007; Han et al., 2004), and that their deletion tends to cause larger fitness defects.

      -More discussion would be helpful about the idea that immutability may to some extent favor interactions that PCA is better able to detect (possibly including membrane proteins?)

      We agree with the reviewer and now added a discussion of this potential caveats to the discussion on page 9, line 4:

      Results presented here and elsewhere (Huttlin et al., 2020) suggest that PPIs discovered under a single condition or cell type are a small subset of the full protein interactome emergent from a genome. We sampled nine diverse environments and found approximately 3-fold more interactions than in a single environment. However, the discovery of new PPIs began to saturate, indicating that most condition-specific PPIs can be captured in a limited number of conditions. Testing in many more conditions and with PPI assays orthogonal to PPiSeq will undoubtedly identify new PPIs, however a more important outcome could be the identification of coordinated network changes across conditions. Using a test set of ~1.6 million (of ~18 million) protein pairs across nine environments, we find that specific parts of the protein interactome are relatively stable (core modules) while others frequently change across environments (accessory modules). However, two important caveats of our study must be recognized before extrapolating these results to the entire protein interactome across all environment space. First, we tested for interactions between a biased set of proteins that have previously been found to participate in at least one PPI as measured by mDHFR-PCA under standard growth conditions (Tarassov et al., 2008). Thus, proteins that are not expressed under standard growth conditions are excluded from our study, as are PPIs that are not detectable by mDHFR-PCA or PPiSeq. It is possible that a comprehensive screen using multiple orthogonal PPI assays would alter our observations related to the relative dynamics of different regions of the protein interactome and the features of mutable and immutable PPIs. Second, we tested a limited number of environmental perturbations under similar growth conditions (batch liquid growth). It is possible that more extreme environmental shifts (e.g. growth as a colony, anaerobic growth, pseudohyphal growth) would introduce new accessory modules or alter the mutability of the PPIs we detect. Nevertheless, results presented here provide a new mechanistic view of how the cell changes in response to environmental challenges, building on the previous work that describes coordinated responses in the transcriptome (Brauer et al., 2007; Gasch et al., 2000) and proteome (Breker et al., 2013; Chong et al., 2015).

      -Re: "As might be expected, we also found that mutable hubs, but not non-hubs, are more likely to participate in multiple protein complexes than less mutable proteins." (pg 6) This is a cool result. To what extent was this result driven by members of one or two complexes? If so, it would worth noting them.

      We thank the reviewer for this question. We have now included Figue S13, which shows the number and size of protein complexes that underlie the finding that mutable hubs are more likely to participate in multiple protein complexes. We find that proteins in our screen that participate in multiple complexes are distributed over a wide range of complexes, indicating that this observation is not driven by one or two complexes. On page 6, line 34:

      As might be expected, we also found that mutable hubs, but not non-hubs, are more likely to participate in multiple protein complexes than less mutable proteins (Figures S13A-C) (Costanzo et al., 2016).

      -Re: "Borrowing a species richness estimator from ecology (Jari Oksanen et al., 2019), we estimate that there are ~10,840 true interactions within our search space across all environments, ~3-fold more than are detected in SD (note difference to Figure 3, which counts observed PPIs)." (pg 8) Should note that this only allows estimation of the number of interactions that are detectable by PCA methods. Previous work (Braun et al, 2019) showed that every known protein interaction assay (including PCA approaches) can only detect a fraction of bona fide interactions.

      We agree with the reviewer and have modified the discussion to make this point explicit on page 9, line 4:

      Results presented here and elsewhere (Huttlin et al., 2020) suggest that PPIs discovered under a single condition or cell type are a small subset of the full protein interactome emergent from a genome. We sampled nine diverse environments and found approximately 3-fold more interactions than in a single environment. However, the discovery of new PPIs began to saturate, indicating that most condition-specific PPIs can be captured in a limited number of conditions. Testing in many more conditions and with PPI assays orthogonal to PPiSeq will undoubtedly identify new PPIs, however a more important outcome could be the identification of coordinated network changes across conditions.

      We continue in this paragraph to discuss the implications:

      Using a test set of ~1.6 million (of ~18 million) protein pairs across nine environments, we find that specific parts of the protein interactome are relatively stable (core modules) while others frequently change across environments (accessory modules). However, two important caveats of our study must be recognized before extrapolating these results to the entire protein interactome across all environment space. First, we tested for interactions between a biased set of proteins that have previously been found to participate in at least one PPI as measured by mDHFR-PCA under standard growth conditions (Tarassov et al., 2008). Thus, proteins that are not expressed under standard growth conditions are excluded from our study, as are PPIs that are not detectable by mDHFR-PCA or PPiSeq. It is possible that a comprehensive screen using multiple orthogonal PPI assays would alter our observations related to the relative dynamics of different regions of the protein interactome and the features of mutable and immutable PPIs.

      -Re: "This analysis shows that the number of PPIs present across all environments is much larger than the number observed in a single condition, but that it is feasible to discover most of these new PPIs by sampling a limited number of conditions." (pg 8). The main point is surely correct, but it is worth noting that extrapolation to the number of true interactions depends on the nine chosen environments being representative of all environments. The situation could change under more extreme, e.g., anaerobic, conditions.

      We agree with the reviewer and make this point explicit, continuing from the paragraph quoted above on page 9, line 22:

      Second, we tested a limited number of environmental perturbations under similar growth conditions (batch liquid growth). It is possible that more extreme environmental shifts (e.g. growth as a colony, anaerobic growth, pseudohyphal growth) would introduce new accessory modules or alter the mutability of the PPIs we detect. Nevertheless, results presented here provide a new mechanistic view of how the cell changes in response to environmental challenges, building on the previous work that describes coordinated responses in the transcriptome (Brauer et al., 2007; Gasch et al., 2000) and proteome (Breker et al., 2013; Chong et al., 2015).

      -It stands to reason that proteins expressed in all conditions will yield less mutable interactions, if 'mutability' is primarily due to expression change at the transcriptional level. They should at least discuss that measuring mRNA levels could resolve questions about this. Could use Waern et al G3 2013 data (H202, SD, HU, NaCl) to predict the dynamic interactome purely by node removal, and see how conclusions would change

      We agree with the reviewer that mRNA abundance could potentially be used as a proxy for protein abundance and have added this point on page 10, line 28:

      Here we use homodimer abundance as a proxy for protein abundance. However, genome-wide mRNA abundance measures could be used as a proxy for protein abundance or protein abundance could be measured directly in the same pool (Levy et al., 2014) by, for example, attaching a full length mDHFR to each gene using “swap tag” libraries mentioned above (Weill et al., 2018; Yofe et al., 2016).

      However, using mRNA abundance as a proxy for protein abundance in this study has several important caveats that would make interpretation difficult. First, mRNA and protein abundance correlate, but not perfectly (R2 = 0.45) (Lahtvee et al., 2017), and our findings suggest that post-translational regulation may be important to driving PPI changes. Second, mRNA abundance measures are for a single time point, while our PPI measures coarse grain over a growth cycle (lag, exponential growth, diauxic shift, saturation). Although we may be able to take multiple mRNA measures across the cycle, time delays between changes in mRNA and protein levels, combined with the fact that we do not know when a PPI is occurring or most prominent over the cycle, would pose a significant challenge to making any claims that PPI changes are driven by changes in protein abundance. We instead chose to focus on a subset of proteins (homodimers) where abundance measures can be coarse grained in the same way as PPI measures. In the above quote, we point to a potential method by which this can be done for all proteins. We also point to how a continuous culturing design could be used to better determine how protein (or mRNA proxy) abundance impacts PPI abundance on page 10, line 6:

      Finally, our assays were performed across cycles of batch growth meaning that changes in PPI abundance across a growth cycle (e.g. lag, exponential growth, saturation) are coarse grained into one measurement. While this method potentially increases our chance of discovering a diverse set of PPIs, it might have an unpredictable impact on the relationship between fitness and PPI abundance (Li et al., 2018). To overcome these issues, strains containing natural or synthetic PPIs with known abundances and intracellular localizations could be spiked into cell pools to calibrate the relationship between fitness and PPI abundance in each environment. In addition, continuous culturing systems may be useful for refining precision of growth-based assays such as ours.

      -The analysis showing that many interactions are likely due to post-translational modifications is very interesting, but caveats should be discussed. Where heterodimers do not fit the expression-level dependence model, some cases of non-fitting may simply be due to measurement error or non-linearity in the relationship between abundance and fitness.

      We show the measurement error in Figures 1, S2, S3. While we agree with the reviewer that measurement error is a general caveat for all results reported, we do not feel that it is necessary to point to that fact in this particular case, which uses a logistic regression to report that PPI mutability was the best predictor of fit to the expression-level dependence model. We discuss the non-linearity caveat on page 9, line 41:

      Our assay detected subtle fitness differences across environments (Fig S5B and S5C), which we used as a rough estimate for changes in relative PPI abundance. While it would be tempting to use fitness as a direct readout of absolute PPI abundance within a cell, non-linearities between fitness and PPI abundance may be common and PPI dependent. For example, the relative contribution of a reconstructed mDHFR molecule to fitness might diminish at high PPI abundances (saturation effects) and fitness differences between PPIs may be caused, in part, by differences in how accessible a reconstructed mDHFR molecule is to substrate. In addition, environmental shifts might impact cell growth rate, initiate a stress response, or result in other unpredictable cell effects that impact the selective pressure of methotrexate and thereby fitness (Figure S2 and S3).

      -Line numbers would have been helpful to note more specific minor comments

      We are sorry for this inconvenience. We have added line numbers in our revised manuscript.

      -Sequence data should be shared via the Short-Read Archive.

      The raw sequencing data have been uploaded to the Short-Read Archive. We mentioned it in the Data and Software Availability section on page 68, line 41.

      Raw barcode sequencing data are available from the NIH Sequence Read Archive as accession PRJNA630095 (https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP259652).

      Reviewer #3 (Significance (Required)):

      Knowledge of protein-protein interactions (PPIs) provides a key window on biological mechanism, and unbiased screens have informed global principles underlying cellular organization. Several genome-scale screens for direct (binary) interactions between yeast proteins have been carried out, and while each has provided a wealth of new hypotheses, each has been sub-saturation. Therefore, even given multiple genome-scale screens our knowledge of yeast interactions remains incomplete. Different assays are better suited to find different interactions, and it is now clear that every assay evaluated thus far is only capable (even in a saturated screen) of detecting a minority of true interactions. More relevant to the current study, no binary interaction screen has been carried out at the scale of millions of protein pairs outside of a single 'baseline' condition.

      The study by Liu et al is notable from a technology perspective in that it is one of several recombinant-barcode approaches have been developed to multiplex pairwise combinations of two barcoded libraries. Although other methods have been demonstrated at the scale of 1M protein pairs, this is the first study using such a technology at the scale of >1M pairs across multiple environments.

      A limitation is that this study is not genome-scale, and the search space is biased towards proteins for which interactions were previously observed in a particular environment. This is perhaps understandable, as it made the study more tractable, but this does add caveats to many of the conclusions drawn. These would be acceptable if clearly described and discussed. There were also questions about data quality and assessment that would need to be addressed.

      Assuming issues can be addressed, this is a timely study on an important topic, and will be of broad interest given the importance of protein interactions and the status of S. cerevisiae as a key testbed for systems biology.

      *Reviewers' expertise:* Interaction assays, next-generation sequencing, computational genomics. Less able to assess evolutionary biology aspects.

      References

      Brauer, M.J., Huttenhower, C., Airoldi, E.M., Rosenstein, R., Matese, J.C., Gresham, D., Boer, V.M., Troyanskaya, O.G., and Botstein, D. (2007). Coordination of Growth Rate, Cell Cycle, Stress Response, and Metabolic Activity in Yeast. Mol. Biol. Cell 19, 352–367.

      Breker, M., Gymrek, M., and Schuldiner, M. (2013). A novel single-cell screening platform reveals proteome plasticity during yeast stress responses. J. Cell Biol. 200, 839–850.

      Chong, Y.T., Koh, J.L.Y., Friesen, H., Kaluarachchi Duffy, S., Cox, M.J., Moses, A., Moffat, J., Boone, C., and Andrews, B.J. (2015). Yeast Proteome Dynamics from Single Cell Imaging and Automated Analysis. Cell 161, 1413–1424.

      Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., and Brown, P.O. (2000). Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Mol. Biol. Cell 11, 4241–4257.

      Hart, G.T., Ramani, A.K., and Marcotte, E.M. (2006). How complete are current yeast and human protein-interaction networks? Genome Biol. 7, 120.

      Hilliker, A., Gao, Z., Jankowsky, E., and Parker, R. (2011). The DEAD-box protein Ded1 modulates translation by the formation and resolution of an eIF4F-mRNA complex. Mol. Cell 43, 962–972.

      Isasa, M., Suñer, C., Díaz, M., Puig-Sàrries, P., Zuin, A., Bichmann, A., Gygi, S.P., Rebollo, E., and Crosas, B. (2015). Cold Temperature Induces the Reprogramming of Proteolytic Pathways in Yeast. J. Biol. Chem. jbc.M115.698662.

      Jensen, L.J., and Bork, P. (2008). Not Comparable, But Complementary. Science 322, 56–57.

      Lahtvee, P.-J., Sánchez, B.J., Smialowska, A., Kasvandik, S., Elsemman, I.E., Gatto, F., and Nielsen, J. (2017). Absolute Quantification of Protein and mRNA Abundances Demonstrate Variability in Gene-Specific Translation Efficiency in Yeast. Cell Syst. 4, 495-504.e5.

      Obayashi, T., Kagaya, Y., Aoki, Y., Tadaka, S., and Kinoshita, K. (2019). COXPRESdb v7: a gene coexpression database for 11 animal species supported by 23 coexpression platforms for technical evaluation and evolutionary inference. Nucleic Acids Res. 47, D55–D62.

      Sambourg, L., and Thierry-Mieg, N. (2010). New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size. BMC Bioinformatics 11, 605.

      Tarassov, K., Messier, V., Landry, C.R., Radinovic, S., Molina, M.M.S., Shames, I., Malitskaya, Y., Vogel, J., Bussey, H., and Michnick, S.W. (2008). An in Vivo Map of the Yeast Protein Interactome. Science 320, 1465–1470.

      Yu, H., Braun, P., Yıldırım, M.A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N., et al. (2008). High-Quality Binary Protein Interaction Map of the Yeast Interactome Network. Science 322, 104–110.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      The manuscript "A large accessory protein interactome is rewired across environments" by Liu et al. scales up a previously-described method (PPiSeq) to test a matrix of ~1.6 million protein pairs of direct protein-protein interactions in each of 9 different growth environments.

      While the study found a small fraction of immutable PPIs that are relatively stable across environments, the vast majority were 'mutable' across environments. Surprisingly, PPIs detected only in one environment made up more than 60% of the map. In addition to a false positive fraction that can yield apparently-mutable interactions, retest experiments demonstrate (not surprisingly) that environment-specificity can sometimes be attributed to false-negatives. The study authors predict that the whole subnetwork within the space tested will contain 11K true interactions.

      Much of environment-specific rewiring seemed to take place in an 'accessory module', which surrounds the core module made of mostly immutable PPIs. A number of interesting network clustering and functional enrichment analyses are performed to characterize the network overall and 'mutable' interactions in particular. The study report other global properties such as expression level, protein abundance and genetic interaction degree that differ between mutable and immutable PPIs. One of the interesting findings was evidence that many environmentally mutable PPI changes are regulated post-translationally. Finally, authors provide a case study about network rewiring related to glucose transport.

      Major issues

      -The results section should more prominently describe the dimensions of the matrix screen, both in terms of the set of protein pairs attempted and the set actually screened (I think this was 1741 x 1113 after filtering?). More importantly, the study should acknowledge in the introduction that this was NOT a random sample of protein pairs, but rather focused on pairs for which interaction had been previously observed in the baseline condition. This major bias has a potentially substantial impact on many of the downstream analyses. For example, any gene which was not expressed under the conditions of the original Tarrasov et al. study on which the screening space was based will not have been tested here. Thus, the study has systematically excluded interactions involving proteins with environment-dependent expression, except where they happened to be expressed in the single Tarrasov et al. environment. Heightened connectivity within the 'core module' may result from this bias, and if Tarrasov et al had screened in hydrogen peroxide (H2O2) instead of SD media, perhaps the network would have exhibited a code module in H2O2 decorated by less-densely connected accessory modules observed in other environments. The paper should clearly indicate which downstream analyses have special caveats in light of this design bias.

      -Related to the previous issue, a quick look at the proteins tested (if I understood them correctly) showed that they were enriched for genes encoding the elongator holoenzyme complex, DNA-directed RNA polymerase I complex, membrane docking and actin binding proteins, among other functional enrichments. Genes related to DNA damage (endonuclease activity and transposition), were depleted. It was unclear whether the functional enrichment analyses described in the paper reported enrichments relative to what would be expected given the bias inherent to the tested space?

      -Re: data quality. To the study's great credit, they incorporated positive and random reference sets (PRS and RRS) into the screen. However, the results from this were concerning: Table SM6 shows that assay stringency was set such that between 1 and 3 out of 67 RRS pairs were detected. This specificity would be fine for an assay intended for retest or validate previous hits, where the prior probability of a true interaction is high, but in large-scale screening the prior probability of true interactions that are detectable by PCA is much lower, and a higher specificity is needed to avoid being overwhelmed by false positives. Consider this back of the envelope calculation: Let's say that the prior probability of true interaction is 1% as the authors' suggest (pg 49, section 6.5), and if PCA can optimistically detect 30% of these pairs, then the number of true interactions we might expect to see in an RRS of size 67 is 1% 30% 67 = 0.2 . This back of the envelope calculation suggests that a stringency allowing 1 hit in RRS will yield 80% [ (1 - 0.2) / 1 ] false positives, and a stringency allowing 3 hits in RRS will yield 93% [ (3 - 0.2) / 3] false positives. How do the authors reconcile these back of the envelope calculations from their PRS and RRS results with their estimates of precision?

      -Methods for estimating precision and recall were not sufficiently well described to assess. Precision vs recall plots would be helpful to better understand this tradeoff as score thresholds were evaluated.

      -Within the tested space, the Tarassov et al map and the current map could each be compared against a common 'bronze standard' (e.g. literature curated interactions), at least for the SD map, to have an idea about how the quality of the current map compares to that of the previous PCA map. Each could also be compared with the most recent large-scale Y2H study (Yu et al).

      • Experimental validation of the network was done by conventional PCA. However, it should be noted that this is a form of technical replication of the DHFR-based PCA assay, and not a truly independent validation. Other large-scale yeast interaction studies (e.g., Yu et al, Science 2008) have assessed a random subset of observed PPIs using an orthogonal approach, calibrated using PRS and RRS sets examined via the same orthogonal method, from which overall performance of the dataset could be determined.

      -The Venn diagram in Figure 1G was not very informative in terms of assessing the quality of data. It looks like there is a relatively little overlap between PPIs identified in standard conditions (SD media) in the current study and those of the previous study using a very similar method. Is there any way to know how much of this disagreement can be attributed to each screen being sub-saturation (e.g. by comparing replica screens) and what fraction to systematic assay or environment differences?

      -In Figure S5C, the environment-specificity rate of PPIs might be inflated due to the fact that authors only test for the absence of SD hits in other conditions, and the SD condition is the only condition that has been sampled twice during the screening. What would be the environment-specific verification rate if sample hits from each environment were tested in all environments? This seems important, as robustly detecting environment-specific PPIs is one of the key points of the study.

      Minor issues

      -Re: "An interaction between the proteins reconstitutes mDHFR, providing resistance to the drug methotrexate and a growth advantage that is proportional to the PPI abundance" (pg 2). It may be more accurate to say "monotonically related" than "proportional" here. Fig 2 from the cited Freschi et al ref does suggests linearity with colony size over a wide range of inferred complex abundances, but non-linear at low complex abundance. Also note that Freschi measured colony area which is not linear with exponential growth rate nor with cell count. -Re: "Using putatively positive and negative reference sets, we empirically determined astatistical threshold for each environment with the best balance of precision and recall (positive predictive value (PPV) > 61% in SD media, Methods, section 6)." (pg 3). Should state the recall at this PPV.

      -Authors could discuss the extent to which related methods (e.g. PMID: 28650476, PMID: 27107012, PMID: 29165646, PMID: 30217970) would be potentially suitable for screening in different environments.

      • the term "mutable" is certainly appropriate according to the dictionary definition of changeable. The authors may wish to consider though, that in a molecular biology context the term evokes changeability by mutation (a very interesting but distinct topic). Maybe another term (environment-dependent interactions or ePPIs?) would be clearer. Of course this is the authors' call.

      -Some discussion is warranted about the phenomenon that a PPI that is unchanged in abundance could appear to change because of statistical significance thresholds that differ between screens. This would be a difficult question for any such study, and I don't think the authors need to solve it, but just to discuss.

      -More discussion would be helpful about the idea that immutability may to some extent favor interactions that PCA is better able to detect (possibly including membrane proteins?)

      -Re: "As might be expected, we also found that mutable hubs, but not non-hubs, are more likely to participate in multiple protein complexes than less mutable proteins." (pg 6) This is a cool result. To what extent was this result driven by members of one or two complexes? If so, it would worth noting them.

      -Re: "Borrowing a species richness estimator from ecology (Jari Oksanen et al., 2019), we estimate that there are ~10,840 true interactions within our search space across all environments, ~3-fold more than are detected in SD (note difference to Figure 3, which counts observed PPIs)." (pg 8) Should note that this only allows estimation of the number of interactions that are detectable by PCA methods. Previous work (Braun et al, 2019) showed that every known protein interaction assay (including PCA approaches) can only detect a fraction of bona fide interactions.

      -Re: "This analysis shows that the number of PPIs present across all environments is much larger than the number observed in a single condition, but that it is feasible to discover most of these new PPIs by sampling a limited number of conditions." (pg 8). The main point is surely correct, but it is worth noting that extrapolation to the number of true interactions depends on the nine chosen environments being representative of all environments. The situation could change under more extreme, e.g., anaerobic, conditions.

      -It stands to reason that proteins expressed in all conditions will yield less mutable interactions, if 'mutability' is primarily due to expression change at the transcriptional level. They should at least discuss that measuring mRNA levels could resolve questions about this. Could use Waern et al G3 2013 data (H202, SD, HU, NaCl) to predict the dynamic interactome purely by node removal, and see how conclusions would change

      -The analysis showing that many interactions are likely due to post-translational modifications is very interesting, but caveats should be discussed. Where heterodimers do not fit the expression-level dependence model, some cases of non-fitting may simply be due to measurement error or non-linearity in the relationship between abundance and fitness.

      -Line numbers would have been helpful to note more specific minor comments

      -Sequence data should be shared via the Short-Read Archive.

      Significance

      Knowledge of protein-protein interactions (PPIs) provides a key window on biological mechanism, and unbiased screens have informed global principles underlying cellular organization. Several genome-scale screens for direct (binary) interactions between yeast proteins have been carried out, and while each has provided a wealth of new hypotheses, each has been sub-saturation. Therefore, even given multiple genome-scale screens our knowledge of yeast interactions remains incomplete. Different assays are better suited to find different interactions, and it is now clear that every assay evaluated thus far is only capable (even in a saturated screen) of detecting a minority of true interactions. More relevant to the current study, no binary interaction screen has been carried out at the scale of millions of protein pairs outside of a single 'baseline' condition.

      The study by Liu et al is notable from a technology perspective in that it is one of several recombinant-barcode approaches have been developed to multiplex pairwise combinations of two barcoded libraries. Although other methods have been demonstrated at the scale of 1M protein pairs, this is the first study using such a technology at the scale of >1M pairs across multiple environments.

      A limitation is that this study is not genome-scale, and the search space is biased towards proteins for which interactions were previously observed in a particular environment. This is perhaps understandable, as it made the study more tractable, but this does add caveats to many of the conclusions drawn. These would be acceptable if clearly described and discussed. There were also questions about data quality and assessment that would need to be addressed.

      Assuming issues can be addressed, this is a timely study on an important topic, and will be of broad interest given the importance of protein interactions and the status of S. cerevisiae as a key testbed for systems biology.

      Reviewers' expertise: Interaction assays, next-generation sequencing, computational genomics. Less able to assess evolutionary biology aspects.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Report on Liu et al. "A large accessory protein interactome is rewired across environments" Liu et al. use a mDHFR-based, pooled barcode sequencing / competitive growth / mild methotrexate selection method to investigate changes of PPI abundance of 1.6 million protein pairs across different 9 growth conditions. Because most PPI screens aim to identify novel PPIs in standard growth conditions, the currently known yeast PPI network may be incomplete. The key concept is to define immutable" PPIs that are found in all conditions and "mutable" PPIs that are present in only some conditions. The assay identified 13764 PPIs across the 9 conditions, using optimized fitness cut offs. Steady PPI i.e. across all environments, were identified in membrane compartments and cell division. Processes associated with the chromosome, transcription, protein translation, RNA processing and ribosome regulation were found to change between conditions. Mutable PPIs are form modules as topological analyses reveals.

      Interestingly, a correlation on intrinsic disorder and PPI mutability was found and postulated as more flexible in the conformational context, while at the same time they are formed by less abundant proteins.

      I appreciate the trick to use homodimerization as an abundance proxy to predict interaction between heterodimers (of proteins that homodimerize). This "mass-action kinetics model" explains the strength of 230 out of 1212 tested heterodimers.

      A validation experiment of the glucose transporter network was performed and 90 "randomly chosen" PPIs that were present in the SD environment were tested in NaCl (osmotic stress) and Raffinose (low glucose) conditions through recording optical density growth trajectories. Hxt5 PPIs stayed similar in the tested conditions, supported by the current knowledge that Hxt5 is highly expressed in stationary phase and under salt stress. In Raffinose, Hxt7, previously reported to increase the mRNA expression, lost most PPIs indicating that other factors might influence Hxt7 PPIs.

      Points for consideration:

      *) A clear definition of mutable and immutable is missing, or could not be found e.g. at page 4 second paragraph.

      *) Approximately half of the PPIs have been identified in one environment. Many of those mutable PPIs were detected in the 16{degree sign}C condition. Is there an explanation for the predominance of this specific environment? What are these PPIs about?

      *) 50 % overall retest validation rate is fair and reflects a value comparable to other large-scale approaches. However what is the actual variation, e.g. between mutable PPIs and immutable or between condition. e.g. at 16{degree sign}C.

      *) What is the R correlation cutoff for PPIs explained in the mass equilibrium model vs. not explained?

      *) 90 "randomly chosen" PPIs for validation. It needs to be demonstrated that these interaction are a random subset otherwise is could also mean cherry picked interactions ...

      *) Figure 4 provides interesting correlations with the goal to reveal properties of mutable and less mutable PPIs. PPIs detected in the PPIseq screen can partially be correlated to co-expression (4A) as well as co-localization. Does it make sense to correlate the co-expression across number of conditions? Are the expression correlation condition specific. In this graph it could be that expression correlation stems from condition 1 and 2 and the interaction takes place in 4 and 5 still leading to the same conclusion ... Is the picture of the co-expression correlation similar when you simply look at individual environments like in S4A?

      *) Figure 4C: Interesting, how dependent are the various categories?

      *) Figure 4 F: When binned in the number of environments in which the PPI was found, the distribution peaks at 6 environments and decreases with higher and lower number of environments. The description /explanation in the text clearly says something else.

      *) Figure 6: I apologize, but for my taste this is not a final figure 6 for this study. Investigation of different environments increases the PPI network in yeast, yes, yet it is very well known that a saturation is reached after testing of several conditions, different methods and even screening repetition (sampling). It does not represent an important outcome. Move to suppl or remove.

      Significance

      Liu et al. increase the current PPI network in yeast and offer a substantial dataset of novel PPIs seen in specific environments only. This resource can be used to further investigate the biological meaning of the PPI changes. The data set is compared to previous DHFR providing some sort of quality benchmarking. Mutable interactions are characterized well. Clearly a next step could be to start some "orthogonal" validation, i.e. beyond yeast growth under methotrexate treatment.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript is clearly written and the figures appropriate and informative. Some descriptions of data analyses are a little dense but reflect what would appear long hard efforts on the part of the authors to identify and control for possible sources of misinterpretation due to sensitivities of parameters in their fitness model. The authors efforts to retest interactions under non-competition conditions allay fears of most concerns that I would have. One problem though that I could not see explicitly addressed was that of potential effects of interactions between methotrexate and the other conditions and how this is controlled for. Specifically, I could be argued that the fact that a particular PPI is observed under a specific condition could have more to do with a synthetic effect of treatment of cells with a drug plus methotrexate. Is this controlled for and how? I raise this because in a chemical genetic screen for fitness it was shown that methotrexate is particularly promiscuous for drug-drug interactions (Hillenmeyer ME ,et al. Science 2008). I tried to think of how this works but couldn't come up with anything immediately. I'd appreciate if the authors would take a crack at resolving this issue. Otherwise I have no further concerns about the manuscript.

      Significance

      Lui et al expand on previous work from the Levy group to explore a massive in vivo protein interactome in the yeast S. cerevisiae. They achieve this by performing screens cross 9 growth conditions, which, with replication, results in a total of 44 million measurements. Interpreting their results based on a fitness model for pooled growth under methotrexate selection, they make the key observation that there is a vastly expanded pool of protein-protein interactions (PPI) that are found under only one or two condition compared to a more limited set of PPI that are found under a broad set of conditions (mutable versus immutable interactors). The authors show that this dichotomy suggests some important features of proteins and their PPIs that raise important questions about functionality and evolution of PPIs. Among these are that mutable PPIs are enriched for cross-compartmental, high disorder and higher rates of evolution and subcellular localization of proteins to chromatin, suggesting roles in gene regulation that are associated with cellular responses to new conditions. At the same time these interactions are not enriched for changes in abundance. These results are in contrast to those of immutable PPIs, which seem to form a core background noise, more determined by changes in abundance than what the authors interpret must be post-translational processes that may drive, for instance, changes in subcellular localization resulting in appearance of PPIs under specific conditions. The authors are also able to address a couple of key issues about protein interactomes, including the controversial Party-date Hub hypothesis of Vidal, in which they could now affirm support for this hypothesis based on their results and notably negative correlation of PPIs to protein abundance for mutable PPIs. Finally, they also addressed the problem of predicting the upper limit of PPIs in yeast, showing the remarkable results that it may be no more than about 2 times the number of proteins expressed by yeast. Such an upper limit is profoundly important to modelling cellular network complexity and, if it holds up, could define a general upper limit on organismal complexity.

      This manuscript is a very important contribution to understanding dynamics of molecular networks in living cells and should be published with high priority.
      
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their close reading and constructive comments on our manuscript. We believe that their insight has substantially strengthened our manuscript. Please find our response/revision plan for each comment below (in blue). Note, because of the substantial changes to the figures and the additional experiments that are we are undertaking, we have not initially revised the text. The proposed textual revisions will be included in the full revision.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The Katz lab has contributed greatly to the field of epigenetic reprogramming over the years, and this is

      another excellent paper on the subject. I enjoyed reviewing this manuscript and don't have any major

      comments/suggestions for improving it. The findings presented are novel and important, the results are clear

      cut, and the writing is clear.

      It's important to stress the novelty of the findings, which build upon previous studies from the same lab (upon

      a shallow look one might think that some of the conclusions were described before, but this is not the case).

      Despite the fact that this system has been studied in depth before, it remained unclear why and how

      germline genes are bookmarked by H3K36 in the embryo, and it wasn't known why germline genes are not

      expressed in the soma.

      To study these questions Carpenter et al. examine multiple phenotypes (developmental aberrations,

      sterility), that they combine with analysis of multiple genetic backgrounds, RNA-seq, CHIP-seq, single

      molecule FISH, and fluorescent transgenes.

      Previous observations from the Katz lab suggested that progeny derived from spr-5;met-2 double mutants

      can develop abnormally. They show here that the progeny of these double mutants (unlike spr-5 and met-2

      single mutants) develop severe and highly penetrate developmental delays, a Pvl phenotype, and sterility.

      They show also that spr-5; met-2 maternal reprogramming prevents developmental delay by restricting

      ectopic MES-4 bookmarking, and that developmental delay of spr-5;met-2 progeny is the result of ectopic

      expression of MES-4 germline genes. The bottom line is that they shed light on how SPR-5, MET-2 and

      MES-4 balance inter-generational inheritance of H3K4, H3K9, and H3K36 methylation, to allow correct

      specification of germline and somatic cells. This is all very important and relevant also to other organisms.

      **(very) Minor comments:**

      -Since the word "heritable" is used in different contexts, it could be helpful to elaborate, perhaps in the

      introduction, on the distinction between cellular memory and transgenerational inheritance.

      We are happy to elaborate on this in the revised manuscript.

      -It might be interesting in the Discussion to expand further about the links between heritable chromatin

      marks and heritable small RNAs. The do hint that the result regarding the silencing of the somatic transgene

      are especially intriguing.

      We are happy to expand this in the revised manuscript.

      Reviewer #1 (Significance (Required)):

      This is an exciting paper which build upon years of important work in the Katz lab. The novelty of the paper

      is in pinpointing the mechanisms that bookmark germline genes by H3K36 in the embryo, and explaining

      why and how germline genes are prevented from being expressed in the soma.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Katz and colleagues examine the interaction between the methyltransferase MES-4 and spr-5; met-2 double

      mutants. Their prior analysis (PNAS, 2014) showed the dramatic enhancement in sterility and development

      for spr-5; met-2; this paper extends that finding by showing these effects depend on MES-4. The results are

      interesting and the genetic interactions dramatic. The examination by RNAseq and ChIP helps move the

      phenotypes into a more molecular analysis. The authors hypothesize that SPR-5 and MET-2 modify

      chromatin of germline genes (MES-4 targets) in somatic cells, and this is required to silence germline genes

      in the soma. A few issues need to be resolved to test these ideas and rule out others.

      **Main comments:**

      The authors' hypothesis is that SPR-5 and MET-2 act directly, to modify chromatin of germline genes (MES-

      4 targets), but alternate hypothesis is that the key regulated genes are i) MES-4 itself and/or ii) known

      regulators of germline gene expression e.g. the piwi pathway. Mis regulation of these factors in the soma

      could be responsible for the phenotypes. Therefore, the authors should analyze expression (smFISH and

      where possible protein stains) for MES-4 and PIWI components in the embryo and larvae of wildtype, double

      and triple mutant strains. These experiments are essential and not difficult to perform.

      In our RNA-seq analysis we see a small elevation of MES-4 itself (average 1.18 log2 fold change across 5 replicates). This does not seem likely to be solely driving such a dramatic phenotype. Nevertheless, it is possible that the small increase in expression of MES-4 itself could be contributing. To determine if MES-4 is being ectopically expressed in spr-5; met-2 double mutants, we have obtained a tag version of MES-4 from Dr. Susan Strome and will use this to examine the localization of MES-4 protein in spr-5; met-2 double mutants. We are definitely interested in the potential interaction between PIWI components and the histone modifying enzymes that we have explored in this study. However, since RNAi of MES-4 is sufficient to rescue the developmental delay of spr-5; met-2 mutants, we have chosen to focus on that interaction in this paper. In the future, we hope to examine the role of PIWI components in this system.

      A second aspect of the hypothesis is that spr-5 and met-2 act before mes-4 and that while these genes are

      maternally expressed, they act in the embryo. There really aren't data to support these ideas - the timing and

      location of the factors' activities have not been pinned down. One way to begin to address this question

      would be to perform smFISH on the target genes and on mes-4 in embryos and determine when and where

      changes first appear. smFISH in embryos is critical - relying on L1 data is too late. If timing data cannot be

      obtained, then I suggest that the authors back off of the timing ideas or at least explain the caveats.

      Certainly, figure 8 should be simplified and timing removed. (note: Typical maternal effect tests probably

      won't work because if the genes' RNAs are germline deposited, then a maternal effect test will reflect when

      the RNA is expressed but not when the protein is active. A TS allele would be needed, and that may not be

      available.)

      To determine the timing of the ectopic expression of MES-4 targets, we have performed smFISH on two MES-4 targets in embryos. Thus far, these experiments show that MES-4 targets are ectopically expressed in the embryo, but only after the maternal to zygotic transition. This is consistent with our proposed model. A figure containing this data will be added to the revised manuscript. In addition, our model is predicated on the known embryonic protein localization of SPR-5 and MES-4. Maternal SPR-5 protein is present in the early embryo up to around the 8-cell stage, but absent in later embryos (Katz et al., 2009). In addition, in mice, the SPR-5 ortholog LSD1 is required maternally prior to the 2-cell stage (Wasson et al., 2016 and Ancelin et al., 2016). In contrast, MES-4 continues to be expressed in the embryo until later embryonic stages where it is concentrated into the germline precursors Z2 and Z3 (Fong et al., 2002). This is consistent with SPR-5 establishing a chromatin state that continues to be antagonized by MES-4. There is evidence that MET-2 is expressed both in early embryos and later embryos. However, since the phenotype of MET-2 so closely resembles the phenotype of SPR-5 (Kerr et al., 2014), we have included it in our model as working with SPR-5. Further experimentation will be required to substantiate the model, but we believe the model is consistent with all of the current data.

      Writing/clarity:

      -It would be helpful to include a table that lists the specific genes studied in the paper and how they behaved

      in the different assays e.g. RNAseq 1, RNAseq 2, MES-4 target, ChIP. That way, readers will understand

      each of the genes better.

      We are happy to include a table in the revised manuscript.

      -At the end of each experiment, it would be helpful to explain the conclusion and not wait until the

      Discussion. For readers not in the field, the logic of the Results section is hard to follow.

      This seems like a stylistic choice. Traditionally, papers did not include any conclusions in the results section, and it is our preference to keep our paper organized this way. However, if the reviewer would still like us to change this, we are happy to do so.

      -The model is explained over three pages in the Discussion. It would be great to begin with a single

      paragraph that summarizes the model/point of the paper simply and clearly.

      The discussion in the revised manuscript will altered to include this.

      **Specific comments:**

      -Figure 1 has been published previously and should be moved to the supplement.

      In our original paper (Kerr et al.) we reported in the text that spr-5; met-2 mutants have a developmental delay. However, we did not characterize this developmental delay. Nor did we include any images of the double mutants, except for one image of the adult germline phenotype. As a result, we believe that the inclusion of the developmental delay in the main body of this manuscript is warranted.

      -Cite their prior paper for the vulval defects e.g. page 6 or show in supplement.

      We are happy to include a citation of our previous paper for the vulval defects in the revised manuscript.

      -The second RNAseq data should be shown in the Results since it is much stronger. The first RNAseq,

      which is less robust, should be moved to supplement.

      The revised manuscript will include this alteration.

      -Figure 3 is very nice. Please explain why the RNAs were picked (+ the table, see comment above), and

      please add here or in a new figure mes-4 and piwi pathway expression data in wildtype vs double/triple

      mutants.

      We performed RT-PCR on 9 MES-4 targets. These 9 targets were picked because they had the highest ectopic expression in spr-5; met-2 mutants and largest change in H3K36me3 in spr-5; met-2 mutants versus Wild Type. Amongst these 9 genes, we performed smFISH on htp-1 and cpb-1 because they are relatively well characterized as germline genes.

      The revised manuscript will include added panels to supplemental figure 2 showing the expression of PIWI pathway components.

      -Figure 3 here or later, please show if mes-4 RNAi removes somatic expression of target genes.

      We are currently carrying out this experiment. Once it is completed, the data will hopefully be added to the paper.

      -Is embryogenesis delayed?

      Embryogenesis seems to be sped up in spr-5; met-2 mutants. A supplemental figure will be added to the revised manuscript showing this. It is unclear why embryogenesis is sped up. However, this confirms that the developmental delay is unique to the L1/L2 stages.

      -Figure 4 since htp-1 smFISH is so dramatic, it would be helpful to include htp-1 in the lower panels.

      htp-1 will be added to the lower panels in the revised manuscript.

      -Figure 4, please add an extra 2 upper panels showing all the genes in N2 vs spr-5;met-2, for comparison to

      the mes-4 cohort.

      As a control, we will add panels showing a comparison to all germline genes, excluding MES-4 targets. This new data shows that germline genes that are not MES-4 targets do not have ectopic H3K36me3. This data, which further suggests that the phenomenon is confined to MES-4 targets, is consistent with our results showing that MES-4 RNAi is sufficient to suppress the developmental delay.

      -Figure 6. Please show a control that met-1 RNAi is working.

      We performed RT-PCR to try and confirm that met-1 RNAi was working. Despite controls repeating the MES-4 suppression and verifying that RNAi was working, we were unable to demonstrate that met-1 was knocked down. As a result, we will remove this result from the paper. Importantly, this does not affect the conclusion of the paper.

      -To quantify histone marks more clearly, it would be wonderful to have a graph of the mean log across the

      gene. showing the mean numbers would help clarify the degree of the effect. we had an image as an

      example but it does not paste into the reviewer box. Instead, see figure 2 or figure 4

      here: https://www.nature.com/articles/ng.322

      We will attempt to include this analysis in the revised manuscript.

      Reviewer #2 (Significance (Required)):

      Katz and colleagues examine the interaction between the methyltransferase MES-4 and spr-5; met-2 double

      mutants. Their prior analysis (PNAS, 2014) showed the dramatic enhancement in sterility and development

      for spr-5; met-2; this paper extends that finding by showing these effects depend on MES-4. The results are

      interesting and the genetic interactions dramatic. The examination by RNAseq and ChIP helps move the

      phenotypes into a more molecular analysis.

      This work will be of interest to people following transgenerational inheritance, generally in the C. elegans

      field. People using other organisms may read it also, although some of the worm genetics may be

      complicated. Some of the writing suggestions could make a difference.

      I study C. elegans embryogenesis, chromatin and inheritance.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In the paper entitled "C. elegans establishes germline versus soma by balancing inherited histone

      methylation" Carpenter BS et al examined a double mutant worm strain they had previously produced of the

      H3K4me1/2 demethylase spr-5 and the predicted H3K9me1/me2 methylase met-2. These mutant worms

      have a developmental delay that arises by the L2 larval stage. They performed an analysis of what genes

      get misexpressed in these double mutants by performing RNAseq and compare this to datasets generated

      from other labs on an H3K36me2/me3 methylase MES-4 where they see a high degree of overlap. They

      validate the misexpression of some germline specific genes in the soma by in situ and validate that there is a

      dysregulation of H3K36me3 in their double mutant worms. They further find that knocking down mes-4

      reverts the developmental delay.

      I think that the authors need to make more of an effort to be a bit more scholarly in terms of placing their

      work in the context of the field as a whole and also need to add a few additional experiments as well as

      reorganize a bit before this is ready for publication. Remember that the average reader is not necessarily an

      expert in C. elegans or this particular field and you really want to try and make the manuscript as accessible

      to everyone as possible.

      **Major Points**

      1)It would be good to see western blots or quantitative mass spec examining H3K36me3 in the WT and spr-

      5;met-2 double mutant worms. I believe this was also previously reported by Greer EL et al Cell Rep 2014 in

      the single spr-5 mutant worm so that work should be cited here in addition to the identification of JMJD-2 as

      an enzyme involved in the inheritance of H3K4me2 phenotype.

      The ectopic H3K36me3 is confined to a small set of MES-4 targets. We don’t even see ectopic H3K36me3 at non-MES-4 germline genes (see above). Therefore, we don’t expect to see any global differences in bulk H3K36me3. Greer et al reported that there are elevated H3K36me3 levels in spr-5 mutants. This discrepancy may be due to different stages (embryos, germline) present in their bulk preparation. Alternatively, the met-2 mutant may counteract the effect of the spr-5 mutation on H3K36me3. Regardless, we believe that the genome-wide ChIP-seq is more informative than bulk H3K36me3 levels.

      We will add a citation for the Greer paper in the revised manuscript.

      2)Missing from Fig.5 is mes-4 KD by itself. This is needed to determine whether these effects are specific to

      the spr-5;met-2 double mutants or more general effects that KD of mes-4 would decrease the expression of

      all these genes to a similar extent. Then statistics should be done to see if the decrease in the WT context is

      the same or greater than the decrease in the double mutants.

      The MES-4 targets are generally expressed only in the germline and defined by having mes-4 dependent H3K36me3. Knocking down mes-4 would be expected to prevent the expression of these genes in the germline, but this is difficult to test because mes-4 mutants basically don’t make a germline. Regardless, knocking down mes-4 by itself would only assess the role of MES-4 in germline transcription, not the ectopic expression that is being assayed in spr-5; met-2 mutants in Fig 5. Importantly, it remains possible that spr-5; met-2 mutants might also result in an increase in the expression of MES-4 targets in the germline. However, the experiments performed in this manuscript were conducted on L1 larvae, which do not have any germline expression, to eliminate this potential confounding contribution.

      **Minor Points**

      1)A greater attempt needs to be made to be more scholarly for citing previously published literature. This

      includes work on the inheritance of H3K27 and H3K36 methylation in C. elegans and other species as well.

      A few papers which seem germane to this story which should be cited in the intro are (Nottke AC et al PNAS

      2011, Gaydos LJ et al Science 2014, Ost A et al Cell 2014, Greer EL et al Cell Rep 2014, Siklenka K et al

      Science 2015, Tabuchi TM et al Nat Comm 2018, Kaneshiro KR et al Nat Comm 2019). This problem is not

      restricted to the intro.

      Although many of these excellent papers are broadly relevant to this current work, they are not necessarily directly relevant to this paper. For this reason, they were not originally cited. Nevertheless, we will attempt to cite these papers in the revised version when possible.

      2)I think that the authors need to be a little less definitive with your language. Theories should be introduced

      as possibilities rather than conclusions. Should remove "comprehensive" from intro as there are many other

      methods which could be done to test this.

      Throughout the manuscript, we have tried to be clear what the data suggests versus what is model based on the data. Nevertheless, to further clarify this, we are happy to remove “comprehensive” from the intro.

      3)The authors should describe what PIE-1 is. Is this a transcription factor?

      PIE-1 is a transcriptional inhibitor that is thought to block RNA polII elongation by mimicking the CTD of RNA polII and competing for phosphorylation. We are happy to add a reference to this function in the revised manuscript.

      4)The language needs clarification about MES-4 germline genes and bookmark genes. Are these bound by

      MES-4 or marked with K36me2/3?

      The revised manuscript will be modified to make this definition more clear.

      5)I think Fig S1 E+F should be in the main figure 1 so readers can see the extent of the phenotype.

      The original single image of the spr-5; met-2 adult germline phenotype (including the protruding vulva) was included in our previous publication. In this manuscript, we have now quantified this phenotype, which is why it is included in the supplement here. However, because the original picture was included in our original publication, we prefer to leave it as supplemental.

      6)For Fig S2 it would be good to do the same statistics that is done in Fig 2 and mention them in the text so

      the readers can see that the overlap is statistically significant.

      We are happy to include these statistics in the revised manuscript.

      7)Fig S2.2 should be yellow blue rather than red green for the colorblind out there.

      Thanks for pointing this out. We are happy to change the colors in the revised manuscript.

      8)When saying "Many of these genes involved in these processes..." the authors need to include numbers

      and statistics.

      We will amend the revised text to make the definition of the MES-4 genes more clear.

      9)Should use WT instead of N2 and specify what wildtype is in methods.

      We will use WT instead of N2 in the revised manuscript.

      10)Fig. 2A + B could be displayed in a single figure. And Fig 2D seems superfluous and could be combined

      with 2C or alternatively it could be put in supplementary.

      Figure 2A and 2B were purposely separated to make it clear how many of the overlapped changes are up versus down. In the revised manuscript, Figure

      2D will be moved to the supplement.

      11)Non-C. elegans experts won't understand what balancers are. An effort should be made to make this

      accessible to all. Explaining when genes are heterozygous or homozygous mutants seems relevant

      here.

      The text of the revised manuscript will be amended to make it more accessible for non-C. elegans readers.

      12)The GO categories (Fig. S2) should be in the main figure and need to be made to look more scientific

      rather than copied and pasted from a program.

      The GO categories were included to be comprehensive and do not contribute substantially to the main conclusion of the paper. This is why they are supplemental. In the revised manuscript, we will edit the GO results so that they look more scientific.

      13)Fig. 7 seems a bit out of place. If the authors were to KD mes-4 and similarly show that the phenotype

      reverts that would help justify its inclusion in this paper. Without it seems like a bit of an add on that belongs

      elsewhere.

      We believe that the somatic expression of a transgene in spr-5; met-2 mutants adds to our potential understanding of how this double mutant may lead to developmental delay. This is true, regardless of whether of whether the somatic transgene expression is mes-4 dependent or not.

      Reviewer #3 (Significance (Required)):

      I think this is an interesting and timely piece of work. A little more effort needs to be put in to make sure it is

      accessible to the average reader and has sufficient inclusion of more of the large body of work on

      inheritance of histone modifications. I think C. elegans researchers as well as people interested in

      inheritance and the setup of the germline will be interested in this work.

      REFEREES CROSS COMMENTING

      I agree with Reviewer #2's comments on experiments to include or exclude alternative models. I also agree

      about their statement about rewriting to make it more accessible to others who aren't experts in this

      specialized portion of C. elegans research. All in all it seems like the experiments which are required by

      reviewer #2 and myself as well as the rewriting should be quite feasible.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In the paper entitled "C. elegans establishes germline versus soma by balancing inherited histone methylation" Carpenter BS et al examined a double mutant worm strain they had previously produced of the H3K4me1/2 demethylase spr-5 and the predicted H3K9me1/me2 methylase met-2. These mutant worms have a developmental delay that arises by the L2 larval stage. They performed an analysis of what genes get misexpressed in these double mutants by performing RNAseq and compare this to datasets generated from other labs on an H3K36me2/me3 methylase MES-4 where they see a high degree of overlap. They validate the misexpression of some germline specific genes in the soma by in situ and validate that there is a dysregulation of H3K36me3 in their double mutant worms. They further find that knocking down mes-4 reverts the developmental delay.

      I think that the authors need to make more of an effort to be a bit more scholarly in terms of placing their work in the context of the field as a whole and also need to add a few additional experiments as well as reorganize a bit before this is ready for publication. Remember that the average reader is not necessarily an expert in C. elegans or this particular field and you really want to try and make the manuscript as accessible to everyone as possible.

      Major Points

      1)It would be good to see western blots or quantitative mass spec examining H3K36me3 in the WT and spr-5;met-2 double mutant worms. I believe this was also previously reported by Greer EL et al Cell Rep 2014 in the single spr-5 mutant worm so that work should be cited here in addition to the identification of JMJD-2 as an enzyme involved in the inheritance of H3K4me2 phenotype.

      2)Missing from Fig.5 is mes-4 KD by itself. This is needed to determine whether these effects are specific to the spr-5;met-2 double mutants or more general effects that KD of mes-4 would decrease the expression of all these genes to a similar extent. Then statistics should be done to see if the decrease in the WT context is the same or greater than the decrease in the double mutants.

      Minor Points

      1)A greater attempt needs to be made to be more scholarly for citing previously published literature. This includes work on the inheritance of H3K27 and H3K36 methylation in C. elegans and other species as well. A few papers which seem germane to this story which should be cited in the intro are (Nottke AC et al PNAS 2011, Gaydos LJ et al Science 2014, Ost A et al Cell 2014, Greer EL et al Cell Rep 2014, Siklenka K et al Science 2015, Tabuchi TM et al Nat Comm 2018, Kaneshiro KR et al Nat Comm 2019). This problem is not restricted to the intro.

      2)I think that the authors need to be a little less definitive with your language. Theories should be introduced as possibilities rather than conclusions. Should remove "comprehensive" from intro as there are many other methods which could be done to test this.

      3)The authors should describe what PIE-1 is. Is this a transcription factor?

      4)The language needs clarification about MES-4 germline genes and bookmark genes. Are these bound by MES-4 or marked with K36me2/3?

      5)I think Fig S1 E+F should be in the main figure 1 so readers can see the extent of the phenotype.

      6)For Fig S2 it would be good to do the same statistics that is done in Fig 2 and mention them in the text so the readers can see that the overlap is statistically significant.

      7)Fig S2.2 should be yellow blue rather than red green for the colorblind out there.

      8)When saying "Many of these genes involved in these processes..." the authors need to include numbers and statistics.

      9)Should use WT instead of N2 and specify what wildtype is in methods.

      10)Fig. 2A + B could be displayed in a single figure. And Fig 2D seems superfluous and could be combined with 2C or alternatively it could be put in supplementary.

      11)Non-C. elegans experts won't understand what balancers are. An effort should be made to make this accessible to all. Explaining when genes are heterozygous or homozygous mutants seems relevant here.

      12)The GO categories (Fig. S2) should be in the main figure and need to be made to look more scientific rather than copied and pasted from a program.

      13)Fig. 7 seems a bit out of place. If the authors were to KD mes-4 and similarly show that the phenotype reverts that would help justify its inclusion in this paper. Without it seems like a bit of an add on that belongs elsewhere.

      Significance

      I think this is an interesting and timely piece of work. A little more effort needs to be put in to make sure it is accessible to the average reader and has sufficient inclusion of more of the large body of work on inheritance of histone modifications. I think C. elegans researchers as well as people interested in inheritance and the setup of the germline will be interested in this work.

      REFEREES CROSS COMMENTING

      I agree with Reviewer #2's comments on experiments to include or exclude alternative models. I also agree about their statement about rewriting to make it more accessible to others who aren't experts in this specialized portion of C. elegans research. All in all it seems like the experiments which are required by reviewer #2 and myself as well as the rewriting should be quite feasible.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Katz and colleagues examine the interaction between the methyltransferase MES-4 and spr-5; met-2 double mutants. Their prior analysis (PNAS, 2014) showed the dramatic enhancement in sterility and development for spr-5; met-2; this paper extends that finding by showing these effects depend on MES-4. The results are interesting and the genetic interactions dramatic. The examination by RNAseq and ChIP helps move the phenotypes into a more molecular analysis. The authors hypothesize that SPR-5 and MET-2 modify chromatin of germline genes (MES-4 targets) in somatic cells, and this is required to silence germline genes in the soma. A few issues need to be resolved to test these ideas and rule out others.

      Main comments:

      The authors' hypothesis is that SPR-5 and MET-2 act directly, to modify chromatin of germline genes (MES-4 targets), but alternate hypothesis is that the key regulated genes are i) MES-4 itself and/or ii) known regulators of germline gene expression e.g. the piwi pathway. Mis regulation of these factors in the soma could be responsible for the phenotypes. Therefore, the authors should analyze expression (smFISH and where possible protein stains) for MES-4 and PIWI components in the embryo and larvae of wildtype, double and triple mutant strains. These experiments are essential and not difficult to perform.

      A second aspect of the hypothesis is that spr-5 and met-2 act before mes-4 and that while these genes are maternally expressed, they act in the embryo. There really aren't data to support these ideas - the timing and location of the factors' activities have not been pinned down. One way to begin to address this question would be to perform smFISH on the target genes and on mes-4 in embryos and determine when and where changes first appear. smFISH in embryos is critical - relying on L1 data is too late. If timing data cannot be obtained, then I suggest that the authors back off of the timing ideas or at least explain the caveats. Certainly, figure 8 should be simplified and timing removed. (note: Typical maternal effect tests probably won't work because if the genes' RNAs are germline deposited, then a maternal effect test will reflect when the RNA is expressed but not when the protein is active. A TS allele would be needed, and that may not be available.)

      Writing/clarity:

      -It would be helpful to include a table that lists the specific genes studied in the paper and how they behaved in the different assays e.g. RNAseq 1, RNAseq 2, MES-4 target, ChIP. That way, readers will understand each of the genes better.

      -At the end of each experiment, it would be helpful to explain the conclusion and not wait until the Discussion. For readers not in the field, the logic of the Results section is hard to follow.

      -The model is explained over three pages in the Discussion. It would be great to begin with a single paragraph that summarizes the model/point of the paper simply and clearly.

      Specific comments:

      -Figure 1 has been published previously and should be moved to the supplement.

      -Cite their prior paper for the vulval defects e.g. page 6 or show in supplement.

      -The second RNAseq data should be shown in the Results since it is much stronger. The first RNAseq, which is less robust, should be moved to supplement.

      -Figure 3 is very nice. Please explain why the RNAs were picked (+ the table, see comment above), and please add here or in a new figure mes-4 and piwi pathway expression data in wildtype vs double/triple mutants.

      -Figure 3 here or later, please show if mes-4 RNAi removes somatic expression of target genes.

      -Is embryogenesis delayed?

      -Figure 4 since htp-1 smFISH is so dramatic, it would be helpful to include htp-1 in the lower panels.

      -Figure 4, please add an extra 2 upper panels showing all the genes in N2 vs spr-5;met-2, for comparison to the mes-4 cohort.

      -Figure 6. Please show a control that met-1 RNAi is working.

      -To quantify histone marks more clearly, it would be wonderful to have a graph of the mean log across the gene. showing the mean numbers would help clarify the degree of the effect. we had an image as an example but it does not paste into the reviewer box. Instead, see figure 2 or figure 4 here: https://www.nature.com/articles/ng.322

      Significance

      Katz and colleagues examine the interaction between the methyltransferase MES-4 and spr-5; met-2 double mutants. Their prior analysis (PNAS, 2014) showed the dramatic enhancement in sterility and development for spr-5; met-2; this paper extends that finding by showing these effects depend on MES-4. The results are interesting and the genetic interactions dramatic. The examination by RNAseq and ChIP helps move the phenotypes into a more molecular analysis.

      This work will be of interest to people following transgenerational inheritance, generally in the C. elegans field. People using other organisms may read it also, although some of the worm genetics may be complicated. Some of the writing suggestions could make a difference.

      I study C. elegans embryogenesis, chromatin and inheritance.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The Katz lab has contributed greatly to the field of epigenetic reprogramming over the years, and this is another excellent paper on the subject. I enjoyed reviewing this manuscript and don't have any major comments/suggestions for improving it. The findings presented are novel and important, the results are clear cut, and the writing is clear.

      It's important to stress the novelty of the findings, which build upon previous studies from the same lab (upon a shallow look one might think that some of the conclusions were described before, but this is not the case). Despite the fact that this system has been studied in depth before, it remained unclear why and how germline genes are bookmarked by H3K36 in the embryo, and it wasn't known why germline genes are not expressed in the soma.

      To study these questions Carpenter et al. examine multiple phenotypes (developmental aberrations, sterility), that they combine with analysis of multiple genetic backgrounds, RNA-seq, CHIP-seq, single molecule FISH, and fluorescent transgenes.

      Previous observations from the Katz lab suggested that progeny derived from spr-5;met-2 double mutants can develop abnormally. They show here that the progeny of these double mutants (unlike spr-5 and met-2 single mutants) develop severe and highly penetrate developmental delays, a Pvl phenotype, and sterility. They show also that spr-5; met-2 maternal reprogramming prevents developmental delay by restricting ectopic MES-4 bookmarking, and that developmental delay of spr-5;met-2 progeny is the result of ectopic expression of MES-4 germline genes. The bottom line is that they shed light on how SPR-5, MET-2 and MES-4 balance inter-generational inheritance of H3K4, H3K9, and H3K36 methylation, to allow correct specification of germline and somatic cells. This is all very important and relevant also to other organisms.

      (very) Minor comments:

      -Since the word "heritable" is used in different contexts, it could be helpful to elaborate, perhaps in the introduction, on the distinction between cellular memory and transgenerational inheritance.

      -It might be interesting in the Discussion to expand further about the links between heritable chromatin marks and heritable small RNAs. The do hint that the result regarding the silencing of the somatic transgene are especially intriguing.

      Significance

      This is an exciting paper which build upon years of important work in the Katz lab. The novelty of the paper is in pinpointing the mechanisms that bookmark germline genes by H3K36 in the embryo, and explaining why and how germline genes are prevented from being expressed in the soma.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for providing insightful critiques on our manuscript.

      Changes to document and comments made are marked e.g. “Reply 1.1” (referring the Reviewer #1 item #1, etc.) as described below.

      Reviewer #1

      I found this study to be very convincing. Prior studies are referenced appropriately, the text is well written and clear, the figures are clear also. In my opinion the paper does not need further experiment.

      [1.1] The conclusions are well supported by the data. However, the concatenation model seems very speculative at this point. Also, it does not take into account the dynamics of these molecules.

      Reply 1.1: The concatenation model combines the structural data from our manuscript with prior biochemical insights into tetraspanin homodimerization and with scanning-EM data on immunogold-labeled CD81 and CD9 on cells. It is not completely clear to us what reviewer #1 refers to with “the dynamics of these molecules”. The cryo-EM data revealed that CD9 - EWI-F is a dynamic complex with straight and bent conformations, which could account for both circular and linear arrangements of tetraspanin-microdomains in cell membranes through the higher-order oligomerization of stable CD9 - EWI-F tetramers. Moreover, transient CD9 - CD9 interactions likely yield a variable number of complexes present in these concatenated and flexible strings of complexes. Such a concatenation model indeed requires further validation. However, it is consistent with experimental data and, importantly, provides a long-awaited molecular basis for TEM assembly. Although it was not within the scope of the current study, it will be of great interest to further investigate the concatenation model through detailed cell-biology based approaches.

      **Minor comment:**

      [1.2] There seems to be a mix up between the two structures in the following sentence p4: "In CD9EC2 - 4C8, the D loop adopts a partially helical conformation and central residue F176 is sandwiched by 4E8 residues W59 of CDR2 and W102 and R105 of CDR3 (Fig. 1D). In the 4C8-bound CD9EC2 structure the tip of the D loop points more outward and the Cα atom of F176"

      Reply 1.2: The first sentence indeed mixed up the two structures and wrongfully mentioned CD9EC2 - 4C8 instead of CD9EC2 - 4E8. This has now been updated: “In CD9EC2 - 4E8, the D loop adopts …”

      Reviewer #2

      The paper is well written and the conclusions made are supported by the data presented.

      [2.1] The ternary structure is in agreement with that of CD9 in complex with the related EWI-2 published earlier this year by Umeda et al (ref #25). The present work thus adds little structural insights but may be useful in showing that the interaction pattern seen extends to another EWI protein family member.

      Reply 2.1: We agree with reviewer #2 that that the CD9 - EWI-F structure presented in our work is similar to the CD9 - EWI-2 structure published recently by Umeda et al. (ref #25). However, as also pointed out by reviewer #1, we believe that the CD9 - EWI-F structure adds new important information to understand the molecular mechanism underlying the assembly of tetraspanin-enriched microdomains. Notably, the different conformations of the CD9 - EWI-F complex observed in the cryo-EM data provide structural biology evidence for the dynamic nature of the interaction between a tetraspanin and a partner protein, which is consistent with a wealth of prior biochemical data. Guided by the distinct shape of the CD9EC2 - 4C8 densities, we were able to distinguish a range of straight to bent conformations of the complex. CD9 regions that represent known tetraspanin homo-dimerization sites, orient away from EWI-F and are available for interactions. Thus, combining our structural data with previous biochemical interaction data allowed for the generation of a long-awaited model for the assembly of tetraspanin-microdomains at the molecular level. We believe that these implications for TEM assembly will stimulate new, innovative research into the molecular principles that govern the function of tetraspanins.

      [2.2] As such it may be acceptable for publication. In this case, the authors should improve the quality of Figs. 3D and 4D.

      Reply 2.2: Figures 3D and 4D depict raw cryo-electron microscopy images (micrographs). The protein complexes imaged in this study only contain light atoms (H, N, C, O, S). Therefore, the collected micrographs only reveal low-contrast images of protein particles, and, for a typical cryo-EM experiment, it is required to average particles from thousands of micrographs to obtain a 3-dimensional reconstruction. We would like to keep the raw micrographs in figures 3 and 4, as it will aid cryo-EM scientists in judging the quality of the data.

      Reviewer #3

      The work is technically well performed and clearly presented including methodological details. I just have a few minor comments:

      [3.1] Page 4 and Figure S1: it is hard to see how a reliable affinity for 4E8 can be obtained from the cell binding data in S1A, as there is no indication of saturation. It would be good to at acknowledge that this is at best a rough estimate. Fortunately the data for this nanobody in purified situation seems solid.

      Reply 3.1: The obtained affinities are indeed an ±estimation based on a non-linear regression curve fitting on the measured data, performed in triplicate. The text has been updated and now reads as “4C8 and 4E8 bind to purified, full-length CD9 as well as to endogenous CD9 expressed on HeLa cells with apparent binding affinities in the nanomolar range (Fig. S1A, B, C)”. Next to that, a table stating the calculated KDs has been included as Fig. S1C.

      [3.2] Page 6: Does the absence of micellar density for the EWI-F complex indicate flexibility of the extracellular domain relative to the TM? Does this happen because the classification focuses on the highly elongated Ig region?

      Reply 3.2: These are indeed plausible assumptions. We observed highly heterogeneous, elongated particles in the micrograph shown in Fig. 3D, indicating inter-domain flexibility. If the alignment software focusses on certain Ig-like domains, other regions of the protein complex will be averaged out. An additional complexity with these elongated particles was to select an appropriate box size for particle picking and particle extraction, because the particles differ greatly in size based on their orientation (fully elongated side-views vs. much smaller top-views). When taken together, the complex of CD9 with full-length EWI-F was unsuitable for high-resolution structure determination; the subsequent strategy using EWI-FΔIg1-5 resulted in globular particles with less flexibility (Fig. 4D), which allowed for a more detailed structural characterization of the complex.

      [3.3] Page 8: "Recently, a cryo-EM density map has been reported..." - please reference here.

      Reply 3.3: We added the appropriate reference to the sentence: “Recently, a cryo-EM density map has been reported of CD9 in complex with an EWI-F homolog, EWI-2 (25).”

      [3.4] Relatively little is known about how tetraspanins help to organize partner receptors into defined membrane domains, evidence for which has emerged from super-resolution light microscopy. Based on their structural analysis of the CD9-EWI-F complex, including the heterogeneity apparent in the cryo-EM structure, they propose a feasible concatenation model for higher order oligomerization of these complexes in the membrane. Obviously the model will need to be tested rigorously by mutational analysis, particularly the EWI Ig6 interface, but as it stands the paper is a significant contribution to the field of tetraspanins.

      Reply 3.4: From the 8.6 Å cryo-EM data, the amino-acid residues that form the EWI-F Ig6 dimer interface can indeed not be distinguished. However, our data on CD9 in complex with full-length EWI-F (Fig. 3E) and previous cross-linking data (André et al. In situ chemical cross-linking on living cells reveals CD9P-1 cis-oligomer at cell surface - PMID: 19703604) support that EWI-F forms dimeric assemblies. Regarding the concatenation model, we therefore think that it will be of great interest to establish the putative CD9 - CD9 interactions (identified through biochemical approaches), that would link CD9 - EWI-F tetramers into higher assemblies, in the context of native membranes. However, investigating these transient interactions would require various non-trivial experiments and was therefore not within the scope of the current study.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This paper describes the structure of the tetraspanin CD9 and its interaction with the single pass protein EWI-F. The variability in the D loop of EC2 and the domain swapping is a useful addition to the limited structural database of these proteins and correlates with the relatively poor sequence conservation of this region. The key message is that dimerization of the single pass protein extracellular region, and interaction of its transmembrane helix with the tetraspanin, produces a heterodimeric structure that may further oligomerize. The authors propose a feasible concatenation model for higher order oligomerization of these complexes in the membrane.

      The work is technically well performed and clearly presented including methodological details. I just have a few minor comments:

      Page 4 and Figure S1: it is hard to see how a reliable affinity for 4E8 can be obtained from the cell binding data in S1A, as there is no indication of saturation. It would be good to at acknowledge that this is at best a rough estimate. Fortunately the data for this nanobody in purified situation seems solid.

      Page 6: Does the absence of micellar density for the EWI-F complex indicate flexibility of the extracellular domain relative to the TM? Does this happen because the classification focuses on the highly elongated Ig region?

      Page 8: "Recently, a cryo-EM density map has been reported..." - please reference here.

      Significance

      Relatively little is known about how tetraspanins help to organize partner receptors into defined membrane domains, evidence for which has emerged from super-resolution light microscopy. Based on their structural analysis of the CD9-EWI-F complex, including the heterogeneity apparent in the cryo-EM structure, they propose a feasible concatenation model for higher order oligomerization of these complexes in the membrane. Obviously the model will need to be tested rigorously by mutational analysis, particularly the EWI Ig6 interface, but as it stands the paper is a significant contribution to the field of tetraspanins.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this paper, Dr. Oosterheert and colleagues report the crystal structures of CD9EC2 bound to nanobodies 4C8 and 4E8. The CD9EC2/4C8 structure was useful in determining a low resolution cryo-EM structure of EWI-F in complex with CD9/4C8. The observed sample heterogeneity of this ternary complex was reduced by deleting the n-terminal five Ig domains of EWI-F, yielding a modest maximum global resolution of ~ 8.6 Å. The structural approaches used are standard. The crystallographic and structure refinement statistics are sound as are the cryo-EM image processing. The overall cryo-EM structure of the ternary complex shows a central EWI-F protein dimer flanked by one CD9 molecule on each side. The paper is well written and the conclusions made are supported by the data presented.

      Significance

      The ternary structure is in agreement with that of CD9 in complex with the related EWI-2 published earlier this year by Umeda et al (ref #25). The present work thus adds little structural insights but may be useful in showing that the interaction pattern seen extends to another EWI protein family member. As such it may be acceptable for publication. In this case, the authors should improve the quality of Figs. 3D and 4D.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this article, the authors provide new insights into the structure of the tetraspanin CD9. On the one hand, they provide crystal structures of the large extracellular domain of CD9, alone or bound to two nanobodies. The 3 structures are similar and similar to that of CD81, a related tetraspanin, except for a portion of the molecule, the so-called D-domain, showing flexibility of this domain. On the other hand, they obtained the cryo-EM structure of CD9 in association with a known-partner (EWI-F) with a resolution of 8.6A. More precisely, the complex of CD9 and the full-length EWI-F showed heterogeneity which they interpret as a consequence of the flexibility between the six Ig-like domains of EWI-F, precluding high-resolution structure determination. However, they showed that CD9 still interacted with a molecule lacking the 5 most membrane-distal Ig domains of EWI-F, and obtained the structure using this construct and an anti-CD9 nanobody. This structure reveals a hetero-tetrameric arrangement of CD9-EWIF, with a central EWI-F dimer flanked by a CD9 molecule on each side. CD9 and EWI-F interact through their transmembrane domains and the two truncated EWI-F molecules through the remaining Ig domains. Importantly, CD9 and EWI-F do not make contacts in the extracellular region, and CD9 shows a semi-open conformation. The structure also shows different configurations of the complex.

      I found this study to be very convincing. Prior studies are referenced appropriately, the text is well written and clear, the figures are clear also.

      In my opinion the paper does not need further experiment.

      The conclusions are well supported by the data. However, the concatenation model seems very speculative at this point. Also, it does not take into account the dynamics of these molecules.

      Minor comment:

      There seems to be a mix up between the two structures in the following sentence p4: "In CD9EC2 - 4C8, the D loop adopts a partially helical conformation and central residue F176 is sandwiched by 4E8 residues W59 of CDR2 and W102 and R105 of CDR3 (Fig. 1D). In the 4C8-bound CD9EC2 structure the tip of the D loop points more outward and the Cα atom of F176"

      Significance

      Tetraspanins have been shown over the years to play an essential role in various biological functions. Among them, CD9 which is strongly expressed on the oocyte plasma membrane is essential for sperm-egg fusion. However, the mechanisms by which CD9 regulates this fusion process as well as other cell-cell fusion events remain unknown. The elucidation of its structure and of how it interacts with well characterized partner proteins is clearly a major advance in our understanding of the function of this molecule.

      The absence of a structure for tetraspanins has been for a long time a knowledge gap. Following a breakthrough in 2001 with the publication of the crystal structure of the large extracellular domain of CD81 (Kitadokoro et al., EMBO J 2001), it was only recently that the structure of a full length tetraspanin, again that of CD81, was published (Zimmermann et al., Cell 2016). Earlier this year was published the crystal structure of a truncated version of CD9 as well as the cryo-EM structure of CD9 in association with another molecular partner EWI-2 (Umeda et al.,Nature com 2020).

      The present structure adds new important information such as the existence of different conformation in the large extracellular domain of CD9 or the structure of CD9 with another molecular partner. It also highlights the different configurations of the complex. It will be of interest to researchers interested in tetraspanins, in membrane organization as well as researchers interested in the biological processes regulated by CD9, notably sperm-egg fusion.

      My field of expertise concerns tetraspanins. I cannot comment on the technical aspects of the structures.

  5. Jul 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      1. The first hypothesis of the manuscript is that, rather than a change in a single immune pathway being responsible for the lack of response to the virus, the response will be systemic involving multiple inter-related pathways. The data show that this was the case after presenting convincing transcriptome analysis.

      We thank the reviewer agreeing that we have convincingly shown that the response to the virus is systemic involving the induction of interrelated pathways

      The second hypothesis is that the differences in responses between bats and humans are due to evolutionarily divergent genes. The authors provide evidence for this in the transcriptome differences in the C-reactive protein, aspects of the complement system, iron regulation and M1/M2 macrophage polarization. The second hypothesis is broad, but there are clearly differences in the genes involved in humans and bats. Without mechanistic information on the function of the proteins/cells investigated, it is hard to determine that the changes the authors are observing are the cause of the different responses, rather than an effect of some upstream response, and so difficult to pin-point specific divergent genes.

      We agree that mechanistic studies will be required to test causal links between the genes we identified and specific anti-viral responses, an effort that is likely to require multiple laboratories and some time. The aim of this study was to enable this effort by identifying a list of candidate genes affected by EBOV and MARV infection in bats, not merely in cultured bat cells.

      The authors wish to compare the response to the virus in bats to the better characterized human tissue responses, but because this relies on previously published work in humans, it is sometimes unclear whether "more bat-like" responses are definitely associated with positive outcomes in humans. As the benefit of certain responses in human infections can depend on the timing of the response, it might be helpful to include summarized human data in manuscript to aid comparison with the bat responses.

      We agree and have added the following data and discussion (inserted into Discussion, page 9, and added two new tables, Tables 2 and 3).

      Comparing our observations to human responses to filoviruses is limited by the scarcity of studies in humans. Nevertheless, this comparison suggests potential directions to explore. In one study, individuals who succumbed to the disease showed stronger upregulation of interferon signaling and acute phase responses compared to survivors during the acute phase of infection[1], consistent with the anti-inflammatory response gene expression signature identified in this study in bats. However, most of the genes used in the study by Liu et al. to classify survivors are either barely expressed in bats or do not respond to filoviral infection (Table 2), the differences that provide potential clues to find why bats can tolerate the infection.

      A study of patients infected with Sudan Ebola virus (SUDV) analyzed protein levels for a panel of genes using a Luminex multiplex assay (using antibodies)[2]. The panel was based on results from other studies and pathways involved in the response to infections. The patients were classified into 3 possible dichotomies (fatal/non-fatal, hemorrhaging/non-hemorrhaging, or high/low viremia) correlated with genes that characterized these states. Most of these genes either are barely expressed, if at all, or are unaffected by infection in bats, except for ferritin (FTL, FTH1) whose expression is lowered by MARV infection, consistent with the observation that ferritin is higher is fatal human cases (Table 3).

      For instance, the T-cell response section concludes "Bats mount a T cell response against the infection" but there is no discussion of the impaired but complex lymphocyte response in humans, so comparison is not possible.

      We have expanded the discussion on T cells (Results, page 7) as follows.

      Previous studies on the adaptive immune response to Ebola and Marburg viruses in humans, non-human primates, and non-primate mammals, shows that long-term immunity is conferred by both T cell and antibody responses. Mostly CD8+ T cells were elicited and helpful against Ebola in mice[3],[4], while SUDV infection in humans[5]) and MARV infection in cynomolgus monkeys[6] and humans[7] ) elicited mostly CD4+ T cells . In most human EBOV infections, CD8+ T cells against the EBOV NP protein dominated the responses, while a minority of individuals harbored memory CD8+ T cells against the EBOV-GP [8].

      Consistent with this, in MARV-infected bats, CD4 expression (specific to CD4+ T cells) was higher, while in EBOV-infected bats, CD8 expression (specific to CD8+ T cells) was higher, the overall levels are low, because the tissue samples are heterogenous and expression of these markers is not high in the T cells to begin with. T cell markers (such as CCL3, ANAX1, TIMD4 and MAGT1) are also upregulated in liver, suggesting a T cell response is mounted.

      Mock infected IHC should be included in Figure 1F to demonstrate the antibodies are not background.

      We have added IHC data of two mock-infected animals (Fig. S1 panels A and B).

      See comment in hypotheses- a summarized table of findings from previous studies of early responses to the virus would be helpful for comparisons to the bat response and for determining the second hypothesis.

      We have expanded our comparisons to previous studies by adding the following text to Introduction (page 3)

      A potential source of the difficulty to understand how bats tolerate or eliminate the viruses that are deadly to humans is the lack of studies that analyze the response to infection in bats rather than in cultured bat cells. The results obtained using cell lines have been contradictory. Some studies claim both EBOV and MARV replicate to similar levels in ERB and human derived cell lines[9], with a robust innate immune response mounted by ERB and to a lesser degree, human cells, while others claim MARV inhibited the antiviral program in ERB cells, like in primate cells, and did not induce almost any IFN gene [10], or little anti-viral gene induction[11]. An experiment with the pig (PK15A) and bat (EhKiT) cells suggested they responded to EBOV through the upregulation of immune, inflammatory, and coagulation pathway, in contrast to a limited response in the human (HEK293T) cells[12]. To comprehensively understand the pathways involved in the bat filoviral response, we infected bats, rather than their isolated cells, and analyzed tissue-specific RNA expression through mRNA-seq in the organs of the infected animals.

      Reviewer #2

      1. The authors provide this contribution to the extremely interesting topic of the immunobiology that facilitates filovirus infections of bats without overt pathology. They focused entirely on gene transcription signatures from different tissue sites following experimental infection, and sometimes compare those signatures with those generated in humans following natural exposures to filoviruses. The strengths of the paper is the shear breadth of data generated that is available openly to the scientific community and the development of novel mRNA datasets from bats, in the absence and presence of infection. One of the major limitations of this systems-based approach is that there is no mechanistic data that links gene function to the immune response to filovirus infection. Rather, associations are made and functional links are inferred. This limitation makes the title of the manuscript "...is controlled by a systemic response" an overstatement.

      We thank the reviewer and agree that mechanistic studies were out of scope of this study and have reflected this fact in the title by replacing “is controlled” with “induces”:

      Ebola and Marburg filovirus infection in bats induces a systemic response

      The authors indicate that one of their main objectives is to understand differences in the responses to infection between bats and humans. But this submission says little about the transcriptome-level responses to filovirus infection in humans. It does, on at least one occasion, state that some of the bat genes with altered expression levels were also altered in a study of human filovirus infections (reference #67). I think it would be helpful if the authors devoted a figure or table to the direct comparison between their analysis of MARV- and EBOV-infected bats and the findings of filovirus-infected humans, highlighting genes that are differentially up- or downregulated between the two species.

      This discussion, which was also requested by Reviewer 1, is now included in the manuscript (Discussion page 9 and Tables 2 and 3).

      Figure 2 is not described nor presented usefully. Instead of providing a figure title ""Upset plot..." the authors should clearly describe the type of transcriptomic data being presented. Moreover, it way the data is plotted does not reveal any direct information about the genes that are up- or downregulated in each condition, thus reducing its utility to the reader. I suggest that this Figure be placed in the Supplemental information. In fact, Figures 3 could also be moved to the Supplemental information

      Figure 2 makes that point that the response is a broad one while Figure 3 presents evidence from expression data that there is tissue-specific responses to the viruses. Both together provide convincing evidence of a systemic, wide-ranging response to both MARV and EBOV infections. We have edited the caption to Figure 2 by changing it to the following:

      Figure 2: Broad response of bat liver genes to filoviral infection. Many genes in the liver respond to filoviral infections, with MARV having a bigger impact compared to EBOV (840 genes that are responsive to MARV alone, compared to the 43 specific to EBOV alone). The EBOV-specific (EBOV/MARV) and MARV-specific (MARV/EBOV)genes are likely host responses specific to the viral VP40, VP35 and VP24 genes. In the plot, mock refers to mock-infected bats, EBOV to EBOV-infected bats, and MARV to MARV-infected bat livers. Each row in the lower panel represents a set, there are six sets of genes based on various comparisons, e.g., EBOV/mock is the set of genes at least 2-fold up regulated in EBOV infection, compared to the mock samples. The gray bars at the lower left representing membership in the sets. The vertical blue lines with bulbs represent set intersections, e.g., the last bar is the set of genes common to EBOV/MARV, EBOV/mock and MARV/mock, so the genes in this set are up 2-fold in EBOV compared to the mock and MARV samples, and at least 2-fold up in MARV compared to mock. The main bar plot (top) is number of genes unique to that intersection, so the total belonging to a set, say mock/EBOV, is a sum of the numbers in all sets that have mock/EBOV as a member (41+203+6+31=281).

      The authors do not specify in the main text, figure captions, or methods sections how they objectively assigned bat homologs as being "similar to " or "divergent from" their human counterparts. What is the cut-off in terms of sequence similarity?

      We apologize for this omission. In addition to a description in Methods, we have added the following statement to the Results section (Page 4).

      To identify divergent genes, we relied on BLASTn[13]. Genes detected as homologues (16004, 87% out of 18443 genes in our databse) using BLASTn default settings were labelled “similar”. The remaining 2439 genes (13%) were considered “divergent”. Of these genes, 1,548 transcripts (8% of the total), could be identified as homologous by reducing the word-size in BLASTn from 11, the default, to 9. This approach is equivalent to matching at the protein level, but we find that using nucleotide level matches provides a cleaner separation of the two classes than using translated proteins (Fig. 4, Methods).

      In the Discussion, it is surprising that the authors state that "the majority of interferon response genes are not divergent from human homologs" since genes involved in innate immunity are some of the most rapidly evolving genes known to exist. Again, clarification over what dictates "divergence" over "similarity" is warranted. Many previous studies have shown how a single residue change in an innate immune effector can drastically alter its specificity and/or potency.

      We have clarified this point by adding the following statement in the Discussion (pages 8,9)

      There are hundreds of genes involved in the interferon response, some key components can mutate to change specificity of their interactions, but most, especially those in the core ISG category[14], evolve slowly and have conserved function and sequence[15]. Our analysis of gene divergence shows that the majority of interferon response genes are not divergent from their human homologs, consistent with prior observations that the innate responses are quite similar between human and bat cell lines[9]. This implies that other systems are involved in generating the difference in response between bats and humans.

      The authors state in the introduction, and point to citation #21, that ERBs are "refractory to infection." In Figure 1, the authors indicate that experimental of ERBs with EBOV led to detectable infection in some animals, particularly in the liver. At this point in the manuscript, the authors should state if and how this result differs from what is published in #21, and they should comment on whether this is scientifically significant, or not. This is eventually discussed briefly in the Discussion but adding a sentence to Results section would be helpful for readers.

      To emphasize that our results contradict prior reports of ERB being refractory to EBOV infection, we have modified the statement in the Results (page 3) as follows.

      Two of the three EBOV-inoculated animals presented with histopathological lesions in the liver, consisting of pigmented and unpigmented infiltrates of aggregated mononuclear cells compressing adjacent tissue structures, and eosinophilic nuclear and cytoplasmic inclusions, changes consistent with previous reports[16], [17]. In EBOV-infected animals, focal immunostaining with both pan-filovirus and EBOV-VP40 antibodies was observed in the liver of one animal, but very few foci were found, suggesting limited viral replication.

      The research question at hand, concerning how bats serve as reservoirs for multiple viruses which are pathogenic to humans without succumbing to disease, is one of the hottest topics in immunology and virology. However, the authors do not provide a clear enough explanation of how their approach to study the transcriptome response following filovirus infection goes beyond what has been published in previous studies. This manuscript would greatly benefit from a discussion of its novelty in the Introduction and Discussion sections.

      We have reviewed prior human and bat studies (Introduction -page 3 and Discussion- page 9 shown above) to highlight the novelty of our findings. We have also added the following sentence at the end of the Introduction highlighting the novelty of the study.

      This is the first in vivo study that focuses on the coordinated transcriptional response to filoviruses at the level of individual organs in bats.

      References

      [1] X. Liu et al., “Transcriptomic signatures differentiate survival from fatal outcomes in humans infected with Ebola virus,” Genome Biology, vol. 18, no. 1, p. 4, Jan. 2017, doi: 10.1186/s13059-016-1137-3.

      [2] A. K. McElroy et al., “Ebola hemorrhagic Fever: novel biomarker correlates of clinical outcome,” J. Infect. Dis., vol. 210, no. 4, pp. 558–566, Aug. 2014, doi: 10.1093/infdis/jiu088.

      [3] S. B. Bradfute, K. L. Warfield, and S. Bavari, “Functional CD8+ T cell responses in lethal Ebola virus infection,” J. Immunol., vol. 180, no. 6, pp. 4058–4066, Mar. 2008, doi: 10.4049/jimmunol.180.6.4058.

      [4] M. N. Rahim et al., “Complete protection of the BALB/c and C57BL/6J mice against Ebola and Marburg virus lethal challenges by pan-filovirus T-cell epigraph vaccine,” PLOS Pathogens, vol. 15, no. 2, p. e1007564, Feb. 2019, doi: 10.1371/journal.ppat.1007564.

      [5] A. Sobarzo et al., “Multiple viral proteins and immune response pathways act to generate robust long-term immunity in Sudan virus survivors,” EBioMedicine, vol. 46, pp. 215–226, Aug. 2019, doi: 10.1016/j.ebiom.2019.07.021.

      [6] L. Fernando et al., “Immune Response to Marburg Virus Angola Infection in Nonhuman Primates,” J Infect Dis, vol. 212, no. suppl_2, pp. S234–S241, Oct. 2015, doi: 10.1093/infdis/jiv095.

      [7] S. W. Stonier et al., “Marburg virus survivor immune responses are Th1 skewed with limited neutralizing antibody responses,” J. Exp. Med., vol. 214, no. 9, pp. 2563–2572, Sep. 2017, doi: 10.1084/jem.20170161.

      [8] S. Sakabe et al., “Analysis of CD8+ T cell response during the 2013–2016 Ebola epidemic in West Africa,” PNAS, vol. 115, no. 32, pp. E7578–E7586, Aug. 2018, doi: 10.1073/pnas.1806200115.

      [9] I. V. Kuzmin et al., “Innate Immune Responses of Bat and Human Cells to Filoviruses: Commonalities and Distinctions,” J. Virol., vol. 91, no. 8, Apr. 2017, doi: 10.1128/JVI.02471-16.

      [10] C. E. Arnold et al., “Transcriptomics Reveal Antiviral Gene Induction in the Egyptian Rousette Bat Is Antagonized In Vitro by Marburg Virus Infection,” Viruses, vol. 10, no. 11, 02 2018, doi: 10.3390/v10110607.

      [11] M. Hölzer et al., “Differential transcriptional responses to Ebola and Marburg virus infection in bat and human cells,” Scientific Reports, vol. 6, p. 34589, Oct. 2016, doi: 10.1038/srep34589.

      [12] J. W. Wynne et al., “Comparative Transcriptomics Highlights the Role of the Activator Protein 1 Transcription Factor in the Host Response to Ebolavirus,” Journal of Virology, vol. 91, no. 23, Dec. 2017, doi: 10.1128/JVI.01174-17.

      [13] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., vol. 215, no. 3, pp. 403–410, Oct. 1990, doi: 10.1016/S0022-2836(05)80360-2.

      [14] A. E. Shaw et al., “Fundamental properties of the mammalian innate immune system revealed by multispecies comparison of type I interferon responses,” PLOS Biology, vol. 15, no. 12, p. e2004086, Dec. 2017, doi: 10.1371/journal.pbio.2004086.

      [15] T. B. Sackton, B. P. Lazzaro, T. A. Schlenke, J. D. Evans, D. Hultmark, and A. G. Clark, “Dynamic evolution of the innate immune system in Drosophila,” Nat. Genet., vol. 39, no. 12, pp. 1461–1468, Dec. 2007, doi: 10.1038/ng.2007.60.

      [16] M. E. B. Jones et al., “Experimental Inoculation of Egyptian Rousette Bats (Rousettus aegyptiacus) with Viruses of the Ebolavirus and Marburgvirus Genera,” Viruses, vol. 7, no. 7, pp. 3420–3442, Jun. 2015, doi: 10.3390/v7072779.

      [17] J. T. Paweska, N. Storm, A. A. Grobbelaar, W. Markotter, A. Kemp, and P. Jansen van Vuren, “Experimental Inoculation of Egyptian Fruit Bats (Rousettus aegyptiacus) with Ebola Virus,” Viruses, vol. 8, no. 2, Jan. 2016, doi: 10.3390/v8020029.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors provide this contribution to the extremely interesting topic of the immunobiology that facilitates filovirus infections of bats without overt pathology. They focused entirely on gene transcription signatures from different tissue sites following experimental infection, and sometimes compare those signatures with those generated in humans following natural exposures to filoviruses. The strengths of the paper is the shear breadth of data generated that is available openly to the scientific community and the development of novel mRNA datasets from bats, in the absence and presence of infection. One of the major limitations of this systems-based approach is that there is no mechanistic data that links gene function to the immune response to filovirus infection. Rather, associations are made and functional links are inferred. This limitation makes the title of the manuscript "...is controlled by a systemic response" an overstatement.

      Major points:

      The authors indicate that one of their main objectives is to understand differences in the responses to infection between bats and humans. But this submission says little about the transcriptome-level responses to filovirus infection in humans. It does, on at least one occasion, state that some of the bat genes with altered expression levels were also altered in a study of human filovirus infections (reference #67). I think it would be helpful if the authors devoted a figure or table to the direct comparison between their analysis of MARV- and EBOV-infected bats and the findings of filovirus-infected humans, highlighting genes that are differentially up- or downregulated between the two species.

      Figure 2 is not described nor presented usefully. Instead of providing a figure title ""Upset plot..." the authors should clearly describe the type of transcriptomic data being presented. Moreover, it way the data is plotted does not reveal any direct information about the genes that are up- or downregulated in each condition, thus reducing its utility to the reader. I suggest that this Figure be placed in the Supplemental information. In fact, Figures 3 could also be moved to the Supplemental information.

      The authors do not specify in the main text, figure captions, or methods sections how they objectively assigned bat homologs as being "similar to " or "divergent from" their human counterparts. What is the cut-off in terms of sequence similarity?

      In the Discussion, it is surprising that the authors state that "the majority of interferon response genes are not divergent from human homologs" since genes involved in innate immunity are some of the most rapidly evolving genes known to exist. Again, clarification over what dictates "divergence" over "similarity" is warranted. Many previous studies have shown how a single residue change in an innate immune effector can drastically alter its specificity and/or potency.

      Minor points:

      The authors state in the introduction, and point to citation #21, that ERBs are "refractory to infection." In Figure 1, the authors indicate that experimental of ERBs with EBOV led to detectable infection in some animals, particularly in the liver. At this point in the manuscript, the authors should state if and how this result differs from what is published in #21, and they should comment on whether this is scientifically significant, or not. This is eventually discussed briefly in the Discussion but adding a sentence to Results section would be helpful for readers.

      Significance

      The research question at hand, concerning how bats serve as reservoirs for multiple viruses which are pathogenic to humans without succumbing to disease, is one of the hottest topics in immunology and virology. However, the authors do not provide a clear enough explanation of how their approach to study the transcriptome response following filovirus infection goes beyond what has been published in previous studies. This manuscript would greatly benefit from a discussion of its novelty in the Introduction and Discussion sections.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The manuscript by Jayaprakash et al investigates the response to the filoviruses Marburg and Ebola virus in Rousettus aegyptiacus bats, the natural reservoir of Marburg virus. The response to infection is investigated by comparing transcriptomes of different bat tissues in infected and uninfected bats. The manuscript groups the observed transcriptome changes into pathways that are impacted, and discusses how those pathways may cause subclinical infection in bats, compared to severe disease in humans. The data included also sheds light on bat immunology and reservoir characteristics more generally, which is particularly timely during the SARS-CoV-2 pandemic.

      Major comments:

      Are the key conclusions convincing?

      The first hypothesis of the manuscript is that, rather than a change in a single immune pathway being responsible for the lack of response to the virus, the response will be systemic involving multiple inter-related pathways. The data show that this was the case after presenting convincing transcriptome analysis. The second hypothesis is that the differences in responses between bats and humans are due to evolutionarily divergent genes. The authors provide evidence for this in the transcriptome differences in the C-reactive protein, aspects of the complement system, iron regulation and M1/M2 macrophage polarization. The second hypothesis is broad, but there are clearly differences in the genes involved in humans and bats. Without mechanistic information on the function of the proteins/cells investigated, it is hard to determine that the changes the authors are observing are the cause of the different responses, rather than an effect of some upstream response, and so difficult to pin-point specific divergent genes. The authors wish to compare the response to the virus in bats to the better characterized human tissue responses, but because this relies on previously published work in humans, it is sometimes unclear whether "more bat-like" responses are definitely associated with positive outcomes in humans. As the benefit of certain responses in human infections can depend on the timing of the response, it might be helpful to include summarized human data in manuscript to aid comparison with the bat responses. For instance, the T-cell response section concludes "Bats mount a T cell response against the infection" but there is no discussion of the impaired but complex lymphocyte response in humans, so comparison is not possible.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      No, speculative discussion of potential drugs is already qualified as speculative, and adds to the understanding of the significance of the data.

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      No

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      N/A

      Are the data and the methods presented in such a way that they can be reproduced?

      Yes

      Are the experiments adequately replicated and statistical analysis adequate?

      Yes

      Minor comments:

      Specific experimental issues that are easily addressable.

      Mock infected IHC should be included in Figure 1F to demonstrate the antibodies are not background.

      Are prior studies referenced appropriately?

      Mostly yes. The discussion of the T-cell responses in infection could be expanded to include more information on human responses

      Are the text and figures clear and accurate?

      Yes

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      See comment in hypotheses- a summarized table of findings from previous studies of early responses to the virus would be helpful for comparisons to the bat response and for determining the second hypothesis.

      Significance

      Nature and Significance of the advance.

      Bat immune responses to filoviruses are poorly characterized, and this paper contains much information that can aid future investigation of reservoir responses. This data also has broad application to other bat-borne pathogens.

      Compare to existing published knowledge.

      There is little about in vivo bat immune response to filoviral infections. Significantly, this report has a non-refractory response to Ebola virus infection in Rousettus aegyptiacus.

      Audience

      This paper would be of interest to filovirologists and those interested in zoonotics and bat immunology.

      Your expertise.

      I am a viral immunologist with >15 years' experience with filoviruses. Ms. Clarke is a senior graduate student whose thesis focuses on immune responses to filovirus glycoproteins.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      INITIAL RESPONSE TO REVIEWERS / REVISION PLAN

      We are grateful to the three reviewers for reviewing our manuscript and providing their comments which helped to improve further the quality of the current study. We attach an initial revised version of the manuscript with changes corresponding to reviewers’ comments being highlighted. We now provide:

      • 18 new main figure panels (Fig.1E, Figs.2D-F, Figs.3E-F, Figs.4B,C,E, Figs.6B-F, Figs.7B,D,E,F),
      • 9 new supplementary figures, and
      • 13 new supplementary tables, that correspond to the points raised by the reviewers. In this initial response to reviewers and revision plan we have already performed the bioinformatics analysis and the majority of new wet lab experiments requested by the reviewers, while we are still awaiting only for the results of three sets of wet lab experiments (RIP-seq, additional protein/RT-qPCR confirmations and B2 incubations with other proteins), which, due to their nature, take longer. We have also revised the main text accordingly with only a number of updates (regarding some methods of experiments currently in progress and the respective discussion) still missing.

      In detail:

      REVIEWER 1

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      B2 RNAs, encoded from SINE B2 elements has been directly implicated in stress response by its inherent ability to bind RNA Pol II and suppress stress response genes (SRG) in homeostatic conditions. However, upon stimuli, B2 RNAs are cleaved and degraded, resulting in the release of RNA pol II and upregulation of SRGs. Previous work from the senior author identified PRC2 component EZH2 to be the B2 RNA processing factor, cleaving B2, and releasing POL2. SRGs are upregulated upon stress, for example in age-associated neuropathologies like Alzheimer's disease (AD). Considering that the hippocampus is a primary target of amyloid pathologies as well as since SRGs are suggested to be key for the function of a healthy hippocampus, the authors set to understand the role of B2 RNAs that are linked to SRG regulation in the mouse hippocampus with amyloid pathology. They use disease-relevant in vivo and in vitro models combined with unbiased RNA seq data analysis for this endeavor, which indicates the potential relevance of B2 RNAs in APP mediated neuronal pathologies in mice as well as identifies Hsf1 as the factor cleaving B2 RNAs in the hippocampus.

      This reviewer generally remarks that “The work is interesting and identification of Hsf1 as the processing factor for B2 RNAs in the hippocampus is significant. I would like to credit the authors for their elegant in vivo experimental design in Figure 2.”

      We appreciate the encouraging comments made by this reviewer.

      General comment: The reviewer finds “some of the conclusions to be overstated” and has brought a number of concerns to our attention. Indeed, we agree that provision of additional data and details is needed to avoid any confusion about the gene pathways to which our findings apply. In the initial manuscript, (Figures 2 D, F and 6 D, F), we presented the gene expression levels of all B2 RNA regulated SRGs identified in our previous study (Zovoilis et al, Cell 2016), referred as B2 RNA regulated SRGs or B2-SRGs throughout the manuscript. To this end, we performed the respective statistical tests between the different conditions considering these genes, in order to show the transcription dynamics of these genes in either amyloid beta pathology (APP mice /Figs. 2D, F) or amyloid beta toxicity (HT22 cells / Figs. 6D, F). Since we were not looking for new candidate genes upregulated in APP mice or in our HT22 cell culture system, we did not narrow our analysis only to genes delivered by a general-purpose differential gene expression approach such as DESeq but tested all B2-SRGs. However, based on the reviewer’s comments below, we realize that the paper would benefit by presenting in the main figures only those B2 RNA regulated SRGs that overlap with differentially expressed genes identified by DEseq in each experimental system. This will help to avoid confusion and any misunderstanding that all B2 RNA regulated genes are equally affected in our system, which is not the case and would be an overstatement. We are now presenting in new Figure 2 (2E, 2F) only those B2-SRGs that overlap with upregulated genes identified by DESeq in 6m old APP mice (listed in new Suppl. Table 5) and in new Figure 7 (7D, F) we are now presenting only those B2-SRGs that overlap with upregulated genes identified by DESeq in HT22 cells treated with amyloid beta (listed in new Suppl. Table 11). The conclusions drawn by the new figures remain the same as with the old ones and we believe that this new way of presentation of this data will prevent confusion and potential over-statements. We thank the reviewer for bringing this to our attention. Based also on this reviewer’s minor point 3, we recommend that the old figures that included all B2-SRGs (and not only the differentially expressed ones identified by DESeq) are moved to the Supplement as new Supplementary Figures 1 and 7, respectively, so that readers can still get a view of all the data and the transcription dynamics of all B2-SRGs, while we provide both in text and the supplement an explanation about the value as well as limitations of these figures.

      **Major comments:**

      Major point 1. The reviewer asks: “In figure 1, the authors indicate a strong connection between B2 RNA regulated SRGs and learning and memory. In figure 2, they identify the SRGs in the hippocampus, please provide a direct comparison of learning and memory associated SRGs and the SRGs they identify in figure 2 that are significantly upregulated in APP mice in 6 months.”

      In the revised version of the manuscript we now provide: i) As a new figure panel (lower panel in new Fig.1E), the number of B2 RNA regulated SRGs that are associated with learning based on our Peleg et al, Science 2010 paper and as a new Supplementary Table 3, the exact list of these genes. ii) As a new Supplementary Table 4, the list of all genes that are significantly upregulated in APP mice (6 months). iii) As a new Supplementary Table 5, the list of those genes upregulated in amyloid pathology (APP 6 months) that are B2-SRGs (expression levels of these genes are presented in new Figure 2E,F). Per reviewer’s question, we now provide as a new Supplementary Table 6, the list of B2 RNA regulated SRGs that are both learning associated genes and upregulated in 6 month old APP mice. In the text (first two sections of the results), we provide direct comparisons of the number of genes in each category and their overlap.

      Major point 2. The reviewer asks: “To better understand the data in the context of hippocampal function, please include functional annotation of SRGs they identified in Figure 2F as they do it in Figure 1 (desirably for each time point, at least for 6M). How many of the SRGs they identify in Figure 1 are part of Figure 2F? Please include functional annotation of significantly upregulated B2 regulated SRGs in Fig2 and compare them with that of Figure 1.”

      The number of B2 RNA regulated SRGs in Figure 1 that are part of Figure 2 (in particular Figs.2E,F) is now presented in the new Supplementary Table 5 and also in the text. We now provide as a new Supplementary Table 7 the functional annotation of these genes (see also general comment for this reviewer) and discuss the findings in the text.

      We recommend to include only the 6M old mice as this is the time point in which B2 RNA processing was found to differ between WT and APP mice. However, if the reviewer thinks that this is necessary we will add also differential expression lists of other ages as additional supplementary tables.

      Major point 3. The reviewer asks: “In figure 3, the authors report that the B2 processing rates are high at the 6M time point at in hippocampi of the APP mice. Please include the levels of unprocessed and processed B2 RNAs in these samples along with this figure, without which it is difficult to gauge the significance of its correlation with SRGs in Figure 2.”

      We now provide as new figure panels 3E and 3F the levels of processed B2 RNA fragments and unprocessed (full length) B2 RNAs in these samples, respectively, along with the processing ratio which is now labeled as subfigure 3G.

      Major point 4. The reviewer asks: “What is the % of B2 regulated SRGs that are hsf1 bound in Figure 4C? What is there dynamics in the wild type and APP hippocampi?”.

      Old Figure 4C is now Figure 4A. The exact number of B2 RNA regulated SRGs that are close to Hsf1 binding sites is now presented as a new figure (Figure 4C) and discussed in the text. A list of these genes is provided as new Supplementary Table 8. For genes that are upregulated in APP mice compared to wild type, the difference in Hsf1 binding dynamics between B2 RNA regulated and not regulated genes is now presented as Suppl. Figure 4D.

      Major point 5. The reviewer asks: “What is the distribution of Hsf1 binding sites on (a) non-B2 regulated SRGs and (b) non-SRG genes in hippocampi?”.

      This point is related with point 4. We now present a new panel (Fig. 4B) for non B2 RNA regulated genes (listed in Suppl. Table 13) along with the distribution we have in the initial manuscript for all B2 RNA regulated SRGs (now presented as Fig. 4A). The direct comparison of these genes is presented in the new Suppl Figure 4C together with a similar comparison only for genes upregulated in APP mice (Suppl. Fig.4D)

      Major point 6. The reviewer notes: “In Figure 4D, the 3months old Wt HSF1 levels are high, yet B2 processing (Figure 3E) is low. Please comment.”

      The reviewer’s comment made us realize that we should include a plot that describes the correlation between Hsf1 levels and B2 RNA processing ration across all sequenced samples. This should reveal whether differences such as those observed by the reviewer affect our conclusion regarding the relationship between these two parameters. We now provide this in the new Supplementary Figure 6D, where we found a strong positive correlation between Hsf1 levels and B2 RNA processing ratio. We thank the reviewer for this comment which helped us to substantiate further this relationship.

      Major point 7. The reviewer notes: While the authors show in vitro cleavage of B2 RNA by Hsf1, the experiment lacks controls to be conclusive. At least, please include a similar size protein as HSF1 with no-known RNA binding activity and a similar size protein with RNA binding activity as controls in 5A. Please justify the use of PNK as the control protein. Please include the use domain-based deletions of Hsf1 to map the region of HSF1 that is binding and potentially cleaving the B2 RNA. Please include an RNA of similar size and Antisense-B2 RNA to show the specificity of the Hsf1 based cleavage of B2 RNA. Without these controls, the conclusions in Figure 5 cannot be substantiated.

      The endogenous ribozyme activity of B2 RNA compared to other control RNAs has already been shown in two previous works but we will also include the relative controls here by providing control incubations with other RNAs. We will also include the incubations with additional control proteins as suggested by the reviewer. We are currently performing these experiments and will include them in the revised version. PNK is used as a control protein because it is an RNA binding protein that is used in the construction of our short RNA libraries and we wanted show that short RNA seq data are free of such confounding factors that could potentially generate artificial fragments. We now include this information in the text.

      We feel that the application of domain based deletions for Hsf1, while it would add additional information on the exact biochemistry underlying B2 RNA processing though Hsf1, is beyond the scope of this manuscript. In the current manuscript we are just focusing on the fact that Hsf1 can accelerate B2 RNA processing in vitro and not on the mechanism how this happens. This should be addressed in our opinion on a separate manuscript.

      Major point 8. The reviewer asks: “The authors should show that the incubated APP peptides are taken up by the cells (experiments in Figure 5F and Figure 6).” These figures are now labelled as Fig.6C and Figure 7, respectively. That’s a very interesting point and we thank the reviewer for this comment. Multiple studies have shown that toxicity after incubation by amyloid beta is mediated mainly by cell surface receptors, which through cell signalling leads to the response to cellular toxicity that induces stress genes such as Hsf1. Nevertheless, APP peptides may enter the cell, and the reviewer’s questions raised the possibility that oligomers entering the cell could have a direct impact on the stability of the B2 RNA. In that case, providing evidence that the amyloid enters the cell would be important if we had indications that amyloid beta interacts directly with B2 RNA. We did test this and we found no direct effect of amyloid beta on B2 RNA, so the processing in our case is not induced by oligomers that may have entered the cell. We were planning to present this information in a different manuscript, but if the reviewer or editor thinks that it would be beneficial for the paper, we could present this as supplement figure that shows that amyloid beta incubations with B2 RNA do not induce further processing beyond what Hsf1 causes. For the moment we just present this below:

      Major point 9. The reviewer asks: “Please provide the list, functional annotation, and % of the SRGs upregulated upon incubation with APP in HT22 cells in comparison to 6month old APP mice. Comment on learning-related Genes.”

      In the revised version, we now provide and mention in the text the following data: i) a list of genes upregulated in HT22 cells during amyloid toxicity upon incubation with amyloid beta (new Suppl. Table 9), ii) a list of genes according to point (i) that are common with genes upregulated in APP mice (new Suppl. Table 10), iii) the list and number of B2-SRGs that are upregulated in HT22 cells during amyloid toxicity (the reviewer’s question) (new Suppl. Table 10). We mention in the text the gene numbers and also the genes that are common in all three lists. iv) Functional annotation of genes of point (iii) (new Suppl. Table 12),

      We also mention in the text the limitations of our comparisons between the in vivo model of amyloid pathology (APP mice) and the in vitro cell culture model of amyloid toxicity (HT 22 cells) and we clarify that the cell culture model is used just as a simulation of the effect of amyloid beta in gene pathways associated with response to cellular stress and the role of Hsf1 on B2 RNA processing.

      Major point 10. The reviewer asks: “The authors should show the efficient downregulation of Hsf1 (protein) upon anti-Hsf1 LNA transfection.”

      In the revised version, in addition to the RNA-seq data we provide a second confirmation at the mRNA level with an independent method (RT-qPCR) in new figures 4E and 7B (lower panel). We are currently performing the protein extractions and will provide a WB or an Elisa in the revised version.

      Major point 11. The reviewer asks: “Please present the total B2 RNA levels for conditions in Figure 6C.”

      We now provide as new supplementary figure (Suppl. Fig. 6B and C) the levels of processed B2 RNA fragments and the total levels of unprocessed full length B2 RNAs of these samples that relate to old Figure 6C (now labeled as Fig.7C)

      Major point 12. The reviewer notes: “Hsf1 levels are not significantly downregulated in Control cells which were inoculated with the reverse APP peptide. Please comment.”

      We assume that the reviewer here refers to the lack of reduction in Hsf1 levels in the cells inoculated with the reverse peptide and the anti-Hsf1 LNA. Indeed, this lack of reduction is confirmed also by the new qPCR we performed (new Figure 7B, lower panel, R-ctrl vs R-anti-Hsf1). This should likely be attributed to compensation during non-stress conditions. In contrast, under stress conditions, Hsf1 is heavily used in stress response, which could explain the differences we see as cellular needs surpass the available Hsf1 transcripts due to degradation by the LNA. This is also supported by the new RT-qPCR experiments we have performed for B2-SRGs (new Figure 7E). In agreement with what is known for stress response genes such as immediately early genes (for example FosB), levels of these genes are minimal in both R-ctrl and R-anti-Hsf1 conditions and only become activated during stress response. We now discuss this in the text of the revised manuscript.

      Major point 13. The reviewer asks: “Please compare and contrast the % of genes, the overlap, and the functional distinctions in 6F to that of 5G and Figure1. What are the genes that are common between Figure1, and that are specifically upregulated upon Anti-Hsf1 LNA transfection along with 1-42 APP. What is % of the occurrence of B2 binding sites in those genes? What are their functional annotations and what is their connection to learning, memory, and cell survival?”

      Old Figure 6F is now Figure 7F, while old Figure 5G is now Figure 6C. This point is discussed in the response to points 1 and 9 of this reviewer. In summary, genes upregulated in our amyloid toxicity model included 25 B2-SRGs (new Suppl. Table 11). When testing for enriched terms in these 25 genes, biological processes related with apoptosis, such as regulation of apoptotic process and programmed cell death were at the top of the list (new Suppl. Table 12) and included, among others, genes such as FosB and Mitf that have been connected with Alzheimer’s disease. Out of the 25 genes that are up-regulated in both mice and our cell culture system, six are B2-SRGs (4932438A13Rik, Fosb, Pag1, Ptprs, Sema5a, and Sgms1) and include a well-known immediate early gene (Fosb), genes associated with sensitivity to amyloid toxicity (Pag1, Sema5a, Sgms1, Fosb), as well as genes associated with p53 (Ptprs, Fosb). All these genes get upregulated in amyloid toxicity (42-Ctrl vs R-Ctrl) but are not upregulated when Hsf1 LNA is applied (42-anti-Hsf1 vs R-anti-Hsf1, no significant difference). This information is now included in the text.

      **Minor.**

      1 . Please include TPM/ FPKM values for hippocampal markers as control in Figure 2 to do justice to the hippocampus specific RNA seq conducted by the Authors.

      To our understanding, the reviewer here suggests the testing of well-known hippocampal markers in our mouse data as controls to confirm that they are indeed hippocampus specific. We have selected as reference markers, the genes employed by the Allen Brain Atlas RNA-sequencing project and we provide a comparison of their data in hippocampal cells with our data from mouse hippocampus. This is now presented as new Supplementary Figure 2.

      2 . In figure 2D the authors show that B2 RNA regulated SRGs in the 3 months' wild type mice are significantly high. P53 has been reported to be high in young wild types hippocampus, but not SRGs in my opinion. The authors should comment on this.

      Old Figure 2D is now Figure 2E. We now mention the reviewer’s comment particularly in the discussion and cite a landmark review article in Neuron journal by Michael Greenberg regarding the role of stress response genes, such as FosB, early during development. As to prevent any confusion, we have also replaced SRGs with B2-SRGs since we tested only B2-SRGS in our study.

      3 . In figure 2F, under the 6m APP condition, the replicate 3 looks substantially different from the other replicate. This can significantly impact the analysis and conclusions made. Either remove that replicate and present the analysis without it or please provide a valid explanation. To make the data more valid, please provide hierarchical clustering of the entire data, the non-B2 regulated genes and the B2 regulated SRGs.

      We now provide in the new Supplementary Figure 9C a PCA plot, which includes 6m APP mice vs. their WT counterparts and HT22 cells, and shows that this variability is within the biological replicate variability we can expect in these models. To substantiate this further, we have constructed the correlation matrix of the RNA-seq data of both WT and APP 6 month old mice in the new Supplementary Figure 9D. As shown in this matrix, all APP mice clearly correlate with each other and not with their WT counterparts.

      In the initial manuscript the heatmaps of former Figure 2 were indeed provided with hierarchical clustering of the entire data and also included non-B2 RNA regulated genes. This data is included now as Supplementary figure 2.

      In Figure 2C RNA seq data is represented in TPM while its FPKM in Figure 2D.

      Figure 2D is now Figure 2E, while Figure 2C remains labelled with the same number. Given that TPM already includes scaling of the data, it is unsuitable for the averaging of the gene expression levels of multiple genes (B2-SRGs) used in the boxplots of Figure 2. This does not apply in the case of single genes as in Fig 2C (p53) or in the heatmap where each gene is presented in a separate row. This explanation is now included in the methods section.

      Figure 2: the number of replicates in the case of 3-month-old wild types only 2. Please specifically denote it and comment why only 2 replicates are provided.

      During the hippocampal RNA extractions, the RNA of one of the three 3m old mice had very low RIN scores, which could be a confounding factor for the short-RNA-seq. As this happened some months after the hippocampal extractions, we did not have any other 3 month mice of the same cohort used for the behavioral and IHC studies. Thus, we decided to include only two replicates in this condition. Since the results presented in the current study focus mainly on 6 month old mice, we expect the impact to be minimal. We include this note in the methods section.

      4 . Considering that p53 and SRGs are significantly upregulated in 6months in the APP model, it would be great if (allowing that these samples are still available) the authors can include a staining for apoptotic markers, for example, Active Casp3 or similar. This will allow us to better gauge the gene expression changes presented by the authors especially regarding SRGs.

      Unfortunately, we do not have these slides but in the revised version we will provide qPCR data for some of these markers.

      5 . Under subheading: Hsf1 accelerates B2 RNA processing, 3rd paragraph when the authors comment on known hsf1 binding sites on SRG genes, please correct from: Increased Hsf1-binding was found.... "To the increased number of hsf1 binding sites were found", unless the authors would like to show increased Hsf1 binding by performing CHIP-seq for Hsf1 in the hippocampus at least at the 6-month time point between Wt and APP mice.

      We have changed the text accordingly.

      Reviewer #1 (Significance (Required)):

      B2 RNAs, encoded from SINE B2 elements has been directly implicated in stress response by its inherent ability to bind RNA Pol II and suppress stress response genes (SRG) in homeostatic conditions. However, upon stimuli, B2 RNAs are cleaved and degraded, resulting in the release of RNA pol II and upregulation of SRGs. Previous work from the senior author identified PRC2 component EZH2 to be the B2 RNA processing factor, cleaving B2, and releasing POL2. SRGs are upregulated upon stress, for example in age-associated neuropathologies like Alzheimer's disease (AD). Considering that the hippocampus is a primary target of amyloid pathologies as well as since SRGs are suggested to be key for the function of a healthy hippocampus, the authors set to understand the role of B2 RNAs that are linked to SRG regulation in the mouse hippocampus with amyloid pathology. They use disease-relevant in vivo and in vitro models combined with unbiased RNA seq data analysis for this endeavor, which indicates the potential relevance of B2 RNAs in APP mediated neuronal pathologies in mice as well as identifies Hsf1 as the factor cleaving B2 RNAs in the hippocampus.

      The work is interesting and identification of Hsf1 as the processing factor for B2 RNAs in the hippocampus is significant. I would like to credit the authors for their elegant in vivo experimental design in Figure 2.

      REVIEWER 2

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      This manuscript follows from previous work by the corresponding author showing that SINE-encoded B2 RNAs function as regulators of the expression of stress response genes (SRGs). Specifically, stimulus triggers the processing of repressive B2 RNAs that are bound at the SRGs, thereby activating SRG transcription. In this work, the authors investigate whether a similar mechanism might be controlling the expression of genes in models of amyloid beta neuropathology (i.e. mouse hippocampi from an amyloid precursor protein knock-in mouse model, and a cell culture model of amyloid beta toxicity). They performed RNA-seq in these models. Their data show a correlation between the progression of amyloid pathology, expression of genes thought to be regulated by B2 RNA, and the processing of B2 RNA. In addition, they show biochemical data supporting a role for Hsf1 in enhancing the processing of B2 RNA. Knockdown of Hsf1 also reduced B2 RNA processing and the expression of SRGs.

      **Major comments:**

      Major point 1. The reviewer asks: “In the RNA-seq data one cannot distinguish between Pol III transcribed B2 RNA and Pol II transcribed B2 RNA (typically embedded within introns and UTRs of mRNAs). The models they present, and the structures they show, clearly imply regulation by Pol III transcribed B2 RNA. However, there is no way to know that the short B2 RNAs they sequence aren't coming from degraded mRNAs. This needs to addressed. Minimally, in writing as a caveat of their model. Ideally, it would be addressed experimentally.”

      That’s a very interesting point, as it implies that the regulatory role of B2 RNAs may extend from PolIII transcribed B2 RNAs into B2 RNAs embedded into mRNAs (likely nascent ones) that may be also under the same endogenous ribozyme activity of this sequence, suppress PolII and are processed in response to stimuli. The RNA RIN values of our samples were pretty high except one 3m old mouse sample which was for this reason excluded from further analysis. Moreover, during the library construction shorter and longer RNAs have been separated. Thus, any generation of B2 RNA fragment that may have originated from mRNA should be biologically but not technically related and must have happened in the cell before our RNA extraction. To address this point, we now provide a new supplementary figure (Suppl. Figure 8), where we have separated the B2 elements against which we map the RNA fragments into two categories, those that fall within exonic/genic regions and those outside of these regions. Although B2 RNAs are produced by multiple copies in the genome, each copy does harbor multiple SNPs, insertions and deletions, which means that each B2 RNA fragment is mapped to a specific set of B2 elements and not to all of them. In other words, despite multiple mapping a level of spatial specificity is maintained. If the B2 RNAs we map were coming exclusively from either only Pol III B2 elements or mRNA embedded B2 elements, we would expect at least some difference in the distribution of fragments between B2 elements of these two categories, as the second one overlaps with mRNAs. As shown in the new supplementary figure 8, the fact that distribution models are very similar between the two categories indeed supports the hypothesis that both types of B2 elements may contribute to B2 RNA processing. Most importantly, the profile of B2 RNAs in genic regions shows that B2 RNA processing is not random but follows the same processing rules as B2 RNAs from Pol III promoters. Given the limitations posed by the repetitive nature of B2 RNAs, it remains difficult though to provide an exact number regarding the portion of B2 RNA fragments produced by each category and this is clearly noted in our revised discussion part. However, even the indication that B2 RNAs embedded in mRNAs may also play an important role in our model provides a new perspective that should be investigated further in future studies.

      Major point 2. The reviewer asks: “The direct regulation of SRGs by B2 RNA was not shown in their model systems for amyloid beta neuropathology. Rather, the authors' used the genes identified in their prior studies as B2 RNA-regulated, which I believe were in the NIH3T3 cell line. Given that transcription is highly cell-type specific, these genes might not be regulated by B2 RNA in mouse hippocampi or their cell culture model, despite the correlations shown. This needs to be addressed. Ideally, a targeted approach to show that transcription of even a couple genes in their system is indeed regulated by B2 RNA would provide stronger support for their conclusions.”

      We agree with the reviewer and we now provide a new figure (Fig.6D-F) with the targeted approach that this reviewer proposed. In particular, we have tested whether fragmentation of full length B2 RNAs is in connection with activation of target genes also in our biological system (HT22 cells) as it did in NIH/3T3 cells in our Cell paper. We now show in new Figure 6 that this is indeed the case.

      Major point 3. The reviewer proposes a number of additional information that needs to be provided: “The following bioinformatics analyses would strengthen their conclusions. This should be straightforward to do because it involves data they already have, and perhaps analyses they have already have performed.”

      a. Regarding the plot in Figure 3A (lower panel). The same plot should be shown for the 3m old and the 12m old APP mice (i.e. not just the 6m data). This would show the specificity of processing B2 RNA and that it indeed correlates with disease progression.

      We now provide this plot as new supplementary figure (Suppl. Figure 3). It shows that increased B2 RNA processing coincides only with the active neurodegeneration phase at 6 months and not the terminal stage.

      b. Regarding the plots of B2 RNA processing rate. This value could increase either due to more short RNAs or less full length RNA. Which is it for the 3m, 6m, and 12m APP mice? Showing the short and long B2 RNAs as boxplots (as opposed to only the processing rate) would address this and also provide additional insight into the regulation involved. The same applies to the data in Figure 6. (As an aside... do the authors mean processing ratio as opposed to rate? I'm not clear where the time component is coming into play to call this a rate.)

      Old Figure 6 is now Figure 7. We now provide all these figures that show that increase in processing ratio at 6 months is mainly due to increase in the processed fragments and not a decrease in full length B2 RNAs. For APP mice these are new Figures 3E and F, and for HT22 cells , these are new Supp. Figures 6B and C.

      c. The random genes in Figures 2E and 6E are plotted as heat maps, but statistical significance is hard to see. What do boxplots of the random genes look like, and is the significant difference between 6m old APP and 6m old WT then lost?

      Old Figure 2E is now new Suppl. Figure 1C, while old Figure 6E is now new Suppl. Figure 7C. We now provide these boxplots in new supplementary figures 1B and 7B.

      Major point 4. The reviewer comments: “ It is interesting that B2 RNA self-processing is enhanced by both Ezh2 and also Hsf1. It would strengthen the data to perform a control with a protein prepared more similarly to the Hsf1 (rather than PNK) to confirm that the enhanced B2 RNA breakdown is indeed attributable to Hsf1 and not a contaminant in the protein prep. Similarly, the authors should provide information on which RNA was added as the negative control for Hsf1-stimulated breakdown (i.e. the ~80 nt RNA).”

      This point is also discussed in Reviewer 1 point 7. The ribozyme endogenous activity of B2 RNA has been shown already in two previous studies that performed incubations with control RNAs and proteins. We are currently preparing and will provide these additional incubations as anew supplementary figure in the revised manuscript.

      **Minor comments:**

      1 . Regarding the GO analyses in Figure 1 (panels B, C, and D). I wasn't clear whether the authors are showing all statistically enriched terms, or only those relevant to neuronal processes and learning. I recommend showing a supplemental table with all terms that have an adjusted p value below a specified cut-off (e.g. 0.05).

      The statistical threshold used was an EASE score of 0.05 and all presented terms were above this threshold. In the initial manuscript we filtered only the top 5 terms in tissue enrichment and the top 10 terms for GO Biol process and Cell Compartment that had passed the threshold. We now provide all the terms that passed the threshold as a new Supplementary Table 2, including gene counts, exact gene numbers and related statistics.

      2 . The authors show several figures that are not new data (2B, 4A, 4B, Suppl. Fig 1 and 2). I think it would be more clear if these data were summarized and referenced in the results, rather than shown.

      Old Suppl. Fig1 and 2 that were results of previous studies or web resources directly available (such as Human Protein Atlas) have been now removed and they are now just referenced in the text. Old Figures 4A and 4B have been removed from the main figures but may be helpful to the readers if they are still available in the Supplement (currently as Suppl. Figure 4A and B), as not all users are familiar with the RNA-seq browsing tools of Allen Brain Atlas resources. Regarding figure 2B that contains data from our previous study on this exact cohort of mice: If the reviewer and the editor agree we recommend that it remains in the main figure (with the appropriate image credit citations), as it provides in an efficient way the clear connection between amyloid load and our results at the molecular level, and, most importantly, it clearly draws a line in amyloid pathology progression between 3m old and 6m old, that agrees with our findings in the RNA-seq data of these mice.

      3 . In Figure 3A the schematic shows that B2 is 155 nt, the plots in Figures 3A,B,C show B2 RNA is 120 nt, and Figure 5 shows the RNA is 188 nt. Can the authors please clarify these differences?

      The full length of B2 consensus sequence is 188nt and this is the one we use for the in vitro experiments. However, the structure of the B2 RNA has been resolved only for the first 155nt by the Kugel lab, and this is the only publicly available structure that we can reference in our figures. For the mapping of 5’ends of short fragments in Fig.3A we have used the same range tested in our Cell paper to maintain consistency of the results. The reason why this 120nt threshold was selected in the Cell paper was to exclude artifacts from short RNAs mapping partially in our metagene as well as downstream of those B2 elements that are shorter from the consensus sequence. We now explain in methods section these differences.

      4 . In the Methods section, the sequence of the g block template didn't contain the T7 promoter sequence that was used as the forward primer for PCR amplification?

      We have now included this sequence in lower case.

      5 . In Figure 6B, why were Hsf1 levels not decreased in the R treated cells after treatment with the LNA?

      Old Figure 6B is now new Figure 7B. Please see response to Reviewer 1, major point 12.

      Reviewer #2 (Significance (Required)):

      Finally, this reviewer generally remarks that “The models presented for the regulation of stress response genes (SRGs) in amyloid beta neuropathologies are compelling. As are the correlations they found between the progression of amyloid pathology, expression of genes thought to be regulated by B2 RNA, and the processing of B2 RNA. This is a unique direction of research for brain disease and represents an interesting conceptual advance. Most prior studies in this area use common model cell lines, and this lab seems well-positioned to unravel the proposed molecular mechanisms in neuronal systems.”

      We appreciate the encouraging comments made by this reviewer.

      REVIEWER 3

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript describes a regulatory mechanism involving Hsf1 and B2 RNAs in the control of stress response genes (SRGs) during amyloid induced toxicity. In particular Hsf1, upregulated in 6m old APP mice and in HT22 cells treated with beta amyloid peptides, is shown to stimulate the B2 RNA destabilization leading to SRGs activation. While in healthy cells this upregulation can be reverted once the stimulus is removed, the pathological condition fuels the circuitry leading to p53 upregulation and neuronal cell death. The authors previously described the same mechanism acting during cellular heath shock response but in this case the protein identified as trigger of B2 RNA destabilization and SRGs activation was EZH2 (Zovoilis et al, 2016).

      This reviewer generally remarks that “Indeed, the first part of the manuscript describes additional analyses of the previous data that prompts further investigation on the potential role of B2 RNA in AD condition. Nevertheless, it is not clear how the prior findings obtained in not biologically related cellular models might be used to obtain helpful indication of B2 RNA neuronal activity.”

      We thank the reviewer for this comment. Indeed, the current study’s main aim was to expand the findings of our previous work on the role of B2 RNA in cellular response to thermal stress in NIH/3T3 cells to other types of cellular response to stress, in our case to amyloid toxicity and the resulting amyloid pathology in neural cells. Response to thermal stress (Heat Shock) has been used for years as a basic study model for cellular response to stress. Proteins and gene pathways initially identified in heat shock have been subsequently shown to play identical pro-survival roles in other biological systems and there are studies showing the role of Hsf1, heat shock related proteins and cell stress response pathways in neural cells and the mammalian brain (we will provide these references in the revised version). For example, pathways such as the MAPK pathway and early response genes, that constitute the basis of response to heat shock, have been shown in studies by us and others to be activated and play a critical role in hippocampal function. Thus, examining the role of B2 RNA in the context of neural response to stress constituted a natural continuation of our previous study in NIH/3T3 cells. The fact that the list of B2 RNA regulated SRGs was found to be highly enriched in neuronal tissue terms and cellular compartments related to neuronal functions plainly confirms the close relationship among cellular response pathways in the two biological systems. Due to these facts we were compelled to investigate in more detail our previous findings also in a neural cell model. However, as discussed in point 2 of Reviewer 2, the initial manuscript did not confirm the direct control of B2 RNA on expression of target genes also in our cellular model. This information is now part of the new figure 6 and we thank both reviewers for bringing this to our attention.

      The reviewer also remarks that “The research fields of non coding RNAs and neurodegeneration are attractive and challenging and, in my opinion, the molecular circuitry involving B2 RNAs might add important insights for understanding beta amyloid toxicity and neuronal death; however, the data provided are not in the shape making the manuscript suitable for publication: some controls are missing, the way the experiments are presented is not easy to follow and more importantly the authors does not provide any data (tables or lists) of the NGS experiments and the study lacks validation of them. Therefore, in my opinion the manuscript needs a profound revision before to be considered for publication in Review Commons.”

      Based on this reviewer’s and the other reviewers’ suggestions we now provide additional controls, detailed tables and gene lists, and qPCR validation of these results. We have also substantially revised the text in the first section of the results and beginning of the discussion, to make our rational for testing B2-SRGs more clear and easier to follow.

      **major concerns:**

      Major point 1. The reviewer asks: “The first paragraph of the Results is entirely dedicated to re-analyze the data previously published by the same group (Zovoilis et al., 2016). However, this is not adequately explained. In line with this, the table 1 is not required since the data are already provided by Zovoilis et al., 2016, unless the authors handled the data using additional new criteria that have to be explained.”

      We now explain our rational for using this data in more detail in the text. Please see also response to the general comment of this reviewer and response to the next point.

      In the Zovoilis et al (2016) study, the data presented did not include the list of regulated genes in a direct way but as part of the annotation of the B2 CHART peaks. This may pose difficulty to non-experts to extract the gene list from that data and we thought to include them as separate gene list here so that readers can directly use it for their analysis. Nevertheless, if the reviewer or the editor think that the list is redundant, we can surely omit it.

      In addition, the reviewer comments: “Moreover, Zovoilis and colleagues (2016) focused on SRGs regulated upon heat shock and using NIH/3T3 and HeLa cell lines, therefore, it is difficult to me understand how, searching for "cellular function connected with B2 RNA regulated SRGs", the list resulted enriched of neuronal tissue terms or cellular compartments related to neuronal functions. Please clarify this point since the following analyses are based on these findings.”

      Neural pathologies, such as amyloid pathology in brain, are often connected with cellular stress due to proteotoxicity. The ability of neural cells to respond to proteotoxicity challenges is connected with various molecular mechanisms, including stress related proteins that were firstly described in the context of heat shock. Thus, both contexts (heat shock and amyloid toxicity) refer to cellular response to stress, which explains why genes identified to be regulated during stress response in NIH/3T3 cells constitute part of the basic stress response toolbox that neural cells have also been described to possess. We have now modified the text accordingly to make our rational more clear.

      Major point 2. The reviewer comments: “In Figure 1F there is no arrow indicating that some of the SRGs regulate directly miR-34 as stated in the main text. Moreover, it is more appropriate to replace SRGs with learning‐associated genes both in the figure and in text (2nd paragraph of the results) since Zovoilis and colleagues focused on them. Finally, they did not show in their manuscript the rescue of p53 expression mediated by mir-34; indeed, for miR-34-p53 regulatory axis Zovoilis and colleagues referred to Peleg et al, 2010 and Yamakuchi & Lowenstein, 2009. Please fix all these concerns.”

      We have restructured the figure as suggested by the reviewer and made clear the distinction between learning genes and B2 RNA regulated SRGs (B2-SRGs) from the two different studies. In connection with point 1 of Reviewer 1, we believe that new Figure 1E, that includes the exact number of B2-SRGs that are learning associated, will represent more efficiently and accurately the data. We have also corrected in the text the citation regarding miR-34c and p53 in both the introduction and first section of the results (last paragraph).

      -The Fig.1A and Fig.1F are wrongly indicated at the end of the sentence "....levels of these genes are normally downregulated in 6m and 12m old mice compared to 3m old mice (p=0.02 and p=0.04, respectively)"; please correct this point.

      The error has been corrected.

      Major point 3. The reviewer comments regarding Figure 2:

      a) Since three mice for each condition have been used for the RNA seq analyses, please provide a blot with the Principal Component Analysis (PCA).

      Please see also response to minor point 3 of Reviewer 1. We provide the PCA plots for WT and APP mice in the new Supplementary Figure 9 and we also provide a comparison of the six month old mice with the HT cell samples as well as a correlation matrix for 6 month old mice in the same figure.

      b) Fig 2F comes first of Fig 2E in the text, however, I suggest to move this latter to supplementary material.

      Old figure 2E has now been moved to supplementary material as new Supplementary Figure 2C and we also provide in a boxplot the exact gene expression levels as new Supplementary Figure 2B.

      c) In general, this study lacks validation of the RNA-seq results. Western blot and/or qRTR-PCR to verify the variation of p53 and of some selected SRGs have to be provided.

      In the current revised version we already provide qPCRs for p53 and Hsf1 in APP mice and we will include additional genes in the final version.

      d) It is also not clear how the authors defined SRGs in the hippocampus: do they correspond to learning‐associated genes described by in Zovoilis et al, 2011 or to B2 RNA H/S regulated genes by Zovoilis et al, 2016?

      The way we presented B2 RNA SRGs in the results with regard to learning associated genes was indeed unclear. We now present the distinction between the two gene categories and their relationship as a new Fig.1E panel and we also provide detailed gene lists of common genes and the exact numbers (please see also response to Review 1, major point 1).

      -APP 12 month old mice show the sever phenotype of the terminal AD-like pathology, however this does not correlate with significant SRGs and B2 processing increase. Can the author make a comment on this?

      That’s a very important point and we thank the reviewer for raising this point. We now comment on this in the discussion part explaining how our findings are characteristic of the initial active neurodegeneration phase of amyloid pathology rather than more terminal stages.

      Major point 4: The reviewer comments regarding Figure 5:

      a) a gel with no-protein control for the time course of panel B was cited in the text but missing among the panels. Moreover, the time course shown in the graph in 5C does not correspond to the one in 5B.

      Indeed, the no-protein control time line should refer only to panel C and not to B, we have now corrected the text. Nevertheless, we now present in the new Supplementary Fig. 5 the gels, based on which the graph in panel C was calculated, including also the gel with no protein timeline. The time course shown in the initial 5C had been mislabeled. It has now been corrected. We apologize for this and we thank the reviewer for bringing this to our attention.

      b) 5G indicates that four samples for each condition have been analysed by RNA-seq, since they do not seem to be homogeneous please provide a PCA analysis together with the validation by qRT-PCR of a selected group of deregulated genes.

      Old Figure 5G is new Figure 6C. PCA analysis for these samples is now provided in Supplementary Figure 9 and qPCR validation of a number of these genes is provided in new Fig. 7E.

      Moreover, it is not clear whether all the genes shown in the heatmap or a number of them, as stated in the text, were found upregulated in 6m old APP mice. Please clarify this point and modify the figure and the text accordingly. A Venn diagram showing the overlap between genes upregulated in 42vsR treatment and those upregulated in 6m old APP mice might help the comprehension of the experiment.

      Please see response to Reviewer 1, point 9. We now provide as new supplementary tables the exact overlapping lists and mention these numbers in the text.

      Major point 5: The reviewer comments regarding Figure 6 (now labeled as Fig.7):

      a) The evaluation of the levels of Hsf1 mRNA and protein upon LNA transfection is missing for both R and 42 treated HT22 cells. From TPM in panel B, Hsf1 downregulation seems to have been more effective in 42 than in R condition. This would mess up the interpretation of the data.

      We now provide qPCR data for Hsf1 gene expression levels which confirm the ones from the RNAseq. The reason why Hsf1 downregulation seems not to affect the R condition is discussed in our response to Reviewer 1, major point 12, and the respective explanation is provided in the revised text.

      b) Again, in this case any validation of the RNA seq data is provided (any B2 regulated SRGs).

      Now, we provide qPCR data for these genes in Fig.7B and new Fig.7E

      c) Panels E and F should be swapped or panel E moved to supplementary material.

      Panel E is now moved to supplementary material as new Suppl. Figure 7C.

      Major point 6. The reviewer comments: “In a previous paper the authors discovered B2 RNAs as a class of transcripts bound to EZH2 and this interaction leads to B2 RNA destabilization in heath shock (H/S) condition. The authors also conclude that the genes controlled by B2 RNAs may not overlap with the ones controlled by Hsf1 during H/S. The author should make a comment on this explaining why during H/S B2 RNAs work independently from Hsf1 and on different target SRGs while, during beta amyloid stress ,the two act together on the same SRGs. Moreover, as shown for EZH2, Hsf1-RIP experiment should be performed in order to confirm the direct involvement of Hsf1 in the SRGs-B2 destabilization.”

      In the last two paragraphs of our discussion we indicate that B2 RNA regulation is a new process implicated in the response to stress in amyloid pathology but certainly not the only one. We have revised the text in this part accordingly in the revised version to prevent any confusion. We are currently performing a series of RIP-seq experiments with various antibodies. As, to our knowledge, there is no prior published study performing RIP-seq or CLIP-seq for any tissue using Hsf1 antibodies, the success of this experiment is not guaranteed and depends on the existence of appropriate antibodies.

      Major point 7. The reviewer comments: “There is any table listing the results of the RNA seq experiments performed in this paper: control vs APP 3-6-12 m old mice and in R vs 42 treated HT22 cells in presence or absence of LNA against Hsf1. Please provide these data.”

      We now provide these lists as new supplementary tables. Please see response to major points 1 and 9 of reviewer 1.

      Major point 8. The reviewer comments: “In the discussion the authors claim that healthy cells are able to restore the expression of Hsf1, SRGs and B2 RNA upon removal of the stress. Since there are evidence for the rescue of SRGs and B2 RNA expression post H/S, no data are available for Hsf1, SRGs and B2 RNA upon the removal of 1-42 beta amyloid peptide. This might be a nice information to add to the manuscript.”

      This would indeed substantiate further our results in our HT22 cell model. We have now performed this experiment, in which HT-22 cells were removed from the amyloid 42 (and the respective R peptide control) and left to recover for 12 hours before estimating through RT-qPCR the Hsf1 levels ( see graph below, REC corresponds to recovered HT-22 cells). Hsf1 levels in 42-REC have returned to the same levels as in R, p We currently perform the RT-qPCRs of these samples also for B2-SRGs and will include them in the final version as a supplementary figure.

      **Minor criticisms:**

      -In the introduction the reference Yamakuchi M and Lowenstein CJ, (2009) MiR‐34, SIRT1 and p53: the feedback loop. Cell Cycle, should be added in the sentence: "In contrast, hippocampi of mouse models of amyloid pathology and post- mortem brains of human patients of AD.....and neural death (Zovoilis et al., 2011)."

      We have now changed the text at that point accordingly and also updated the legend of Figure 1F that also refers to this same study.

      -Authors refer to Hernandez et al., 2020 to state that B2 self cleavage is stimulated by some proteins however, Hernandez and colleagues studied only the effect of EZH2 protein. Please rephrase the sentence accordingly.

      Text has been modified accordingly.

      -Indicate a reference for the sentence: "......Ezh2, was reported as being responsible for the B2 RNA accelerated destabilization and processing during response to stress."

      The respective citation was added.

      -The format of many references is not consistent and has to be revised.

      We have switched to the Vancouver style. Some references in the legend and methods sections are referred independently from EndNote in case these text sections have to be moved to supplement in the final version in order to not create inconsistencies with endnote.

      Reviewer #3 (Significance (Required)):

      Finally, this reviewer generally remarks that “The research fields of non coding RNAs and neurodegeneration are attractive and challenging and, in my opinion, the molecular circuitry involving B2 RNAs might add important insights for understanding beta amyloid toxicity and neuronal death.

      However, this manuscript does not really add technical advances since the authors employed experimental approaches and bioinformatic analyses previously published by Zovoilis and colleagues in 2011 and 2016.”

      Our aim in the current manuscript was not to introduce a new method or experimental approach but rather to study the mechanisms behind B2 RNA regulation of gene expression in neural cells and particularly in amyloid pathology. Nevertheless, the current study constitutes the first reported short-RNA seq in this tissue and offers for the first time the ability to study B2 RNA processing in this tissue which is not possible with standard small and long RNA-seq.

      The reported findings might of interest of an audience of experts in non coding RNAs and neurodegeneration. The area of my expertise almost regards the biology of non coding RNAs from biogenesis to function manly focusing on neuronal and muscular systems both in physiological and pathological conditions.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript describes a regulatory mechanism involving Hsf1 and B2 RNAs in the control of stress response genes (SRGs) during amyloid induced toxicity. In particular Hsf1, upregulated in 6m old APP mice and in HT22 cells treated with beta amyloid peptides, is shown to stimulate the B2 RNA destabilization leading to SRGs activation. While in healthy cells this upregulation can be reverted once the stimulus is removed, the pathological condition fuels the circuitry leading to p53 upregulation and neuronal cell death. The authors previously described the same mechanism acting during cellular heath shock response but in this case the protein identified as trigger of B2 RNA destabilization and SRGs activation was EZH2 (Zovoilis et al, 2016). Indeed, the first part of the manuscript describes additional analyses of the previous data that prompts further investigation on the potential role of B2 RNA in AD condition. Nevertheless, it is not clear how the prior findings obtained in not biologically related cellular models might be used to obtain helpful indication of B2 RNA neuronal activity. The research fields of non coding RNAs and neurodegeneration are attractive and challenging and, in my opinion, the molecular circuitry involving B2 RNAs might add important insights for understanding beta amyloid toxicity and neuronal death; however, the data provided are not in the shape making the manuscript suitable for publication: some controls are missing, the way the experiments are presented is not easy to follow and more importantly the authors does not provide any data (tables or lists) of the NGS experiments and the study lacks validation of them. Therefore, in my opinion the manuscript needs a profound revision before to be considered for publication in Review Commons.

      major concerns:

      -The first paragraph of the Results is entirely dedicated to re-analyze the data previously published by the same group (Zovoilis et al., 2016). However, this is not adequately explained. In line with this, the table 1 is not required since the data are already provided by Zovoilis et al., 2016, unless the authors handled the data using additional new criteria that have to be explained. Moreover, Zovoilis and colleagues (2016) focused on SRGs regulated upon heat shock and using NIH/3T3 and HeLa cell lines, therefore, it is difficult to me understand how, searching for "cellular function connected with B2 RNA regulated SRGs", the list resulted enriched of neuronal tissue terms or cellular compartments related to neuronal functions. Please clarify this point since the following analyses are based on these findings.

      -In Figure 1F there is no arrow indicating that some of the SRGs regulate directly miR-34 as stated in the main text. Moreover, it is more appropriate to replace SRGs with learning‐associated genes both in the figure and in text (2nd paragraph of the results) since Zovoilis and colleagues focused on them. Finally, they did not show in their manuscript the rescue of p53 expression mediated by mir-34; indeed, for miR-34-p53 regulatory axis Zovoilis and colleagues referred to Peleg et al, 2010 and Yamakuchi & Lowenstein, 2009. Please fix all these concerns.

      -The Fig.1A and Fig.1F are wrongly indicated at the end of the sentence "....levels of these genes are normally downregulated in 6m and 12m old mice compared to 3m old mice (p=0.02 and p=0.04, respectively)"; please correct this point.

      -Figure 2:

      a) Since three mice for each condition have been used for the RNA seq analyses, please provide a blot with the Principal Component Analysis (PCA).

      b) Fig 2F comes first of Fig 2E in the text, however, I suggest to move this latter to supplementary material.

      c) In general, this study lacks validation of the RNA-seq results. Western blot and/or qRTR-PCR to verify the variation of p53 and of some selected SRGs have to be provided.

      d) It is also not clear how the authors defined SRGs in the hippocampus: do they correspond to learning‐associated genes described by in Zovoilis et al, 2011 or to B2 RNA H/S regulated genes by Zovoilis et al, 2016?

      -APP 12 month old mice show the sever phenotype of the terminal AD-like pathology, however this does not correlate with significant SRGs and B2 processing increase. Can the author make a comment on this?

      -Figure 5:

      a) a gel with no-protein control for the time course of panel B was cited in the text but missing among the panels. Moreover, the time course shown in the graph in 5C does not correspond to the one in 5B.

      b) 5G indicates that four samples for each condition have been analysed by RNA-seq, since they do not seem to be homogeneous please provide a PCA analysis together with the validation by qRT-PCR of a selected group of deregulated genes. Moreover, it is not clear whether all the genes shown in the heatmap or a number of them, as stated in the text, were found upregulated in 6m old APP mice. Please clarify this point and modify the figure and the text accordingly. A Venn diagram showing the overlap between genes upregulated in 42vsR treatment and those upregulated in 6m old APP mice might help the comprehension of the experiment.

      -Figure 6:

      a) The evaluation of the levels of Hsf1 mRNA and protein upon LNA transfection is missing for both R and 42 treated HT22 cells. From TPM in panel B, Hsf1 downregulation seems to have been more effective in 42 than in R condition. This would mess up the interpretation of the data.

      b) Again, in this case any validation of the RNA seq data is provided (any B2 regulated SRGs).

      c) Panels E and F should be swapped or panel E moved to supplementary material.

      -In a previous paper the authors discovered B2 RNAs as a class of transcripts bound to EZH2 and this interaction leads to B2 RNA destabilization in heath shock (H/S) condition. The authors also conclude that the genes controlled by B2 RNAs may not overlap with the ones controlled by Hsf1 during H/S. The author should make a comment on this explaining why during H/S B2 RNAs work independently from Hsf1 and on different target SRGs while, during beta amyloid stress ,the two act together on the same SRGs. Moreover, as shown for EZH2, Hsf1-RIP experiment should be performed in order to confirm the direct involvement of Hsf1 in the SRGs-B2 destabilization.

      -There is any table listing the results of the RNA seq experiments performed in this paper: control vs APP 3-6-12 m old mice and in R vs 42 treated HT22 cells in presence or absence of LNA against Hsf1. Please provide these data.

      -In the discussion the authors claim that healthy cells are able to restore the expression of Hsf1, SRGs and B2 RNA upon removal of the stress. Since there are evidence for the rescue of SRGs and B2 RNA expression post H/S, no data are available for Hsf1, SRGs and B2 RNA upon the removal of 1-42 beta amyloid peptide. This might be a nice information to add to the manuscript.

      Minor criticisms:

      -In the introduction the reference Yamakuchi M and Lowenstein CJ, (2009) MiR‐34, SIRT1 and p53: the feedback loop. Cell Cycle, should be added in the sentence: "In contrast, hippocampi of mouse models of amyloid pathology and post- mortem brains of human patients of AD.....and neural death (Zovoilis et al., 2011)."

      -Authors refer to Hernandez et al., 2020 to state that B2 self cleavage is stimulated by some proteins however, Hernandez and colleagues studied only the effect of EZH2 protein. Please rephrase the sentence accordingly.

      -Indicate a reference for the sentence: "......Ezh2, was reported as being responsible for the B2 RNA accelerated destabilization and processing during response to stress."

      -The format of many references is not consistent and has to be revised.

      Significance

      The research fields of non coding RNAs and neurodegeneration are attractive and challenging and, in my opinion, the molecular circuitry involving B2 RNAs might add important insights for understanding beta amyloid toxicity and neuronal death. However, this manuscript does not really add technical advances since the authors employed experimental approaches and bioinformatic analyses previously published by Zovoilis and colleagues in 2011 and 2016.

      The reported findings might of interest of an audience of experts in non coding RNAs and neurodegeneration.

      The area of my expertise almost regards the biology of non coding RNAs from biogenesis to function manly focusing on neuronal and muscular systems both in physiological and pathological conditions.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This manuscript follows from previous work by the corresponding author showing that SINE-encoded B2 RNAs function as regulators of the expression of stress response genes (SRGs). Specifically, stimulus triggers the processing of repressive B2 RNAs that are bound at the SRGs, thereby activating SRG transcription. In this work, the authors investigate whether a similar mechanism might be controlling the expression of genes in models of amyloid beta neuropathology (i.e. mouse hippocampi from an amyloid precursor protein knock-in mouse model, and a cell culture model of amyloid beta toxicity). They performed RNA-seq in these models. Their data show a correlation between the progression of amyloid pathology, expression of genes thought to be regulated by B2 RNA, and the processing of B2 RNA. In addition, they show biochemical data supporting a role for Hsf1 in enhancing the processing of B2 RNA. Knockdown of Hsf1 also reduced B2 RNA processing and the expression of SRGs.

      Major comments:

      1 . In the RNA-seq data one cannot distinguish between Pol III transcribed B2 RNA and Pol II transcribed B2 RNA (typically embedded within introns and UTRs of mRNAs). The models they present, and the structures they show, clearly imply regulation by Pol III transcribed B2 RNA. However, there is no way to know that the short B2 RNAs they sequence aren't coming from degraded mRNAs. This needs to addressed. Minimally, in writing as a caveat of their model. Ideally, it would be addressed experimentally.

      2 . The direct regulation of SRGs by B2 RNA was not shown in their model systems for amyloid beta neuropathology. Rather, the authors' used the genes identified in their prior studies as B2 RNA-regulated, which I believe were in the NIH3T3 cell line. Given that transcription is highly cell-type specific, these genes might not be regulated by B2 RNA in mouse hippocampi or their cell culture model, despite the correlations shown. This needs to be addressed. Ideally, a targeted approach to show that transcription of even a couple genes in their system is indeed regulated by B2 RNA would provide stronger support for their conclusions.

      3 . The following bioinformatics analyses would strengthen their conclusions. This should be straightforward to do because it involves data they already have, and perhaps analyses they have already have performed.

      a. Regarding the plot in Figure 3A (lower panel). The same plot should be shown for the 3m old and the 12m old APP mice (i.e. not just the 6m data). This would show the specificity of processing B2 RNA and that it indeed correlates with disease progression.

      b. Regarding the plots of B2 RNA processing rate. This value could increase either due to more short RNAs or less full length RNA. Which is it for the 3m, 6m, and 12m APP mice? Showing the short and long B2 RNAs as boxplots (as opposed to only the processing rate) would address this and also provide additional insight into the regulation involved. The same applies to the data in Figure 6. (As an aside... do the authors mean processing ratio as opposed to rate? I'm not clear where the time component is coming into play to call this a rate.)

      c. The random genes in Figures 2E and 6E are plotted as heat maps, but statistical significance is hard to see. What do boxplots of the random genes look like, and is the significant difference between 6m old APP and 6m old WT then lost?

      4 . It is interesting that B2 RNA self-processing is enhanced by both Ezh2 and also Hsf1. It would strengthen the data to perform a control with a protein prepared more similarly to the Hsf1 (rather than PNK) to confirm that the enhanced B2 RNA breakdown is indeed attributable to Hsf1 and not a contaminant in the protein prep. Similarly, the authors should provide information on which RNA was added as the negative control for Hsf1-stimulated breakdown (i.e. the ~80 nt RNA).

      Minor comments:

      1 . Regarding the GO analyses in Figure 1 (panels B, C, and D). I wasn't clear whether the authors are showing all statistically enriched terms, or only those relevant to neuronal processes and learning. I recommend showing a supplemental table with all terms that have an adjusted p value below a specified cut-off (e.g. 0.05).

      2 . The authors show several figures that are not new data (2B, 4A, 4B, Suppl. Fig 1 and 2). I think it would be more clear if these data were summarized and referenced in the results, rather than shown.

      3 . In Figure 3A the schematic shows that B2 is 155 nt, the plots in Figures 3A,B,C show B2 RNA is 120 nt, and Figure 5 shows the RNA is 188 nt. Can the authors please clarify these differences?

      4 . In the Methods section, the sequence of the g block template didn't contain the T7 promoter sequence that was used as the forward primer for PCR amplification?

      5 . In Figure 6B, why were Hsf1 levels not decreased in the R treated cells after treatment with the LNA?

      Significance

      The models presented for the regulation of stress response genes (SRGs) in amyloid beta neuropathologies are compelling. As are the correlations they found between the progression of amyloid pathology, expression of genes thought to be regulated by B2 RNA, and the processing of B2 RNA. This is a unique direction of research for brain disease and represents an interesting conceptual advance. Most prior studies in this area use common model cell lines, and this lab seems well-positioned to unravel the proposed molecular mechanisms in neuronal systems.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      B2 RNAs, encoded from SINE B2 elements has been directly implicated in stress response by its inherent ability to bind RNA Pol II and suppress stress response genes (SRG) in homeostatic conditions. However, upon stimuli, B2 RNAs are cleaved and degraded, resulting in the release of RNA pol II and upregulation of SRGs. Previous work from the senior author identified PRC2 component EZH2 to be the B2 RNA processing factor, cleaving B2, and releasing POL2. SRGs are upregulated upon stress, for example in age-associated neuropathologies like Alzheimer's disease (AD). Considering that the hippocampus is a primary target of amyloid pathologies as well as since SRGs are suggested to be key for the function of a healthy hippocampus, the authors set to understand the role of B2 RNAs that are linked to SRG regulation in the mouse hippocampus with amyloid pathology. They use disease-relevant in vivo and in vitro models combined with unbiased RNA seq data analysis for this endeavor, which indicates the potential relevance of B2 RNAs in APP mediated neuronal pathologies in mice as well as identifies Hsf1 as the factor cleaving B2 RNAs in the hippocampus. The work is interesting and identification of Hsf1 as the processing factor for B2 RNAs in the hippocampus is significant. I would like to credit the authors for their elegant in vivo experimental design in Figure 2. However, I find some of the conclusions to be overstated and I would like to bring the following concerns I have to your attention:

      Major comments:

      1 . In figure 1, the authors indicate a strong connection between B2 RNA regulated SRGs and learning and memory. In figure 2, they identify the SRGs in the hippocampus, please provide a direct comparison of learning and memory associated SRGs and the SRGs they identify in figure 2 that are significantly upregulated in APP mice in 6 months.

      2 . To better understand the data in the context of hippocampal function, please include functional annotation of SRGs they identified in Figure 2F as they do it in Figure 1 (desirably for each time point, at least for 6M). How many of the SRGs they identify in Figure 1 are part of Figure 2F? Please include functional annotation of significantly upregulated B2 regulated SRGs in Fig2 and compare them with that of Figure 1.

      3 . In figure 3, the authors report that the B2 processing rates are high at the 6M time point at in hippocampi of the APP mice. Please include the levels of unprocessed and processed B2 RNAs in these samples along with this figure, without which it is difficult to gauge the significance of its correlation with SRGs in Figure 2.

      4 . What is the % of B2 regulated SRGs that are hsf1 bound in Figure 4C? What is there dynamics in the wild type and APP hippocampi?

      5 . What is the distribution of Hsf1 binding sites on (a) non-B2 regulated SRGs and (b) non-SRG genes in hippocampi?

      6 . In Figure 4D, the 3months old Wt HSF1 levels are high, yet B2 processing (Figure 3E) is low. Please comment.

      7 . While the authors show in vitro cleavage of B2 RNA by Hsf1, the experiment lacks controls to be conclusive. At least, please include a similar size protein as HSF1 with no-known RNA binding activity and a similar size protein with RNA binding activity as controls in 5A. Please justify the use of PNK as the control protein. Please include the use domain-based deletions of Hsf1 to map the region of HSF1 that is binding and potentially cleaving the B2 RNA. Please include an RNA of similar size and Antisense-B2 RNA to show the specificity of the Hsf1 based cleavage of B2 RNA. Without these controls, the conclusions in Figure 5 cannot be substantiated.

      8 . The authors should show that the incubated APP peptides are taken up by the cells (experiments in Figure 5F and Figure 6).

      9 . Please provide the list, functional annotation, and % of the SRGs upregulated upon incubation with APP in HT22 cells in comparison to 6month old APP mice. Comment on learning-related Genes.

      10 . The authors should show the efficient downregulation of Hsf1 (protein) upon anti-Hsf1 LNA transfection.

      11 . Please present the total B2 RNA levels for conditions in Figure 6C.

      12 . Hsf1 levels are not significantly downregulated in Control cells which were inoculated with the reverse APP peptide. Please comment.

      13 . Please compare and contrast the % of genes, the overlap, and the functional distinctions in 6F to that of 5G and Figure1. What are the genes that are common between Figure1, and that are specifically upregulated upon Anti-Hsf1 LNA transfection along with 1-42 APP. What is % of the occurrence of B2 binding sites in those genes? What are their functional annotations and what is their connection to learning, memory, and cell survival?

      Minor.

      1 . Please include TPM/ FPKM values for hippocampal markers as control in Figure 2 to do justice to the hippocampus specific RNA seq conducted by the Authors.

      2 . In figure 2D the authors show that B2 RNA regulated SRGs in the 3 months' wild type mice are significantly high. P53 has been reported to be high in young wild types hippocampus, but not SRGs in my opinion. The authors should comment on this.

      3 . In figure 2F, under the 6m APP condition, the replicate 3 looks substantially different from the other replicate. This can significantly impact the analysis and conclusions made. Either remove that replicate and present the analysis without it or please provide a valid explanation. To make the data more valid, please provide hierarchical clustering of the entire data, the non-B2 regulated genes and the B2 regulated SRGs. In Figure 2C RNA seq data is represented in TPM while its FPKM in Figure 2D. Figure 2: the number of replicates in the case of 3-month-old wild types only 2. Please specifically denote it and comment why only 2 replicates are provided

      4 . Considering that p53 and SRGs are significantly upregulated in 6months in the APP model, it would be great if (allowing that these samples are still available) the authors can include a staining for apoptotic markers, for example, Active Casp3 or similar. This will allow us to better gauge the gene expression changes presented by the authors especially regarding SRGs.

      5 . Under subheading: Hsf1 accelerates B2 RNA processing, 3rd paragraph when the authors comment on known hsf1 binding sites on SRG genes, please correct from: Increased Hsf1-binding was found.... "To the increased number of hsf1 binding sites were found", unless the authors would like to show increased Hsf1 binding by performing CHIP-seq for Hsf1 in the hippocampus at least at the 6-month time point between Wt and APP mice.

      Significance

      B2 RNAs, encoded from SINE B2 elements has been directly implicated in stress response by its inherent ability to bind RNA Pol II and suppress stress response genes (SRG) in homeostatic conditions. However, upon stimuli, B2 RNAs are cleaved and degraded, resulting in the release of RNA pol II and upregulation of SRGs. Previous work from the senior author identified PRC2 component EZH2 to be the B2 RNA processing factor, cleaving B2, and releasing POL2. SRGs are upregulated upon stress, for example in age-associated neuropathologies like Alzheimer's disease (AD). Considering that the hippocampus is a primary target of amyloid pathologies as well as since SRGs are suggested to be key for the function of a healthy hippocampus, the authors set to understand the role of B2 RNAs that are linked to SRG regulation in the mouse hippocampus with amyloid pathology. They use disease-relevant in vivo and in vitro models combined with unbiased RNA seq data analysis for this endeavor, which indicates the potential relevance of B2 RNAs in APP mediated neuronal pathologies in mice as well as identifies Hsf1 as the factor cleaving B2 RNAs in the hippocampus.

      The work is interesting and identification of Hsf1 as the processing factor for B2 RNAs in the hippocampus is significant. I would like to credit the authors for their elegant in vivo experimental design in Figure 2.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their useful suggestions to improve the manuscript and their support for publication. We have addressed all the comments that have been raised and carried out the suggested additional analyses, resulting in a significantly improved revised version of the manuscript. We provide hereafter a detailed point-by-point response to all questions and comments of the three reviewers.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Centriole structure has been an attractive but challenging research topic for years. Pierre Gonczy's group has been working on its structure using cryo-electron tomography (cryo-ET). While the axoneme, which has longitudinal periodicity, was analyzed by several groups by cryo-ET for more than a decade, cryo-ET study on the centriole suffers from poor signal to noise ratio due to its limited length and thus fewer periodicity. They chose the centriole of flagellate Trichonympha, which have exceptionally long centrioles and thus offer opportunity of relatively straightforward sub-tomogram averaging. Their approach has been successful, and they revealed intermediate resolution structure of the cartwheel, key of 9-fold symmetry formation, and it's joint to triplet microtubules (Guichard et al. 2012, 2013, 2018).

      In this work, they employed modern state-of-art cryo-ET technique, such as direct electron detection and 3D image classification to upgrade our knowledge of centriole structure. In their past works, the central hub of the cartwheel, made of SAS-6 protein forming 9-fold complex, was described as an 8nm periodic object. With improved spatial resolution, they provided further detail with clear polarity, which will deepen our thought about the initial stage of ciliogenesis. They also compared two Trichonympha species (spp and agilis) as well as another flagellate, Teranympha mirabilis, and extended their intriguing evolutional and mechanical hypotheses based on structural differences.

      Despite improved spatial resolution, it is still not possible to identify proteins in the cryo-ET map (cellular cryo-ET will not reach such high resolution in the near future). Therefore, this work is rather geometrically descriptive, which will inspire molecular biologists to identify molecules by other methods. Nevertheless, this work demonstrated capability of cellular cryo-ET, especially analysis of structural heterogeneity. Thus, while biological topics handled are rather specialized for cilia from flagellate, this work will attract attention of any biologist interested in molecular structure in vivo. It is worth for publication in a high Journal after addressing the points below. This reviewer believes that the authors can address these points easily with additional analysis.

      We are grateful to the reviewer for the favorable evaluation and the many valuable suggestions, in particular concerning the processing pipeline, which we addressed by additional analyses, as detailed below.

      Major points:

      1. Entire scheme A graphic diagram of the entire cartwheel area, summarizing this work, is necessary for the readers' understanding (similar to Fig.6 of the other manuscript, Klena et al.).

      We thank the reviewer for this interesting suggestion, which we fully adhere to. As a result, we have generated a graphical summary of the work, which is shown in the new Figure panels 6B-F. Moreover, Figure 6A provides an evolutionary perspective regarding the presence of the CID and of what is now referred to as the fCID (filamentous CID, previously: FLS, see response to reviewer 3). This also helps to link our findings with the companion manuscript by Klena et al. This new Figure 6 is referred to extensively in the discussion of the revised manuscript (pages 13-16).

      Then average scheme should be shown in more detail, especially assumption of periodicity, Materials and Methods. The cartwheel hub was averaged with 25nm periodicity (as discussed below). Was the pinhead averaged with 16nm (as detected by FFT in Fig.S2L)? How about the triplet?

      This reviewer is not completely sure if the longitudinal averaging strategy is justifiable. Since periodicity of each domain is not trivial, logically the initial average must be done with the size of least common multiple (or larger). It is likely 96nm, assuming 25nm of the central hub is 3 times of microtubule periodicity and 16nm of the pinhead is twice of MT. 96nm average should be possible with a long cartwheel in this work. Alternative, in case periodicity is independent of MT and thus there is no least common multiple, is random picking and classification mentioned in "4. Periodicity". This should also be possible, since they can pick enough number of particles from long cartwheels.

      We apologize that the initial version of the manuscript was not sufficiently clear regarding the averaging pipeline that was pursued. To rectify this, we now provide a new Figure S1B to graphically explain the approach followed for STA. As depicted in this figure panel, the step size for sub-volume extraction was 25 nm both centrally and peripherally. This step size was selected because it corresponds to ~3x the major periodicity of ~8.5 nm observed in the power spectra of the sub-volumes. The 25 nm step size is larger than that previously used (i.e. 17 nm in Guichard et al. 2013), in order to identify potential features with larger periodicities. The fact that the step size was of 25 nm in all cases is now mentioned explicitly in the Materials and Methods section of the revised manuscript (line 649).

      We agree with the reviewer that 96 nm averaging is possible given the long cartwheel analyzed here, and such a piece of data was in fact included in the original submission, although with a different purpose. Indeed, we carried out STA using ~(100 nm)3 sub-volumes (with binning 3 to reduce computational time), the results of which are reported in Figure S7 (previously Fig. S6). For the purpose of this analysis, we focused on the lateral organization of the cartwheel, but did not use this dataset to explore other periodicities because of the limitations inherent to a binning 3 data set.

      • Classification*

      The authors analyzed structural heterogeneity inside the cartwheel hub, employing reference-free classification by Relion software. The program reveals multiple coexisting structures - two from Trichonympha agilis and three from Teranympha, respectively. Whereas this is an exciting finding and shows future research direction of this field, interpretation of this classification must be done carefully. ** It is puzzling that major (55%) population of T. agilis shows more ambiguous features than the minor population (45%), while spatial resolutions by FSC are not so different - for example, Fig.2H vs Fig.S5C. In case of Teranympha, it is even more drastic - Fig.4D (major class) seems blurred along the centriolar axis, compared to Fig. 4E (minor class). This reviewer is afraid that these "major" classes might contain more than one structure and after subaveraging be blurred in detailed features. The apparent good spatial resolution could be explained, when two structures coexist and subtomograms are aligned within each subclass. Probably lower resolution at the spoke region of the major class (Fig.S2A) than that of the minor class (Fig.S2D) is a sign of heterogeneity within this class. Another risk could be subtomograms with poorer S/N being categorized to one class (due to lack of feature to be properly classified). Fig.S5F (black dots localized in one tomogram) raised this concern.

      The following investigation will help to solve this issue. 1. Extract and re-classify subtomograms belonging to the major population. 2. Direct observation of tomograms. The authors could plot two classes of Teranympha (as they did for T. agilis in Fig.S5) and find features of the cylindrical cartwheel hub in two conformations (as shown Fig.4DE). Since such a feature was directly observed in tomograms from the other manuscript (left panels of Fig.S6AC in Klena et al.), it should be possible in this work as well.

      We agree with the reviewer that the interpretation of the classification must be done with care, and share her/his interest in better understanding the structural variability between cartwheels classes in T. agilis and T. mirabilis. Although poor S/N may in theory result in erroneous joint classifications, we note that all maps in the original submission stemmed from extensive focused 3D classification, which removed defective and spurious sub-volumes, nevertheless defining distinct classes in the cases reported. Obviously, however, we cannot exclude that much larger data sets and future software advances may lead to the identification of additional features that would allow further sub-classes to be identified.

      Regardless, we followed the two suggestions the reviewer offered to us and have (1) extracted and re-classified sub-tomograms belonging to the major populations and (2) undertaken a direct observation of tomograms. These two points are developed in turn below.

      (1) We have performed a further round of classification of the major populations in T. agilis (55 % class) and T. mirabilis (64 % class), to assess whether additional sub-classes might be identified and thus help further improve the quality of the central cartwheel map. However, this additional round did not yield new sub-classes nor notable improvement in the map quality as judged by visual inspections. We show in Rebuttal Figure 1 a comparison in each case of the original STA and the corresponding STA upon such re-classification. Importantly, all conclusions spelled out in the original submission hold upon further re-classification, indicating that the initial classification converged to the best map quality based on the current data set and available computational resources.

      (2) We have followed the suggestion of the reviewer and now show raw tomograms to confirm that the classes correspond to bona fide structures and not to processing artefacts (new Figures S1C-F). The resulting new Figure S1D for instance shows that the striking variations observed between classes in the T. agilis STA are also visible in the raw tomogram. The more subtle variations among T. mirabilis classes are more difficult to observe in the raw tomogram, but inherent variations that reflect the presence of two classes are nevertheless observed.

      Furthermore, following the reviewer’s suggestion, we now mapped the distribution of the two T. mirabilis cartwheel classes onto tomograms, revealing that both classes can occur next to each other within the same centriole (new Figure S8E).

      • Periodicity mismatch*

      In Fig. 2CD, periodicity of CID has discrepancy from that of the stacked SAS-6 ring (8.5nm and 8.0nm). Do the authors think this is a significant difference or within an error? The same question can occur to other subtomogram averages. It would be nice to show errors as shown in their other manuscript (Fig.3C of Klena et al.) and clarify their idea. If it is systematic difference of periodicity between the stacked ring and CID, this shift will be accumulated through the entire cartwheel region - after 100nm, 8.5nm/8.0nm difference can be accumulated to ~6nm, which should change the entire view of the subtomogram - and the main factor to be classified (periodicity mismatch). This artifact (or influence) should be removed (or separately evaluated) by masking CID (out and in) and run classification separately. By clarifying this, the quality of the major subaverages (mentioned in the previous paragraph) could be improved.

      The reviewer wonders whether there might be a periodicity discrepancy within one map, for instance between CID and spokes in the T. spp. cartwheel map (Fig. 2C and Fig. 2D). Here, the periodicity determined from the STA maps is 8.5 ± 0.2 nm (SD, N=4) for the CID and 8.0 ± 1.5 nm (SD, N=2) for the spokes. Based on these standard deviations, there is indeed no significant difference between the two, and thus no periodicity discrepancy. The same applies for measurements in T. agilis and T. mirabilis. The SDs were reported already in the figure legends of the original submission, and we would prefer to leave them there if possible and not mention them in the figures, which are pretty busy as is. We apologize if this was not clear enough in the initial manuscript. Likewise, one may wonder whether there might be periodicity discrepancies between structures from distinct maps, for instance between CID and A-links from T. spp. (Fig. 2C and Fig. 3D). Again, the measurements are within error, since the distance between adjacent CIDs is 8.5 ± 0.2 nm (N=4) and between adjacent A-links 8.4 ± 0.4 nm (N=6); a similar conclusion applies for the corresponding measurement comparisons in T. agilis and T. mirabilis. The figure legends have been altered in the revised manuscript to spell out that there are no significant differences between periodicities (lines 856-858).

      Furthermore, we would like to stress that, by definition, STA value are average distances. For instance, in the case of T. spp., the central cartwheel STA was obtained from 511 sub-volumes, and thus the reported N=2 represents the average distance from 511 sub-volumes. Since this is an average, errors can therefore not accumulate over longer distances. This point has also been clarified in the figure legends (line 856-858).

      • Periodicity*

      They averaged subtomograms extracted with spacing of 252A with initial average as the first template (p.18 Line22). This means they assumed 25nm periodicity from the beginning and excluded different or larger unit size (if they take search range wide, they could detect difference periodicity, but will still be biased by initially assumed 25nm). 25nm average allowed them to see more detail than before (when they assumed 8nm periodicity), but there is still a risk of bias from references. To avoid this risk, this reviewer would propose classification of randomly extracted (but of course along the cylindrical hub or along the triplet microtubules, so one-dimensionally random picking) subtomograms. This experiment will end up with multiple sub-averages, which are 25nm (or multiple times of that) shifted from each other. Then it will prove their assumption.

      We agree with the reviewer that in theory the choice of periodicity could introduce a bias. This is why we have chosen a larger step size than in our initial work, corresponding to ~3x the major periodicity of ~8.5 nm observed in the power spectrum of the sub-volumes, as mentioned above. Regardless, following the reviewer’s suggestion, we have now explored other types of periodicities by re-analyzing the dataset through extraction of non-overlapping sub-volumes along the proximal-distal centriole axis. In doing so, we randomized the starting position of the first box between tomograms, reaching the same goal as with random picking but maximizing the number of sub-volumes. We carried out this analysis for all T. spp., T. agilis and T. mirabilis cartwheel classes, and found no notable differences that would affect the conclusions of the manuscript compared to the initial overlapping sub-volume classification, albeit generally with a noisier STA due to the lower number of sub-volumes. A comparison of the two approaches is provided in Rebuttal Figure 2. Moreover, all the points regarding the choice of periodicity have been further clarified in the expanded Materials and Methods section (pages 19-21).

      Minor points:

      They discussed difference of stacked SAS-6 rings in the cartwheel from various species. How much is the sequence difference of SAS-6 among these species?

      Unfortunately, no genomic or transcriptomic data has been published for the species investigated here, although the sparse molecular data available from small subunit rRNA sequences allows one to establish an overall molecular phylogeny. We previously identified a SAS-6 homologue in T. agilis (Guichard et al. 2013), which shares 20 % identity and 45 % similarity with C. reinhardtii SAS-6. Despite low sequence conservation, the structural conservation of SAS-6 is predicted to be high between the two organisms (Guichard et al. 2013). We apologize if these points were not expressed sufficiently clearly in the initial rendition and have adapted the wording in the revised manuscript (lines 325-332).

      Are the authors sure that CID is nine-fold symmetric? It is not trivial.

      We thank the reviewer for bringing up this interesting point. We have applied 9-fold symmetrization to the entire central cartwheel comprising spokes, hub and CID/ fCID, a choice guided by the apparent 9-fold symmetry of the spokes and peripheral element. We investigated the impact of symmetrization on the CID by relaxing symmetry from C9 to C1 during refinement, but did not observe a difference, and thus continued with C9 symmetry, which improves map resolution by S/N ratio enhancement and additional missing wedge compensation. In addition, we have also analyzed the CID without symmetrization, as reported in Figure S7 (previously: Fig. S6). Note that these maps were generated with larger sub-volumes centered on the spokes to comprise hub, spokes and microtubule triplets, explaining the resulting lower resolution, as the missing wedge is not compensated. Despite these limitations, however, the unsymmetrized CID shown in Figure S7A and S7E resembles the one in the symmetrized maps of Figure 2, indicating that the CID indeed exhibits 9-fold radial symmetry. That this is the case is spelled out explicitly in the revised manuscript (lines 1145-1147).

      Fig.1C: Another cross-section from the distal region will be helpful. A longer scale bar is better for readers' understanding.

      We understand that the reviewer is curious about the distal region, and cross-section views of resin-embedded sections from T. agilis are available and could be provided if necessary. However, given that the focus of the manuscript is strictly on the cartwheel-bearing proximal region, we felt that featuring the distal region in detail would break the narrative. Therefore, we suggest to keep Figure 1 as in the original manuscript. Following the reviewer’s suggestion, we increased the size of the scale bars from 10 nm to 20 nm in Figure 1C as well as in the corresponding Figure S8C.

      Fig.S6F: It would be informative if the subclasses (25% and 20%) are distinguished in this mapping.

      As per the reviewer’s request, we provide in Rebuttal Figure 3 a side-by-side comparison of the T. agilis 25 % and 20 % classes centered on the spokes, which are noisier than the composite 45 % class due to the lower number of sub-volumes in each sub-class. Given that there are no notable differences between the two maps that would affect any of the conclusions of the manuscript, we feel it is best to keep what is now Figure S7F (previously: Fig. S6F) unchanged in the revised manuscript.

      A figure to explain the classification scheme will help readers understand. How many subtomograms did classification started? Were the 45% class classified into two (25% and 20%) groups by two-step classification or at once (the entire subtomograms were classified into three groups directly?

      We thank the reviewer for this useful suggestion. As a result, we have generated a new Supplemental Figure S1G-J that provides a graphical overview of the classification scheme, together with sub-volume numbers for all deposited maps, thus nicely complementing Table S1.

      Reviewer #1 (Significance (Required)):

      Nevertheless, this work demonstrated capability of cellular cryo-ET, especially analysis of structural heterogeneity. Thus, while biological topics handled are rather specialized for cilia from flagellate, this work will attract attention of any biologist interested in molecular structure in vivo. It is worth for publication in a high journal after addressing the points above. This reviewer believes that the authors can address these points easily with additional analysis.

      We reiterate our thanks to this reviewer for her/his favorable evaluation and detailed suggestions, which enabled us to generate a strengthened manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Here, Nazarov and colleagues report sub-tomogram average (STA) maps of centrioles with 16 to 40 Å resolution from Trichonympha spp., Trichonympha agilis, and Teranympha mirabilis. Even though the authors have previously described the centriole architecture of T. spp, these STA maps of higher resolution revealed new features of centrioles, like polarized Cartwheel Inner Density (CID) and the pinhead. They also observed Filament-like structure (FLS) from T. mirabilis which seems to correspond to the CID from other species. Interestingly, they suggest that one and two SASS6 rings are stacked in an alternative fashion to make the central hub in T. mirabilis (Figure 5). The following issue should be addressed:

      Major points

      • Figure 4E. Authors mentioned in the manuscript that "We observed that every other double hub units in the 36% T. mirabilis class appears to exhibit a slight tilt angle relative to the vertical axis". When I see the other side, it does not seem to be tilted. Could the authors explain this?*

      We apologize that this aspect was not explained in sufficient detail. The left and right sides of the hub indeed appeared different in transverse views across the cartwheel center (previous Fig. 4E). This was because the area we selected in the original submission was centered on one emanating spoke. Due to the 9-fold symmetry one spoke density was selected on the right side, while the region between two spokes was displayed on the left side (as was illustrated by the slice across the center in previous Figure 4A; dashed rectangles in 4.0 nm panel). We have now selected a larger area to include spokes from both sides of the hub and thus better visualize this offset as shown in the modified Figure 4D-E.

      Reviewer #2 (Significance (Required)):

      I believe these results are of interest for all centrosome researchers and would like to recommend this manuscript be published in the EMBO journal which is affiliated with the Review Commons.

      We thank the reviewer for the recommendation to submit the revised manuscript to EMBO Journal, which we have followed.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript Nazrov et al., use cryo-electron tomography (CET) to analyse the structure of the centriole cartwheel. The Gonczy lab have previously generated a ground-breaking structure of the cartwheel from Trichonympha spp (T. spp.) (Guichard et al., Science, 2012; Guichard et al., Curr. Biol., 2013). This work is a direct continuation of those studies but using modern technology to get higher resolution images of the T. spp. cartwheel and comparing this to the cartwheel from Trichonympha agilis and from another distantly related flagellate Teranympha mirabilis.

      The data is generally well presented and of high quality. I am not an expert in CET, so it would be advisable to get the opinion from a reviewer who is, but the Gonczy lab are experienced in these techniques so I would not anticipate any problems. I have to admit that the title of the paper did not excite me, and I expected this to be a very worthy, but incremental study. It was a pleasure to find out that the extra detail provided by the increased resolution has revealed several new and unexpected features that have important implications for our understanding of cartwheel assembly and function. Most important are the potential asymmetry of the cartwheel hub, apparent variations in the packing mechanism of the stacked rings (even within the same cartwheel), and the potential offsetting of ring stacking. These findings will be of great interest to the field, and so I am strongly supportive of publication in The EMBO Journal. I have only a few points that I think the authors should consider.

      We thank the reviewer for this positive feedback and the recommendation to submit to EMBO Journal, which we hereby follow.

      Prompted by the comment of the reviewer, we revised the title to make it more informative and appealing to readers: “Novel features of centriole polarity and cartwheel stacking revealed by cryo-tomography”.

      • Nazarov et al., conclude that the cartwheel structure is intrinsically asymmetric. This is most convincingly based on the displacement of the CID within the hub, but they state that the Discussion that the potential offset between the Sas-6 double rings generates an inherently polar structure. I didn't understand why this is the case. Looking at Fig.S9A,B I can see that the offset in B could tilt to the left (as shown here) or to the right (if the structure was flipped by 180o). But I couldn't see how this makes this structure polar in the sense that a molecule coming into dock with the structure could only bind to one side of the offset structure shown in B, but to both sides of the aligned structure shown in A. I think this needs to be explained better, as it is crucial to understand where any potential polarity in the cartwheel structure comes from.*

      We apologize for not having been sufficiently clear about how two SAS-6 rings with an offset could impart organelle polarity. The reviewer is correct that an offset between superimposed rings alone is not sufficient to generate polarity at a larger scale. The important point we would like to stress, however, is that we discovered concerted polarity in multiple locations, from the central hub to the peripheral elements as illustrated in Fig. S7C-D, S7G-H, S7K-L and S7O-P (previously: Fig. S6). Prompted by the reviewer’s comment, we now better emphasize the asymmetric tilt angles of merging spokes, as highlighted also in the improved Figure S7. This asymmetric spoke tilt angle allows one to discriminate the proximal and distal side of a double SAS-6 ring, which is now explained better in the text (lines 259-263 & 502-510).

      • Related to this last point, in a co-submitted paper Klena et al. do not report such an asymmetry in the hub structures they have solved from several different species (neither in the tilting of the hub, or the displacement of the CID). I think it would be worth both sets of authors commenting on this point.*

      We agree that comparing and contrasting the results of the two companion manuscripts is important and we have updated the text as a consequence in several places (lines 444, 467, 507, 536, 985, 1000). We know from our previous work (Guichard et al. 2013) that the asymmetry of the hub and spoke is not visible at lower resolution. In the accompanying manuscript by Klena et al., no offset in the hub or asymmetric CID localization is reported, probably due to lower resolution and differences between species.

      • The authors data strongly suggests that the T. ag. and Te. mir. hubs are composed of a mixture of single and double Sas-6 rings. In contrast, the T. spp. cartwheel only has a single class of rings, but it wasn't absolutely clear if the authors think this comprises a single or double ring. In the text it is presented as though the elongation of the hub densities in the vertical direction is a new feature of the T. ag cartwheel (Fig.2H,I), but to me it looks as though this is also apparent in the T. spp. cartwheel (Fig.2C,D). The authors should address this directly and, if they believe that T. spp. has a double ring, they should comment on whether this more regular structure seems to have offset rings. If not, then the offset rings are unlikely to be the source of asymmetry that leads to the asymmetric displacement of the CID. Finally, if the authors think these are double rings, they should also be clear that they would now slightly re-interpret their original T. spp. cartwheel model (Figure 2, Guichard et al., Curr. Biol.). There is no embarrassment in this-a higher resolution structure has simply revealed more detail.*

      We apologize if the conclusions drawn about T. spp. cartwheel hubs were not sufficiently clearly expressed. Like the reviewer, we think that elongated hub elements are also discernible in T. spp., something that is also illustrated by the intensity plot profile in Figure 2C (double peaks on light blue line). These points are spelled out more explicitly in the revised manuscript (lines 177-179). In addition, to emphasize the conservation of the double hub units in both Trichonympha species, we have likewise adapted the text for T. agilis (lines 198-201).

      As for the offset observed within T. spp. spoke densities in Figure S10H, we interpret this as evidence for an offset of the double ring at the level of the hub, although we have not observed such offset in T. spp. for reasons that are unclear. The fact that this revises our previous interpretation based on a lower resolution map of T. spp. was already mentioned in the initial submission but is now better emphasized (lines 171-172 & 179-181).

      • The authors conclude that T. mirabilis cartwheels lack a CID and instead have a filament-like structure (FLS). I wonder whether it is more likely that the FLS is really a highly derived CID that appears to be structurally distinct when analysed in this way, but that will ultimately have a similar molecular composition. This situation might be analogous to the central tube in C. elegans, which by EM appears to be distinct from the central cartwheel seen in most other species, but is of course still composed of Sas-6. This historical tube/cartwheel nomenclature is now cumbersome to deal with, so perhaps it would be better to be cautious and not give the T. mirabilis structure a completely new name-how about "unusual CID" (uCID).*

      We share the view that the CID and the “FLS” –the term used in the initial submission- may have a related molecular composition and function, as we had also speculated in the discussion of the original submission. Following the reviewer’s suggestion, and in an effort to have a more uniform nomenclature, we propose to dub the T. mirabilis structure “filamentous CID” (fCID). This highlights better the similar location of these two entities and their potential shared function, while stressing the filamentous nature of the fCID. We further emphasize this point by providing the new Figure 6A to compare the presence of the two entities in select species. The discussion has also been adapted accordingly (pages 13-14).

      Rebuttal Figure Legends

      Rebuttal Figure 1: Re-classification of major classes

      (A-D) Transverse (top) and longitudinal (bottom) views of T. agilis (A, B) and T. mirabilis (C, D) central cartwheel 3D maps. The final major classes reported in the manuscript (A: 55 % class, C: 64 % class) were subjected to re-classification, which again yielded one major class in each case, with no notable improvement (B, D).

      Rebuttal Figure 2: Reclassification with non-overlapping sub-volumes

      (A-F) Transverse (top) and longitudinal (bottom) views of T. spp. (A, B) T. agilis (C, D) and T. mirabilis (E, F) central cartwheel 3D maps. The final maps reported in the manuscript (A, C, E) were generated with a 25 nm step size, yielding overlapping sub-volumes, whereas the maps in (B, D, F) were generated from non-overlapping sub-volumes, with no notable differences between the two that would affect the conclusions of the manuscript.

      Rebuttal Figure 3: Polar centriolar cartwheel upon sub-classification

      (A-C) 3D transverse views of non-symmetrized STA centered on the spokes to jointly show the central cartwheel and peripheral elements in the T. agilis 45 % class (A), as well as separately in the 25 % class (B) and 20% class (C). No notable differences are apparent following such re-classification, apart from the output being noisier due to the lower number of sub-volumes in each sub-class.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this manuscript Nazrov et al., use cryo-electron tomography (CET) to analyse the structure of the centriole cartwheel. The Gonczy lab have previously generated a ground-breaking structure of the cartwheel from Trichonympha spp (T. spp.) (Guichard et al., Science, 2012; Guichard et al., Curr. Biol., 2013). This work is a direct continuation of those studies but using modern technology to get higher resolution images of the T. spp. cartwheel, and comparing this to the cartwheel from Triconympha agilis and from another distantly related flagellate Tetranympha mirabilis.

      The data is generally well presented and of high quality. I am not an expert in CET, so it would be advisable to get the opinion from a reviewer who is, but the Gonczy lab are experienced in these techniques so I would not anticipate any problems. I have to admit that the title of the paper did not excite me, and I expected this to be a very worthy, but incremental study. It was a pleasure to find out that the extra detail provided by the increased resolution has revealed several new and unexpected features that have important implications for our understanding of cartwheel assembly and function. Most important are the potential asymmetry of the cartwheel hub, apparent variations in the packing mechanism of the stacked rings (even within the same cartwheel), and the potential offsetting of ring stacking. These findings will be of great interest to the field, and so I am strongly supportive of publication in The EMBO Journal. I have only a few points that I think the authors should consider.

      1. Nazarov et al., conclude that the cartwheel structure is intrinsically asymmetric. This is most convincingly based on the displacement of the CID within the hub, but they state that the Discussion that the potential offset between the Sas-6 double rings generates an inherently polar structure. I didn't understand why this is the case. Looking at Fig.S9A,B I can see that the offset in B could tilt to the left (as shown here) or to the right (if the structure was flipped by 180o). But I couldn't see how this makes this structure polar in the sense that a molecule coming into dock with the structure could only bind to one side of the offset structure shown in B, but to both sides of the aligned structure shown in A. I think this needs to be explained better, as it is crucial to understand where any potential polarity in the cartwheel structure comes from.

      2. Related to this last point, in a co-submitted paper Klena et al. do not report such an asymmetry in the hub structures they have solved from several different species (neither in the tilting of the hub, or the displacement of the CID). I think it would be worth both sets of authors commenting on this point.

      3. The authors data strongly suggests that the T. agg. and Te. mir. hubs are composed of a mixture of single and double Sas-6 rings. In contrast, the T. spp. cartwheel only has a single class of rings, but it wasn't absolutely clear if the authors think this comprises a single or double ring. In the text it is presented as though the elongation of the hub densities in the vertical direction is a new feature of the T. agg cartwheel (Fig.2H,I), but to me it looks as though this is also apparent in the T. spp. cartwheel (Fig.2C,D). The authors should address this directly and, if they believe that T. spp. has a double ring, they should comment on whether this more regular structure seems to have offset rings. If not, then the offset rings are unlikely to be the source of asymmetry that leads to the asymmetric displacement of the CID. Finally, if the authors think these are double rings, they should also be clear that they would now slightly re-interpret their original T. spp. cartwheel model (Figure 2, Guichard et al., Curr. Biol.). There is no embarrassment in this-a higher resolution structure has simply revealed more detail.

      4. The authors conclude that T. mirabilis cartwheels lack a CID and instead have a filament-like structure (FLS). I wonder whether it is more likely that the FLS is really a highly derived CID that appears to be structurally distinct when analysed in this way, but that will ultimately have a similar molecular composition. This situation might be analogous to the central tube in C. elegans, which by EM appears to be distinct from the central cartwheel seen in most other species, but is of course still composed of Sas-6. This historical tube/cartwheel nomenclature is now cumbersome to deal with, so perhaps it would be better to be cautious and not give the T. mirabilis structure a completely new name-how about "unusual CID" (uCID).

      Significance

      see above

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Here, Nazarov and colleagues report sub-tomogram average (STA) maps of centrioles with 16 to 40 Å resolution from Trichonympha spp., Trichonympha agilis, and Teranympha mirabilis. Even though the authors have previously described the centriole architecture of T. spp, these STA maps of higher resolution revealed new features of centrioles, like polarized Cartwheel Inner Density (CID) and the pinhead. They also observed Filament-like structure (FLS) from T. mirabilis which seems to correspond to the CID from other species. Interestingly, they suggest that one and two SASS6 rings are stacked in an alternative fashion to make the central hub in T. mmirabilis (Figure 5). The following issue should be addressed:

      Major points

      1. Figure 4E. Authors mentioned in the manuscript that "We observed that every other double hub units in the 36% T. mirabilis class appears to exhibit a slight tilt angle relative to the vertical axis". When I see the other side, it does not seem to be tilted. Could the authors explain this?

      Minor Points

      1. Page 11, I think Fig. 9G indicates Fig. S9G.

      Significance

      I believe these results are of interest for all centrosome researchers, and would like to recommend this manuscript be published in the EMBO journal which is affiliated with the Review Commons.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Centriole structure has been an attractive but challenging research topic for years. Pierre Gonczy's group has been working on its structure using cryo-electron tomography (cryo-ET). While the axoneme, which has longitudinal periodicity, was analyzed by several groups by cryo-ET for more than a decade, cryo-ET study on the centriole suffers from poor signal to noise ratio due to its limited length and thus fewer periodicity. They chose the centriole of flagellate Trichonympha, which have exceptionally long centrioles and thus offer opportunity of relatively straightforward subtomogram averaging. Their approach has been successful and they revealed intermediate resolution structure of the cartwheel, key of 9-fold symmetry formation, and it's joint to triplet microtubules (Guichard et al. 2012, 2013, 2018). In this work, they employed modern state-of-art cryo-ET technique, such as direct electron detection and 3D image classification to upgrade our knowledge of centriole structure. In their past works, the central hub of the cartwheel, made of SAS-6 protein forming 9-fold complex, was described as an 8nm periodic object. With improved spatial resolution, they provided further detail with clear polarity, which will deepen our thought about the initial stage of ciliogenesis. They also compared two Trichonympha species (spp and agilis) as well as another flagellate, Teranympha micabilis, and extended their intriguing evolutional and mechanical hypotheses based on structural differences. Despite improved spatial resolution, it is still not possible to identify proteins in the cryo-ET map (cellular cryo-ET will not reach such high resolution in the near future). Therefore this work is rather geometrically descriptive, which will inspire molecular biologists to identify molecules by other methods. Nevertheless this work demonstrated capability of cellular cryo-ET, especially analysis of structural heterogeneity. Thus, while biological topics handled are rather specialized for cilia from flagellate, this work will attract attention of any biologist interested in molecular structure in vivo. It is worth for publication in a high Journal after addressing the points below. This reviewer believes that the authors can address these points easily with additional analysis.

      Major points:

      1. Entire scheme A graphic diagram of the entire cartwheel area, summarizing this work, is necessary for the readers' understanding (similar to Fig.6 of the other manuscript, Klena et al.). Then average scheme should be shown in more detail, especially assumption of periodicity, Materials and Methods. The cartwheel hub was averaged with 25nm periodicity (as discussed below). Was the pinhead averaged with 16nm (as detected by FFT in Fig.S2L)? How about the triplet? This reviewer is not completely sure if the longitudinal averaging strategy is justifiable. Since periodicity of each domain is not trivial, logically the initial average must be done with the size of least common multiple (or larger). It is likely 96nm, assuming 25nm of the central hub is 3 times of microtubule periodicity and 16nm of the pinhead is twice of MT. 96nm average should be possible with a long cartwheel in this work. Alternative, in case periodicity is independent of MT and thus there is no least common multiple, is random picking and classification mentioned in "4. Periodicity". This should also be possible, since they can pick enough number of particles from long cartwheels.

      2. Classification The authors analyzed structural heterogeneity inside the cartwheel hub, employing reference-free classification by Relion software. The program reveals multiple coexisting structures - two from Trichonympha agilis and three from Teranympha, respectively. Whereas this is an exciting finding and shows future research direction of this field, interpretation of this classification must be done carefully. It is puzzling that major (55%) population of T. agilis shows more ambiguous features than the minor population (45%), while spatial resolutions by FSC are not so different - for example, Fig.2H vs Fig.S5C. In case of Teranympha, it is even more drastic - Fig.4D (major class) seems blurred along the centriolar axis, compared to Fig. 4E (minor class). This reviewer is afraid that these "major" classes might contain more than one structure and after subaveraging be blurred in detailed features. The apparent good spatial resolution could be explained, when two structures coexist and subtomograms are aligned within each subclass. Probably lower resolution at the spoke region of the major class (Fig.S2A) than that of the minor class (Fig.S2D) is a sign of heterogeneity within this class. Another risk could be subtomograms with poorer S/N being categorized to one class (due to lack of feature to be properly classified). Fig.S5F (black dots localized in one tomogram) raised this concern. The following investigation will help to solve this issue. 1. Extract and re-classify subtomograms belonging to the major population. 2. Direct observation of tomograms. The authors could plot two classes of Teranympha (as they did for T. agilis in Fig.S5) and find features of the cylindrical cartwheel hub in two conformations (as shown Fig.4DE). Since such a feature was directly observed in tomograms from the other manuscript (left panels of Fig.S6AC in Klena et al.), it should be possible in this work as well.

      3. Periodicity mismatch In Fig. 2CD, periodicity of CID has discrepancy from that of the stacked SAS-6 ring (8.5nm and 8.0nm). Do the authors think this is a significant difference or within an error? The same question can occur to other subtomogram averages. It would be nice to show errors as shown in their other manuscript (Fig.3C of Klena et al.) and clarify their idea. If it is systematic difference of periodicity between the stacked ring and CID, this shift will be accumulated through the entire cartwheel region - after 100nm, 8.5nm/8.0nm difference can be accumulated to ~6nm, which should change the entire view of the subtomogram - and the main factor to be classified (periodicity mismatch). This artifact (or influence) should be removed (or separately evaluated) by masking CID (out and in) and run classification separately. By clarifying this, the quality of the major subaverages (mentioned in the previous paragraph) could be improved.

      4. Periodicity They averaged subtomograms extracted with spacing of 252A with initial average as the first template (p.18 Line22). This means they assumed 25nm periodicity from the beginning and excluded different or larger unit size (if they take search range wide, they could detect difference periodicity, but will still be biased by initially assumed 25nm). 25nm average allowed them to see more detail than before (when they assumed 8nm periodicity), but there is still a risk of bias from references. To avoid this risk, this reviewer would propose classification of randomly extracted (but of course along the cylindrical hub or along the triplet microtubules, so one-dimensionally random picking) subtomograms. This experiment will end up with multiple subaverages, which are 25nm (or multiple times of that) shifted from each other. Then it will prove their assumption.

      Minor points: They discussed difference of stacked SAS-6 rings in the cartwheel from various species. How much is the sequence difference of SAS-6 among these species? Are the authors sure that CID is nine-fold symmetric? It is not trivial. p.7 Line21 "Fig.S1D-O": D-L p.8 Line1: It would be nice if more detailed description about MIPs, correlating to recent high resolution works from Bui and Brown labs. p.9 Line6 "Focused 3D classification...": This sentence is unclear. p.18 5 lines from bottom "S6C, S6F": How can these panels be power spectra to measure spacing? Typo? Fig.1C: Another cross-section from the distal region will be helpful. A longer scale bar is better for readers' understanding. p.29 Line6: pin -> pink Fig.S6F: It would be informative if the subclasses (25% and 20%) are distinguished in this mapping. A figure to explain the classification scheme will help readers understand. How many subtomograms did classification started? Were the 45% class classified into two (25% and 20%) groups by two-step classification or at once (the entire subtomograms were classified into three groups directly?

      Significance

      Nevertheless this work demonstrated capability of cellular cryo-ET, especially analysis of structural heterogeneity. Thus, while biological topics handled are rather specialized for cilia from flagellate, this work will attract attention of any biologist interested in molecular structure in vivo. It is worth for publication in a high journal after addressing the points above. This reviewer believes that the authors can address these points easily with additional analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note from the authors (AU): This manuscript has been reviewed by subject experts for Review Commons. The authors would like to thank the reviewers for their comments to the manuscript, and the editor for patience with our response. Our reponse was delayed due to the COVID-19 lock-down situation in our institution. Now we are pleased to provide the following point-by-point response, as detailed below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Suomalainen et al. describes a fluorescence-based approach combined with high-resolution confocal microscopy to study the heterogeneity of adenovirus infection in a population of human cells. The main focus of the authors is the detection of viral transcripts in infected cells, how this correlates with viral genomes, the cell state, and how it varies between different cells in a single population. The paper is generally well written and easy to read, with a few typos, although I found parts of it to be somewhat length and repetitive. Particularly the results section could be pruned somewhat for readability and clarity. The major limitation of the study as it stands is it's overall impact and novelty, which limits journal selection somewhat. A very similar study was recently published, which the authors cite (Krzywkowski et al, 2017). Nevertheless, I think the study design is rigorous and well executed, but I do have some specific comments which may enhance it's overall impact and novelty.

      **Major:**

      Results "Visualization of AdV-C5..." section:

      Why not also look at normal cells that can be synchronized? Cancer cells, such as A549 will by definition be highly heterogenous and at all phases of the cell cycle. Primary non-transformed cells can easily be synchronized by contact inhibition and are much more physiologically relevant.

      AU: In the current manuscript, we concentrated on the early phases of the AdV-C5 infection, on the question how virus gene expression is initiated and whether the cell cycle phase of the host cell impacts the initiation of virus gene expression. Answering these questions requires use of cells that express good amount of virus receptors so that viruses efficiently bind to the cells and infections can be synchronized so that extended time does not elapse between virus addition and accumulation of E1A transcripts; extended time between these two steps would make interpretation of the results more complex since cells could have progressed from one cell cycle stage to another during the experiment. Furthermore, having cells at all phases of the cell cycle is actually a benefit since then the experiment can be carried out under an “unperturbed” condition; all cell cycle synchronization methods have pleiotropic effects on the cells.

      It is true that primary non-transformed cells are physiologically more relevant than cancer cells, but primary cells have issues with donor-to-donor variability and many primary cells express rather low amounts of AdV-C5 receptors, so synchronized infections in these cells are not possible. Furthermore, the extended cell morphology of many normal fibroblast cell lines and the tendency of cell extensions from neighboring cells to overlap makes fluorescent images of these cells incompatible for automated cell segmentation.

      Here, we provide data also from HDF-TERT cells (nontransformed human diploid fibroblasts immortalized by human telomerase expression) to show that two of our key findings from A549 cells are not artefacts of cancer cells. This is, that akin to A549 cells, the infected HDF-TERT cells accumulate high number of E1A transcripts (Fig.1C), and also in these cells nuclear vDNA numbers do not predict the cytoplasmic E1A transcript counts during early phases of infection (S2C Fig). However, since HDF-TERT cells are rather inefficiently infected by AdV-C5, correlation of early E1A transcript accumulation to the cell cycle phase of the host cell could not been done in these cells. We have been unable to identify primary or normal immortalized cells that would be easily available and efficiently infected by AdV-C5 (synchronized infection with short time elapsed between virus addition and accumulation of E1A transcripts).

      "The virus particles bound..." - Can the spatial resolution of a confocal microscope truly differentiate individual particles that are sub-wavelength in size? What about the sensitivity for single particles? Some sort of experiment to show that single particles can be detected should be performed and shown to assure the readers that this is in fact possible. Furthermore, even when based on the particle to pfu ratio, the MOI would still be nearly 2000pfu/cell, so the actual number of observed particles is an order of magnitude lower than what was applied to the cells.

      AU: The fluorescence signal from individual fluorophore-tagged AdV or anti-hexon antibody-decorated particle is bright enough to be picked up by PMT or HyD detectors of the current confocal laser scanning microscopes. In fact, tracking fluorophore-tagged particles of the size of AdV has been a standard microscopy procedure since late 1990’s.

      Because the Reviewers were questioning the apparently high multiplicity of infection used in the experiments, we clarify the difference between “standard” MOI estimations and our infection set-up. First of all, as described in Material and Methods, we estimated the number of physical virus particles in our virus preparations using A260 measurements (J.A. Sweeney et al., Virol. 2002, doi: 10.1006/viro.2002.1406). This method, like all other methods used to estimate virus particle numbers, is likely not 100% reliable.

      Second, we incubated the virus inoculum with cells only for 60 min, after which the unbound viruses were washed away. During this short incubation time only a small fraction of input virus particles bind to cells, and indeed as shown in Fig.1A, a theoretical MOI of 54400 physical virus particles/cell or 13600 physical virus particles/cell yielded Median of 75 and 26 bound virus particles per cell, respectively. Interpretation of the results from the cell cycle assays required that there was a relatively short time between infection and analysis so that cells in a large scale did not change their cell cycle status during the experiment. This required use of a rather high MOI. Furthermore, for collection of a large data set, it is convenient that every cell is infected.

      Third, what exactly does one pfu mean in terms of physical adenovirus particles? There is no clear answer to this, since several parameters affect the pfu. In which cells was the titration carried out? How long was the input virus inoculum incubated with the cells? How many of the virus particles entering the cell actually established an infection? And, as described in A. Yakimovich et al. (J. Virol. 2012, DOI: 10.1128/JVI.01102-12), only a fraction of infected cells produce a plaque. The majority of papers stating that x pfu/cell was used for infection, usually incubate the cells with the virus inoculum for several hours at 37°C, and never make any attempts to estimate exactly how many virus particles entered into the cells.

      Fig. 4 - I am not certain that the observed difference is significant, at least looking at it, beyond the width difference of the peaks, highest expression for both is largely in G1. It would be nice to see this using a western blot of cell cycle sorted cells, which can easily be accomplished using FACS.

      AU: In the highest GFP expression bin, CMV-eGFP expressing cells have 43% cells in G1 and 50% in S/G2/M. In comparison, E1A-GFP expressing cells have 58% cells in G1 and 35% in S/G2/M. The difference in G1 cells in the highest eGFP bin is statistically significant (p Page 15, 2nd paragraph. It would be valuable and informative to determine whether there is heterogeneity in histone association with these different vDNAs and whether these histones exhibit divergent modifications (enabling or restricting transcription). Same as above. I am rather surprised that the DBP signal did not correlate well with vDNA signal, particularly for the larger replication centers. How can this be reconciled? Was there an increase in overall vDNA signal later in infection? It is important to know this as it determines whether the observed vDNA signal is real or could be caused by viral RNA or other background causes (non-infected controls notwithstanding). Can the signal be detected with inactivated viruses (via UV for example?)

      AU: Whether histone modifications impact the transcriptional output of adenovirus genomes early in infection is indeed an intriguing question, but unfortunately this is very challenging, if not impossible, to study at single-cell / single vDNA level with the existing technology. Techniques for single-cell measurements of chromatin states are still in infancy, although some notable advancements in this field were reported in 2019 (e.g. K. Grosselin et al. Nature Genetics, DOI: https://doi.org/10.1038/s41588-019-0424-9 and S. Ai et al. Nature Cell Biology, DOI: https://doi.org/10.1038/s41556-019-0383-5).

      Furthermore, current literature offers a confused picture as to when exactly protein VII on incoming virus genomes is replaced by histones (reviewed in the reference 39, Giberson et al.). Of note, the vast majority of incoming nuclear vDNA molecules scored protein VII-positive with anti-VII staining under the experimental conditions used for the Fig. 2C data. However, we did not include these results into the manuscript because VII-positive signal on vDNAs does not exclude these vDNAs having histones on certain parts of the genome.

      The Reviewer wonders why the DBP signal in Fig.6C does not correlate with vDNA signal. There is no discrepancy here because DBP signal in the figure is a proxy for replicating vDNA whereas the click vDNA signal reports incoming vDNA. The one DBP spot without an associated click vDNA signal could be due to a replication center originated from a replicated viral genome, not from incoming viral genome. The figure shows that incoming vDNAs within the same nucleus initiate replication asynchronously.

      Page 18, 1st paragraph. It would be interesting to determine whether there was association between pol II and those genomes that showed no E1A, similarly to the histone suggestion. What about things like viral chromatin organization? Soriano et al. 2019 showed how E1A and E4orf3 work in tandem to alter viral chromatin organization by varying histone loading on the viral genome.

      AU: This again would be technically very challenging to show. We actually tried to visualize active transcription using an antibody against RNA polymerase II CTD repeat YSPTSPS (phosphor S5), azide-alexa fluor488 and anti-alexa fluor488 antibody to mark EdC-labeled incoming vDNAs and proximity ligation assay for signal amplification. However, this method was not sensitive enough to detect RNA polymerase II association with individual viral genomes. We only detected the proximity ligation signal in replication centers when replicated viral genomes were tagged with EdC.

      Fig. 2. Can you really say that a single dot correlates with a single transcript? Has that been validated in any way?

      AU: Signal amplification with branched DNA technology leads to binding of a large number of fluorescent probes to a mRNA and thus enables detection of single nucleic acid molecules. This has been validated e.g. in A.N. Player et al. 2001. J. Histochem. Cytochem (https://doi.org/10.1177/002215540104900507) and N. Battich et al. 2013. Nature Methods (https://doi.org/10.1038/nmeth.2657).

      **Minor:**

      Page 5, last paragraph. "Transcirpts from the viral late transcription unit,..." This is not correct as recently shown by Crisostomo et al, 2019.

      AU: The data in Crisostomo et al. paper suggest that some late gene expression can occur before vDNA replication, but an abundant accumulation of late transcripts coincides with onset of vDNA replication. However, the Crisostomo et al. study did not test what the levels of late gene transcripts are if the vDNA replication was inhibited. But to acknowledge the possibility that there might be some level of late gene transcription prior to replication of the viral genomes, the sentence is modified as follows: “Transcripts from the viral late transcription unit, amongst them mRNAs for the viral structural proteins, vastly increase in abundance concomitant with the onset of vDNA replication”. Furthermore, we have added the Crisostomo et al. reference here as well.

      Page 10, "... because AdvV-infected cells are less well adherent..." This is not strictly true as loss of attachment only occurs later on in infection. It would be helpful to have statistical significance indicated directly in the figures.

      AU: Although clearly visible cell rounding indeed occurs only late in infection, also during early stages of infection the HAdV-C5-infected cells are less adherent than non-infected cells. In many assays this is not obvious, but the RNA FISH staining procedure includes several incubation and washing steps in rather harsh buffers, and we observed random, sometimes considerable, cell loss with infected cultures but not with non-infected cultures.

      In the revised manuscript we have included the statistical significance P values both into the main text and the figure legends, but not to the figures directly, because the P values were generated with different statistical tests and P values should not be shown/mentioned without stating which statistical test was used. However, we noticed that we had in some cases omitted to mention what was the number of pairs analyzed in some of the Spearman’s correlation tests. This has now been corrected in the revised manuscript.

      The very high MOIs used are concerning, could these have negative effects on the cell viability or overall state?

      AU: We refer to our explanation above about the theoretical MOI and the actual MOI. Furthermore, in the experiment described in Fig.2C (correlation of E1A transcripts per cell vs. viral genomes per cell), 42% of analyzed cells had ≤ 5 viral genomes/cell and 27.5% of analyzed cells had between 6-10 viral genomes per cell; these are not high numbers. We also provide controls that the EdC-labeled genomes are detected with good efficiency. Hence the EdC-labeled genomes per cell are a good estimate of the numbers of virus particles that indeed entered into the cells.

      There are a few typos and such that should be corrected. AU: We have tried to find and correct the typos.

      Reviewer #1 (Significance (Required)):

      As I stated above, the work is interesting and significant, to a degree. The major limitation is that the novelty is low as a paper published in 2017 (cited by the authors) used a very similar approach to investigate a similar problem. In addition, there are multiple other recent papers looking at cell populations in the context of adenovirus infection, and whether a single cell or population based approach is better is unclear. This is something the authors might want to strengthen prior to submission.

      AU: In the current study, we focused on the early phase of HAdV-C5 infection, on how viral gene expression is initiated and how individual nuclear viral genomes proceed to a replicative phase. The Krzywkowski et al. 2017 J. Virol. Paper that the reviewer refers to used padlock probe-based rolling circle amplification technique to simultaneously detect HAdV-C5 genomes and viral mRNAs in individual infected cells.

      The shortcoming of this method is inferior sensitivity compared to the branched DNA technology-based method used by us in the current study. Krzywkowski et al. were able to pick up signals from virus mRNAs and virus genome only relatively late in the infection, i.e. at the time when incoming genomes were expected to have multiplied by replication. Thus the study by Krzywkowski et al. was unable to provide information for the questions addressed in our study, i.e. do the levels of E1A transcripts early in infection correlate with viral vDNA counts in the nucleus and is there variability in the transcription output from individual vDNAs within the same nucleus, or variability in how individual vDNAs within the same nucleus proceed into the replication phase. We hence do provide novel information, and do not consider this as a limitation of our paper.

      We emphasize that population assays are done to attempt to understand molecular basis of a phenomenon by correlations. Instead, deep molecular insights require to-the-point-assays, in the case of transcription, single-molecule live cell assays at the level of single genes. Technically, we (and also the field) are not quite there yet.

      Regardless, our study is a first step towards understanding transcription output of nuclear HAdV-genome at single-cell, single-genome levels. It has revealed insight that was not apparent from population assays. It is clear that the next step will be time-resolved live cell assays with simultaneous detection of transcription output, genome detection and transcription factor clustering on the genomic loci. With current technology the simultaneous detection of all these events is challenging, and requires the development of further technology.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors show heterogeneity of AdV-C5 mRNA transcript quantity and dynamics in different cell types, which is regulated by the cell cycle phase and does not correlate to incoming viral DNA, using single molecule RNA FISH technologies and detection of incoming viral DNA by EdC labeling.

      **Major Comments:**

      The authors change the MOI used in their experiments (7 different MOIs are used throughout the paper) in a manner that appears randomly and without explanation. (54400 for Figure 1A, 1B, 3B, S3B; 37500 for Figure 1C; 23440 for Figure 2A, 2C, S5A; 13600 for Figure 1A, 1D; 36250 for Figure 3C, S3D; 11200 for Figure 4B; 23400 for Figure 6B). The authors should provide explanation, why these changes in MOIs are necessary.

      AU: The MOIs given are theoretical MOIs, and essentially all figures indicate what was the actual MOI, that is, the real number of virus particles entering into the cells. This is beyond what is commonly provided in virology. It is essential, however, since MOI differs between different cell types. Therefore, we prefer to use the actual MOI as shown in Fig.1A, or we indicate the number of vDNAs that were delivered to the cells of interest.

      Variable MOIs had to be used to ensure that different cell lines received comparable numbers of virions, in particular virus particle binding to and entering into the cells. Infection kinetics are different in different types of cells, but can be tuned by MOIs used. Furthermore, different virus preparations were used in the experiments and we performed analyses at different stages of the infection cycle. Due to all these different facettes provided by our experiments, it was impossible to choose one standard (theoretical) MOI for all the experiments.

      The authors use mean fluorescence intensity of E1A probes per cell as estimate for viral transcript abundance for some of their experiments (Figure 1D, E, 3B), and count E1A punctae as measure for E1A transcripts in other experiments (Figure 2C, 3C, 5), without showing data, that these measures correlate. Problematic is hereby, that not all E1A punctae have the same signal intensity, as can be seen in Figure S1, which makes the estimation of the correlation of E1A punctae (= number of transcripts) and fluorescence intensity difficult. The authors should provide both (E1A punctae counts and estimation via fluorescence intensity) for at least one experiment, to prove, that the estimation of E1A transcript levels via fluorescence intensity is feasible.

      AU: The quantification method had to be adjusted to the number of virus transcripts in the cell at the time of analysis. The best quantification method is segmentation and counting the individual fluorescent puncta per cell, but, as stated in the manuscript, this method does not accurately quantify the mRNA puncta from maximum projections of confocal or widefield image stacks when the number of puncta per cell exceeds ~ 200.

      On the other hand, as shown in the quantification below, mean fluorescence intensity measurements per cell do not of course distinguish between cells having one vs. two mRNA puncta. Yet, as shown in the figure below, a relatively good correlation between puncta counting and fluorescence intensity measurements is achieved when cells have ≥ 10 transcripts per cell. Subsets of randomly picked images of the Fig.2C/Fig.5 dataset were included into the analysis (rs is Spearman’s correlation rank coefficient, approximate P p.15: "The nuclear E1A signals in AraC-treated cells were resistant to RNase A, but they were dampened by treatment with S1 nuclease (S6B Fig)." The authors make this statement based on (i) two completely different timepoints (12 h.p.i. for RNaseA treatment, 24.5 h.p.i. for S1 nuclease treatment) and (ii) in different clones of the A549 cells as stated in the methods section on p.21 (Two different clones of human lung epithelial carcinoma A549 cells were used in the study: our laboratory's old A549 clone (experiments shown in Fig. 1, Fig. 3B and S1 Fig., S3B and S3C Fig., S6A and S6B Fig., RNase A treatment) and A549 from American Type Culture Collection (ATCC, experiments shown in Fig. 2 and Fig. 5, Fig. 6, S2B Fig., S4 Fig., S5 Fig., and S6B Fig. S1 nuclease-treatment)). This makes it difficult to interpret, if the data is due to differences in the timepoints or cell types, or if it is due to binding of the E1A probe to single stranded vDNA.

      AU: This is a fair criticism, thank you. We have replaced the RNase A figure S6B in the revised manuscript. A new RNase A experiment was repeated in ATCC A549 cells using the same infections conditions as with the S1 nuclease-treated cells.

      **Minor Comments:**

      p.4: "AdV are non-enveloped, double-stranded DNA viruses that cause mild respiratory infections in immuno-competent hosts, and establish persistent infections, which can develop into life-threatening infections if the host becomes immuno-compromised [reviewed in 6]." Not all AdV cause respiratory diseases, the disease outcome of human AdV depends on the site of primary infection, which differs between the different AdV types.

      AU: We have modified the text as follows: AdV are non-enveloped, double-stranded DNA viruses that cause mild respiratory, gastrointestinal or ocular infections…

      p.7: The authors state, that "At the 17 h time point, about half of the cells had high numbers of protein VI transcripts, and most of them very high numbers of E1A transcripts.", however, the picture shown in Figure 1F shows a different phenotype, with low transcript levels of VI in E1A high cells and high transcript levels of VI in E1A low cells.

      AU: This was perhaps a bit difficult to see in the overlay images since one has to distinguish between green and yellowish green. We have provided the individual channels along the overlay picture in Fig. S1D, and now it is clear that at 17h pi cells with high numbers of VI transcripts have also high numbers of E1A transcripts.

      p.8: "This nuclear E1A signal is due to binding of the E1A probe to single-stranded vDNA in the replication centers (see below)." The authors should state here, that due to the binding of the probes to the single stranded vDNA in the replication centers, the nucleus was excluded from the analysis for Figure 1F in late timepoints.

      AU: We have modified the text according to the Reviewer’s suggestion. The text is now as follows: ‘Due to further studies (see below), we assume that this nuclear E1A signal represents binding of the E1A probe to single-stranded vDNA in the replication centers. Accordingly, the nuclear area was excluded when quantifying the viral transcripts per cell in late timepoints (Fig. 1F).’

      Due to this time point the author cannot state that the E1A staining seen (Fig. 1F; indicated with white arrows) are replication centers; this is just an assumption, since there is no evidence in Fig 1 the author cannot be sure; the author should change the text: "taking the following experiments into account...", "due to further studies (see below)..... we assume that..."

      AU: We have modified the text according to the Reviewer’s suggestion; see also the previous comment above.

      p.8: The authors should mention the figure they refer to, since there is no E1B-55K staining in Fig. 1F

      AU: The text has been modified as follows: Whereas other time points showed relatively few E1A, E1B-55K or VI puncta over the nuclear area (Fig. 1B, 1F, S1A Fig.), clustered nuclear E1A signals were apparent at 23 h.

      p.9: Which test was used to calculate the additional p-values?

      AU: As stated in the Material and Methods section or the figure legends, the p-values were calculated either by a permutation test using custom-programmed R-script (the code has been deposited on Mendeley Data along with other data associated with this manuscript), or by Kolmogorov-Smirnov test using GraphPad Prism. GraphPad Prism was also used to calculate Spearman’s correlation coefficients and the associated approximate p values. In the revised manuscript, we have added the following sentense into the Material and Methods section / Statistical analyses: Spearman’s correlation tests were done using GraphPad Prism.

      p.10: For the experiment for the correlation of viral genomes per cell and E1A transcripts in HDF-TERT cells (Figure S2C), the MOI is missing in the description of the results, as well as in the corresponding figure legends.

      AU: We have indicated the theoretical MOI (~ 4800 virus particles per cell) in the figure legend and in the Material and Methods section. The actual MOI, i.e. the actual number of virus particles entering into the cells, could not be determined due to the long (15 h) incubation time of virus inoculum with the cells, which in turn was required because these cells bind AdV-C5 rather inefficiently. However, between 1 and 32 EdC-labeled virus genomes were detected per cell nucleus at 22 h pi.

      11: calculation of correlation? rs? Why does the author combine S and G2/M phase? Fig. S3A show different values for the phases

      AU: rs is the abbreviation for Spearman’s correlation coefficient, and, as indicated in the Material and Methods, we used GraphPad Prism to calculate the Spearman’s correlation coefficients.

      Different methods to estimate cell cycle stages. DNA content method cannot separate S and G2/M with great confidence, whereas Kusabira Orange-hCdt1 and Azami-Green-hGeminin expressions in HeLa-Fucci cells allow more fine-tuned assessment of the cell cycle phases.

      p.11: "Thus, the total intensity of nuclear DAPI signal can be used to accurately assign G1 vs S/G2/M stage to cells." The authors should also here refer to other papers, which showed that this correlation is feasible, as they did in the methods section (67. Roukos V, Pegoraro G, Voss TC, Misteli T. Cell cycle staging of individual cells by fluorescence microscopy. Nature protocols. 2015;10(2):334-48. Epub 2015/01/31. doi: 10.1038/nprot.2015.016. PubMed PMID: 25633629; PubMed Central PMCID:PMCPMC6318798.), and maybe also refer to a newer paper which deals with this technique: Ferro, A., Mestre, T., Carneiro, P. et al. Blue intensity matters for cell cycle profiling in fluorescence DAPI-stained images. Lab Invest 97, 615-625 (2017). https://doi.org/10.1038/labinvest.2017.13

      AU: The integrated nuclear DAPI signal intensity is indeed a widely used method to assign cell-cycle stage to individual cells. We have added the second reference suggested by the Reviewer to the reference list for this method.

      p.11: "Furthermore, when focusing on the highest E1A expressing cells, i.e. the cells with mean cytoplasmic E1A intensities larger than 1.5 × interquartile range from the 75th percentile, 71.9% of these cells were found to be in the G1 phase of cell cycle, whereas only 55.8% of cells in the total sampled cell population were G1 cells." The authors do not provide any reference to a figure within the manuscript or the supplements, which contains these data. Are these data not shown in the manuscript?

      AU: These values are calculated from the data shown in Fig.3B. The source data supporting findings of this study (maximum projection images, excel files of the CellProfiler and Knime workflows) have now been deposited to Mendeley Data as stated in the Material and Methods / Data availability section of the revised manuscript and listed in Supplementary tables.

      p.12: punctuation mistake; . instead of , To enrich G1 cells. AdV-C-5 (moi ~ 36250) was added. Why does the author switch between signal intensities and counting E1A puncta per cell (limited to 200) in the different experiments to illustrate accumulation of E1A transcripts?

      AU: The same answer as above: the quantification method had to be adjusted to the number of virus transcripts in the cell at the time of analysis. The best quantification method is segmentation and counting the individual fluorescent puncta per cell, but, as stated in the manuscript, this method does not accurately quantify the mRNA puncta from maximum projections of confocal or widefield image stacks when the number of puncta per cell exceeds ~ 200. On the other hand, as shown in the quantification in the new S1C Fig., mean fluorescence intensity measurements per cell do not of course distinquish between cells having one vs. two mRNA puncta, but a relatively good correlation between puncta counting and fluorescence intensity measurements is achieved when cells have ≥ 10 transcripts per cell.

      p.14: "For E1A (or E1B-55K), we did not detect transcriptional bursts with bDNA-FISH probes on nuclear vDNAs, either prior to or after accumulation of viral transcripts in the cell cytoplasm." The authors do not provide any reference to a figure within the manuscript or the supplements, which contains these data. Are these data not shown in the manuscript?

      AU: This statement is based on hundreds of images we have analyzed during the course of the study. It is impossible to show all of these images, so in principle, this is “data not shown”. We have modified the text as follows: With hundreds of images analyzed, we never unambiguously detected transcriptional bursts with E1A (or E1B-55K) bDNA-FISH probes on nuclear vDNAs, either prior to or after accumulation of viral transcripts in the cell cytoplasm.

      p.14: space between number and %

      AU: Thank you for pointing this out. It has been corrected.

      p.15: "This is was also seen in AdV-C5-EdC-infected cells" should be changed to "This was also seen in AdV-C5-EdC-infected cells"

      AU: Thank you for pointing this out. It has been corrected.

      Fig. 1B:

      −figure legend does not indicate how cells were staine −also no description in the continuous text −which E1A transcripts are stained? all? 12S? 13S?

      AU: The first sentence in Results section states that “We used fluorescent in situ hybridization (FISH) with probes targeting E1A, E1B-55K and protein VI transcripts followed by branched DNA (bDNA) signal amplification to visualize the appearance and abundance of viral transcripts in AdV-C5-infected A549 lung carcinoma cells.” Furthermore, the legend to Figure 1 starts with the title “Visualization of AdV-C5 E1A, E1B-55K and protein VI transcripts in infected cells by bDNA-FISH technique”, and the legend to Fig.1B mentions that “cells were stained with probes against E1A and E1B-55K mRNAs or E1A and protein VI mRNAs”. We are of the opinion that this is enough information to understand the figures.

      The main text to Fig.1 also states that “The E1A probes covered the entire E1A primary transcript region and thus all E1A splice variants. The temporal control of E1A primary transcript splicing and E1A mRNA stability give rise predominantly to 13S and 12S E1A mRNAs at 5 h pi (references)”.

      Fig. 1D: −difference in accumulation of viral transcripts is not that visible as in IF staining (Fig. 1B; Fig. 1S);

      Fig. 1 or S1 Fig. do not show IF staining but signals from FISH.

      −graph does not show any difference between E1A and E1B-55K

      AU: The y-axes values in Fig.1D graph are arbitrary units and thus E1A and E1B-55K graphs are not directly comparable to each other. We have included into the revised manuscript S1B Fig., which shows quantification of E1A and E1B-55K fluorescent puncta per cell at the 5 h pi; the difference between E1A and E1B-55K was statistically significant.

      Fig. 1F: −figure legend does not fit with labelling of IF images and continuous text −description says 22 h, while IF labeling and text (p. 7, last lane) mentions 23 h pi

      AU: The figure annotations state the time of analyses as total time after virus addition to cells, whereas text stated the time of analyses as x h post virus removal since we wanted to stress that the input virus was incubated only for 1 h with the cells. However, Reviewers found this confusing, so we have changed the text in the revised manuscript so that time of analysis is stated as total time after virus addition to cells (as in the figure annotations). Only in the Material and Methods section we maintain the original 1 h + x h statement for the time of analysis.

      Fig. 2A: −figure legend: lane 5 Punctuation wrong: azide-Alexa Fluor488. Alexa Fluor647

      AU: Thank you for pointing this out. It has been corrected.

      Fig. 4A: −difficulties to understand −author stated that promoter-driven EGFP expression is clearly dominated by G1 cells for E1A and by S/G2/M cells for CMV, however this is not clearly visible in the graph −no severe differences visible between CMV-eGFP and E1A-eGFP −author should include numbers for quantification and statistical calculations to illustrate the differences

      AU: In the highest GFP expression bin, CMV-eGFP expressing cells have 43% cells in G1 and 50% in S/G2/M (n=2149). In comparison, E1A-GFP expressing cells have 58% cells in G1 and 35% in S/G2/M (n=2258). The difference in G1 cells in the highest eGFP bin is statistically significant (p

      Fig. 4B: −amount of E1A protein levels calculated via IF (signal intensities) −immunofluorescence is not a suitable tool for protein quantification

      AU: It is true that not all antibodies are suitable for IF (or for Western blot), and we cannot be certain that the monoclonal anti-E1A antibody used by us detects all E1A forms with different post-translational modifications with equal efficiency. However, IF is a widely accepted method to estimate protein levels in the cell, especially if the proteins like E1A accumulate in the nucleus (makes segmentation of the signal easy) and give a rather uniform nuclear staining pattern.

      Fig. 5: −in A. it is stated, that E1A bDNA -FISH is not suitable, since it is too short to be detectable. However, in B E1A bDNA-FISH is used. is there a difference? −according to the method part just one E1A mRNA was used for the assays, why is it then not possible to use that one in Fig. 5A? −explanation of the procedure and the experiment is very confusing

      AU: The Reviewer probably refers to Fig.6 here, not to Fig.5. The E1A introns are short (about 100 bases) and cannot be picked up with bDNA FISH probes. In Fig. 6B we were using the E1A bDNA-FISH probes, which were made against the AdV-C5 genome map positions 551-1630 to detect vDNA single strands of the E1A region and these single strands were long enough to be picked out by our E1A probes.

      Fig. S6B: −authors want to show that it is RNase-insensitive, but S1 nuclease-sensitive

      −two different A549 cell clones and two different time points are used for the treatments → not compareable to each other

      AU: This is a fair criticism. We have replaced the RNase A figure in S6B Fig. in the revised manuscript. The new RNase A experiment was carried out in ATCC A549 cells using the same infections conditions as with the S1 nuclease-treated cells.

      Material and Methods: −headings do not indicate which methods are explained −no clear structure AU: We have made minor changes to the headings of Material and Methods section. We have first explained in detail the bDNA-FISH method, but otherwise the order is according to the order of the figures.

      Reviewer #2 (Significance (Required)):

      highly significant manuscript very important for the virology field

      my research topics are human adenoviruses and their replication cycle

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:** Soumalainen et al have studied adenovirus viral gene expression and replication at a single-cell level. They explore the extent of correlation between incoming genome copy number and early gene expression and progression into the late phase, revealing substantial variation between cells in the numbers of E1A transcripts (the first gene expressed upon infection) that is not explained by differences in the numbers of viral genome templates in the cells. They also explore the relevance of cell cycle stage to this variability and show a positive correlation between G1 cell cycle stage and higher levels of gene activity, which explains at least part of the variation. To form these conclusions they have applied new methods to visualise and quantify single molecules of nucleic acid in single cells. The experiments are all carefully and fully described with full detail of materials. Overall the manuscript is well written and easy to follow.

      **Major comments:**

      All of the experiments appear to be done with rigour and their results reported with due regard to statistical significance etc. My major concern though is that they have been done, perhaps out of necessity to get detectable signals, at very high multiplicities of infection. A well-accepted standard to achieve infection of all cells in a culture is an MOI of 10 infectious units per cell. Even this is acknowledged not to represent the biology of natural infection and it is striking that, where technically feasible, lower MOI studies are more revealing of how a virus actually works. Here, the authors have used counts of particles rather than infectious units to determine MOI and for Ad5, the particle/pfu ratio is typically 20-100. Their MOIs though are 13,000 - 50,000 per cell, implying an infectious MOI of at least 130 for their A549 experiments, which are known to be readily infected by Ad5 from other work.

      AU: Unlike common experiments done by others, we used a synchronized infection and removed the input virus after 1h incubation at 37°C. This type of infection initiation requires high input virus amounts, as opposed to studies in which the virus inoculum is incubated with cells for several hours/days, as is typically done in studies determining the infectious or plaque forming units in virus inoculum. Hence, the MOI used by others involved incubation of inoculum with cells over extended periods of time, and they cannot be compared to our pulsed infection conditions.

      Although the calculated theoretical MOIs (physical particles/cell) were high in our experiments, only 0.1% – 0.2% of input virus particles bound to cells during the 1h incubation period (Fig. 1 A; this estimation is based on the ratios between Median values for the number of cell-associated viruses vs input virus numbers).

      Furthermore, in the experiment described in Fig.2C (correlation of E1A transcripts per cell vs. viral genomes per cell), 42% of analyzed cells had ≤ 5 viral genomes/cell and 27.5% of analyzed cells had between 6-10 viral genomes per cell. Please note, that these are not high numbers.

      The input virus amounts used were selected this way, because we aimed at getting a broader view of how virus transcription at early phases of infection responds to a varying number of virus genomes delivered to the nucleus. Therefore, we did not limit the analyses to a situation with 1 or less than 1 virus particles/genomes per cell.

      In addition, the analyses of how cell cycle phase impacts the initiation of virus gene expression requires a relatively short time between virus inoculation and time point of analysis (i.e. a rather high MOI). Otherwise, as also pointed out by the Reviewer, the cells could have experienced more than one cell cycle phase during the duration of the experiment. Furthermore, although the initial natural infection probably starts with a very low MOI, the second round of infection is a high MOI infection due to a large number of progeny virus particles released from an infected cell.

      Surprisingly, the authors do not see intracellular vDNA copy numbers that are fully reflective of this high MOI, with median intracellular vDNA of 75 /cell at the highest MOI. The authors should consider how the population distribution of vDNA /cell does or does not fit the predicted Poisson distribution. Nonetheless, at these high copy numbers / cell, there must surely be a risk that the variation in gene expression activity arises stochastically, out of competition between genomes for essential transcription factors. Given that multiple cellular factors are each required for E1A transcription, high genome copy numbers could actually inhibit E1A expression relative to cells with more modest copy numbers because limited supplies of individual factors are recruited to different viral genome copies.

      AU: The “discrepancy” between theoretical MOI and the actual observed number of cell-associated virus particles or cell-associated virus genomes is explained above. Furthermore, we would like to point out that we have directly estimated the number of virus particles bound to cells with the input virus amounts used, something that is usually not done in other studies.

      It is indeed theoretically possible that high nuclear genome numbers could lead to inhibition of transcription due to competition for limiting essential host factors. However, if we included only cells with ≤4 vDNA molecules per nucleus into the analysis (total number of cells analyzed was 258), then Spearman’s correlation coefficient for vDNA per nucleus vs E1A mRNAs per cell was 0.186 (p=0.0027). Thus, this would not support the notion that cells with moderate nuclear vDNA copy numbers would have a better correlation between the nuclear vDNA copies vs E1A mRNA counts per cell.

      The vDNA/cell in Fig.2C does not fit predicted Poisson distribution, var/mean=9.129.

      It is important for the analysis of correlation of gene expression with cell cycle that the virus has not, at the time point analysed, already perturbed the cell cycle (a well-known effect of infection) which the authors document in Suppl Fig3B. To my eye, the G1 peak in infected cells is somewhat narrower than in the control while the S/G2 bump is a little greater. The % of cells in each of the two gates needs to be shown to support the conclusion.

      AU: In non-infected sample G1= 54.63% and S/G2/M = 45.37%, in infected cells G1= 51.4% and S/G2/M= 48.6%. We have added this information into the S3B Fig.

      Turning to the experiments documenting a correlation between E1A expression and cell cycle stage, the authors interpret their findings in terms of the stage the cells are at when the analysis was done (G1 stage cells have more E1A transcripts). The key experiment (Fig 3B) is analysed at only 4 h pi, so substantial progression from G2/M back to G1 after virus addition can probably be discounted, but the point should be discussed. The authors also use release from G1 in another cell line to support their argument that G1 supports higher levels of E1A expression (Fig 3C). Here, they elect to exclude all cells with fewer than 50 E1A transcripts from their analysis. The reason for this is completely obscure and isn't obviously justified; conceivably it could bias the outcome of the experiment. At minimum, this decision needs to be carefully explained; ideally, the full data set should be used.

      AU: Fig.3B: As suggested by the Reviewer, we have added to the main text the following explanation: “We used a high MOI infection (median 75 cell-associated virus particles, Fig. 1A) in order to achieve a rapid onset of E1A expression so that the time between virus addition and analysis was short. Thus, it is not expected that a substantial number of cells would have changed their cell cycle status during the experiment.”

      Fig.3C: We show the results also from the full data set of infected cells, i.e., cells with ≥ 1 E1A puncta in S3D Fig. We excluded the cells without zero E1A puncta because with these cells it is impossible to know whether they received no virus or whether E1A transcription had not yet started. Permutation test indicated that the difference between the starved+starved and starved+FCS is statistically significant even in this case. Because both samples are dominated by cells with low E1A counts, we log-transformed the E1A values for the box plot figure.

      The authors note the highest level of E1A activity (as opposed to RNA) was in G1/S cells and suggest that high E1A cells advance preferentially into S. Whilst in line with the literature that E1A promotes progression into S, an alternative explanation is simply that there is a time lag between RNA accumulation and protein accumulation, during which progression through the cycle would be expected.

      AU: This is a valid point, and we have modified the text as follows: “… which could reflect the advancement of high E1A expressing cells into S-phase. However, considering the time between virus addition and analysis (10.5 h), we cannot exclude the possibility that the observed G1/S preference is at least partly due to time-dependent progression of G1 cells to G1/S.”

      **Minor comments:** Fig 1 and elsewhere. Given that the 1 h incubations with virus were done at 37 C, the convention would be to include this period in the time post-infection at which harvest / fix time points are quoted. There is inconsistency between text and legend with 12 h pi being sometimes represented as 11 h after virus removal; this is an unnecessary confusion.

      AU: We have modified the text so that hours pi always include the 1h incubation with the input virus. Only in the Material and Methods section we kept the original 1h virus binding – fixing at xh post virus removal.

      Results description prior to the ref to Fig 1B: unclear what this is supposed to mean.

      AU: We have now slightly modified the first paragraph of the Results section. We mention the benefits of the bDNA signal amplification method and explain the experimental set up, i.e. that the input virus was incubated with the cells only for 1h. We also justify why we used a short incubation for the virus inoculum.

      Fig 4A: provide % of cells in each gate in each histogram.

      AU: In the highest GFP expression bin, CMV-eGFP expressing cells have 43% of cells in G1 and 50% in S/G2/M. In comparison, E1A-GFP expressing cells have 58% of cells in G1 and 35% in S/G2/M. This has been added to the figure, and it is also mentioned in the main text. Furthermore, we added to the text the results from Two Proportion Z-test to show that the proportion difference of G1 cells in the highest bin was statistically significant (p

      Fig 5: bottom right panel x axis label is wrong

      AU: Thank you for pointing out this. This has been corrected.

      In the presentation of Fig 6, it would be much clearer for the reader if the detected replication foci (ss DNA detected as E1A puncta) were referred to as something other than E1A puncta. There is too much scope for confusion with the earlier experiments in which E1A RNA was detected.

      AU: We agree. In the revised manuscript, we refer to these puncta in the text as E1A ssDNA-foci.

      Reviewer #3 (Significance (Required)):

      The study represents the application of state of the art single-molecule visualization techniques to an as yet not understood aspect of virus infection. That said, there is prior experimentation in this area, which the authors fully acknowledge and build upon. The new work is largely descriptive, in that it reveals very clearly the discrepancy between genome copy number and amounts of mRNA without seeking to explain these, beyond the cell cycle analysis. Whilst there is a better correlation between vDNA number and transcript once the data are stratified by cell cycle stage, it is still not strong (Fig 5), indicating that other substantial contributing factors remain to be described.

      The work will be of interest certainly to adenovirologists, but also to others who study virus infections - particularly nuclear-replicating DNA viruses such as herpesviruses - where similar considerations are likely to apply.

      Expertise: adenovirus; gene expression; virus-host interactions; molecular biology

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary: Soumalainen et al have studied adenovirus viral gene expression and replication at a single-cell level. They explore the extent of correlation between incoming genome copy number and early gene expression and progression into the late phase, revealing substantial variation between cells in the numbers of E1A transcripts (the first gene expressed upon infection) that is not explained by differences in the numbers of viral genome templates in the cells. They also explore the relevance of cell cycle stage to this variability and show a positive correlation between G1 cell cycle stage and higher levels of gene activity, which explains at least part of the variation. To form these conclusions they have applied new methods to visualise and quantify single molecules of nucleic acid in single cells. The experiments are all carefully and fully described with full detail of materials. Overall the manuscript is well written and easy to follow.

      Major comments:

      All of the experiments appear to be done with rigour and their results reported with due regard to statistical significance etc. My major concern though is that they have been done, perhaps out of necessity to get detectable signals, at very high multiplicities of infection. A well-accepted standard to achieve infection of all cells in a culture is an MOI of 10 infectious units per cell. Even this is acknowledged not to represent the biology of natural infection and it is striking that, where technically feasible, lower MOI studies are more revealing of how a virus actually works. Here, the authors have used counts of particles rather than infectious units to determine MOI and for Ad5, the particle/pfu ratio is typically 20-100. Their MOIs though are 13,000 - 50,000 per cell, implying an infectious MOI of at least 130 for their A549 experiments, which are known to be readily infected by Ad5 from other work.

      Surprisingly, the authors do not see intracellular vDNA copy numbers that are fully reflective of this high MOI, with median intracellular vDNA of 75 /cell at the highest MOI. The authors should consider how the population distribution of vDNA /cell does or does not fit the predicted Poisson distribution. Nonetheless, at these high copy numbers / cell, there must surely be a risk that the variation in gene expression activity arises stochastically, out of competition between genomes for essential transcription factors. Given that multiple cellular factors are each required for E1A transcription, high genome copy numbers could actually inhibit E1A expression relative to cells with more modest copy numbers because limited supplies of individual factors are recruited to different viral genome copies. It is important for the analysis of correlation of gene expression with cell cycle that the virus has not, at the time point analysed, already perturbed the cell cycle (a well-known effect of infection) which the authors document in Suppl Fig3B. To my eye, the G1 peak in infected cells is somewhat narrower than in the control while the S/G2 bump is a little greater. The % of cells in each of the two gates needs to be shown to support the conclusion.

      Turning to the experiments documenting a correlation between E1A expression and cell cycle stage, the authors interpret their findings in terms of the stage the cells are at when the analysis was done (G1 stage cells have more E1A transcripts). The key experiment (Fig 3B) is analysed at only 4 h pi, so substantial progression from G2/M back to G1 after virus addition can probably be discounted, but the point should be discussed. The authors also use release from G1 in another cell line to support their argument that G1 supports higher levels of E1A expression (Fig 3C). Here, they elect to exclude all cells with fewer than 50 E1A transcripts from their analysis. The reason for this is completely obscure and isn't obviously justified; conceivably it could bias the outcome of the experiment. At minimum, this decision needs to be carefully explained; ideally, the full data set should be used.

      The authors note the highest level of E1A activity (as opposed to RNA) was in G1/S cells and suggest that high E1A cells advance preferentially into S. Whilst in line with the literature that E1A promotes progression into S, an alternative explanation is simply that there is a time lag between RNA accumulation and protein accumulation, during which progression through the cycle would be expected.

      Minor comments:

      Fig 1 and elsewhere. Given that the 1 h incubations with virus were done at 37 C, the convention would be to include this period in the time post-infection at which harvest / fix time points are quoted. There is inconsistency between text and legend with 12 h pi being sometimes represented as 11 h after virus removal; this is an unnecessary confusion.

      Results description prior to the ref to Fig 1B: unclear what this is supposed to mean.

      Fig 4A: provide % of cells in each gate in each histogram.

      Fig 5: bottom right panel x axis label is wrong

      In the presentation of Fig 6, it would be much clearer for the reader if the detected replication foci (ss DNA detected as E1A puncta) were referred to as something other than E1A puncta. There is too much scope for confusion with the earlier experiments in which E1A RNA was detected.

      Significance

      The study represents the application of state of the art single-molecule visualization techniques to an as yet not understood aspect of virus infection. That said, there is prior experimentation in this area, which the authors fully acknowledge and build upon. The new work is largely descriptive, in that it reveals very clearly the discrepancy between genome copy number and amounts of mRNA without seeking to explain these, beyond the cell cycle analysis. Whilst there is a better correlation between vDNA number and transcript once the data are stratified by cell cycle stage, it is still not strong (Fig 5), indicating that other substantial contributing factors remain to be described.

      The work will be of interest certainly to adenovirologists, but also to others who study virus infections - particularly nuclear-replicating DNA viruses such as herpesviruses - where similar considerations are likely to apply.

      Expertise: adenovirus; gene expression; virus-host interactions; molecular biology

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors show heterogeneity of AdV-C5 mRNA transcript quantity and dynamics in different cell types, which is regulated by the cell cycle phase and does not correlate to incoming viral DNA, using single molecule RNA FISH technologies and detection of incoming viral DNA by EdC labeling.

      Major Comments:

      The authors change the MOI used in their experiments (7 different MOIs are used throughout the paper) in a manner that appears randomly and without explanation. (54400 for Figure 1A, 1B, 3B, S3B; 37500 for Figure 1C; 23440 for Figure 2A, 2C, S5A; 13600 for Figure 1A, 1D; 36250 for Figure 3C, S3D; 11200 for Figure 4B; 23400 for Figure 6B). The authors should provide explanation, why these changes in MOIs are necessary. The authors use mean fluorescence intensity of E1A probes per cell as estimate for viral transcript abundance for some of their experiments (Figure 1D, E, 3B), and count E1A punctae as measure for E1A transcripts in other experiments (Figure 2C, 3C, 5), without showing data, that these measures correlate. Problematic is hereby, that not all E1A punctae have the same signal intensity, as can be seen in Figure S1, which makes the estimation of the correlation of E1A punctae (= number of transcripts) and fluorescence intensity difficult. The authors should provide both (E1A punctae counts and estimation via fluorescence intensity) for at least one experiment, to prove, that the estimation of E1A transcript levels via fluorescence intensity is feasible. p.15: "The nuclear E1A signals in AraC-treated cells were resistant to RNase A, but they were dampened by treatment with S1 nuclease (S6B Fig)." The authors make this statement based on (i) two completely different timepoints (12 h.p.i. for RNaseA treatment, 24.5 h.p.i. for S1 nuclease treatment) and (ii) in different clones of the A549 cells as stated in the methods section on p.21 (Two different clones of human lung epithelial carcinoma A549 cells were used in the study: our laboratory's old A549 clone (experiments shown in Fig. 1, Fig. 3B and S1 Fig., S3B and S3C Fig., S6A and S6B Fig., RNase A treatment) and A549 from American Type Culture Collection (ATCC, experiments shown in Fig. 2 and Fig. 5, Fig. 6, S2B Fig., S4 Fig., S5 Fig., and S6B Fig. S1 nuclease-treatment)). This makes it difficult to interpret, if the data is due to differences in the timepoints or cell types, or if it is due to binding of the E1A probe to single stranded vDNA.

      Minor Comments:

      p.4: "AdV are non-enveloped, double-stranded DNA viruses that cause mild respiratory infections in immuno-competent hosts, and establish persistent infections, which can develop into life-threatening infections if the host becomes immuno-compromised [reviewed in 6]." Not all AdV cause respiratory diseases, the disease outcome of human AdV depends on the site of primary infection, which differs between the different AdV types.

      p.7: The authors state, that "At the 17 h time point, about half of the cells had high numbers of protein VI transcripts, and most of them very high numbers of E1A transcripts.", however, the picture shown in Figure 1F shows a different phenotype, with low transcript levels of VI in E1A high cells and high transcript levels of VI in E1A low cells.

      p.8: "This nuclear E1A signal is due to binding of the E1A probe to single-stranded vDNA in the replication centers (see below)." The authors should state here, that due to the binding of the probes to the single stranded vDNA in the replication centers, the nucleus was excluded from the analysis for Figure 1F in late timepoints. Due to this time point the author cannot state that the E1A staining seen (Fig. 1F; indicated with white arrows) are replication centers; this is just an assumption, since there is no evidence in Fig 1 the author cannot be sure; the author should change the text: "taking the following experiments into account...", "due to further studies (see below)..... we assume that..." p.8: The authors should mention the figure they refer to, since there is no E1B-55K staining in Fig. 1F

      p.9: Which test was used to calculate the additional p-values?

      p.10: For the experiment for the correlation of viral genomes per cell and E1A transcripts in HDF-TERT cells (Figure S2C), the MOI is missing in the description of the results, as well as in the corresponding figure legends.

      p. 11: calculation of correlation? rs? Why does the author combine S and G2/M phase? Fig. S3A show different values for the phases

      p.11: "Thus, the total intensity of nuclear DAPI signal can be used to accurately assign G1 vs S/G2/M stage to cells." The authors should also here refer to other papers, which showed that this correlation is feasible, as they did in the methods section (67. Roukos V, Pegoraro G, Voss TC, Misteli T. Cell cycle staging of individual cells by fluorescence microscopy. Nature protocols. 2015;10(2):334-48. Epub 2015/01/31. doi: 10.1038/nprot.2015.016. PubMed PMID: 25633629; PubMed Central PMCID:PMCPMC6318798.), and maybe also refer to a newer paper which deals with this technique: Ferro, A., Mestre, T., Carneiro, P. et al. Blue intensity matters for cell cycle profiling in fluorescence DAPI-stained images. Lab Invest 97, 615-625 (2017). https://doi.org/10.1038/labinvest.2017.13

      p.11: "Furthermore, when focusing on the highest E1A expressing cells, i.e. the cells with mean cytoplasmic E1A intensities larger than 1.5 × interquartile range from the 75th percentile, 71.9% of these cells were found to be in the G1 phase of cell cycle, whereas only 55.8% of cells in the total sampled cell population were G1 cells." The authors do not provide any reference to a figure within the manuscript or the supplements, which contains these data. Are these data not shown in the manuscript?

      p.12: punctuation mistake; . instead of , To enrich G1 cells. AdV-C-5 (moi ~ 36250) was added. Why does the author switch between signal intensities and counting E1A puncta per cell (limited to 200) in the different experiments to illustrate accumulation of E1A transcripts?

      p.14: "For E1A (or E1B-55K), we did not detect transcriptional bursts with bDNA-FISH probes on nuclear vDNAs, either prior to or after accumulation of viral transcripts in the cell cytoplasm." The authors do not provide any reference to a figure within the manuscript or the supplements, which contains these data. Are these data not shown in the manuscript?

      p.14: space between number and %

      p.15: "This is was also seen in AdV-C5-EdC-infected cells" should be changed to "This was also seen in AdV-C5-EdC-infected cells"

      Fig. 1B:

      −figure legend does not indicate how cells were staine

      −also no description in the continuous text

      −which E1A transcripts are stained? all? 12S? 13S?

      Fig. 1D:

      −difference in accumulation of viral transcripts is not that visible as in IF staining (Fig. 1B; Fig. 1S);

      −graph does not show any difference between E1A and E1B-55K

      Fig. 1F:

      −figure legend does not fit with labelling of IF images and continuous text

      −description says 22 h, while IF labeling and text (p. 7, last lane) mentions 23 h pi

      Fig. 2A:

      −figure legend: lane 5 Punctuation wrong: azide-Alexa Fluor488. Alexa Fluor647

      Fig. 4A:

      −difficulties to understand

      −author stated that promoter-driven EGFP expression is clearly dominated by G1 cells for E1A and by S/G2/M cells for CMV, however this is not clearly visible in the graph

      −no severe differences visible between CMV-eGFP and E1A-eGFP

      −author should include numbers for quantification and statistical calculations to illustrate the differences

      Fig. 4B:

      −amount of E1A protein levels calculated via IF (signal intensities)

      −immunofluorescence is not a suitable tool for protein quantification

      Fig. 5:

      −in A. it is stated, that E1A bDNA -FISH is not suitable, since it is too short to be detectable. However, in B E1A bDNA-FISH is used. is there a difference?

      −according to the method part just one E1A mRNA was used for the assays, why is it then not possible to use that one in Fig. 5A?

      −explanation of the procedure and the experiment is very confusing

      Fig. S6B:

      −authors want to show that it is RNase-insensitive, but S1 nuclease-sensitive

      −two different A549 cell clones and two different time points are used for the treatments → not compareable to each other

      Material and Methods:

      −headings do not indicate which methods are explained

      −no clear structure

      Significance

      highly significant manuscript very important for the virology field

      my research topics are human adenoviruses and their replication cycle

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Suomalainen et al. describes a fluorescence-based approach combined with high-resolution confocal microscopy to study the heterogeneity of adenovirus infection in a population of human cells. The main focus of the authors is the detection of viral transcripts in infected cells, how this correlates with viral genomes, the cell state, and how it varies between different cells in a single population. The paper is generally well written and easy to read, with a few typos, although I found parts of it to be somewhat length and repetitive. Particularly the results section could be pruned somewhat for readability and clarity. The major limitation of the study as it stands is it's overall impact and novelty, which limits journal selection somewhat. A very similar study was recently published, which the authors cite (Krzywkowski et al, 2017). Nevertheless, I think the study design is rigorous and well executed, but I do have some specific comments which may enhance it's overall impact and novelty.

      Major: Results "Visualization of AdV-C5..." section:

      Why not also look at normal cells that can be synchronized? Cancer cells, such as A549 will by definition be highly heterogenous and at all phases of the cell cycle. Primary non-transformed cells can easily be synchronized by contact inhibition and are much more physiologically relevant. "The virus particles bound..." - Can the spatial resolution of a confocal microscope truly differentiate individual particles that are sub-wavelength in size? What about the sensitivity for single particles? Some sort of experiment to show that single particles can be detected should be performed and shown to assure the readers that this is in fact possible. Furthermore, even when based on the particle to pfu ratio, the MOI would still be nearly 2000pfu/cell, so the actual number of observed particles is an order of magnitude lower than what was applied to the cells.

      Fig. 4 - I am not certain that the observed difference is significant, at least looking at it, beyond the width difference of the peaks, highest expression for both is largely in G1. It would be nice to see this using a western blot of cell cycle sorted cells, which can easily be accomplished using FACS. Page 15, 2nd paragraph. It would be valuable and informative to determine whether there is heterogeneity in histone association with these different vDNAs and whether these histones exhibit divergent modifications (enabling or restricting transcription). Same as above. I am rather surprised that the DBP signal did not correlate well with vDNA signal, particularly for the larger replication centers. How can this be reconciled? Was there an increase in overall vDNA signal later in infection? It is important to know this as it determines whether the observed vDNA signal is real or could be caused by viral RNA or other background causes (non-infected controls notwithstanding). Can the signal be detected with inactivated viruses (via UV for example?)

      Page 18, 1st paragraph. It would be interesting to determine whether there was association between pol II and those genomes that showed no E1A, similarly to the histone suggestion. What about things like viral chromatin organization? Soriano et al. 2019 showed how E1A and E4orf3 work in tandem to alter viral chromatin organization by varying histone loading on the viral genome. Fig. 2. Can you really say that a single dot correlates with a single transcript? Has that been validated in any way?

      Minor:

      Page 5, last paragraph. "Transcirpts from the viral late transcription unit,..." This is not correct as recently shown by Crisostomo et al, 2019.

      Page 10, "... because AdvV-infected cells are less well adherent..." This is not strictly true as loss of attachment only occurs later on in infection. It would be helpful to have statistical significance indicated directly in the figures.

      The very high MOIs used are concerning, could these have negative effects on the cell viability or overall state?

      There are a few typos and such that should be corrected.

      Significance

      As I stated above, the work is interesting and significant, to a degree. The major limitation is that the novelty is low as a paper published in 2017 (cited by the authors) used a very similar approach to investigate a similar problem. In addition, there are multiple other recent papers looking at cell populations in the context of adenovirus infection, and whether a single cell or population based approach is better is unclear. This is something the authors might want to strengthen prior to submission.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      First of all, we thank all reviewers for their constructive suggestions and comments.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This group has been at the forefront recently of using imaging technologies to understand how chromosome segregation is coordinated in mammalian oocytes, and why errors occur. In the current paper they examine the dynamics of microtubule organising centres (which effectively replace centrioles/centrosomes in oocytes) in MI. The imaging of oocytes in this paper is beautiful. The major findings are (1) that MTOCs that are supposed to be at the spindle pole sometimes end up at the spindle equator, and this is documented very beautifully and (2) the correct positioning of MTOCs at the spindle pole appears to require kinetochore microtubules, as indicated by experiments manipulating the kinetochore component NDC80.

      We appreciate the reviewer’s comment and clear description of our study.

      **Major Comments**

      As such the major claims of the paper are basically well supported. However, the analyses are is almost entirely restricted to prometaphase/metaphase, and the conclusions are relatively limited. The salient omission is any analysis of MTOC/chromosome relationship during anaphase. Were the paper to be extended to determine whether the lingering of MTOCs at the spindle equator is related to chromosome segregation error, that would increase the reach and importance of the work substantially. Specifically:

      Can tracking experiments be performed to determine whether the chromosome that shows movement similarities to the errant MTOC is more/less likely to missegregate? Complete tracking as these authors are expert at should achieve this, or photo-labelling the desired chromosome.

      Thank you for your comment. In our experimental system, oocytes rarely exhibit chromosome segregation errors (

      Can the position of MTOCs (proportion that linger at the equator) be manipulated in the absence of other defects to determine whether this increases errors (lagging at anaphase, metaphase-II chromosome counting spreads)?

      We agree with the reviewer that a specific manipulation of MTOC positions is exactly what we would need to investigate the significance of central MTOCs. Unfortunately, there are currently no tools available to specifically manipulate MTOC positions without other defects. Therefore, the significance of central MTOCs is currently unclear. In the revised manuscript, we will state these points in Discussion.

      The above analysis would have to be well supported by controls showing that these constructs are having no impact on normal anaphase (proportion of oocytes completing meiosis-I, likelihood of lagging chromosomes etc).

      Thank you for the comment. As we answered above, control oocytes rarely exhibit chromosome segregation errors or lagging chromosomes (

      Related to the above, though I appreciate a fixed metaphase image of MTOC immunofluorescence is presented, the paper is about the dynamics of MTOCs and thus nonetheless relies heavily on the live imaging of cep192. The core results should be confirmed using another (substantially different) MTOC probe. *This final comment applies to the current metaphase data, regardless of whether the study is ultimately extended*

      Thank you for the suggestion. We will confirm the dynamics of MTOCs at metaphase with mEGFP-Cdk5Rap2, another established marker of MTOCs.

      Reviewer #1 (Significance (Required)):

      As explained above, as presented this paper is largely scientifically sound, but far more limited in scope than this groups other recent papers. As explained above, the paper would be made more impactful and the readership broadened if a relationship between MTOC position/movement and segregation problems were established. Or on the other hand if it were established why some MTOCs sometimes linger at the spindle equator. Whilst to my knowledge this is the first time that equator MTOCs have been documented so carefully, oocyte cell biologists may not find the core observation that MTOCs are occasionally at the spindle equator extremely surprising.

      Thank you for your helpful suggestions. Due to lack of tools to specifically manipulate MTOC positions, we are unfortunately not able to directly address whether MTOC position/movement contributes to chromosome segregation problems. On the other hand, we are currently investigating to answer your important question ‘why some MTOCs sometimes linger at the spindle equator’. We speculate that MTOCs become central due to unstable kinetochore-microtubule attachments, which are predominantly observed at early metaphase in normal oocytes. To test this idea, we are currently investigating whether the appearance of central MTOCs are prevented by forced stabilization of kinetochore-microtubule attachments with Ndc80-9A. Our pilot analysis thus far supports this idea. In light of your suggestions, we will incorporate the results into the revised manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      I am commenting on the work of Courtois et al. as an expert in the biochemistry of spindle formation with a focus on acentriolar assembly.

      First and foremost, this a technically excellent study with a number of very interesting and well-documented observations, which are highly relevant for our understanding of the mechanisms of acentriolar spindle formation in the mouse oocyte model. In principle, the manuscript is in a very mature state. However, my major concern at this point would be that there is a break in the story. It starts describing the (very interesting) observation of "central MTOCs". After thoroughly investigating how these behave, the authors stop and look at overall MTOCs distribution after loss of stable MT-kinetochore interactions based on oocytes expressing the Ndc80_9D mutant instead of wt Ndc80. The two parts are experimentally and conceptually not well connected.

      We appreciate your comments on our techniques and novel observations in this study, and thank you for your helpful suggestions.

      Answering the following questions may help to further develop the paper:

      If I understand the arguments correctly, central MTOCs are an "accident" on the way to complete meiosis I spindle formation, which will eventually be corrected and all MTOCs clustered at the poles. Thus, they may serve as an assay for spindle assembly fidelity and kinetics (?). At this point, the reader is left with the observation without efforts to explain the meaning of this observation, ideally experimentally, or at least in a valid discussion.

      Thank you for your thoughtful comment. We agree that we should clearly explain our view on central MTOCs. We indeed speculate that central MTOCs are an “accident” due to unstable kinetochore-microtubule attachments, which are normally pronounced at early metaphase.

      We will revise the manuscript as follows: (1) Following the section for the observation of central MTOCs, we will state our hypothesis that central MTOCs may appear due to unstable kinetochore–microtubule attachments. (2) We will introduce our experiment of the manipulation of kinetochore–microtubule attachment stability as a test for our hypothesis. (3) We will present new results of our analysis for the effects of kinetochore–microtubule attachment stability on the appearance of central MTOCs (please see below).

      Enthusiasm for the technically excellent experiments using the Ndc80 variants are somewhat reduced as conclusions from these experiments are published in the parallel paper of the same laboratory (Yoshida et al.). Due to my opinion, it may thus be even more important to connect these observations with the first part described central MTOCs and to clarify their significance.

      Thank you for the important suggestion.

      First, we agree that we should connect our observations of central MTOCs to the phenotypes of Ndc80 manipulations. To do this, we will reanalyze our dataset to quantify the effects of Ndc80 manipulations on central MTOCs. Our pilot analysis thus far suggests that the forced stabilization of kinetochore–microtubule attachments by Ndc80-9A reduces the appearance of central MTOCs. This would support our idea that central MTOCs appear due to unstable kinetochore–microtubule attachments.

      Second, we agree with the reviewer that experimental clarification of the significance of central MTOCs would be nice. However, as outlined above, we unfortunately have no tool to directly address the significance of MTOC positioning in the fidelity of spindle assembly and chromosome segregation. Although we assume that MTOC positioning is critical for spindle assembly fidelity, as generally thought based on previous studies (Breuer et al., 2010; Clift and Schuh, 2015; Schuh and Ellenberg, 2007), the significance of MTOC positioning in spindle assembly remains uncertain, as you (and also the reviewer 1) point out. We will discuss these points in the revised manuscript.

      Shown if in Fig. 3B but not fully explained: How does the distribution of what is defined as central MTOCs behave in Ndc80_wt and Ndc80_9A mutant oocytes? Do the variants differ, i.e. are there fewer, or less persistent central MTOCs in the 9A mutant? Would they differ in kinetics of appearance and "rescue" to the poles?

      Thank you for the question. As outlined above, we will reanalyze our dataset to quantify the effects of Ndc80-9A on the behavior of central MTOCs. Our pilot analysis suggests that the forced stabilization of kinetochore–microtubule attachments suppresses the appearance of central MTOCs.

      Similarly: is there a correlation of central MTOC appearance, Ndc80 phosphorylation/stability of kinetochore attachment and Anaphase I onset? The authors mention that oocytes expressing the 9A mutant go faster into Anaphase.

      Thank you for this comment. First, we will investigate whether the levels of Ndc80 phosphorylation at kinetochores has any correlations to the distance to central MTOCs. Second, we will address whether microtubules connect kinetochores to central MTOCs. Third, we will perform the tracking of chromosomes that showed correlated motions to closely positioned MTOCs until anaphase onset.

      The observation that "central MTOCs exhibited correlated motions with closely positioned kinetochores" is poorly defined, yet an important observation. Does this mean some sort of short k-fibers remain to connect central MTOCs and kinetochores? Wouldn't one expect that the loss of stable end-on-attachment causes MTOCs to become central? How does this fit into a/the model?

      We believe these concerns will be addressed by the experiments/analyses proposed above. First, we will check if central MTOCs are connected to kinetochores by microtubules. Second, we indeed speculate that loss of stable kinetochore-microtubule attachment allows MTOCs to become central. We will test this idea by quantifying the appearance of central MTOCs in Ndc80-9A-expressing oocytes.

      Along the same lines: The authors hype their conclusion that kinetochores dominate meiosis I spindle formation based on the observation that loss of kinetochore functions results in less well-organized spindle poles and worse MTOC "confinement". This may mean that kinetochores, together with MTOCs, maintain stable k-fibers in meiosis, as shown here and in Yoshida et al. When one or the other end of k-fibers is destabilized (loss of end-on-attachment, loss of MTOC attachment), the fibers collapse and the remaining minus-or-plus-end associated structure loses its destination. We then see central MTOCs and/or kinetochores at poles. In this respect, the interpretation / discussion should be less "kinetochore-centered".

      We agree with your thoughtful comment that the regulations of minus-ends (e.g. MTOCs) and of plus-ends (e.g. kinetochores) are equally relevant for spindle bipolarization. We will tone down our kinetochore-centered view in the Abstract and Discussion and revise them into more balanced statements.

      Is there any way to determine the efficiency of Ndc80 knockdown in the gene replacement respective experiment? I share the view of the authors that their method may be more efficient and may explain apparent discrepancies to previous studies on Ndc80-9A (Guy and Homer, 2013) with more dramatic effects on spindle geometry. However, at that point, this remains speculative. For instance, one may also speculate vice versa that the ko strategy used here is less efficient in a maternally dominated system and leaves behind more wt Ndc80, which better compensates defects seen in the 9A mutant.

      Our gene deletion strategy (Zp3-Cre Ndc80f/f) resulted in >90% depletion of the Ndc80 protein (estimated by Western blot; Supplementary Figure 1c in Yoshida et al, Nat Commun 2020). On the other hand, Gui and Homer report that their morpholino-based depletion strategy resulted in 60–70% depletion of the Ndc80 protein (estimated by Western blot; Figure 1B in Gui and Homer, Dev Cell 2013). Thus, the depletion was more efficient in our experimental system. We will add this information in the manuscript.

      Reviewer #2 (Significance (Required)):

      Courtois et al present data on mechanisms governing spindle assembly in mouse oocytes. Mouse oocytes serve as model system for spindle formation in the absence of centriole-based MTOCs. At the onset of meiosis I, numerous MTOCs form, which shape a mass ("ball") of MT nucleated around chromatin into a bipolar structure. Accumulating evidence indicates that kinetochores play an important role in acentriolar spindle formation in mouse oocytes, yet the mechanisms behind kinetochore action remains unclear.

      Here, Courtois et al. analyze spindle formation in live mouse oocytes using 3D-time-lapse imaging. They use fluorescently tagged Cep192 to track MTOCs and Histone H2B or CENP-C to visualize chromatin or kinetochores. In the first part, the authors deal with the appearance of "central MTOCs", i.e. aggregates of centrosomal protein(s) that, apparently, fail to remain stably integrated into the spindle pole clusters on MTOCs during spindle formation. The authors convincingly demonstrate that these central MTOCs can be seen in the majority of spindles investigated. They demonstrate that central MTOCs generally come from positions at poles from where they "fall back" towards chromosomes. Central MTOCs may even cross the spindle and end up at opposite poles from where they originated from. Interestingly, central MTOCs are often found next to kinetochores.

      In the second part, the authors focus on the role of kinetochores and their stable MT attachment for spindle formation in general and bipolarity/pole organization in particular. The same lab has published data on the role of kinetochores in meiosis I spindle very recently (Yoshida et al. Nat Comm, 2020). Here, they successfully exploit Ndc80 phospho-mutants to compare MTOC distribution in oocytes with reduced or increased end-on-attachment. The data show that stable end-on attachment determines stable MTOC clustering at spindle poles and governs the maintenance of bipolarity and spindle length.

      Thank you for your clear description of our study.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In order to assemble a bipolar structure, acentrosomal spindles relay on multiple non-centrosomal pathways. Mouse oocytes specifically build bipolar spindles by sorting and clustering of microtubule organizing centers (MTOCs). While microtubule cross-linkers, spindle motors and microtubule nucleators are involved; the role of kinetochores and kinetochore-microtubule attachments in meiotic spindle assembly and maintenance has not been thoroughly tested. Using an impressive combination of live cell imaging and semi-automated image analysis, Courtois et al. quantified MTOC behavior in bipolar mouse oocyte spindles and found an ongoing MTOC sorting in metaphase and instances of MTOC-kinetochore associations. The authors further employed an elegant genetic system to replace NDC80 in maturing oocytes with a mutant almost completely unable to form stable microtubule-kinetochore attachments. The data show lack of MTOC confinement at the spindle poles and increased spindle elongation while maintaining spindle bipolarity. The authors concluded that stable kinetochore-microtubule attachments are required to confine MTOCs at the poles, which in turn sets an optimal spindle length. Overall, the data are of very high quality and clearly presented, the manuscript is easy to follow, and the methods are comprehensively described. One concern is the lack of mechanistic link between the natural metaphase MTOC sorting (Fig. 1-2) and massive MTOC rearrangements observed with the NDC80-9D mutant (Fig. 3). A second concern is that deficient MTOC confinements and spindle elongation observed with the 9D mutant could be due unaligned chromosomes rather than lack of stable kinetochore-microtubule attachments, which is the authors' interpretation.

      **Major Points:**

      1) Massive MTOC rearrangements (Supplementary Video 6) are reminiscent of spindle assembly defects or spindle collapse. Since these spindles do not reach a normal metaphase and seem to change shape (Supplementary Video 6; 11:10), it is difficult to differentiate between spindle assembly and spindle maintenance defects. Is there a difference in the timing of bipolar spindle assembly for NDC80-9D vs WT? If so, one interpretation is that stable attachments not only ensure MTOC confinement but also contribute to bipolar spindle assembly.

      We apologize for the lack of explanation for the spindle dynamics seen in Supplementary Video 6, 11:10. At this time point, the spindle rotated in 3D, which appeared as if the spindle collapsed in the z-projection movie. We will add this explanation into the legend.

      Our quantitative analysis of spindle shape in 3D indicated no increased collapse in Ndc80-9D, based on the signals of the spindle marker EGFP-Map4. Moreover, we observed no detectable difference in the timing of the onset of bipolar spindle assembly, as long as we define it with EGFP-Map4 signals. These results are shown in Figure 4B.

      2) Fig. 1-2 vs Fig. 3 - It is not clear how the discrete MTOC sorting phenotype presented in Fig. 1-2 relates to the massive MTOC collapse shown in Fig. 3. The natural MTOC sorting and MTOC-kinetochore associations seem to be happening within the bipolar structure confined by the polar MTOCs. The MTOC rearrangements (e.g., Supplementary Video 6) are much more drastic, reminiscent of a spindle collapse. To make a mechanistic link between the phenotypes, it would be useful to use an intermediate NCD80 mutant (ex. NDC80-4D; Zaytsev et al., 2014 JCB) that may support chromosome alignment and maintenance of the canonical bipolar spindle structure, but still show effects on MTOC sorting.

      Thank you for your nice suggestion. We will test Ndc80-4D. The construct is ready.

      3) Fig. 4 - The authors should provide evidence that unstable kinetochore-microtubule attachments, rather than chromosome-derived signals of misaligned chromosomes (e.g., from Ran or Aurora B), limit spindle elongation. For example, the authors could measure spindle elongation in oocytes with misaligned chromosomes but stable attachments: for example, NDC80-9A oocytes released from an Eg5 inhibition block should carry a number of polar chromosomes with stable attachments. The expectation would be that such spindles form with confined MTOCs and do not elongate as much as NDC80-9D expressing oocytes.

      Thank you for this important suggestion. Following your suggestion, we have conducted a pilot experiment using monastrol washout. However, unfortunately, we did not observe increased chromosome misalignment in Ndc80-9A. We will play around experimental conditions.

      Moreover, we propose to perform an additional experiment. We will use cohesin depletion with Rec8 TRIM-Away, which will produce chromosome misalignment and reduce kinetochore-microtubule attachment stability. We expect that these oocytes exhibit excessive spindle elongation. Then, we ask if Ndc80-9A, which would force to stabilize kinetochore-microtubule attachment (but fail to align chromosomes due to loss of chromosome cohesion), can suppress excessive spindle elongation.

      These experiments will allow us to address direct contribution of kinetochore-microtubule attachment to proper spindle elongation. However, in our opinion, regardless of the results, we cannot exclude the possibility that chromosome alignment contributes to proper spindle elongation, which is indeed an intriguing hypothesis. We will discuss these possibilities in Discussion.

      4) Figure 5D - The authors' model suggests that MTOCs are confined due to their connection to stably attached k-fibers. It would be useful to speculate on the molecular mechanism behind the confinement. Does a maximal k-fiber length restrict the elongation, or is there a pulling force exerted by the kinetochores?

      Thank you for your thoughtful suggestion. As the reviewer suggests, we speculate that the length of k-fibers is critical for restricting MTOC position and spindle elongation. K-fibers may prevent excessive spindle elongation by anchoring MTOCs at their minus ends. Alternatively, k-fibers may act as a platform that inactivates spindle bipolarizers. We will discuss these possibilities in our revised manuscript.

      5) Discussion - Lines 203-204 - "The findings of this study, together with recent studies, suggest a model for how kinetochore-microtubule attachments contribute to acentrosomal spindle assembly (Figure 5D)". - Throughout the paper the authors underscore that biopolar spindles do assembly with the NDC80-9D mutant. The authors should clarify whether spindle assembly is affected by the NDC80-9D mutant or not?

      Thank you for your comment. We agree with the reviewer that we should clearly state our conclusion based on the phenotype of the Ndc80-9D mutant. Our conclusion is that stable kinetochore-microtubule attachment fine-tunes bipolar spindle assembly. If oocytes lack stable attachments, they can form a bipolar-shaped spindle composed of microtubule arrays that are largely bipolar, but the spindle becomes too much elongated and lacks MTOCs at its poles. We will explicitly state these ideas in our revised manuscript.

      **Minor Points:**

      1) Introduction - Lines 38-44 - The authors should cite the role of the Augmin complex in acentrosomal spindle assembly (Watanabe et al., 2016 Cell Reports).

      Thank you for your excellent suggestion. We will cite this relevant paper.

      2) Results - Lines 55-56 - "However, the precise manipulation of the stability of kinetochore-microtubule attachments has not been tested" - Gui et Homer 2013 studied the outcome of NDC80 depletion and tested the NDC80-9A mutant in the context of oocyte spindle assembly. Although, as the authors point out in the Discussion section, there might be differences in the experimental design that lead to different conclusions, it is not entirely accurate that precise manipulations of attachments stability have not been tested. A different wording (e.g., "has not been comprehensively tested") may be better.

      Thank you for your suggestion. We agree that “has not been comprehensively tested” fits better.

      3) Results - Lines 162-164 - "Ndc80-9D-expressing oocytes had no significant delay in the onset of spindle elongation, but had significantly faster kinetics of elongation compared to Ndc80-WT- and Ndc80-9D-expressing oocytes" - The authors probably meant "... Ndc80-9A expressing oocytes."

      Thank you for pointing out this mistake. We will correct it.

      4) Discussion - Lines 239-242 - "... microtubule nucleation in later stages may not be determined by MTOCs but are largely attributed to nucleation within the spindle, as observed by microtubule plus-end tracking in bipolar-shaped spindles (Supplementary Figure 4)." - Strictly speaking, EB3 comets indicate microtubule polymerization rather than nucleation. Microtubule nucleation within the spindle is, however, supported by studies of the Augmin complex (e.g., Watanabe et al., 2016 Cell Rep).

      Thank you for your comment. We will correct our wording for EB3 comets and discuss that microtubule nucleation within the spindle is shown in Watanabe et al., 2016 Cell Rep.

      5) Discussion - Lines 257-260 - "The lagging MTOCs can be positioned close to kinetochores on bi-oriented chromosomes, underscoring the importance of active error corrections of kinetochore-microtubule attachments during metaphase (Lane and Jones, 2014; Yoshida et al., 2015)." - The reasoning here is not clear. Does the number/persistence of lagging MTOCs correlate with chromosome mis-alignment or with the efficiency/timing of chromosome alignment in WT cells?

      We apologize that our discussion was not clear. Previous studies (Lane and Jones, 2014; Yoshida et al., 2015) show that kinetochore-microtubule attachment errors are found on aligned chromosomes during metaphase and must be corrected until anaphase onset in oocytes. We speculate that lagging (or central) MTOCs may be a source of such kinetochore-microtubule attachment errors, although we cannot directly test this hypothesis due to lack of tools to specifically manipulate MTOC positions. We will discuss these points in Discussion.

      To check if central MTOCs are correlated with chromosome misalignment, we will perform the tracking of chromosomes that were closely positioned to lagging MTOCs.

      6) Discussion - Line 266 - "Yoshida et al., 2020" - This article is cited elsewhere in the text as "Yoshida et al., in press".

      Thank you for pointing out these mistakes. We will correct them.

      Reviewer #3 (Significance (Required)):

      Courtois et al., have found a new mechanism contributing to acentrosomal spindle assembly in mouse oocytes. Although kinetochore-dependent spindle assembly occurs in mitotic cells (e.g., Toso et al., 2009 JCB), only the recent work from the Kitajima lab (Yoshida et al., 2020 Nat Comm; this manuscript) showed that kinetochores also impact acentrosomal spindle assembly in meiosis. The genetic model presented here brings a significant technical advance in dissecting relative contributions of spindle assembly pathways in mouse oocytes (ex. Schuh and Ellenberg 2007 Cell; Watanabe et al., 2016 Cell Rep; Drutovic et al., 2020 EMBO J) and complements current methods used to study meiotic error-correction (e.g., Chmatal et al., 2015 Curr Biol, Yoshida et al., 2015 Dev Cell; Vallot et al., 2018 Curr Biol and many others). This model expands an existing toolbox of techniques allowing complete elimination of the endogenous protein specifically in mature mouse oocytes (Clift et al., 2017 Cell; Clift et al., 2018 Nat Protocols), which is a difficult feat due to a limited capacity of ex-vivo culture (Pfender et al., 2015 Nature). Therefore, the work presented in this manuscript may encourage other researchers to establish similar systems for oocyte-specific manipulations, which will allow more precise insight into oocyte biology.

      Expertise keywords: spindle dynamics, chromosome segregation, mitosis, meiosis

      We appreciate your comments. Additional experiments following on your constructive comments will further improve our manuscript.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In order to assemble a bipolar structure, acentrosomal spindles relay on multiple non-centrosomal pathways. Mouse oocytes specifically build bipolar spindles by sorting and clustering of microtubule organizing centers (MTOCs). While microtubule cross-linkers, spindle motors and microtubule nucleators are involved; the role of kinetochores and kinetochore-microtubule attachments in meiotic spindle assembly and maintenance has not been thoroughly tested. Using an impressive combination of live cell imaging and semi-automated image analysis, Courtois et al. quantified MTOC behavior in bipolar mouse oocyte spindles and found an ongoing MTOC sorting in metaphase and instances of MTOC-kinetochore associations. The authors further employed an elegant genetic system to replace NDC80 in maturing oocytes with a mutant almost completely unable to form stable microtubule-kinetochore attachments. The data show lack of MTOC confinement at the spindle poles and increased spindle elongation while maintaining spindle bipolarity. The authors concluded that stable kinetochore-microtubule attachments are required to confine MTOCs at the poles, which in turn sets an optimal spindle length. Overall, the data are of very high quality and clearly presented, the manuscript is easy to follow, and the methods are comprehensively described. One concern is the lack of mechanistic link between the natural metaphase MTOC sorting (Fig. 1-2) and massive MTOC rearrangements observed with the NDC80-9D mutant (Fig. 3). A second concern is that deficient MTOC confinements and spindle elongation observed with the 9D mutant could be due unaligned chromosomes rather than lack of stable kinetochore-microtubule attachments, which is the authors' interpretation.

      Major Points:

      1) Massive MTOC rearrangements (Supplementary Video 6) are reminiscent of spindle assembly defects or spindle collapse. Since these spindles do not reach a normal metaphase and seem to change shape (Supplementary Video 6; 11:10), it is difficult to differentiate between spindle assembly and spindle maintenance defects. Is there a difference in the timing of bipolar spindle assembly for NDC80-9D vs WT? If so, one interpretation is that stable attachments not only ensure MTOC confinement but also contribute to bipolar spindle assembly.

      2) Fig. 1-2 vs Fig. 3 - It is not clear how the discrete MTOC sorting phenotype presented in Fig. 1-2 relates to the massive MTOC collapse shown in Fig. 3. The natural MTOC sorting and MTOC-kinetochore associations seem to be happening within the bipolar structure confined by the polar MTOCs. The MTOC rearrangements (e.g., Supplementary Video 6) are much more drastic, reminiscent of a spindle collapse. To make a mechanistic link between the phenotypes, it would be useful to use an intermediate NCD80 mutant (ex. NDC80-4D; Zaytsev et al., 2014 JCB) that may support chromosome alignment and maintenance of the canonical bipolar spindle structure, but still show effects on MTOC sorting.

      3) Fig. 4 - The authors should provide evidence that unstable kinetochore-microtubule attachments, rather than chromosome-derived signals of misaligned chromosomes (e.g., from Ran or Aurora B), limit spindle elongation. For example, the authors could measure spindle elongation in oocytes with misaligned chromosomes but stable attachments: for example, NDC80-9A oocytes released from an Eg5 inhibition block should carry a number of polar chromosomes with stable attachments. The expectation would be that such spindles form with confined MTOCs and do not elongate as much as NDC80-9D expressing oocytes.

      4) Figure 5D - The authors' model suggests that MTOCs are confined due to their connection to stably attached k-fibers. It would be useful to speculate on the molecular mechanism behind the confinement. Does a maximal k-fiber length restrict the elongation, or is there a pulling force exerted by the kinetochores?

      5) Discussion - Lines 203-204 - "The findings of this study, together with recent studies, suggest a model for how kinetochore-microtubule attachments contribute to acentrosomal spindle assembly (Figure 5D)". - Throughout the paper the authors underscore that biopolar spindles do assembly with the NDC80-9D mutant. The authors should clarify whether spindle assembly is affected by the NDC80-9D mutant or not?

      Minor Points:

      1) Introduction - Lines 38-44 - The authors should cite the role of the Augmin complex in acentrosomal spindle assembly (Watanabe et al., 2016 Cell Reports).

      2) Results - Lines 55-56 - "However, the precise manipulation of the stability of kinetochore-microtubule attachments has not been tested" - Gui et Homer 2013 studied the outcome of NDC80 depletion and tested the NDC80-9A mutant in the context of oocyte spindle assembly. Although, as the authors point out in the Discussion section, there might be differences in the experimental design that lead to different conclusions, it is not entirely accurate that precise manipulations of attachments stability have not been tested. A different wording (e.g., "has not been comprehensively tested") may be better.

      3) Results - Lines 162-164 - "Ndc80-9D-expressing oocytes had no significant delay in the onset of spindle elongation, but had significantly faster kinetics of elongation compared to Ndc80-WT- and Ndc80-9D-expressing oocytes" - The authors probably meant "... Ndc80-9A expressing oocytes."

      4) Discussion - Lines 239-242 - "... microtubule nucleation in later stages may not be determined by MTOCs but are largely attributed to nucleation within the spindle, as observed by microtubule plus-end tracking in bipolar-shaped spindles (Supplementary Figure 4)." - Strictly speaking, EB3 comets indicate microtubule polymerization rather than nucleation. Microtubule nucleation within the spindle is, however, supported by studies of the Augmin complex (e.g., Watanabe et al., 2016 Cell Rep).

      5) Discussion - Lines 257-260 - "The lagging MTOCs can be positioned close to kinetochores on bi-oriented chromosomes, underscoring the importance of active error corrections of kinetochore-microtubule attachments during metaphase (Lane and Jones, 2014; Yoshida et al., 2015)." - The reasoning here is not clear. Does the number/persistence of lagging MTOCs correlate with chromosome mis-alignment or with the efficiency/timing of chromosome alignment in WT cells?

      6) Discussion - Line 266 - "Yoshida et al., 2020" - This article is cited elsewhere in the text as "Yoshida et al., in press".

      Significance

      Courtois et al., have found a new mechanism contributing to acentrosomal spindle assembly in mouse oocytes. Although kinetochore-dependent spindle assembly occurs in mitotic cells (e.g., Toso et al., 2009 JCB), only the recent work from the Kitajima lab (Yoshida et al., 2020 Nat Comm; this manuscript) showed that kinetochores also impact acentrosomal spindle assembly in meiosis. The genetic model presented here brings a significant technical advance in dissecting relative contributions of spindle assembly pathways in mouse oocytes (ex. Schuh and Ellenberg 2007 Cell; Watanabe et al., 2016 Cell Rep; Drutovic et al., 2020 EMBO J) and complements current methods used to study meiotic error-correction (e.g., Chmatal et al., 2015 Curr Biol, Yoshida et al., 2015 Dev Cell; Vallot et al., 2018 Curr Biol and many others). This model expands an existing toolbox of techniques allowing complete elimination of the endogenous protein specifically in mature mouse oocytes (Clift et al., 2017 Cell; Clift et al., 2018 Nat Protocols), which is a difficult feat due to a limited capacity of ex-vivo culture (Pfender et al., 2015 Nature). Therefore, the work presented in this manuscript may encourage other researchers to establish similar systems for oocyte-specific manipulations, which will allow more precise insight into oocyte biology.

      Expertise keywords: spindle dynamics, chromosome segregation, mitosis, meiosis

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      I am commenting on the work of Courtois et al. as an expert in the biochemistry of spindle formation with a focus on acentriolar assembly.

      First and foremost, this a technically excellent study with a number of very interesting and well-documented observations, which are highly relevant for our understanding of the mechanisms of acentriolar spindle formation in the mouse oocyte model. In principle, the manuscript is in a very mature state. However, my major concern at this point would be that there is a break in the story. It starts describing the (very interesting) observation of "central MTOCs". After thoroughly investigating how these behave, the authors stop and look at overall MTOCs distribution after loss of stable MT-kinetochore interactions based on oocytes expressing the Ndc80_9D mutant instead of wt Ndc80. The two parts are experimentally and conceptually not well connected.

      Answering the following questions may help to further develop the paper:

      1. If I understand the arguments correctly, central MTOCs are an "accident" on the way to complete meiosis I spindle formation, which will eventually be corrected and all MTOCs clustered at the poles. Thus, they may serve as an assay for spindle assembly fidelity and kinetics (?). At this point, the reader is left with the observation without efforts to explain the meaning of this observation, ideally experimentally, or at least in a valid discussion.
      2. Enthusiasm for the technically excellent experiments using the Ndc80 variants are somewhat reduced as conclusions from these experiments are published in the parallel paper of the same laboratory (Yoshida et al.). Due to my opinion, it may thus be even more important to connect these observations with the first part described central MTOCs and to clarify their significance.
      3. Shown if in Fig. 3B but not fully explained: How does the distribution of what is defined as central MTOCs behave in Ndc80_wt and Ndc80_9A mutant oocytes? Do the variants differ, i.e. are there fewer, or less persistent central MTOCs in the 9A mutant? Would they differ in kinetics of appearance and "rescue" to the poles?
      4. Similarly: is there a correlation of central MTOC appearance, Ndc80 phosphorylation/stability of kinetochore attachment and Anaphase I onset? The authors mention that oocytes expressing the 9A mutant go faster into Anaphase.
      5. The observation that "central MTOCs exhibited correlated motions with closely positioned kinetochores" is poorly defined, yet an important observation. Does this mean some sort of short k-fibers remain to connect central MTOCs and kinetochores? Wouldn't one expect that the loss of stable end-on-attachment causes MTOCs to become central? How does this fit into a/the model?
      6. Along the same lines: The authors hype their conclusion that kinetochores dominate meiosis I spindle formation based on the observation that loss of kinetochore functions results in less well-organized spindle poles and worse MTOC "confinement". This may mean that kinetochores, together with MTOCs, maintain stable k-fibers in meiosis, as shown here and in Yoshida et al. When one or the other end of k-fibers is destabilized (loss of end-on-attachment, loss of MTOC attachment), the fibers collapse and the remaining minus-or-plus-end associated structure loses its destination. We then see central MTOCs and/or kinetochores at poles. In this respect, the interpretation / discussion should be less "kinetochore-centered".
      7. Is there any way to determine the efficiency of Ndc80 knockdown in the gene replacement respective experiment? I share the view of the authors that their method may be more efficient and may explain apparent discrepancies to previous studies on Ndc80-9A (Guy and Homer, 2013) with more dramatic effects on spindle geometry. However, at that point, this remains speculative. For instance, one may also speculate vice versa that the ko strategy used here is less efficient in a maternally dominated system and leaves behind more wt Ndc80, which better compensates defects seen in the 9A mutant.

      Significance

      Courtois et al present data on mechanisms governing spindle assembly in mouse oocytes. Mouse oocytes serve as model system for spindle formation in the absence of centriole-based MTOCs. At the onset of meiosis I, numerous MTOCs form, which shape a mass ("ball") of MT nucleated around chromatin into a bipolar structure. Accumulating evidence indicates that kinetochores play an important role in acentriolar spindle formation in mouse oocytes, yet the mechanisms behind kinetochore action remains unclear.

      Here, Courtois et al. analyze spindle formation in live mouse oocytes using 3D-time-lapse imaging. They use fluorescently tagged Cep192 to track MTOCs and Histone H2B or CENP-C to visualize chromatin or kinetochores. In the first part, the authors deal with the appearance of "central MTOCs", i.e. aggregates of centrosomal protein(s) that, apparently, fail to remain stably integrated into the spindle pole clusters on MTOCs during spindle formation. The authors convincingly demonstrate that these central MTOCs can be seen in the majority of spindles investigated. They demonstrate that central MTOCs generally come from positions at poles from where they "fall back" towards chromosomes. Central MTOCs may even cross the spindle and end up at opposite poles from where they originated from. Interestingly, central MTOCs are often found next to kinetochores.

      In the second part, the authors focus on the role of kinetochores and their stable MT attachment for spindle formation in general and bipolarity/pole organization in particular. The same lab has published data on the role of kinetochores in meiosis I spindle very recently (Yoshida et al. Nat Comm, 2020). Here, they successfully exploit Ndc80 phospho-mutants to compare MTOC distribution in oocytes with reduced or increased end-on-attachment. The data show that stable end-on attachment determines stable MTOC clustering at spindle poles and governs the maintenance of bipolarity and spindle length.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This group has been at the forefront recently of using imaging technologies to understand how chromosome segregation is coordinated in mammalian oocytes, and why errors occur. In the current paper they examine the dynamics of microtubule organising centres (which effectively replace centrioles/centrosomes in oocytes) in MI. The imaging of oocytes in this paper is beautiful. The major findings are (1) that MTOCs that are supposed to be at the spindle pole sometimes end up at the spindle equator, and this is documented very beautifully and (2) the correct positioning of MTOCs at the spindle pole appears to require kinetochore microtubules, as indicated by experiments manipulating the kinetochore component NDC80.

      Major Comments

      As such the major claims of the paper are basically well supported. However, the analyses are is almost entirely restricted to prometaphase/metaphase, and the conclusions are relatively limited. The salient omission is any analysis of MTOC/chromosome relationship during anaphase. Were the paper to be extended to determine whether the lingering of MTOCs at the spindle equator is related to chromosome segregation error, that would increase the reach and importance of the work substantially. Specifically:

      1. Can tracking experiments be performed to determine whether the chromosome that shows movement similarities to the errant MTOC is more/less likely to missegregate? Complete tracking as these authors are expert at should achieve this, or photo-labelling the desired chromosome.
      2. Can the position of MTOCs (proportion that linger at the equator) be manipulated in the absence of other defects to determine whether this increases errors (lagging at anaphase, metaphase-II chromosome counting spreads)?
      3. The above analysis would have to be well supported by controls showing that these constructs are having no impact on normal anaphase (proportion of oocytes completing meiosis-I, likelihood of lagging chromosomes etc).
      4. Related to the above, though I appreciate a fixed metaphase image of MTOC immunofluorescence is presented, the paper is about the dynamics of MTOCs and thus nonetheless relies heavily on the live imaging of cep192. The core results should be confirmed using another (substantially different) MTOC probe. This final comment applies to the current metaphase data, regardless of whether the study is ultimately extended

      Significance

      As explained above, as presented this paper is largely scientifically sound, but far more limited in scope than this groups other recent papers. As explained above, the paper would be made more impactful and the readership broadened if a relationship between MTOC position/movement and segregation problems were established. Or on the other hand if it were established why some MTOCs sometimes linger at the spindle equator. Whilst to my knowledge this is the first time that equator MTOCs have been documented so carefully, oocyte cell biologists may not find the core observation that MTOCs are occasionally at the spindle equator extremely surprising.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank Reviewer #1 and #2 for the evaluation of our research and comments to our manuscript. Their comments are highly appreciated and addressed as described below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      *Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).*

      Here Ha et al. has further developed their Pumilio RNA tagging methodology for the isolation of UV-crosslinked proteins that are suggested to associate with Xist RNA in mouse embryonic stem cells (mESCs). Within this study the authors claim to have found the Lupus antigen RNA binding protein (La) as a novel Xist interacting partner that influences the efficacy of X-chromosome inactivation (XCI). The authors use a number of different techniques such as qPCR, fluorescent imaging, ATAC-SEQ and SHAPE to show aberration of XCI upon La shRNA knockdown. However, this study has significant flaws in the efficient isolation and validation of Xist associated proteins using their FLAG-out methodology. Furthermore, later experiments predominantly focus on cell death/survival assays, which is somewhat troubling given the essential roles La plays in processes such as cell differentiation and proliferation, ribosome biogenesis, transcriptional control and tRNA maturation. I feel the authors need to robustly address the potential effects La knockdown may be having on their mESCs.

      Reviewer #1 did not fully understand the basic designs of the experimental systems (FLAG-out and iXist), and completely rejected these experimental systems. Reviewer #1 also ignored the majority of the functional analysis on the candidate protein, Ssb. These issues cannot be addressed by additional experiments

      **Major comments:**

      *-Are the key conclusions convincing?*

      My major concern is in their Xist RNA purification.

      First of all, I couldn't find any data on proving the enrichment of Xist RNA itself in their Pumilio pull-down experiment. It would have been useful to show Xist RNA enrichment before benzonase step. Secondly, it is hard to imagine the protocol would successfully isolated Xist RNA-protein complexes from the cell. An earlier report by Clemson et al., (J Cell Biol., 1996) has shown that majority of Xist RNA is still stuck in the nucleus after nuclear matrix prep protocol using detergent, which is not so different from the authors' protocol. Moreover, the authors used UV crosslink, which would have made even harder to purify Xist RNA without sonication. Thirdly, as the tag is located on 5' of Xist RNA, it is rather surprising to see that Spen is not detected in their pulldown. Spen is one of the main functional interactors with Xist, robustly detected by several previous reports. Similarly, other high-affinity binders of Xist such as hnRNP-K and Ciz1 were also lacking from this screen. Finally, the peptides found associated with FLAG-out Xist are extremely low in comparison with other data using glutaraldehyde or formaldehyde crosslinking. For example, HnRNP-M found in Chu et al 2015 has 1120 peptide counts in differentiated cells. The authors here use HnRNP-M as a baseline for specific interactions and show a total of 6 peptide counts in Xist expressing cells and 5 in i-Empty cells (Supplementary excel sheet 1). Similarly, the La protein of interest in this study has 8 counts in i-FLAG-Xist and 6 counts in i-Empty. I struggle to see how this result indicate specific Xist binding. Worryingly this is the starting rationale for the rest of their experiments, it is hard to therefore accept the rest of their conclusions either.

      We have detected Xist RNA after Pumilio pull-down, and added the data in the revised manuscript (Figure S1). The enrichment of Xist RNA by Pumilio pull-down is about 75-fold, comparable to the enrichment reported by Minajigi et al.

      Two out of three previous studies used similar protocols to prep cell lysates for co-IP, including UV cross-linking and detergent (McHugh et al. 2015 and Minajigi et al. 2015). The major difference between their protocols and ours is the co-IP step. They used antisense oligos to pull-down Xist RNA-protein complex, while we take advantage of the specific interaction between PUF and PBS to pull-down Xist RNA-protein complex. With the data in Figure S1, we are confident that our strategy is successful in isolating Xist RNA

      For systematic identification of Xist binding proteins, each method has its own strength and weakness. As we described in the introduction, only 4 proteins were commonly identified by all three studies to systematically identify Xist binding proteins. There is no doubt that our method also missed some authentic Xist binding proteins (false negative) and identified some false positive candidates. Thus, we have to be careful in balancing between the false negative and false positive calls. The reason that we applied the ranking gain to identify Xist binding protein candidates, is to minimize the false negative rate. Meanwhile, we compared our Xist binding protein candidate list with previous identified Xist-binding proteins to enhance the confidence in our candidate lists.

      Regardless the strength and weakness of our method, Ssb is also an Xist-binding protein identified by another study (Chu et al. 2015). More importantly, we have provided experimental validation to confirm Ssb’s involvement in XCI and extensive functional analysis to reveal the protein’s mechanistic role in XCI.

      The other key conclusion the authors make is from the use of numerous cell death/survival assays for both male and female cell lines. This is extremely troubling in the context of assessing their target protein La. La is involved in multiple RNA maturation events of rRNAs, tRNAs and other polIII transcripts. Furthermore, La has been implicated in binding to the mRNA for Cyclin D1 in both human cells and mouse fibroblasts (NIH/3T3 - male) which show a significant effect on cell proliferation upon siRNA knockdown https://www.nature.com/articles/onc2010425. This, along with the observation that La knock-out blastocysts fail to develop any mice or ES cell lines (male or female) show the effect observed in the authors results is most likely not X-linked cell death https://mcb.asm.org/content/mcb/26/4/1445.full.pdf. The authors need to show that their shRNA KD isn't affecting the proliferation and general fitness of their mESC lines.

      The cell death/survival assay was specially designed for analyzing the defect of XCI. The cell death of iXist ESCs upon adding Dox is due to the induction of Xist, which consequently initiates the silencing of the only X chromosome in male cells. Knockdown of genes involved in XCI compromises XCI, thus allowing cell survival. Given the diverse functions of Ssb in cell differentiation and proliferation, ribosome biogenesis, transcriptional control and tRNA maturation, one would expect slow growth and/or cell death of Ssb knockdown cells. Indeed, the result is consistent with our expectation (Figure 2C, without Dox). Nevertheless, more Ssb knockdown cells survive in the presence of Dox, compared with control cells (Figure 2C-E, with Dox), suggesting that Ssb plays an important role in XCI.

      *- Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?*

      As discussed above, I feel the authors have not clearly demonstrated Xist specific protein enrichment and haven't proven X-linked cell death. Due to the lack of necessary control experiments as discussed below, I feel the notion that La is involved directly in XCI as an RNA chaperone is currently preliminary/speculative.

      The FLAG-out experiment just provided an initial point for the study. We have demonstrated the interaction between Xist and Ssb by RIP. And, Ssb knockdown antagonizes the lethal effect of induced XCI in male cells, allowing more cell to survive. This is contradictory to the diverse house-keeping functions of Ssb, which should lead to slow proliferation or cell death. Therefore, the data here (Figure 2C-E) should suggest a role of Ssb in XCI. In addition, we showed that knockdown of Ssb compromises the silencing of X-linked genes (Figure 2F, 2G, and 3E), the compaction of X chromosome (Figure 3D), Xist cloud formation (Figure 4), epigenetic modifications on Xi (Figure 5), Xist RNA folding (Figure 6F-I), and Xist RNA stability (Figure 7C and D). All these data indicate that Ssb is involved in XCI by regulating Xist RNA folding.

      *- Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.*

      I would suggest them to show RT-qPCR results of Xist RNA enrichment from the sample after flagIP before benzonase treatment.

      We have the data, and added it to Figure S1.

      Also, it would have been more convincing if their negative control construct (i-Empty) would contain 25 copies of PBSb RNA at least.

      This is a good alternative design of the negative control. Using i-Empty expressing 25 copies of PBSb RNA will allow us subtract the background causing by proteins binding to PBSb RNA. Yet, as discussed above, regardless how we improve the experimental setting, we cannot completely avoid the issue of false positive and false negative. Our goal of the FLAG-out experiment is to generate a list of Xist binding protein candidates, and their binding to Xist and their functions in XCI should be validated by additional experiments. With our current experimental setting, a list of Xist binding protein candidates has been generated, and we have validated the role of Ssb in XCI with subsequent experiments.

      In Fig1b, the total amount of proteins loaded on the gel is not equivalent between two lanes. The gel should show equivalent amounts of proteins on the gel. It looks like if the negative control sample had been loaded at the same amount as the one with Xist, the band pattern wouldn't be distinguishable between the two samples. Furthermore, as these samples were used in the following mass spectrometry screen it may suggest that the minimal increase in peptide counts observed in the iXist FLAG-out were due to an increased amount of sample being loaded? No controls are conducted to account for this.

      IP samples of i-Empty and i-FLAG-Xist were loaded in the gel in Figure 1b. It is expected that IP sample of i-FLAG-Xist should pull down more proteins than IP samples of i-Empty. The FLAG-PUFb bands (the strongest band in each lane) are about the same amount in two samples, indicating roughly equal amount of loading. After normalization of gel loading according to the FLAG-PUFb bands, the upper part of the i-FLAG-Xist lane showed some unique bands.

      For mass spectrometry analysis, the loading of two samples are independent, therefore, to compare the absolute amount of each protein between the two samples does not always provide valuable information. Yet, the relative amount of different proteins within one sample is not affected by the loading amount, thus, more informative. Therefore, we used the ranking information to estimate the relative amount of different proteins in each sample and used the ranking gain to further identify protein candidates.

      The authors quantify cell death in figures 2C - E. It seems clear that shSsb 1 and 2 have an effect on cell count even in the absence of Dox. The rescue effect seen upon Dox addition is minimal when compared to Empty + Dox 2D. The authors ∆A-iXist line with and without Ssb KD/Dox would be an informative control on whether the increase in cell survival that they see is X-linked.

      As the reviewer pointed out earlier, Ssb plays multiple roles in cellular processes. Inevitably, KD of Ssb leads to slow growth and/or cell death with or without Dox. Thus, it is less meaningful to compare the surviving cell counts in Figure 2D. Rather, the survival rate (Figure 2E) reflects the rescuing effect more precisely. Shown in Figure 2E, both shSsb 1 and 2 increase the survival rate significantly, compared with Empty control.

      Moreover, the data in Figure 3B and C demonstrated that Ssb KD compromises the survival of female differentiating cells, but not the survival of male differentiating cells, also indicating a role of Ssb in XCI. With these experiments, it should be sufficient to conclude that Ssb KD affects X-linked cell death/survival in both iXist male ESCs and WT female differentiating cells

      The qPCR results used to validate silencing defects show minor changes in expression and also don't show significant silencing of X-linked genes sufficient for cell death. Could this be because only ~ 50 - 60% of Male iXist cells seem to be expressing in the movies and that this will have an effect on the observed qPCR results? Furthermore, it seems counterintuitive that expression in the Empty male cells increases in 48h compared to 14h. Is this due to cell death and positive selection of cells less able to silence their X-chromosome? How would these data look in the female XX line? How would the data look in a ∆A-iXist line in the presence and absence of shSsb/Dox?

      First, high-quality live-cell imaging can only be carried out for 2 hours with 2-min time interval. The movies are meant to show the onset of Xist RNA signals. Therefore, they were taken one hour after Dox treatment (figure legend of Figure 4B-D). After overnight Dox treatment, Xist clouds can be seen in majority of cells.

      Second, in Fig. 2F-G, we did not include uninduced iXist male ESCs. Therefore, it is impossible to judge whether induction of Xist in this male ESC line results in Xist-dependent silencing at 14 and 48 hr. However, in our previous publication (Li et al., JMB, 2018, 430: 2734-2746), it has been shown that Gpc4, Hprt, Mecp2, G418, and TomatoRed are silenced (4- to 16-fold reduction) at 24 and 48 hours after Dox induction.

      Third, the qRT-PCR results in 14 h and in 48 h are not normalized to the same internal control. Thus, they are not directly comparable.

      Confusingly, the male line in Fig 3C shows a drop in live cell count at day 6 of differentiation? Surely given their previous results in Fig 2 the Ssb KD should increase cell viability with +Dox? Ssb KD seems to have an adverse effect on ES cells during extended differentiation protocols. In Figure S1 the authors show ~ 8 - 10% survival of male lines during differentiation. Could the recombination of the Xist sequence around the loxP sites enable the cells to outcompete the dead cells? How would iEmpty and ∆A-iXist cells compare here? Have the differentiated cells been tested for their expression of Xist? Additionally, how are there similar live cell counts for male vs female lines when ~90% of male cells die during differentiation? Were more cells plated at day 4? If so, this would bias the competition of male cell survival and therefore make the male line an inappropriate control.

      Given the essential role of La during development a control is needed to prove that this death is X-linked in the female 3F1 line. For example, an XO cell line retaining the Cast allele and shSsb expression could show the amount of death caused from shSsb alone independent of X-linked cell death.

      The reviewer completely misunderstood the experiment. The severe cell death specifically observed in female differentiating ESCs is a strong evidence showing Ssb is involved in XCI (Figure 3).

      The male ESCs in Figure 3C is a WT ESC line without the inducible Xist transgene, in which no XCI occurs upon differentiation. It is completely different from iXist male ESCs with Dox, in which forced Xist induction leads to XCI. Thus, the diverse functions of Ssb might contribute to the slight decrease in live cell count of wild type male cells at day 6 of differentiation.

      Figure S2 shows the differentiation of iXist male ESCs with or without Dox. As explained above, forced Xist induction silences the only X chromosome in male cells, resulting in cell death. In addition, XCI occurs more efficiently in differentiation condition (Figure S2) than in pluripotent status (Figure 2C)

      During differentiation, female ESCs silence one X chromosome, and the other X chromosome remains active. KD of Ssb compromises XCI, and two X chromosomes in some female differentiating cells maintain active, leading to cell death. The reviewer is correct that we need a control to rule out that the essential role of Ssb during development affects cell survival and death. An XO cell line can be used as a control. Similarly, a male cell line (XY) is also a good control. We already included a male cell line as a control in Figure 3B and 3C.

      If I understood correctly, the RNA FISH used dsDNA probes ("Sx9") against 40 kb of the X-inactivation centre (Xic). Surely Tsix or other Xic transcripts will also be visible? Can the authors use their RNA FISH to determine the XX or XO status of their cells? In Figure S5 a number of cells appear to show a single pinpoint of transcription. This could either be low levels of Xist transcripts or Xic transcription from an XO line in which the 129 chromosome is missing. It would be best to solely quantify cells which have two x chromosomes and if a significant amount of X chromosomes have been kicked out, this should be discussed and controlled for.

      This is a valid concern, but this concern can be adequately addressed with the available data in the manuscript.

      First, if the female Ssb KD cell line is an “XO” cell line, in which the X129 allele is “kicked out”, the RNA allelotyping results should show an absolute “silencing” of the X129 allele. However, in complete contrast to this notion, RNA allelotyping detected “more” RNA transcripts from X129, showing the chromosome-wide XCI defects (Figure 3D).

      Second, overexpression of Ssb in Ssb KD female cells restores the Xist clouds and the polycomb marks (Figure S8), suggesting that the Ssb KD female cells are XX, but not XO.

      Third, the severe cell death specifically occurred in female Ssb KD lines is also against the “XO” argument (Figure 3B&C).

      In Fig6, the authors generated a number of Ssb constructs for a rescue assay. However, these results complicate the matter and raise more questions than they address. It seems odd that the ∆RRM1 does not rescue based on comparison with their putative negative control, ∆NLS. However, the ∆RRM1 + 2 and ∆LAM do rescue the phenotype better than the full length Ssb? This makes no logical sense and highlights the inherent variation in cell viability these generated cell lines seem to show.

      Following on from this, figure S7 quantifies the GFP tag mRNA levels, depicting all ∆RRM mutants with expression below ~30%? How can ∆RRM1 or 2 be rescuing in this scenario? Have these lines been tested for their XX or XO status? The loss of an X chromosome would lead to a rescue of the cell death phenotype, which is a process known to occur in XX lines that have been cultured for extended periods of time. Could it also be that the cell lines derived are more or less sensitive to exogenous shRNA expression? Also, further validation is needed to assess the efficiency of KD in these lines as theoretically most of these constructs will be targeted by shRNA? What is the endogenous Ssb expression level in these lines? Where in the mRNA sequence are the shRNAs targeted to? Does this make sense on the relative expression levels of ∆RRM1/2 for example? Further testing of GFP expression could also be assessed by quantitative western blot of GFP or even visualised in their RNA FISH/IF samples (Figure S8), currently neither are shown. In addition, some kind of information of stability of each Ssb protein constructs has not been demonstrated.

      Our shRNA targets the LAM domain, so the expression of ∆LAM is not affected by the shRNA. The reviewer is correct that the detected GFP expression levels of ∆RRM1 and ∆RRM2 are too low to be conclusive. We have removed the data point of ∆RRM1 and ∆RRM2. Meanwhile, it is clear that ∆RRM1&2 has a better rescuing effect than ∆NLS, when ∆RRM1&2 and ∆NLS are expressed at similar levels. Ssb is a well known RNA chaperone/RNA helicase. Identifying Ssb is an Xist-binding protein already suggests the functional role of Ssb in XCI. The data of the plasmid rescue experiments further suggests that Ssb is involved in XCI as a RNA chaperone/RNA helicase.

      As for the Western blot and GFP fluorescence (IF), we have tried both. Neither of them detected GFP signal, reflecting the low expression level of these GFP fusion proteins. As the reviewers pointed out that the shSsb is not targeting the 5’ or 3’-UTR region, therefore, interfering the exogenous Ssb as well. This might be a reason for the low expression of these GFP fusion proteins.

      For the data shown in Figure 7A and B the authors quantify the % of cells with Xist signal. The authors have already shown a defect in Xist visualisation in Ssb KD. Surely it is plausible to assume a faster loss of Xist signal below background in weaker expressing cells. A more appropriate quantification would be the % loss of Xist signal per cell over time.

      With Figure 7C and D, the samples have been treated with actinomycin D which globally affects the transcription of cells even the PolIII associated genes Ssb is needed to mature. This treatment could have an added effect on cell mortality and function. Data confirming that actinomycin D doesn't affect the cells disproportionately is needed. The difference in half-life could be attributed to such a treatment.

      We agree with the reviewer that monitoring Xist signal loss per cell would be a better way to analyze the data. However, in Xist signal loss experiment, snapshot images were taken at four time points (1h, 2h, 3h and 4h). This is not a time-lapse imaging. High-quality time-lapse imaging can only be done within a 2-hour time period with 2-min time interval. Therefore, cell-tracking cannot be done in this experiment. In addition, even though Ssb KD slows down the formation of Xist cloud within the early phase (3 hours) of Xist induction (Figure 4), prolonged (overnight) Xist induction leads to Xist cloud formation in a significant fraction of Ssb KD cells, and the Xist cloud signals are about the same in WT and Ssb KD cells (Figure 7A, 0 h). Similarly, qRT-PCR also revealed that Xist RNA are at the same level in WT and Ssb KD cells (Figure 7C, 0 h). These data argue against that a faster loss of Xist signal in Ssb KD cells is due to weaker initial Xist signal.

      Actinomycin D was added at the last 11 hours of the experiment. During this period, no obvious adverse effects on cells were observed.

      In summarising the authors claim that La binds Xist to facilitate folding and appropriate spreading of Xist along the X-chromosome. No direct interaction has been shown, CLIP-seq data would resolve this, however I do understand this is a challenging technique. The authors have instead opted for RIP followed by qPCR (Figure S2). However, this process has a greater potential for non-specific recovery of RNAs via indirect binding. Furthermore, qPCR may also amplify the relative abundance of the RNA detected. As multiple nucleolar proteins came down in the mass spec screen and FLAG-Ssb is being over expressed, it is plausible to assume some transient Xist interactions may arise from nucleolar association at which La will be in high abundance. Positive and negative nuclear RNA controls (e.g. 7SK and U1 snRNA respectively) could be used so to determine the amount of non-specific Protein-RNA interactions in their RIP pull downs. Cytoplasmic actin is not an appropriate control as it is cytosolic.

      We have to clarify one point that the mass spec screen analyzed samples pulled down by FLAG-PUFb, but not FLAG-Ssb.

      We did not intend to distinguish whether Ssb directly binds Xist or is just associated with Xist. RIP followed by qPCR is sufficient to prove the association between Ssb and Xist RNA.

      We can include nuclear RNA as controls, if the reviewer regards RIP as a valid method to show protein and RNA association

      Other than this the authors may want to probe (via IF) for the presence of La accumulation on the X? Many other know factors such as Ciz1, hnrnpK and PRC1/2 complexes show clear accumulation on the X. If I understand correctly, there are many La antibodies on the market and endogenous levels on the X could be assessed. These antibodies may be useful in IP's and pull downs also.

      Many XCI factors play extensive roles in the cell and are not clearly enriched on Xi, including Spen (Moindrot et al. 2015). We have tried the immunostaining and did not detect Ssb’s enrichment on Xi. Ssb shows a general distribution in the nucleus without a clear enrichment on Xi (data not shown).

      *-Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.*

      The experiments suggested above are centrally focussed on the cell lines that are currently in the authors possession with maybe exceptions with the ∆A-iXist-shSsb line suggested. However, this should be reasonably quick to obtain given their previous work for this paper. Most experiments suggested will focus on the validation of karyotype, Xist expression, rescue construct expression, further RNA FISH classification and repeating more appropriate positive and negative controls for a number of experiments. In theory this can be obtained relatively simply and quickly from current resources. But with the sheer volume of further experiments that are required here, this may take a significant amount of time.

      One vital improvement needed is the replication of mass spec data and the validation of Xist specific recovery and protein enrichment. As it stands this manuscript seems to not have any replicates of the FLAG-out methodology and mass spec data. This is troubling given the poor recovery and specificity of the protein samples obtained. Repeating these experiments would be costly in time and also financially. As it stands, I feel this is essential to conclusively validate their target of interest.

      *- Are the data and the methods presented in such a way that they can be reproduced?*

      The data is presented relatively well, however, it would be beneficial if deailed methods were in the main text and not in a supplementary file. Similarly, more information about the process of differentiation and how cell death/survival was quantified and validated is needed.

      The reviewer rejected the basic design of the experimental system and ignored the majority of the functional analysis data. No additional experiment can address these issues

      We can include more information in the main text, regarding Ssb. However, there is limited space for the main text, various depending on the journals. Meanwhile, the current citation on Ssb is adequate to emphasize that Ssb is a versatile RNA binding protein involved in a variety of fundamental RNA processing events in the cell.

      *- Are the experiments adequately replicated and statistical analysis adequate?*

      In the most part yes, however there seems to be no replicates of the FLAG-out mass spec screen which is worrying given the minimal specificity observed in the current data.

      As we mentioned above, the FLAG-out experiment only serves as a starting point to generate a list of Xist binding protein candidates. Rather than repeating the FLAG-out experiment, we compared the result of FLAG-out to previously published lists of Xist binding protein candidates. More importantly, additional experiments are carried out to validate the Xist binding proteins identified by FLAG-out.

      **Minor comments:**

      *- Specific experimental issues that are easily addressable.*

      Unfortunately, the majority of experimental issues need to be addressed with more robust data which are highlighted above. However, some image analysis, quantification and classification can be amended relatively easily. For example, the live-cell imaging data should be quantified as loss of signal as discussed and RNA FISH should be used to classify XX positive cells and the XO cells can be discarded from analysis.

      We have addressed these issue in the previous sections of this rebuttal.

      *- Are prior studies referenced appropriately?*

      Most papers regarding Xist pull down and biology are discussed and referenced appropriately. However, the role in which La plays during development and its aberrant affects upon KD are seemingly downplayed. I would like to see more discussion of potential defects that could be caused due to globally altering cellular RNA folding.

      We have tried to cite key references about Ssb in development and RNA folding. Due to length limitation, we cannot cite all references in the topic. If necessary, we could discuss the possibility of indirect effect of Ssb KD on XCI through globally altering cellular RNA folding.

      *- Are the text and figures clear and accurate?*

      For the most part, lots of the figures are clear and accurate. Apart from these exceptions.

      1.The Y-axis of Figure 2D is confusing. What does 0.3 as a "sum of area" equate to? 30% of the area was ES cells? This doesn't look to be the case from Fig 2C. Also, how does the intensity of the signal compare? The area may not be a good quantification due to ES cells growing in colonies.

      We have revised the Y-axis labelling of Figure 2D to “sum of area cm2”. Thus, “0.3” means that the area of ESCs is 0.3 cm2. ALPP is highly expressed on ES cell surface. ALPP stain usually produce saturated stains on ES cell colonies. Thoroughly stained ES cell colonies, big and small, show similar signal intensity levels. To analyze the “total signal intensity” will be not much different from “sum of area”.

      2.In the Movies S1-7 there are boxes around certain cells and marked with "Figure 5a - c". This seems to be incorrect as figure 5 is currently the IF staining of polycomb marks. I assume this is in relation to Figure 4b-d?

      We have corrected the labelling mistakes.

      3.Similarly, in Movies S1-7, the intensities of Xist foci seem by eye to be similar. In the paper it is claimed that the Xist clouds that do form are lower in intensity. Are the Movies depicting the same range of pixel intensities? If not, this should be amended. Similarly, figure 7 seems to show relatively equivalent RNA signal at 0 h?

      All the images were collected using a fixed standard of the microscope and camera setting, and these movies depict the same range of pixel intensities. Movies S1-S3 are WT control, and Movies S4-S7 are Ssb KD cells. The Xist cloud signals are weaker in Movie S4-S7 (also quantified in Figure 4E). For the Xist cloud signal, not only the intensity, but also the area of Xist cloud, have to be taken into account.

      The 0 h in Figure 7 is after overnight Dox treatment, and different from the time point in Movies S1-7 (maximum 3 hour Dox treatment, figure legend of Figure 4B-D). The discrepancy can be explained by that knockdown of Ssb only slows down the formation of Xist clouds. After overnight forced expression, the Xist RNA still shows an accumulation in the cells. Figure 7 shows the forced accumulation of Xist RNA after prolonged Dox treatment disappears faster after Dox withdraw.

      4.In figure 4A the data is from female XX cells, this should be highlighted to limit confusion with the male iXist data shown below in 4B-E. It would also be helpful to have the male/female icons (as in figure 3B), for each figure that has images of cells. Currently Figure 4, 5, 7, S5 and S8 are lacking these icons.

      We have revised the labelling on Figure 3, 4, 5, 7 S6 and S9 (S5 and S8 before revision).

      5.No explanation of the Flag-Ssb expression is given for figure S2. Furthermore, is it really necessary to express Flag-Ssb? There are reasonably good antibodies out there for Ssb as this was how it was originally found in Systemic Lupus patients. Also, no data showing the amount of Ssb being overexpressed is shown. This may have big implication to the validity of the RIP-qPCR analysis.

      We could perform qRT-PCR to quantify the overexpression level of Flag-Ssb. If required, we could use Ssb antibody to do Western blot to show the amount of Flag-Ssb protein.

      *- Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Most of the data is presented reasonably well, but the robustness of the data somewhat retracts from their conclusions. I feel the certainty of their conclusion regarding Xist specific La binding and RNA chaperone activity is still presumptive and should be rewritten unless more robust data can confirm Xist interaction. I would also suggest deciding on the nomenclature for the protein of interest and use either La or Ssb, the continued use of both through the figures and text can get a little confusing to the reader.

      In the current literatures, Ssb seems to be commonly used as a gene name and La is used as a protein name. We have revised the manuscript to use one name “Ssb” to describe both the gene and the protein.

      Reviewer #1 (Significance (Required)):

      *- Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.*

      It was a good trial to use PBSb-PUFb system to purify Xist RNA binding proteins, compared to previous reports had used anti-sense oligo purification using complementary sequence to Xist RNA sequences. But currently the purification still needs further validation and repeats to confirm its use. A potential complementary technique could be to isolate Xist directly by using biotinylated probes against the PBSb sequence.

      The authors further claim the identification of a novel Xist RNA chaperone (La/Ssb) which they say facilitates XCI progression. This would be a novel finding in the field; however, the data is currently not robust enough to support this

      *- Place the work in the context of the existing literature (provide references, where appropriate).*

      This work has focused on the development of a milder methodology for purifying Xist RNA during XCI. Others have published similar methodologies predominantly focusing on purifying Xist RNA directly with biotinylated probes (McHugh et al. 2015; Minaji et al. 2015; and Chu et al. 2015). Although this method boasts a milder purification method, it seems to be low yielding in Xist specific proteins. Others have shown a more robust identification of bona fide Xist binding proteins which are currently missing in this manuscript. A recent preprint from the Plath lab has identified new factors involved in XCI during differentiation and their tethering/rescue experiments are far more convincing than the ones shown in this manuscript https://www.biorxiv.org/content/10.1101/2020.03.09.979369v1. The candidate protein Ha et al. have identified has multiple roles in developing cells and has shown to be important during mouse development. However, Ha et al do not robustly show that the knockdown of Ssb causes X-linked cell mortality. Alternatively, as would be presumed from Ssb's essential role in many housekeeping short non-coding RNAs, the cell death seems more ubiquitous upon shRNA KD. Therefore, the link the authors are making here are relatively weak.

      Ssb KD rescues cell death caused by forced induction of Xist in male ESCs. In addition, Ssb KD leads to cell death in differentiating female ESCs, while it has a negligible effect on cell death in differentiating male ESCs. These data clearly demonstrated X-linked cell survival/mortality by Ssb KD.

      Plath lab’s work is different from ours. In their manuscript, the authors report the observation of a protein condensation which is assembled by Xist but sustains in absence of Xist. TDP-43 (a.k.a. Tardbp) happens to be one protein factor involved in the protein condensation and also one candidate protein selected for further validation in our study. In our study, Tardbp KD did not rescue cell death caused by induced XCI in male cells. Thus, Tardbp is not further studied. In the manuscript, we have discussed the possibility that low efficiency of knockdown and redundancy might contribute to the failure in validation of Tardbp

      *- State what audience might be interested in and influenced by the reported findings.*

      The audience may be interested in the novel technique and the finding of a novel Xist binding protein.

      *- Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.*

      RNA biochemistry and developmental biology

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      This manuscript describes a novel "FLAG-out" system, where the authors sought to identify Xist RNA binding proteins. The authors focused on a specific protein found in their screen and also identified in several other screens for Xist RNA binding proteins, Ssb/La, and further characterize the role of this protein in XCI. This manuscript describes the loss of Ssb/La and suggest that it predominately impacts the canonical 'cloud' formation of Xist RNA on the X chromosome during XCI initiation. Further, they determine that loss of Ssb/La decreases Xist RNA half-life and alters folding of Xist RNA transcripts. Based on their findings, the authors propose that Ssb/La functions to directly bind and fold Xist RNA transcripts in a manner that stabilizes Xist RNA, allowing for proper 'cloud' formation and successful initiation of XCI.

      **Major comments:**

      The authors made an interesting findings that the SLE-relevant autoantigen Ssb/La stabilizes Xist RNA transcripts, and there is some evidence that this occurs by binding and maintaining proper folding of Xist RNA. Despite these intriguing observations, there are many parts of the manuscript that need to be addressed in order to support the authors main conclusions.

      The most troubling aspect of this manuscript is the persistent use of an artificial XCI system in male cells to draw strong conclusions about the function of Ssb in XCI. This issue is prevalent throughout the manuscript, and I question why the authors chose to perform most of their experiments in male cells when the same experiments can be (and have previously been by other groups) performed in female cells. Using male ESCs and then making conclusions for XCI, which is a female-specific process, is a major concern.

      In addition to iXist male ESC line, many experiments, such as cell death/survival (Figure 3B, C), allelotype (Figure 3E), Xist could formation (Figure 4A), H3K27me3 and H2AK119ub IF (Figure 5), were performed in female ESC. We chose to do SHAPE and Xist RNA stability assays in iXist male ESC line, because the onset of XCI is much more synchronized in this system. Moreover, in female cells, Xa causes additional layers of complication/noise in the ATAC-sequencing which may not be fully cleared up by data analysis. On the other hand, inducible Xist expression in male ESCs can be used as an experimental system to recapitulate the silencing step of XCI (Ha et al. 2018; Wutz et al. 2002).

      • Out of the 138 identified binding proteins, the authors chose to only validate three: Mybbp1a, Tardbp, and Ssb/La. The logic for choosing these candidates is weak, and the authors are only able to validate 1 out of 3 of these proteins.

      In theory, all candidate proteins in the list are possibly involved in XCI. There is no method which can help to make accurate prediction. We did not follow a clear-cut logic in selecting candidates for validation, but we do consider the candidate gene’s knockout phenotype, “early embryonic lethality”, as a phenotype consistent with a critical role of the candidate gene in XCI. Meanwhile, in the manuscript, we have discussed why we chose the three proteins for validation as the following:

      “……From the candidate proteins, we shortlisted three proteins for individual validation. Myb-binding protein 1A (Mybbp1a, Q7TPV4) and TAR DNA-binding protein 43 (Tardbp, Q921F2) were selected because they are known transcription repressors (11, 12). The Lupus autoantigen La (P32067, encoding-gene name: Ssb) was selected because systemic lupus erythematosus (SLE) is an autoimmune disease characterized by a strikingly high female to male ratios of 9:1 (13). Moreover, its autoimmune antigen La is a ubiquitous and versatile RNA-binding protein and a known RNA chaperone (14). All the three selected candidates have also been identified as Xist-binding proteins in previous studies (2, 4). Moreover, the knockout of these three genes all lead to early embryonic death. Tardbp knockout causes embryonic lethality at the blastocyst implantation stage (15). Mybbp1a and Ssb knockout affect blastocyst formation (16, 17). Early embryonic lethality is a mutant phenotype consistent with a critical role of the mutated gene in XCI (1)** ……”

      We used cell death/survival assay to further validate the role of Xist binding protein candidates in XCI. This is a stringent assay. It requires not only that Xist binding protein candidates bind to Xist, but also that the candidates have to be functionally important in XCI.

      Indeed, it has been demonstrated by Plath lab (the BioRxix manuscript mentioned by reviewer 1) that Tardbp (also named TDP-43), together with other RBPs, bind to the E repeat of Xist to form a condensate and create an Xi-domain. Yet, Tardbp KD did not rescue cell death caused by forced XCI in male cells in our studies. Thus, only 1 out of 3 of these candidates is validated and further studied. In the manuscript, we also discussed that low efficiency of knockdown and redundancy might contribute to the failure in validation of Tardbp and Mybbp1a.

      • Use of the cell death assay is not strong enough to "confirm that La is involved in induced XCI" as stated by the authors. This is a huge overstatement.

      Given the diverse functions of Ssb in cell differentiation and proliferation, ribosome biogenesis, transcriptional control and tRNA maturation, one would expect less surviving Ssb knockdown cells. In contrast, more Ssb knockdown cells survives in the presence of Dox, suggesting that Ssb plays an important role in XCI. Considering the reviewer’s comment, we revised the sentence to “further suggest that Ssb is involved in induced XCI”.

      While the authors observed differences in X-linked gene expression after Ssb KD, they did not examine expression of these genes in after KD of either Mybbp1a or Tardbp. Are the changes observed in these genes specific to Ssb KD? Or could there still be alterations of X-linked gene expression in the non-validated KDs? This experiment should be performed and included in the manuscript, either within Fig 2 or in the supplemental. As well, inclusion of a well characterized positive control, for example Hnrnpu, as comparison to Ssb should be included.

      Mybbp1a and Tardbp were not validated by the cell death assay. Thus, compared with Ssb, Mybbp1a and Tardbp are less important for XCI functionally. We only focused on Ssb in the subsequent studies. Mybbp1a and Tardbp KD could be additional negative controls. Yet, we have used empty vector as a negative control. We do not need so many controls.

      As mentioned, Tardbp indeed binds to Xist RNA. It is very likely that Tardbp KD might alter some X-linked gene expression. This rules out Tardbp KD as a good negative control.

      If we do not see any effect of Ssb KD on X-linked gene expression, a positive control is absolutely required. However, we have detected that Ssb KD compromises the silencing of several X-linked gene. A positive control might not be essential.

      • The authors perform RIP to validate the interaction of Ssb with Xist, but this is performed in male ES cells with induced Xist RNA and with FLAG-tagged Ssb. Aside from these cells being male, in this system Xist RNA expression is much higher than would be found endogenously. RIP should have been done in female differentiated ESCs if there is in fact a role for XCI.

      • The authors need to include more details in the methods section to explain how the FLAG-Ssb is expressed in these cells, and why the authors chose to use a tagged contrast over endogenous Ssb. Due to these issues the result from this experiment is essentially meaningless and is not convincing of Ssb interaction with Xist RNA. There is no reason RIP cannot be performed in female cells, and the authors should repeat this experiment in the relevant experimental condition. As well, if a validated Ssb antibody exists the authors should perform RIP using the endogenous protein.

      If required, we could try to perform RIP and/or CLIP using Ssb antibody in female cells.

      The authors state in Fig 3A-C that the results of the cell death and differentiation experiments "...support a functional role of La in XCI". The authors state earlier that Ssb is a ubiquitous protein that is embryonic lethal (in both female and males). Based on this, the cell death results shown do not support a functional role of La in XCI as the Ssb KD could be having an indirect affect due to its other developmental functions. This manuscript lacks a direct functional link between Ssb and XCI; more data is necessary.

      Given the diverse functions of Ssb in cell differentiation and proliferation, ribosome biogenesis, transcriptional control and tRNA maturation, one would expect less surviving Ssb knockdown cells. In contrast, more Ssb knockdown cells survives in the presence of Dox, suggesting that Ssb plays an important role in XCI.

      For the data in Fig 3A-C, Ssb KD causes the death of female differentiating cells, but not male differentiating cells. Therefore, it rules out that the death of female cells is due to the general function of Ssb. Rather, the specific role of Ssb in XCI contributes to the female specific cell death.

      In Fig 3D, the authors perform ATAC-seq in inducible male ES cells. The authors claim that the extremely slight reduction in chromatin compaction of the Ssb KD compared to control iXist "directly connect La to the heterochromatinization of Xi, supporting a functional role of La in XCI". This is also an overstatement based on the minimal, and possibly indirect, change in compaction. The positive control i-detaA-Xist sample has significantly less compaction (and thus significantly higher compaction defect) than the Ssb KD again disputing the claim stated above. It is unclear why performing ATAC-seq is even necessary, as Ssb isn't stated to have a function in regulating chromatin architecture. In addition, why the authors performed ATAC-seq in the artificial male XCI system and not in the F1 female cells, and the N of the experiment is unclear. If the authors want to include the ATAC-seq in further revisions it should be repeated n=3 in the female system.

      The male induced XCI system provides a more synchronized onset of XCI. More importantly, in the male induced XCI system, only one X chromosome exists, avoiding the interference from the active X chromosome in female cells. If ATAC-seq was performed in female cells, only loci with SNPs can be distinguished. The sequencing reads from Xa will create additional layers of complication/noise which may not be cleared up fully by data analysis

      “i-delat-Xist” is a positive control to show the experimental system works. It is not justified to compare the chromatin accessibility of the mutant, which is only a Ssb “knockdown” mutant, and the control “i-delat-Xist”, in which the Repeat A is “deleted”. We admit that ATAC-Seq results did not reveal a drastic difference in chromatin accessibility between the wild type sample and the mutant sample. However, as what we discussed in the manuscript, clear difference can still be seen at the 14 h time point. This is shown clearly by the heatmap (Fig. 3E) and the sequencing coverage profile (Fig. S4A).

      • In Fig 6, the authors state in their methods that "The shRNA construct, which worked efficiently against Ssb, was not designed against the 3' UTR of the RNA. Therefore, the shRNA is against some of the rescue plasmid constructs. Nonetheless, transfecting the Ssb knockdown cells with the rescue plasmids should compensate the effect of Ssb knockdown and serve as a rescue assay to study the functional domains of La.". This is troubling and seems like a major experimental issue; the specific rescue constructs that may be impacted by this issue are not stated and should be explicitly mentioned. This becomes more confusing when examining the data from rescue experiments.

      We pointed out this issue in the original manuscript. We agree that the experiment was not perfectly designed. In the revision, we added in the information on the shRNA target site. Our shRNA targets the LAM domain, so the expression of ∆LAM is not affected by the shRNA. We agree that the detected GFP expression levels of ∆RRM1 and ∆RRM2 are too low to be conclusive. In the revision, we have removed the data point of ∆RRM1 and ∆RRM2. Meanwhile, it is clear that ∆RRM1&2 has a better rescuing effect than ∆NLS, when ∆RRM1&2 and ∆NLS are expressed at similar levels. Ssb is a well-known RNA chaperone/RNA helicase. Identifying Ssb is an Xist-binding protein already suggests the functional role of Ssb in XCI. The data of the plasmid rescue experiments further suggests that Ssb is involved in XCI as a RNA chaperone/RNA helicase.

      If it is necessary, we could redo this experiments using a shSsb targeting 3’-UTR or expressing GFP-Ssb immune to shSsb.

      In Figure S7, the expression of the rescue constructs deltaRRM1 and deltaRRM2 is extremely low, yet the authors observe a rescue of the cloud phenotype (fig 6D) from those constructs that reaches almost the level of full length Ssb. This is confusing, and the authors need to address this by performing a western blot to show the protein levels of these rescue constructs and discuss further how such a low level of expression can show a rescue phenotype. The results would also be stronger if the authors examined H3K27me3 and H2AK119ub1 enrichment since they observed decreased overlap of these marks with Xist RNA after Ssb KD. Finally, the authors state that "...all three RNA-binding domains are required for the functionality of La in XCI..." however I have trouble coming to this conclusion based on the above issues. As well, if the authors want to support direct function, they should repeat the RIP experiments with these rescues constructs to show that the domains capable of rescue can still bind to Xist RNA.

      Reviewer 1 raised similar concerns. In Figure 6C, the live cell counts of ∆RRM1 and ∆NLS are about the same. It might be due to the low expression level of ∆RRM1 (Figure S7). It is clear that ∆RRM1&2 has a better rescuing effect than ∆NLS, when ∆RRM1&2 and ∆NLS are expressed as similar levels. To make the data more straight forward, we removed the data point of ∆RRM1 and ∆RRM2, because of their low expression levels.

      As for the Western blot and GFP fluorescence (IF), we have tried both. Neither of them detected GFP signal, reflecting the low expression level of these GFP fusion proteins. The shSsb is not targeting the 5’ or 3’-UTR region, therefore interfering the exogenous Ssb as well. This might be a reason for the low expression of these GFP fusion proteins. If it is necessary, we could redo this experiments using a shSsb targeting 3’–UTR or expressing GFP-Ssb immune to shSsb.

      We deleted the sentence "all three RNA-binding domains are required for the functionality of La in XCI".

      **Minor comments:**

      The authors may want to consider better highlighting the strengths of their "FLAG-out" system. As written, is it difficult to tell how this system sets them apart from the previously published studies referenced in the text, especially as some of these studies used similar crosslinking conditions and cell types. Additionally, the logic and questions the authors pose in the introduction as to why they performed this project are too general and not very strong. For example, the authors mention how might protein machinery may assemble on Xist RNA, and how might Xist RNA may spread on the X chromosome. However neither of these topics are actually addressed in their experiments or discussion. These are interesting questions, but the authors should either discuss them further within the context of their results or take these questions out. It would also be helpful if the authors could better label Figure 4, as it is unclear in the figure itself that Fig 4A is in reference to female cells, but remaining panels are in male cells.

      The inducible XCI in male cells is a valid system to recapitulate the silencing step of XCI. It also provides unique advantages in many experiments, such as ATAC-seq. Meanwhile, we did perform extensive functional analysis on the endogenous XCI process using female cells. However, we do realize that presenting the data of induced XCI in male cells together with the data from female cells is confusing to many readers. We have revised the labelling on Figure 3, 4, 5, 7 S6 and S9 (S5 and S8 before revision).

      To understand “how the protein machinery is assembled by Xist” and “how Xist spreads along its host chromosome territory” are not specifically the initial aims of this study. We removed the sentences from the introduction section. However, we believe Ssb may provide clues for the future studies to fully address these questions, and we did provide the following thoughts in the discussion section:

      “……Secondly, as Ssb is able to utilize ATP to unwind RNA-RNA and RNA-DNA duplex, it may play a more active role in controlling the structural dynamics of Xist in living cells (14, 23). These structural dynamics may be important for recruiting proteins onto the RNA and spreading of the RNA along its host chromosome territory……”

      Reviewer #2 (Significance (Required)):

      I am not convinced the this manuscript, as written, has sufficient novelty. Ssb/La has been previously identified to be an Xist RNA binding protein with older/different approaches. However, there are some interesting observations in this manuscript. Major revisions are necessary.

      We agree with the reviewer that identification of Ssb as an Xist RNA binding protein is not novel. The novelty of our discovery lies in: 1) we developed a new method for isolating lincRNA associated proteins; 2) we confirmed that Ssb is an important player involved in XCI; 3) we showed that Ssb regulates the folding of Xist RNA, consequently the stability of Xist and the formation of Xist cloud.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This manuscript describes a novel "FLAG-out" system, where the authors sought to identify Xist RNA binding proteins. The authors focused on a specific protein found in their screen and also identified in several other screens for Xist RNA binding proteins, Ssb/La, and further characterize the role of this protein in XCI. This manuscript describes the loss of Ssb/La and suggest that it predominately impacts the canonical 'cloud' formation of Xist RNA on the X chromosome during XCI initiation. Further, they determine that loss of Ssb/La decreases Xist RNA half-life and alters folding of Xist RNA transcripts. Based on their findings, the authors propose that Ssb/La functions to directly bind and fold Xist RNA transcripts in a manner that stabilizes Xist RNA, allowing for proper 'cloud' formation and successful initiation of XCI.

      Major comments:

      The authors made an interesting findings that the SLE-relevant autoantigen Ssb/La stabilizes Xist RNA transcripts, and there is some evidence that this occurs by binding and maintaining proper folding of Xist RNA. Despite these intriguing observations, there are many parts of the manuscript that need to be addressed in order to support the authors main conclusions.

      • The most troubling aspect of this manuscript is the persistent use of an artificial XCI system in male cells to draw strong conclusions about the function of Ssb in XCI. This issue is prevalent throughout the manuscript, and I question why the authors chose to perform most of their experiments in male cells when the same experiments can be (and have previously been by other groups) performed in female cells. Using male ESCs and then making conclusions for XCI, which is a female-specific process, is a major concern.

      • Out of the 138 identified binding proteins, the authors chose to only validate three: Mybbp1a, Tardbp, and Ssb/La. The logic for choosing these candidates is weak, and the authors are only able to validate 1 out of 3 of these proteins.

      • Use of the cell death assay is not strong enough to "confirm that La is involved in induced XCI" as stated by the authors. This is a huge overstatement.

      • While the authors observed differences in X-linked gene expression after Ssb KD, they did not examine expression of these genes in after KD of either Mybbp1a or Tardbp. Are the changes observed in these genes specific to Ssb KD? Or could there still be alterations of X-linked gene expression in the non-validated KDs? This experiment should be performed and included in the manuscript, either within Fig 2 or in the supplemental. As well, inclusion of a well characterized positive control, for example Hnrnpu, as comparison to Ssb should be included.

      • The authors perform RIP to validate the interaction of Ssb with Xist, but this is performed in male ES cells with induced Xist RNA and with FLAG-tagged Ssb. Aside from these cells being male, in this system Xist RNA expression is much higher than would be found endogenously. RIP should have been done in female differentiated ESCs if there is in fact a role for XCI.

      • The authors need to include more details in the methods section to explain how the FLAG-Ssb is expressed in these cells, and why the authors chose to use a tagged contrast over endogenous Ssb. Due to these issues the result from this experiment is essentially meaningless and is not convincing of Ssb interaction with Xist RNA. There is no reason RIP cannot be performed in female cells, and the authors should repeat this experiment in the relevant experimental condition. As well, if a validated Ssb antibody exists the authors should perform RIP using the endogenous protein.

      • The authors state in Fig 3A-C that the results of the cell death and differentiation experiments "...support a functional role of La in XCI". The authors state earlier that Ssb is a ubiquitous protein that is embryonic lethal (in both female and males). Based on this, the cell death results shown do not support a functional role of La in XCI as the Ssb KD could be having an indirect affect due to its other developmental functions. This manuscript lacks a direct functional link between Ssb and XCI; more data is necessary.

      • In Fig 3D, the authors perform ATAC-seq in inducible male ES cells. The authors claim that the extremely slight reduction in chromatin compaction of the Ssb KD compared to control iXist "directly connect La to the heterochromatinization of Xi, supporting a functional role of La in XCI". This is also an overstatement based on the minimal, and possibly indirect, change in compaction. The positive control i-detaA-Xist sample has significantly less compaction (and thus significantly higher compaction defect) than the Ssb KD again disputing the claim stated above. It is unclear why performing ATAC-seq is even necessary, as Ssb isn't stated to have a function in regulating chromatin architecture. In addition, why the authors performed ATAC-seq in the artificial male XCI system and not in the F1 female cells, and the N of the experiment is unclear. If the authors want to include the ATAC-seq in further revisions it should be repeated n=3 in the female system.

      • In Fig 6, the authors state in their methods that "The shRNA construct, which worked efficiently against Ssb, was not designed against the 3' UTR of the RNA. Therefore, the shRNA is against some of the rescue plasmid constructs. Nonetheless, transfecting the Ssb knockdown cells with the rescue plasmids should compensate the effect of Ssb knockdown and serve as a rescue assay to study the functional domains of La.". This is troubling and seems like a major experimental issue; the specific rescue constructs that may be impacted by this issue are not stated and should be explicitly mentioned. This becomes more confusing when examining the data from rescue experiments.

      • In Figure S7, the expression of the rescue constructs deltaRRM1 and deltaRRM2 is extremely low, yet the authors observe a rescue of the cloud phenotype (fig 6D) from those constructs that reaches almost the level of full length Ssb. This is confusing, and the authors need to address this by performing a western blot to show the protein levels of these rescue constructs and discuss further how such a low level of expression can show a rescue phenotype. The results would also be stronger if the authors examined H3K27me3 and H2AK119ub1 enrichment since they observed decreased overlap of these marks with Xist RNA after Ssb KD. Finally, the authors state that "...all three RNA-binding domains are required for the functionality of La in XCI..." however I have trouble coming to this conclusion based on the above issues. As well, if the authors want to support direct function, they should repeat the RIP experiments with these rescues constructs to show that the domains capable of rescue can still bind to Xist RNA.

      Minor comments:

      The authors may want to consider better highlighting the strengths of their "FLAG-out" system. As written, is it difficult to tell how this system sets them apart from the previously published studies referenced in the text, especially as some of these studies used similar crosslinking conditions and cell types. Additionally, the logic and questions the authors pose in the introduction as to why they performed this project are too general and not very strong. For example, the authors mention how might protein machinery may assemble on Xist RNA, and how might Xist RNA may spread on the X chromosome. However neither of these topics are actually addressed in their experiments or discussion. These are interesting questions, but the authors should either discuss them further within the context of their results or take these questions out. It would also be helpful if the authors could better label Figure 4, as it is unclear in the figure itself that Fig 4A is in reference to female cells, but remaining panels are in male cells.

      Significance

      I am not convinced the this manuscript, as written, has sufficient novelty. Ssb/La has been previously identified to be an Xist RNA binding protein with older/different approaches. However, there are some interesting observations in this manuscript. Major revisions are necessary.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      Here Ha et al. has further developed their Pumilio RNA tagging methodology for the isolation of UV-crosslinked proteins that are suggested to associate with Xist RNA in mouse embryonic stem cells (mESCs). Within this study the authors claim to have found the Lupus antigen RNA binding protein (La) as a novel Xist interacting partner that influences the efficacy of X-chromosome inactivation (XCI). The authors use a number of different techniques such as qPCR, fluorescent imaging, ATAC-SEQ and SHAPE to show aberration of XCI upon La shRNA knockdown. However, this study has significant flaws in the efficient isolation and validation of Xist associated proteins using their FLAG-out methodology. Furthermore, later experiments predominantly focus on cell death/survival assays, which is somewhat troubling given the essential roles La plays in processes such as cell differentiation and proliferation, ribosome biogenesis, transcriptional control and tRNA maturation. I feel the authors need to robustly address the potential effects La knockdown may be having on their mESCs.

      Major comments:

      -Are the key conclusions convincing?

      My major concern is in their Xist RNA purification. First of all, I couldn't find any data on proving the enrichment of Xist RNA itself in their Pumilio pull-down experiment. It would have been useful to show Xist RNA enrichment before benzonase step. Secondly, it is hard to imagine the protocol would successfully isolated Xist RNA-protein complexes from the cell. An earlier report by Clemson et al., (J Cell Biol., 1996) has shown that majority of Xist RNA is still stuck in the nucleus after nuclear matrix prep protocol using detergent, which is not so different from the authors' protocol. Moreover, the authors used UV crosslink, which would have made even harder to purify Xist RNA without sonication. Thirdly, as the tag is located on 5' of Xist RNA, it is rather surprising to see that Spen is not detected in their pulldown. Spen is one of the main functional interactors with Xist, robustly detected by several previous reports. Similarly, other high-affinity binders of Xist such as hnRNP-K and Ciz1 were also lacking from this screen. Finally, the peptides found associated with FLAG-out Xist are extremely low in comparison with other data using glutaraldehyde or formaldehyde crosslinking. For example, HnRNP-M found in Chu et al 2015 has 1120 peptide counts in differentiated cells. The authors here use HnRNP-M as a baseline for specific interactions and show a total of 6 peptide counts in Xist expressing cells and 5 in i-Empty cells (Supplementary excel sheet 1). Similarly, the La protein of interest in this study has 8 counts in i-FLAG-Xist and 6 counts in i-Empty. I struggle to see how this result indicate specific Xist binding. Worryingly this is the starting rationale for the rest of their experiments, it is hard to therefore accept the rest of their conclusions either.

      The other key conclusion the authors make is from the use of numerous cell death/survival assays for both male and female cell lines. This is extremely troubling in the context of assessing their target protein La. La is involved in multiple RNA maturation events of rRNAs, tRNAs and other polIII transcripts. Furthermore, La has been implicated in binding to the mRNA for Cyclin D1 in both human cells and mouse fibroblasts (NIH/3T3 - male) which show a significant effect on cell proliferation upon siRNA knockdown https://www.nature.com/articles/onc2010425. This, along with the observation that La knock-out blastocysts fail to develop any mice or ES cell lines (male or female) show the effect observed in the authors results is most likely not X-linked cell death https://mcb.asm.org/content/mcb/26/4/1445.full.pdf. The authors need to show that their shRNA KD isn't affecting the proliferation and general fitness of their mESC lines.

      - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      As discussed above, I feel the authors have not clearly demonstrated Xist specific protein enrichment and haven't proven X-linked cell death. Due to the lack of necessary control experiments as discussed below, I feel the notion that La is involved directly in XCI as an RNA chaperone is currently preliminary/speculative.

      - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      I would suggest them to show RT-qPCR results of Xist RNA enrichment from the sample after flagIP before benzonase treatment.

      Also, it would have been more convincing if their negative control construct (i-Empty) would contain 25 copies of PBSb RNA at least.

      In Fig1b, the total amount of proteins loaded on the gel is not equivalent between two lanes. The gel should show equivalent amounts of proteins on the gel. It looks like if the negative control sample had been loaded at the same amount as the one with Xist, the band pattern wouldn't be distinguishable between the two samples. Furthermore, as these samples were used in the following mass spectrometry screen it may suggest that the minimal increase in peptide counts observed in the iXist FLAG-out were due to an increased amount of sample being loaded? No controls are conducted to account for this.

      The authors quantify cell death in figures 2C - E. It seems clear that shSsb 1 and 2 have an effect on cell count even in the absence of Dox. The rescue effect seen upon Dox addition is minimal when compared to Empty + Dox 2D. The authors ∆A-iXist line with and without Ssb KD/Dox would be an informative control on whether the increase in cell survival that they see is X-linked.

      The qPCR results used to validate silencing defects show minor changes in expression and also don't show significant silencing of X-linked genes sufficient for cell death. Could this be because only ~ 50 - 60% of Male iXist cells seem to be expressing in the movies and that this will have an effect on the observed qPCR results? Furthermore, it seems counterintuitive that expression in the Empty male cells increases in 48h compared to 14h. Is this due to cell death and positive selection of cells less able to silence their X-chromosome? How would these data look in the female XX line? How would the data look in a ∆A-iXist line in the presence and absence of shSsb/Dox?

      Confusingly, the male line in Fig 3C shows a drop in live cell count at day 6 of differentiation? Surely given their previous results in Fig 2 the Ssb KD should increase cell viability with +Dox? Ssb KD seems to have an adverse effect on ES cells during extended differentiation protocols. In Figure S1 the authors show ~ 8 - 10% survival of male lines during differentiation. Could the recombination of the Xist sequence around the loxP sites enable the cells to outcompete the dead cells? How would iEmpty and ∆A-iXist cells compare here? Have the differentiated cells been tested for their expression of Xist? Additionally, how are there similar live cell counts for male vs female lines when ~90% of male cells die during differentiation? Were more cells plated at day 4? If so, this would bias the competition of male cell survival and therefore make the male line an inappropriate control. Given the essential role of La during development a control is needed to prove that this death is X-linked in the female 3F1 line. For example, an XO cell line retaining the Cast allele and shSsb expression could show the amount of death caused from shSsb alone independent of X-linked cell death.

      If I understood correctly, the RNA FISH used dsDNA probes ("Sx9") against 40 kb of the X-inactivation centre (Xic). Surely Tsix or other Xic transcripts will also be visible? Can the authors use their RNA FISH to determine the XX or XO status of their cells? In Figure S5 a number of cells appear to show a single pinpoint of transcription. This could either be low levels of Xist transcripts or Xic transcription from an XO line in which the 129 chromosome is missing. It would be best to solely quantify cells which have two x chromosomes and if a significant amount of X chromosomes have been kicked out, this should be discussed and controlled for.

      In Fig6, the authors generated a number of Ssb constructs for a rescue assay. However, these results complicate the matter and raise more questions than they address. It seems odd that the ∆RRM1 does not rescue based on comparison with their putative negative control, ∆NLS. However, the ∆RRM1 + 2 and ∆LAM do rescue the phenotype better than the full length Ssb? This makes no logical sense and highlights the inherent variation in cell viability these generated cell lines seem to show. Following on from this, figure S7 quantifies the GFP tag mRNA levels, depicting all ∆RRM mutants with expression below ~30%? How can ∆RRM1 or 2 be rescuing in this scenario? Have these lines been tested for their XX or XO status? The loss of an X chromosome would lead to a rescue of the cell death phenotype, which is a process known to occur in XX lines that have been cultured for extended periods of time. Could it also be that the cell lines derived are more or less sensitive to exogenous shRNA expression? Also, further validation is needed to assess the efficiency of KD in these lines as theoretically most of these constructs will be targeted by shRNA? What is the endogenous Ssb expression level in these lines? Where in the mRNA sequence are the shRNAs targeted to? Does this make sense on the relative expression levels of ∆RRM1/2 for example? Further testing of GFP expression could also be assessed by quantitative western blot of GFP or even visualised in their RNA FISH/IF samples (Figure S8), currently neither are shown. In addition, some kind of information of stability of each Ssb protein constructs has not been demonstrated.

      For the data shown in Figure 7A and B the authors quantify the % of cells with Xist signal. The authors have already shown a defect in Xist visualisation in Ssb KD. Surely it is plausible to assume a faster loss of Xist signal below background in weaker expressing cells. A more appropriate quantification would be the % loss of Xist signal per cell over time.

      With Figure 7C and D, the samples have been treated with actinomycin D which globally affects the transcription of cells even the PolIII associated genes Ssb is needed to mature. This treatment could have an added effect on cell mortality and function. Data confirming that actinomycin D doesn't affect the cells disproportionately is needed. The difference in half-life could be attributed to such a treatment.

      In summarising the authors claim that La binds Xist to facilitate folding and appropriate spreading of Xist along the X-chromosome. No direct interaction has been shown, CLIP-seq data would resolve this, however I do understand this is a challenging technique. The authors have instead opted for RIP followed by qPCR (Figure S2). However, this process has a greater potential for non-specific recovery of RNAs via indirect binding. Furthermore, qPCR may also amplify the relative abundance of the RNA detected. As multiple nucleolar proteins came down in the mass spec screen and FLAG-Ssb is being over expressed, it is plausible to assume some transient Xist interactions may arise from nucleolar association at which La will be in high abundance. Positive and negative nuclear RNA controls (e.g. 7SK and U1 snRNA respectively) could be used so to determine the amount of non-specific Protein-RNA interactions in their RIP pull downs. Cytoplasmic actin is not an appropriate control as it is cytosolic.

      Other than this the authors may want to probe (via IF) for the presence of La accumulation on the X? Many other know factors such as Ciz1, hnrnpK and PRC1/2 complexes show clear accumulation on the X. If I understand correctly, there are many La antibodies on the market and endogenous levels on the X could be assessed. These antibodies may be useful in IP's and pull downs also.

      -Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      The experiments suggested above are centrally focussed on the cell lines that are currently in the authors possession with maybe exceptions with the ∆A-iXist-shSsb line suggested. However, this should be reasonably quick to obtain given their previous work for this paper. Most experiments suggested will focus on the validation of karyotype, Xist expression, rescue construct expression, further RNA FISH classification and repeating more appropriate positive and negative controls for a number of experiments. In theory this can be obtained relatively simply and quickly from current resources. But with the sheer volume of further experiments that are required here, this may take a significant amount of time. One vital improvement needed is the replication of mass spec data and the validation of Xist specific recovery and protein enrichment. As it stands this manuscript seems to not have any replicates of the FLAG-out methodology and mass spec data. This is troubling given the poor recovery and specificity of the protein samples obtained. Repeating these experiments would be costly in time and also financially. As it stands, I feel this is essential to conclusively validate their target of interest.

      - Are the data and the methods presented in such a way that they can be reproduced?

      The data is presented relatively well, however, it would be beneficial if deailed methods were in the main text and not in a supplementary file. Similarly, more information about the process of differentiation and how cell death/survival was quantified and validated is needed.

      - Are the experiments adequately replicated and statistical analysis adequate?

      In the most part yes, however there seems to be no replicates of the FLAG-out mass spec screen which is worrying given the minimal specificity observed in the current data.

      Minor comments:

      - Specific experimental issues that are easily addressable.

      Unfortunately, the majority of experimental issues need to be addressed with more robust data which are highlighted above. However, some image analysis, quantification and classification can be amended relatively easily. For example, the live-cell imaging data should be quantified as loss of signal as discussed and RNA FISH should be used to classify XX positive cells and the XO cells can be discarded from analysis.

      - Are prior studies referenced appropriately?

      Most papers regarding Xist pull down and biology are discussed and referenced appropriately. However, the role in which La plays during development and its aberrant affects upon KD are seemingly downplayed. I would like to see more discussion of potential defects that could be caused due to globally altering cellular RNA folding.

      - Are the text and figures clear and accurate?

      For the most part, lots of the figures are clear and accurate. Apart from these exceptions.

      1.The Y-axis of Figure 2D is confusing. What does 0.3 as a "sum of area" equate to? 30% of the area was ES cells? This doesn't look to be the case from Fig 2C. Also, how does the intensity of the signal compare? The area may not be a good quantification due to ES cells growing in colonies.

      2.In the Movies S1-7 there are boxes around certain cells and marked with "Figure 5a - c". This seems to be incorrect as figure 5 is currently the IF staining of polycomb marks. I assume this is in relation to Figure 4b-d?

      3.Similarly, in Movies S1-7, the intensities of Xist foci seem by eye to be similar. In the paper it is claimed that the Xist clouds that do form are lower in intensity. Are the Movies depicting the same range of pixel intensities? If not, this should be amended. Similarly, figure 7 seems to show relatively equivalent RNA signal at 0 h?

      4.In figure 4A the data is from female XX cells, this should be highlighted to limit confusion with the male iXist data shown below in 4B-E. It would also be helpful to have the male/female icons (as in figure 3B), for each figure that has images of cells. Currently Figure 4, 5, 7, S5 and S8 are lacking these icons.

      5.No explanation of the Flag-Ssb expression is given for figure S2. Furthermore, is it really necessary to express Flag-Ssb? There are reasonably good antibodies out there for Ssb as this was how it was originally found in Systemic Lupus patients. Also, no data showing the amount of Ssb being overexpressed is shown. This may have big implication to the validity of the RIP-qPCR analysis.

      - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Most of the data is presented reasonably well, but the robustness of the data somewhat retracts from their conclusions. I feel the certainty of their conclusion regarding Xist specific La binding and RNA chaperone activity is still presumptive and should be rewritten unless more robust data can confirm Xist interaction. I would also suggest deciding on the nomenclature for the protein of interest and use either La or Ssb, the continued use of both through the figures and text can get a little confusing to the reader.

      Significance

      - Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      It was a good trial to use PBSb-PUFb system to purify Xist RNA binding proteins, compared to previous reports had used anti-sense oligo purification using complementary sequence to Xist RNA sequences. But currently the purification still needs further validation and repeats to confirm its use. A potential complementary technique could be to isolate Xist directly by using biotinylated probes against the PBSb sequence. The authors further claim the identification of a novel Xist RNA chaperone (La/Ssb) which they say facilitates XCI progression. This would be a novel finding in the field; however, the data is currently not robust enough to support this.

      - Place the work in the context of the existing literature (provide references, where appropriate).

      This work has focused on the development of a milder methodology for purifying Xist RNA during XCI. Others have published similar methodologies predominantly focusing on purifying Xist RNA directly with biotinylated probes (McHugh et al. Minaji et al and Chu et al.). Although this method boasts a milder purification method, it seems to be low yielding in Xist specific proteins. Others have shown a more robust identification of bona fide Xist binding proteins which are currently missing in this manuscript. A recent preprint from the Plath lab has identified new factors involved in XCI during differentiation and their tethering/rescue experiments are far more convincing than the ones shown in this manuscript https://www.biorxiv.org/content/10.1101/2020.03.09.979369v1. The candidate protein Ha et al have identified has multiple roles in developing cells and has shown to be important during mouse development. However, Ha et al do not robustly show that the knockdown of Ssb causes X-linked cell mortality. Alternatively, as would be presumed from Ssb's essential role in many housekeeping short non-coding RNAs, the cell death seems more ubiquitous upon shRNA KD. Therefore, the link the authors are making here are relatively weak.

      - State what audience might be interested in and influenced by the reported findings.

      The audience may be interested in the novel technique and the finding of a novel Xist binding protein.

      - Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      RNA biochemistry and developmental biology