5,361 Matching Annotations
  1. Dec 2021
  2. Nov 2021
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer’s comments

      We thank the three reviewers for their positive comments and constructive feedback. We have addressed the issues raised through additional experiments and text changes which have helped to improve the manuscript. Below, we address the specific points with detailed responses (reviewer comments are provided in italic).

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Rodriguez-Lopez et al describes the analysis of long intergenic non-coding RNA (lincRNA) function in fission yeast using both deletion and overexpression methods. The manuscript is very well presented and provides a wealth of lincRNA functional information for the field. This work is an important advance as there is still very little known about the function of lincRNAs in both normal and other conditions. An impressive array of conditions were assessed here. With a large scale analysis like this there is really not one specific conclusion. The authors conclude that lincRNAs exert their function in specific environmental or physiological conditions. This conclusion is not a novel conclusion, it has been proposed and shown before, but this manuscript provides the experimental proof of this concept on a large scale.

      The lincRNA knock-out library was assessed using a colony size screen, a colony viability screen and cell size and cell cycle analysis. Additionally, a lincRNA over-expression library was assessed by a colony size screen. These different functional analysis methods for lincRNAs were than carried out in a wide variety of conditions to provide a very large dataset for analysis. Overall, the presentation and analysis of the data was easy to follow and informative. Some points below could be addressed to improve the manuscript.

      There were 238 protein coding gene mutants assessed in parallel, to provide functional context, which was a very promising idea. But, unfortunately, the inclusion of 104 protein coding genes of unknown function restricted the use of the protein coding genes in the integrated analysis to connect lincRNAs to a known function using guilt by association.

      Reply: Yes, the unknown coding-gene mutants did certainly not help to provide functional context through guilt by association. These mutants were included to generate functional clues for the unknown proteins and compare phenotype hits with unknown lincRNA mutants. Nevertheless, because the known coding-gene mutants included broadly cover all high-level biological processes (GO slim), we could make several useful functional inferences for certain lincRNAs as discussed.

      The colony viability screen is not described well throughout the manuscript. Firstly, the use of phloxine B dye to determine cell viability needs to be described better when first introduced at the bottom of page 6. What exactly is this viability screen and red colour intensity indicating? Please define what the different levels of red a colony would indicate as far as viability. I assume an increase in red colour indicates more dead cells? So it is confusing that later the output of this assay is described as giving a resistant/sensitive phenotype or higher/lower viability. How can you get a higher viability from an assay that should only detect lower viability? Shouldn't this assay range from viable (no, or low red, colour) to increasing amounts of red indicating increasingly less viability? Figure 4D is also confusing with the "red" and "white" annotations. These should be changed to "lower viability" and "viable" or "not viable" and "viable".

      Reply: The colony-viability screen is described in detail in our recent paper (Kamrad et al, eLife 2020). We have now better explained how phloxine B works to determine cell viability (p. 6). The reviewer’s assumption is correct: an increase in red colour indicates more dead cells. However, all phenotypes reported are relative to wild-type cells under the same condition. Many conditions lead to a general increase in cell death, but some mutants show a lower increase in cell death compared to wild-type cells. These mutants, therefore, have a higher viability than wild-type cells, i.e. they are more resistant than wild-type under the given condition. We have tried to clarify this in the text, including the legend of Fig. 4. We agree that the ‘red’ and ‘white’ annotations in Fig. 4D could be confusing. We have now changed these to ‘low viability’ and ‘high viability’. Again, this is relative to wild-type cells.

      How are you sure that when generating the 113 lincRNA ectopic over-expression constructs by PCR that the sequences you cloned are correct? Simply checking for "correct insert size", as stated in the methods, is not really good practice and these constructs should be fully sequenced to be sure they contain the correct sequence and that constructs have not had mutations introduced by the PCR used for cloning. Without such sequence confirmation one cannot be completely confident that the data produced is specific for a lincRNA over-expression. Additionally, a selection of strains with the overexpression constructs should be tested by qRT-PCR and compared to a non-over-expressing strain to confirm lincRNA overexpression.

      Reply: To minimize errors during PCR amplification, we used the high-fidelity Phusion DNA polymerase which features an >50-fold lower error rate than Taq DNA Polymerase. We had confirmed the insert sequences for the first 17 lincRNAs cloned using Sanger sequencing (but did not report this in the manuscript). We have now checked additional inserts of the overexpression plasmids by Sanger sequencing in 96-well plate-format using a universal forward primer upstream of the cloning site. This high-troughput sequencing produced reliable sequence data for 80 inserts, including full insert sequences for 62 plasmids and the first ~900 bp of insert sequences for 18 plasmids). Of these, only the insert for SPNCRNA.601 showed a sequence error compared to the reference genome: T to C transition in position 559. This mutation could reflect either an error that occurred during cloning or a natural sequence variant among yeast strains (lincRNA sequences are much more variable than coding sequences). So, in general, the PCR cloning accurately preserved the sequence information. We have added this information in the Methods (p. 27-28). Please note that lincRNAs depend much less on primary nucleotide sequence than mRNAs, and a few nucleotide changes are highly unlikely to interfere with lincRNA function.

      Minor comments:

      Page 4, lines 19-20 - "A substantial portion of lincRNAs are actively translated (Duncan and Mata, 2014), raising the possibility that some of them act as small proteins." This sentence does not make sense, lincRNAs can't "act as" small proteins, they can only "code for" small proteins. Wording needs to be changed here.

      Reply: We agree and have changed the wording as suggested.

      Figure 1A is a nice representation but what are the grey dots? Are they all ncRNAs including lincRNAs? This needs to be stated in the legend.

      Reply: The grey dots represent all non-coding RNAs across the three S. pombe chromosomes as described by Atkinson et al., 2018. This has now been clarified in the legend.

      How many lincRNAs are there in total in pombe and what percentage did you delete? These numbers should be stated in the text.

      Reply: There are 1189 lincRNAs and we mutated ~12.6% of them. These numbers are now stated at the end of the Introduction, page 5.

      It would be nice if Supplementary Figure 1 included concentrations or amounts of the conditions used. This info is buried in a Supplementary table and would be better placed here.

      Reply: Supplemental Fig. 1 provides a simple overview for the different conditions and drugs used. For most stresses and drugs, we used multiple different doses. So the figure would become cluttered if we indicated all these concentrations, detracting from the main message. Colleagues who are interested in the different concentration ranges used for specific conditions can readily obtain this information from Supplemental Dataset 1. We have now added a statement in this respect to the legend of Supplemental Fig. 1

      Page 6, last sentence. What is a "biological repeat"? Three distinct deletion strains (ie three different deletion strains made by CRISPR) or one deletion strain used three times?

      Reply: Biological repeat means that one deletion strain was assayed three times independently, each with at least two colonies (technical repeats). In most cases, we had two or more independently generated deletion strains for each lincRNA (using the same or different gRNAs), and we performed at least three biological repeats for each strain. The numbers of independent strains for each lincRNA are provided in Supplemental Dataset 1 (sheet: lincRNA_metadata, column: n_independent_ko_mutants). The total numbers of repeats carried out for each condition after QC filtering are available in Supplemental Dataset 2 (columns: observation_count). We have clarified this on p. 7, and the details are now provided in the Methods on p. 28-29 (deletion mutants) and p. 32 (overexpression mutants).

      There is no mention in the manuscript of how other researchers can get access to the deletion strains and over-expression plasmids.

      Reply: As is usual, all strains and plasmids will be readily available upon request.

      Reviewer #1 (Significance (Required)):

      The production of lincRNA deletion strains and overexpression plasmids, and their analysis under an impressive number of conditions, provides key resources and data for the ncRNA field. This work complements nicely the analysis of protein coding gene deletion strains and provides the tools and data for future mechanistic studies of individual lincRNAs. This work would be of interest to the growing audience of ncRNA researchers in both yeast and other systems.

      Field of expertise:

      Yeast deletion strain construction and analysis, RNA functional analysis

      \*Referee Cross-commenting** *

      Reviewer #3 makes an important point that the stability of each lincRNA over expressed from plasmid is not known and therefore some lincRNAs may not be overexpressed as predicted. RT-qPCR would be required to assess lincRNA expression levels from the plasmids. It also appears that we both agree that it is important to determine the sequence of the cloned lincRNAs in the over expression plasmids.

      Reply: See reply in response to Reviewer 3.

      Reviewer #3 also makes an important point in his review that where it is predicted that a lincRNA deletion influences an adjacent gene in cis then the expression of that gene should be tested.

      Reply: See reply in response to Reviewer 3.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      \*Summary:** *

      The Rodriguez-Lopez manuscript from the Bahler lab present the phenotypical and functional profiling of lincRNA in fission yeast. This is the first large-scale, extensive work of this nature in this model organism and it therefore nicely complement the well-documented examples of lincRNA already reported in S.pombe.

      The work is very solid using seamless genome deletion and overexpression followed by colony-based assay in respone to a very wide set of conditions.

      \*Major comments:** *

      - considering that this is a descriptive work by nature and that the experiments were properly conducted as far as I can judge, I don't have major issues with this paper.

      To me the only thing that is missing is a gametogenesis assay, for two reasons: First, several reported cases of lincRNAs in pombe critically regulates meiosis, and second many of the analysed lincRNAs are upregulated durig meiosis. Figure 6B already points to three obvious candidates. I don't think it would take to much time to look at the deletion and OE in an h90 strain and see the effect of gametogenesis for the entire set or at least the 3 candidates from Figure 6.

      If the already broad set of lincRNAs implicated in meiosis would grow, this would be another evidence that eukaryotic cell differentiation relies on non-coding RNAs even in simpler models.

      Reply: We agree that this is a meaningful analysis to add. We have now deleted the three unstudied lincRNA genes, along with the meiRNA gene, from the sub-cluster of Figure 6B in the homothallic h90 background (to allow self-mating). We have analysed meiosis and spore viability of these four deletion strains together with a wild-type h90 control strain. These experiments indicate that cell mating is normal in the deletion mutants, but meiotic progression is somewhat delayed in SPNCRNA.1154, SPNCRNA.1530 and, most strongly, meiRNA mutants (the latter has been reported before (reviewed by Yamashita 2019). Notably, we detected significant reductions in spore viability for all four deletion mutants compared to the control strain. These results point to roles of SPNCRNA.1154, SPNCRNA.1530, and SPNCRNA.335 in meiotic differentiation, as predicted by the clustering analyses. This is a nice addition to the manuscript. We now report these results on p. 23, with a new Supplemental Figure 10, and describe the experimental procedures in the Methods (p. 34-35).

      \*Minor comments:** *

      - A reference to the recent work of the Rougemaille lab on mamRNA is necessary

      Reply: Yes, we now cite this reference in the Introduction (p. 4).

      - a discussion of the possibility to perfom large-scale genetic interactions searches (as done by Krogan for protein-coding genes) would add to the discussion of futue plans

      Reply: We have added a sentence about the potential of SGA screens in the Conclusions (p. 26).

      Reviewer #2 (Significance (Required)):

      The work unambigously shows that that most of the lincRNAs analyzed exert cellular functions in specific environmental or physiological contexts. This conclusion is critical because the biological relevance this so-called « dark matter » is still debated despite a few well-established cases. This is an important addition to the field and the deep phenotyping work already points to some directions to analyse some of these lincRNA in the context of cell cycle progression, metabolism or meiosis.

      \*Referee Cross-commenting** *

      - I agree with the issues raised by referees 1 and 3 but I am concerned about the added value of a RT-qPCR. First, this is a significant amout of work considering the large set of targets. Second a more importantly, what you ll end up with is a fold change. What will be considered as overexpression? Which threshold? This is why I prefer a biological read-out (a phenotype) because whatever the fold change, it tells us that there is an effect. It is very likely indeed that some targets are not overexpressed because of their rapid degradation. To me, this is the drawback of any large-scale studies.

      - Also, looking at the expression of the adjacent gene in the case of a cis-effect is interesting though this is likely condition-dependent (because most phenotypes appear in specific conditions). So, what would be the conclusion if there is no effect in classical rich media?

      - The sequence of the insert should be specified, I agree. Most likely, it is the sequence available from pombase (this is what I understood) but that should be clarified indeed.

      Reply: Yes, the sequences of the inserts are available from PomBase, and we provide the primer sequences used for cloning in the Supplemental Dataset 1. We have now clarified this in the Methods (p. 27).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this work from the group of Jurg Bahler, the authors take advantage of the high throughput colony-based screen approach they recently developed (Kamrad et al, eLife 2020) to perform a functional profiling analysis on a subset of 150 lincRNAs in fission yeast. Using a seamless CRISPR/Cas9-based method, they created deletion mutants for 141 lincRNAs. In addition, the authors also generated strains ectopically overexpressing 113 lincRNAs from a plasmid (under the control of the strong and inducible nmt1 promoter).

      The viability and growth of all these mutants was then assessed across benign, nutrient, drug and stress conditions (149 conditions for the deletion mutants, 47 conditions for the overexpression). For the deletion mutants, the authors also assayed in parallel mutants of 238 protein-coding genes (PCGs) covering multiple biological processes and main GO classes.

      In benign conditions, deletion of 5 and 10 lincRNAs resulted in a reduced growth phenotype (rich and minimal medium, respectively). Morphological characterization by microscopy also revealed cell size defects for 6 lincRNA mutants (2 shorter, 4 longer). In addition, 27 mutants displayed phenotypes pointing defects in the cell cycle.

      Remarkably, the nutrient/drug/stress conditions revealed more phenotypes, with 60 of the 141 lincRNA mutants showing a growth phenotype in at least one condition, and 25 mutants showing a different viability compared to the wild-type in at least one condition.

      Also remarkable is the observation that 102/113 lincRNA overexpression strain displayed a growth phenotype in at least one condition, 14 lincRNAs showing phenotypes in more than 10 conditions.

      The clustering analyses performed by the authors also provide functional insight for some lincRNAs.

      Overall, this is an important study, well conducted and well presented. Together, the data described by the authors are convincing and highlight that most lincRNAs would function in very particular conditions, and that deletion/inactivation and overexpression are complementary approaches for the functional characterization of lncRNAs. This has been demonstrated here, in a very elegant manner.

      I think this manuscript will be acknowledged as a pioneer work in the field.

      \*A. Major comments** *

      - A.1. Are the key conclusions convincing?

      To my opinion, the key conclusions of this study are convincing.

      - A.2. Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      No. The authors are careful in their claims and conclusions.

      - A.3. Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      This study is based on systematic lincRNA deletion/overexpression.

      - For the deletion strains, I could not find any information about the control of the deletions. Are the authors sure that the targeted lincRNAs were indeed properly deleted?

      Reply: Yes, we had carefully checked the correctness of the deletions using several controls as described by Rodriguez-Lopez et al. 2017. All deletion strains were checked for missing open-reading frames by PCR. For 20 strains, we also sequenced across the deletion scars. We re-checked all strains by PCR after arraying them onto the 384 plates to ensure that no errors occurred during the process. We have now specified this in the Methods (p. 27).

      - For the overexpression, there is only a control of the insert size by PCR. Sanger sequencing would have been preferable to confirm that the targeted lincRNAs were properly cloned, without any mutation. In addition, the authors did not check that the lincRNAs were indeed overexpressed (at least in the benign conditions). Is the overexpression fold similar for all the lincRNAs? Do the 14 lincRNAs showing the most consistent phenotypes in at least 10 conditions display different expression levels than the other lincRNAs?

      - A.4. Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      - Validating the deletion strains requires genomic DNA extraction and then PCR. This is repetitive and tedious, but this control is important, I think. The time needed depends on the possibility of automating the process. I think this is feasible in this lab.

      - Controlling the insert sequence into the overexpression vector requires plasmid DNA (available as it was used for PCR) and one/several primer(s), depending on the insert size. The sequencing itself is usually done by platforms.

      - Analysing lincRNA overexpression at the RNA level requires yeast cultures, RNA extraction and then RT-qPCR. Again, the time needed depends on the possibility of automating the process.

      Reply: We have now checked most overexpression constructs by Sanger sequencing of the inserts as described in response to Reviewer 1. Moreover, we have tested the overexpression levels for eight selected overexpression constructs using RT-qPCR analysis. These eight constructs feature the entire range of associated phenotypes hits, including 3 lincRNAs with the highest number of phenotypes in 14 conditions, 3 with no phenotypes, and 2 with intermediate numbers of phenotypes. The RT-qPCR results show that the lincRNAs were 35- to 2200-fold overexpressed relative to the empty-vector control strain (which expresses the lincRNA at native levels). No clear pattern was evident between expression levels and phenotype hits, e.g. lincRNAs without phenotypes when overexpressed showed similar fold-changes as a lincRNA showing 13 phenotypes. We present these results on p. 21/22 and in the new Supplemental Figure 9A, and describe the experiment in the Methods (p. 28).

      As pointed out by Reviewer 2, these fold changes in expression are actually of limited value compared to the phenotype read-outs. The important result is that we detected phenotypes for over 90% of the overexpression strains, indicating that overexpression generally worked. Given that this is a large-scale study, there might be some lincRNA constructs that are faulty or are not overexpressed. It would not be realistic or meaningful to test all constructs. Any follow-on studies focusing on a specific lincRNAs will need to first validate the large-scale results as is common practice.

      - A.5. Are the data and the methods presented in such a way that they can be reproduced?

      The methods are clearly and extensively explained. If necessary, the reader can find more details about the high-throughput colony-based screen approach in the original paper (Kamrad et al, eLife 2020); a very interesting technical discussions can also be found in the reviewers reports and in the authors response published alongside.

      - A.6. Are the experiments adequately replicated and statistical analysis adequate?

      The experiments are replicated. However, I feel confused regarding the number of replicates used in each analysis.

      In the first part of the Results, it is mentioned that all colony-based phenotyping was performed in at least 3 independent replicates, with a median number of 9 repeats per lincRNAs. In the Methods section, I read that for the high-throughput microscopy and flow cytometry for cell-size and cell-cycle phenotypes, over 80% of the 110 lincRNA mutants screened for cellular phenotypes were assayed in at least 2 independent biological repeats. For the overexpression, I read that each strain was represented by at least 12 colonies across 3 different plates and experiments were repeated at least 3 times. Each condition was assayed in three independent biological repeats, together with control EMM2 plates, resulting in at least 36 data points per strain per condition.

      Perhaps I missed something. If not, could the authors clarify this? In addition, I suggest to indicate the number of replicates used for each lincRNA/condition/assay in Supplemental Dataset 2 (I could only find the information for the Flow Cytometry) and in Supplemental Dataset 6.

      Reply: For all colony-based phenotyping, we performed at least three biological repeats, meaning that the strains were assayed three times independently, each with at least two colonies (technical repeats). In most cases, we had two or more independently generated deletion strains for each lincRNA, and we performed at least three biological repeats for each strain (hence the higher median number of nine repeats per lincRNA). The numbers of independent deletion strains for each lincRNA are provided in Supplemental Dataset 1 (sheet: lincRNA_metadata, column: n_independent_ko_mutants). The total numbers of repeats carried out for each condition after QC filtering are available in Supplemental Dataset 2 (columns: observation_count). We have now clarified this on p. 6, and the details are provided in the Methods on p. 28-29 (for deletion mutants) and p. 32 (for overexpression mutants). For the high-throughput microscopy and flow cytometry experiments, we performed the repeats as described in the text.

      \*B. Minor comments** *

      - B.1. Specific experimental issues that are easily addressable.

      - The pattern of the SPNCRNA.1343 and SPNCRNA.989 mutants is consistent with the idea that these lincRNAs act in cis and that their deletion interferes with the expression of the adjacent tgp1 and atd1 genes, respectively. The authors could easily test by RT-qPCR or Northern Blot that the lincRNA deletion leads to the induction of the adjacent gene. Also, if the hypothesis of the authors is correct, the ectopic expression of these two lincRNAs in trans should not complement the phenotypes of the corresponding mutants. These experiments would reinforce the conclusion of the authors about the specific regulatory effect of the SPNCRNA.1343 and SPNCRNA.989 lincRNAs.

      Reply: It would actually not be as easy as suggested to obtain conclusive results in this respect. For SPNCRNA.1343 and its neighbour, atd1, the mechanisms involved have already been shown in detail based on several mechanistic studies (Ard et al., 2014; Ard and Allshire, 2016; Garg et al., 2018; Shah et al., 2014; 2014; Yague-Sanz et al., 2020). But these studies did require multiple precise genetic constructs and specialized approaches to interrogate the complex regulatory relationships between the overlapping transcripts which can be both positive and negative. As correctly pointed out by Reviewer 2, we do not know the particular conditions where any cis-regulatory interactions take place, and a negative result would not be conclusive. We have interrogated our RNA-seq data obtained under multiple genetic and environmental conditions (Atkinson et al. 2018) to analyse the regulatory relationship between SPNCRNA.1343 and atd1 (studied before) as well as SPNCRNA.989 and tgp1 (proposed in our manuscript). Depending on the specific conditions, both of these gene pairs show positive or negative correlations in expression levels. So it is not possible to just perform the easy experiment as suggested to reach a clear conclusion.

      - Is there any possibility that some nutrient/drug/stress conditions interfere with the expression from the nmt1 promoter?

      Reply: This seems unlikely as this widely used promoter is known to be specifically regulated by thiamine. Consistent with this, we actually detected phenotypes for over 90% of the overexpression strains. But we cannot exclude the possibility that some conditions might interfere with nmt1 function.

      - Supplemental Figure 7 refers to unpublished data from Maria Rodriguez-Lopez. Is this still allowed?

      Reply: These are just control RNA-seq data from wild-type cells growing in rich medium. It does not seem that meaningful, but if required we could submit these data to the European Nucleotide Archive (ENA).

      - Supplemental Figure 8 shows drop assays to validate the growth phenotypes revealed by the screen for lincRNAs of clusters 1 and 3. As admitted by the authors in the text, in most cases, the effects are quite difficult to see to the naked eye. Did the authors consider the possibility to use growth curves (for the lincRNAs/conditions they would like to highlight), which might be more appropriate to visualize weak effects?

      Reply: We have tried a few experiments in liquid medium using our BioLector microfermentor. However, the doses need to be substantially changed for liquid media (in which cells typically are more sensitive than on solid media). So the situation with the altered conditions would become too confusing and could not be used as a direct validation of our results from solid media.

      - B.2. Are prior studies referenced appropriately?

      Yes. The authors could have cited the work of Huber et al (2016) Cell Rep. (PMID: 27292640) as another pioneer study where systematic lncRNA deletion was performed, even if in this case, these were antisense lncRNAs.

      Reply: Agreed, we now cite this paper in the Introduction (p. 4).

      - B.3. Are the text and figures clear and accurate?

      Overall, I found the text and figures clear.

      Reviewer #3 (Significance (Required)):

      Eukaryotic genomes produce thousands of long non-coding RNAs, including lincRNAs which are expressed from intergenic regions and do not overlap PCGs. Several lincRNAs have been extensively studied and characterized, showing that they function in different cellular processes, such as regulation of gene expression, chromatin modification, etc. However, beside these well documented lincRNAs, the function of most lincRNAs remains elusive. In addition, under the standard growth conditions used in labs, many of them are expressed to very low levels, and for the few cases for which it has been tested, the deletion and/or overexpression in trans often failed to display in a detectable phenotype.

      High throughput approaches for lncRNA functional profiling are currently emerging. The lab of Jurg Bahler recently developed a high throughput colony-based screen approach enabling them to quantitatively assay the growth and viability of fission yeast mutants under multiple conditions (Kamrad et al, eLife 2020). Here, they take advantage of this approach to characterize mutants of 150 lincRNAs in fission yeast, including not only deletion mutants generated using the CRISPR/Cas9 technology, but also overexpression mutants, tested in 149 and 47 growth conditions, respectively. This systematic approach allowed the authors to reveal specific phenotypes for a large fraction of the lincRNAs, emphasizing the fact that they are likely to be functional in particular nutrient/drug/stress conditions, acting in cis but also in trans.

      As I wrote in the summary above, I think that this study is important and constitutes a significant contribution in the lncRNA field.

      My field of expertise: long non-coding RNAs, yeast, genetics.

      \*Referee Cross-commenting** *

      I can see that reviewer #1 and I have raised the same concerns about the lack of insert sequencing for the overexpression plasmids, which is crucial to control that the correct lincRNAs were cloned and that no mutation has been introduced by the PCR. We are also both asking for RT-qPCR controls to show that the lincRNAs are indeed overexpressed. Again, this control is very important as many long non-coding RNAs are rapidly degraded by the nuclear and/or ctyoplasmic RNA decay machineries. So expressing a lincRNA from a plasmid, under the control of a strong promoter, does not guarantee increased RNA levels.

      I see that reviewer #2 is asking for a gametogenesis assay. I think it should be limited to the 3 lincRNAs which belong to the same sub-cluster as meiRNA.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this work from the group of Jurg Bahler, the authors take advantage of the high throughput colony-based screen approach they recently developed (Kamrad et al, eLife 2020) to perform a functional profiling analysis on a subset of 150 lincRNAs in fission yeast. Using a seamless CRISPR/Cas9-based method, they created deletion mutants for 141 lincRNAs. In addition, the authors also generated strains ectopically overexpressing 113 lincRNAs from a plasmid (under the control of the strong and inducible nmt1 promoter).

      The viability and growth of all these mutants was then assessed across benign, nutrient, drug and stress conditions (149 conditions for the deletion mutants, 47 conditions for the overexpression). For the deletion mutants, the authors also assayed in parallel mutants of 238 protein-coding genes (PCGs) covering multiple biological processes and main GO classes. In benign conditions, deletion of 5 and 10 lincRNAs resulted in a reduced growth phenotype (rich and minimal medium, respectively). Morphological characterization by microscopy also revealed cell size defects for 6 lincRNA mutants (2 shorter, 4 longer). In addition, 27 mutants displayed phenotypes pointing defects in the cell cycle.

      Remarkably, the nutrient/drug/stress conditions revealed more phenotypes, with 60 of the 141 lincRNA mutants showing a growth phenotype in at least one condition, and 25 mutants showing a different viability compared to the wild-type in at least one condition. Also remarkable is the observation that 102/113 lincRNA overexpression strain displayed a growth phenotype in at least one condition, 14 lincRNAs showing phenotypes in more than 10 conditions.

      The clustering analyses performed by the authors also provide functional insight for some lincRNAs. Overall, this is an important study, well conducted and well presented. Together, the data described by the authors are convincing and highlight that most lincRNAs would function in very particular conditions, and that deletion/inactivation and overexpression are complementary approaches for the functional characterization of lncRNAs. This has been demonstrated here, in a very elegant manner. I think this manuscript will be acknowledged as a pioneer work in the field.

      A. Major comments

      • A.1. Are the key conclusions convincing? To my opinion, the key conclusions of this study are convincing.
      • A.2. Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No. The authors are careful in their claims and conclusions.
      • A.3. Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      This study is based on systematic lincRNA deletion/overexpression.

      • For the deletion strains, I could not find any information about the control of the deletions. Are the authors sure that the targeted lincRNAs were indeed properly deleted?
      • For the overexpression, there is only a control of the insert size by PCR. Sanger sequencing would have been preferable to confirm that the targeted lincRNAs were properly cloned, without any mutation. In addition, the authors did not check that the lincRNAs were indeed overexpressed (at least in the benign conditions). Is the overexpression fold similar for all the lincRNAs? Do the 14 lincRNAs showing the most consistent phenotypes in at least 10 conditions display different expression levels than the other lincRNAs?
      • A.4. Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.
      • Validating the deletion strains requires genomic DNA extraction and then PCR. This is repetitive and tedious, but this control is important, I think. The time needed depends on the possibility of automating the process. I think this is feasible in this lab.
      • Controlling the insert sequence into the overexpression vector requires plasmid DNA (available as it was used for PCR) and one/several primer(s), depending on the insert size. The sequencing itself is usually done by platforms.
      • Analysing lincRNA overexpression at the RNA level requires yeast cultures, RNA extraction and then RT-qPCR. Again, the time needed depends on the possibility of automating the process.
      • A.5. Are the data and the methods presented in such a way that they can be reproduced? The methods are clearly and extensively explained. If necessary, the reader can find more details about the high-throughput colony-based screen approach in the original paper (Kamrad et al, eLife 2020); a very interesting technical discussions can also be found in the reviewers reports and in the authors response published alongside.
      • A.6. Are the experiments adequately replicated and statistical analysis adequate? The experiments are replicated. However, I feel confused regarding the number of replicates used in each analysis.

      In the first part of the Results, it is mentioned that all colony-based phenotyping was performed in at least 3 independent replicates, with a median number of 9 repeats per lincRNAs. In the Methods section, I read that for the high-throughput microscopy and flow cytometry for cell-size and cell-cycle phenotypes, over 80% of the 110 lincRNA mutants screened for cellular phenotypes were assayed in at least 2 independent biological repeats. For the overexpression, I read that each strain was represented by at least 12 colonies across 3 different plates and experiments were repeated at least 3 times. Each condition was assayed in three independent biological repeats, together with control EMM2 plates, resulting in at least 36 data points per strain per condition.

      Perhaps I missed something. If not, could the authors clarify this? In addition, I suggest to indicate the number of replicates used for each lincRNA/condition/assay in Supplemental Dataset 2 (I could only find the information for the Flow Cytometry) and in Supplemental Dataset 6.

      B. Minor comments

      • B.1. Specific experimental issues that are easily addressable.
      • The pattern of the SPNCRNA.1343 and SPNCRNA.989 mutants is consistent with the idea that these lincRNAs act in cis and that their deletion interferes with the expression of the adjacent tgp1 and atd1 genes, respectively. The authors could easily test by RT-qPCR or Northern Blot that the lincRNA deletion leads to the induction of the adjacent gene. Also, if the hypothesis of the authors is correct, the ectopic expression of these two lincRNAs in trans should not complement the phenotypes of the corresponding mutants. These experiments would reinforce the conclusion of the authors about the specific regulatory effect of the SPNCRNA.1343 and SPNCRNA.989 lincRNAs.
      • Is there any possibility that some nutrient/drug/stress conditions interfere with the expression from the nmt1 promoter?
      • Supplemental Figure 7 refers to unpublished data from Maria Rodriguez-Lopez. Is this still allowed?
      • Supplemental Figure 8 shows drop assays to validate the growth phenotypes revealed by the screen for lincRNAs of clusters 1 and 3. As admitted by the authors in the text, in most cases, the effects are quite difficult to see to the naked eye. Did the authors consider the possibility to use growth curves (for the lincRNAs/conditions they would like to highlight), which might be more appropriate to visualize weak effects?
      • B.2. Are prior studies referenced appropriately? Yes. The authors could have cited the work of Huber et al (2016) Cell Rep. (PMID: 27292640) as another pioneer study where systematic lncRNA deletion was performed, even if in this case, these were antisense lncRNAs.
      • B.3. Are the text and figures clear and accurate? Overall, I found the text and figures clear.

      Significance

      Eukaryotic genomes produce thousands of long non-coding RNAs, including lincRNAs which are expressed from intergenic regions and do not overlap PCGs. Several lincRNAs have been extensively studied and characterized, showing that they function in different cellular processes, such as regulation of gene expression, chromatin modification, etc. However, beside these well documented lincRNAs, the function of most lincRNAs remains elusive. In addition, under the standard growth conditions used in labs, many of them are expressed to very low levels, and for the few cases for which it has been tested, the deletion and/or overexpression in trans often failed to display in a detectable phenotype.

      High throughput approaches for lncRNA functional profiling are currently emerging. The lab of Jurg Bahler recently developed a high throughput colony-based screen approach enabling them to quantitatively assay the growth and viability of fission yeast mutants under multiple conditions (Kamrad et al, eLife 2020). Here, they take advantage of this approach to characterize mutants of 150 lincRNAs in fission yeast, including not only deletion mutants generated using the CRISPR/Cas9 technology, but also overexpression mutants, tested in 149 and 47 growth conditions, respectively. This systematic approach allowed the authors to reveal specific phenotypes for a large fraction of the lincRNAs, emphasizing the fact that they are likely to be functional in particular nutrient/drug/stress conditions, acting in cis but also in trans. As I wrote in the summary above, I think that this study is important and constitutes a significant contribution in the lncRNA field.

      My field of expertise: long non-coding RNAs, yeast, genetics.

      Referee Cross-commenting

      I can see that reviewer #1 and I have raised the same concerns about the lack of insert sequencing for the overexpression plasmids, which is crucial to control that the correct lincRNAs were cloned and that no mutation has been introduced by the PCR. We are also both asking for RT-qPCR controls to show that the lincRNAs are indeed overexpressed. Again, this control is very important as many long non-coding RNAs are rapidly degraded by the nuclear and/or ctyoplasmic RNA decay machineries. So expressing a lincRNA from a plasmid, under the control of a strong promoter, does not guarantee increased RNA levels.

      I see that reviewer #2 is asking for a gametogenesis assay. I think it should be limited to the 3 lincRNAs which belong to the same sub-cluster as meiRNA.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The Rodriguez-Lopez manuscript from the Bahler lab present the phenotypical and functional profiling of lincRNA in fission yeast. This is the first large-scale, extensive work of this nature in this model organism and it therefore nicely complement the well-documented examples of lincRNA already reported in S.pombe.

      The work is very solid using seamless genome deletion and overexpression followed by colony-based assay in respone to a very wide set of conditions.

      Major comments:

      • considering that this is a descriptive work by nature and that the experiments were properly conducted as far as I can judge, I don't have major issues with this paper. To me the only thing that is missing is a gametogenesis assay, for two reasons: First, several reported cases of lincRNAs in pombe critically regulates meiosis, and second many of the analysed lincRNAs are upregulated durig meiosis. Figure 6B already points to three obvious candidates. I don't think it would take to much time to look at the deletion and OE in an h90 strain and see the effect of gametogenesis for the entire set or at least the 3 candidates from Figure 6. If the already broad set of lincRNAs implicated in meiosis would grow, this would be another evidence that eukaryotic cell differentiation relies on non-coding RNAs even in simpler models.

      Minor comments:

      • A reference to the recent work of the Rougemaille lab on mamRNA is necessary
      • a discussion of the possibility to perfom large-scale genetic interactions searches (as done by Krogan for protein-coding genes) would add to the discussion of futue plans

      Significance

      The work unambigously shows that that most of the lincRNAs analyzed exert cellular functions in specific environmental or physiological contexts. This conclusion is critical because the biological relevance this so-called « dark matter » is still debated despite a few well-established cases. This is an important addition to the field and the deep phenotyping work already points to some directions to analyse some of these lincRNA in the context of cell cycle progression, metabolism or meiosis.

      Referee Cross-commenting

      • I agree with the issues raised by referees 1 and 3 but I am concerned about the added value of a RT-qPCR. First, this is a significant amout of work considering the large set of targets. Second a more importantly, what you ll end up with is a fold change. What will be considered as overexpression? Which threshold? This is why I prefer a biological read-out (a phenotype) because whatever the fold change, it tells us that there is an effect. It is very likely indeed that some targets are not overexpressed because of their rapid degradation. To me, this is the drawback of any large-scale studies.
      • Also, looking at the expression of the adjacent gene in the case of a cis-effect is interesting though this is likely condition-dependent (because most phenotypes appear in specific conditions). So, what would be the conclusion if there is no effect in classical rich media?
      • The sequence of the insert should be specified, I agree. Most likely, it is the sequence available from pombase (this is what I understood) but that should be clarified indeed.
    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Rodriguez-Lopez et al describes the analysis of long intergenic non-coding RNA (lincRNA) function in fission yeast using both deletion and overexpression methods. The manuscript is very well presented and provides a wealth of lincRNA functional information for the field. This work is an important advance as there is still very little known about the function of lincRNAs in both normal and other conditions. An impressive array of conditions were assessed here. With a large scale analysis like this there is really not one specific conclusion. The authors conclude that lincRNAs exert their function in specific environmental or physiological conditions. This conclusion is not a novel conclusion, it has been proposed and shown before, but this manuscript provides the experimental proof of this concept on a large scale.

      The lincRNA knock-out library was assessed using a colony size screen, a colony viability screen and cell size and cell cycle analysis. Additionally, a lincRNA over-expression library was assessed by a colony size screen. These different functional analysis methods for lincRNAs were than carried out in a wide variety of conditions to provide a very large dataset for analysis. Overall, the presentation and analysis of the data was easy to follow and informative. Some points below could be addressed to improve the manuscript.

      There were 238 protein coding gene mutants assessed in parallel, to provide functional context, which was a very promising idea. But, unfortunately, the inclusion of 104 protein coding genes of unknown function restricted the use of the protein coding genes in the integrated analysis to connect lincRNAs to a known function using guilt by association.

      The colony viability screen is not described well throughout the manuscript. Firstly, the use of phloxine B dye to determine cell viability needs to be described better when first introduced at the bottom of page 6. What exactly is this viability screen and red colour intensity indicating? Please define what the different levels of red a colony would indicate as far as viability. I assume an increase in red colour indicates more dead cells? So it is confusing that later the output of this assay is described as giving a resistant/sensitive phenotype or higher/lower viability. How can you get a higher viability from an assay that should only detect lower viability? Shouldn't this assay range from viable (no, or low red, colour) to increasing amounts of red indicating increasingly less viability? Figure 4D is also confusing with the "red" and "white" annotations. These should be changed to "lower viability" and "viable" or "not viable" and "viable".

      How are you sure that when generating the 113 lincRNA ectopic over-expression constructs by PCR that the sequences you cloned are correct? Simply checking for "correct insert size", as stated in the methods, is not really good practice and these constructs should be fully sequenced to be sure they contain the correct sequence and that constructs have not had mutations introduced by the PCR used for cloning. Without such sequence confirmation one cannot be completely confident that the data produced is specific for a lincRNA over-expression. Additionally, a selection of strains with the overexpression constructs should be tested by qRT-PCR and compared to a non-over-expressing strain to confirm lincRNA overexpression.

      Minor comments:

      Page 4, lines 19-20 - "A substantial portion of lincRNAs are actively translated (Duncan and Mata, 2014), raising the possibility that some of them act as small proteins." This sentence does not make sense, lincRNAs can't "act as" small proteins, they can only "code for" small proteins. Wording needs to be changed here.

      Figure 1A is a nice representation but what are the grey dots? Are they all ncRNAs including lincRNAs? This needs to be stated in the legend.

      How many lincRNAs are there in total in pombe and what percentage did you delete? These numbers should be stated in the text.

      It would be nice if Supplementary Figure 1 included concentrations or amounts of the conditions used. This info is buried in a Supplementary table and would be better placed here.

      Page 6, last sentence. What is a "biological repeat"? Three distinct deletion strains (ie three different deletion strains made by CRISPR) or one deletion strain used three times?

      There is no mention in the manuscript of how other researchers can get access to the deletion strains and over-expression plasmids.

      Significance

      The production of lincRNA deletion strains and overexpression plasmids, and their analysis under an impressive number of conditions, provides key resources and data for the ncRNA field. This work complements nicely the analysis of protein coding gene deletion strains and provides the tools and data for future mechanistic studies of individual lincRNAs. This work would be of interest to the growing audience of ncRNA researchers in both yeast and other systems.

      Field of expertise: Yeast deletion strain construction and analysis, RNA functional analysis

      Referee Cross-commenting

      Reviewer #3 makes an important point that the stability of each lincRNA over expressed from plasmid is not known and therefore some lincRNAs may not be overexpressed as predicted. RT-qPCR would be required to assess lincRNA expression levels from the plasmids. It also appears that we both agree that it is important to determine the sequence of the cloned lincRNAs in the over expression plasmids.

      Reviewer #3 also makes an important point in his review that where it is predicted that a lincRNA deletion influences an adjacent gene in cis then the expression of that gene should be tested.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all reviewers for their thorough assessment and constructive comments.

      For clarity, their comments have been numbered.

      Reviewer #1

      Evidence, reproducibility and clarity:

      Summary:

      Acetylation/Deacetylation controls G1/s transition in budding yeast. The lysine acetyl transferase Esa1 is here shown to play a role, in part via acetylation of the nuclear pore complex basket component Nup60, which stimulates mRNA export.

      Major comments:

      1 • Figure 1C: The curve for esa1-ts in this figure and the curve in the supplementary figure S2B are not similar, while the first shows 10% cells budding after 60 minutes it is about 50% after 60 min in S2B. Another helpful way of presenting the data could be the length of the G1 phase (from cytokinesis to budding) in the WT, esa1-ts, gcn5delta cells over time.

      We thank the reviewer for pointing this out. Indeed, there is some day-to-day variability in the budding kinetics of the temperature-sensitive esa1 mutant, and the text referred to one individual experiment. Therefore, we have changed the text to better reflect the observed variability (p. 7) and added a graph (supplementary Figure S2C) including all individual replicates. This shows that in spite of small differences between experiments, esa1-ts cells always bud slower and less efficiently than wild-type cells. We note that the data cannot be shown in the way suggested (time from cytokinesis to budding, presumably from individual cells) because cells in these experiments were released from a G1 block (after cytokinesis), and samples from cell cultures were imaged at time intervals (and not single cells over time). Time-lapse data of single cells is shown in figure 2E.

      2 • What is the rational of creating the Nup60-KN mutation. Does it prevent acetylation of Nup60, at least by GCN5 and/or esa1?

      The biophysical properties of asparagine resemble those of acetylated lysine. Therefore, the Nup60-KN mutant (lysine 467 to asparagine) is expected to mimic acetylation of Nup60 K467, which was found to be acetylated in earlier studies. Supporting the conclusion that Nup60-KN is indeed an acetyl-mimic, the nup60-KN mutation partially rescues the Start and mRNA export defects on Esa1-deficient cells. We make the rationale of the Nup60-KN mutation clearer in the current version (p. 8).

      3 • Given the much stronger phenotype of the esa1-ts+GCN5 delta condition for G1/S transition as compared to esa1-ts and that GCN5 seems to strongly acetylate Nup60 I do not understand the sole focus on esa1 in the study. The fact that the Nup60-KN cells do not show G1/S transition under esa1-ts+GCN5 delta conditions in experiments presented in Fig. S3 argues that esa1 meaidted acetylation of Nup60 is only one, probably minor aspect of G1/S transition. This should be much balanced discussed.

      We focus on Esa1 because this allows us to dissect the specific role of Nup60 acetylation and mRNA export during the G1/S transition. Of course, Esa1-dependent acetylation of Nup60 is not the only process controlling the G1/S transition, which is regulated at several levels. For example, the concentration of multiple Start activators and inhibitors scales differentially with cell size (PMID: 26390151, 32246903). In addition, daughter-specific factors inhibit Start through a pathway parallel to Nup60 deacetylation (Ace2/Ash1-dependent repression of Cln3 transcription; PMID: 19841732, 19841732). We discuss these studies in the current version (p. 17).

      As for the relative contribution of Esa1 and Gcn5 to the G1/S transition and mRNA export: both of these KATs have overlapping roles in promoting transcription, probably through distinct substrates (such as histone H2 for Gcn5, H4 for Esa1) and this may contribute to their role in Start. Consistent with this, deletion of GCN5 causes a minor delay in transcription of G1/S genes (Kishkevich, Sci. Rep 2019). On the other hand, gnc5 mutants have no detectable mRNA export defects, unlike esa1-ts (our Figure 3E). This suggests that whereas Gcn5 and Esa1 may have overlapping roles in transcription of G1/S genes, Esa1 is more specifically involved in mRNA export. The ability of Nup60-KN to rescue the single mutant esa1 but not the double gcn5 esa1 is consistent with this view: the transcription defects in the double mutant may be so severe as to prevent Start even in the presence of Nup60-KN. We have modified the discussion to mention these points. In addition, we will investigate the transcription defects of esa1 and gcn5 single and double mutants to test this possibility and include the results in a revised version.

      4 • Suppl: Fig 2: I miss the hat1delta+gcn5delta condition.

      We will include the budding index of the hat1 gcn5 double mutant in a revised version.

      Minor comments:

      5 • Figure legend 2C "at least 200 cells were scored": please state number of replicates

      Figure 2C shows RT-qPCR data. The reviewer probably means figure 1C, which shows the budding index of one experiment comparing wild type, esa1, gcn5 and esa1 gcn5 strains. This experiment was repeated 3 times, as is now mentioned in the figure 1 legend.

      6 • Figure 2E: X axis "impor" should be corrected to "import"

      We have corrected this.

      7 • Would Mex67 and/or Mrt2 overexpression recue the esa1-ts and esa1-ts+GCN5 delta phenotype?

      We will include this experiment in a revised version.

      8 • Figure 4 A: The size of the daughter cells in the hos3delta condition seems smaller as compared to esa1-ts. Is this true and can you comment this? Is a premature onset of S phase observed here?

      Since Fig 4A features only wild type and hos3∆ cells, the reviewer is probably referring to esa1-ts cells shown in figure 4B. These two figure panels are not directly comparable: cells in 4A are freely cycling, whereas those in 4B were released from a mitotic arrest using nocodazole. The mitotic arrest was done in order to avoid potentially confounding effects due to inactivation of Esa1 during S phase. However, the arrest also causes daughter cells to grow larger, explaining the size differences pointed out by the reviewer. That being said, it is true that cell size and G1 duration are intimately linked and thus the reviewer question raises a relevant point. We previously showed that although hos3 daughter cells enter S phase prematurely, their size is not significantly different from wild type (Kumar et al., Figure 1d-g). Premature onset of S phase can lead to smaller cell size but this is not the case for hos3 cells, probably due to the slightly faster growth rate of the hos3∆ mutant relative to wild type specifically during S/G2/M phases (Kumar et al., Supplementary Fig. 1b).

      9 • Figure 4D: The still images in figure 2E and 4D do not correspond with the quantitation. E.g. in Fig 2E the esa1ts cells shows Whi5 export at t=81 min, which is according to the shown quantitation unusual late.

      We will modify Figures 2E-4D in a revised version to include cells that export Whi5 at times closer to the median.

      10 • Figure 4B: it is not clear why for the quantitation a different representation is chosen as compared to 4A. It would be better to show the nuclear intensities of mother/daughter as in Figure 4A.

      The reason for the different representation between figures 4A and 4B is that 4A depicts freely cycling cells and in 4B, cells were released from a nocodazole-induced mitotic arrest (as mentioned in our response to point 8). A mitotic arrest perturbs M/D size asymmetries, as daughter cells (but not mothers) continue growing during the arrest, leading to larger nuclear size. In addition, esa1-ts daughters are smaller than wt daughters in this condition, further complicating M/D asymmetries. We thought that in this case, a better metric for protein association with the NPC is the fluorescence intensity relative to a nuclear pore component. We agree that using different types of graphs is confusing, and therefore we have removed M/D comparisons from figure 4A and now represent these data as in figure 4B: the intensity of Sac3 relative to Nup49. Finally, a good control for these experiments is the quantification of total protein levels, which we have added for Sac3. We have also removed Mtr2-GFP data until our analysis of Mtr2 total levels is complete. We hope this simplifies this figure.

      11 • Figure 4D: To strengthen these results, it would be good to perform this assay with esa1-ts Nup60-KN cells as in figure 2a. The release of Whi5-GFP is expected to behave in a similar way to the WT. This would ensure that Nup60 acetylation is a pre-requisite for Whi5 release

      I’m afraid we don't understand this suggestion. Figure 4D shows time-lapse fluorescence microscopy of Whi5 nuclear export when Sac3 is recruited to the nuclear basket. Figure 2a shows western blots of Nup60 acetylation status. Therefore it is not clear how these two assays could be done in similar ways. Perhaps the reviewer refers to a different figure panel. The purpose of the suggested experiment, if we understand properly, is to test whether Nup60 acetylation is required for Whi5 export. This is the hypothesis tested in figure 2D: Whi5-GFP export is delayed in esa1-ts, and this delay is partially rescued in esa1-ts nup60-KN, which mimics acetylation. In fact, the advance in Whi5 export observed in Figure 4D upon Sac3 anchoring to NPC is similar to that observed in a nup60-KN (Figure 2E).

      12 • Page 13 "Finally, we tested whether Esa1 targets Sac3 to G1 nuclei": The effect of esa1 knockdown on Sac3 fit with the story line and the effect esa1 imposes on mRNA export. However targeting of Sac3 which is part of a bigger complex by esa1 is a misleading statement, given that you don't show a proof of direct interactions shown, e.g. by immunoprecipiations.

      We meant to say “we tested whether Esa1 function promotes the localisation of Sac3 to the nuclear basket”. We agree that it is unknown whether this involves direct interactions between Sac3 and Esa1. We have changed the text to make this point clearer.

      13 • Page 18: "Nevertheless, our findings suggest that mammalian nucleoporins may represent a novel category of substrates for KATs and for the multiprotein complexes in which these enzymes reside, with important roles in gene expression." Given that there is little experimental evidence this statement is for my taste too strong. Rather indicate that this is a possibility which needs to be tested...

      We have changed the text as suggested.

      14 • Page 3: "Nuclear pores are macromolecular assemblies composed of approximately 30-50 different Nucleoporins": it is rather approximately 30 different nucleoporins in the species so far analyzed.

      We have corrected this as suggested.

      Significance:

      The concept of acetylation/deacetylation regulation of G1/S transition in budding yeast is very appealing. The specific (and important) contribution of Esa1, especially in comparison to GCN5 and Hat1 remains unclear as well as its precise effect on Nup60. Clarifying this, also in a more balanced way of presentation of discussion, would be of interest for the field.

      My research centers around NPC function.

      Audience: experts in the nuclear structure/function fields and cell cycle regulation.

      A more detailed characterisation of the specific roles of Esa1, Gcn5 and Hat1 in the G1/S transition and mRNA export will be included in a revised version, as mentioned in our response to point 3.

      Reviewer #2

      Evidence, reproducibility and clarity:

      In this manuscript, Gomar-Alba et al. follow up on previous work from the lab that showed that the KDAC Hos3 is targeted to the bud neck and daughter cell nuclear pore complexes in budding yeast where it slows cell cycle progression by influencing gene positioning and nucleo-cytoplasmic transport. Overall, the current manuscript describes a well-conducted study that dissects the role of acetylation and deacetylation on Nup60 during the cell cycle using genetics and microscopy. The authors conclusively identify Esa1 as counteracting Hos3 in the nucleus (Figure 1) and show that part of their effect on cell cycle progression and gene expression is mediated by acetylation of Nup60 at K467 (Figure 2). They also demonstrate that this leads to a differential localization of several mRNA export factors and suggest that deacetylation of Nup60 blocks mRNA export in daughter cells. Although this work is overall carefully done, the last conclusion is still somewhat speculative.

      I have a number of minor suggestions to improve the manuscript, but only one major concern, which revolves around the role of chromatin tethering to NPCs. The authors have shown in their previous paper that this plays a role for CLN2 and it is known that active GAL1 interacts with the nuclear periphery, but in the current manuscript this aspect is largely disregarded although I think it could play a major role in the observed mRNA export phenotypes. Therefore, I think some additional experiments and controls as well as additional analysis are required to substantiate especially the results shown in figure 5.

      Major points:

      1) Figure 2: The authors claim that the mechanism by which Nup60 acetylation promotes cell cycle progression is the enhancement of mRNA export through the NPC. In Figure 2, the authors look at the expression levels of four candidate mRNAs which all show disturbed expression in esa1-ts which is not rescued by the nup60-KN mutation, but expression of the protein of one of these candidates (CLN2) is improved. In their previous paper, the same lab has shown that the CLN2 gene is tethered to the NPC in daughter cells with deacetylated Nup60 and that this is relieved in a Nup60 K467N mutant. I think it would be important here to investigate the protein levels of additional candidates that are not regulated at the level of gene localization. Is it a general effect that protein expression is higher in the nup60KN mutant?

      We agree this is an important point. To establish if Nup60-KN regulates only genes that interact with the NPC (such as CLN2), the reviewer suggests determining the cell cycle levels of proteins encoded by other G1/S genes that do not bind NPCs. The main problem with this approach is that with the exception of CLN2, the nuclear localisation of the (about 200) G1/S regulon genes is not yet known. In addition, establishing connections between mRNA and protein levels during the first cell cycle is only possible for short-lived proteins such as Cln2. For instance, amongst the G1/S genes shown in Figure 2, Cdc21 and Rnr1 have protein half-lives of 10 and 4 h, much longer than the 90-minute yeast cell cycle (PMID 25466257). We think a more direct approach to investigate the connection between gene position and mRNA synthesis / export would be to directly visualise the localisation of single mRNAs upon perturbation of the Nup60 acetylation pathway, using single mRNA labeling techniques (smFISH or PP7). We aim to do this for CLN2 and also for GAL1 (see point 2d of this reviewer). We will attempt these experiments for a revised version of our paper.

      2) Figure 5: In figure 5, the authors investigate the expression of a different inducible RNA (GAL1) to test whether the observed effect on mRNA export is more general. Since this is a crucial point for generalizing the finding, this data needs to be presented in a more convincing manner.

      2a. GAL1 is known to be tethered to the NPC upon transcription. Whether this tethering is affected by the Nup60-KN mutant is unclear, but since Nup60 has been implicated in GAL1 tethering in the literature, this possibility is not unlikely. GAL1 therefore becomes a similar case to CLN2, where it is difficult to disentangle effects directly due to mRNA export from the effects of gene tethering on mRNA transcription and processing. Therefore, this experiment should be repeated with a system that is independent of gene tethering. For example, induction of the GAL promoter via a b-estradiol inducible VP16 transactivator does not seem to induce tethering.

      This is an excellent idea. We are not aware of studies on the localisation of the GAL1 locus induced by a VP16 transactivator, but this was investigated for the HXK1 gene. This subtelomeric gene localises to NPCs in non-glucose carbon sources, and its localisation is perturbed by VP16 transactivation in glucose (PMID: 16760983). We will investigate whether the same is true for GAL1, and if so, perform the suggested experiments.

      2b. The activation kinetics in all mutants analyzed is very different from the wildtype. Therefore, the quantification made in Figure 5C is difficult to interpret. Therefore, it might be more fair to quantify for the mutant strains at an earlier timepoint after activation when the levels are similar to the levels in the wildtype strain. E.g. in the hos3d strain at around 250 min.

      This is a good point - indeed, persistent mother/daughter asymmetry in GAL1 expression in hos3 and nup60-KN mutants could be masked by saturated levels of GFP at late time points. An alternative way to test this is to determine the time of GAL1 induction in mother and daughter cells. We have done this in wild-type and hos3 mutant cells; our results indicate that GAL1 expression occurs first in wildt-type mothers and later in their daughters, whereas it is almost simultaneous in nup60-KN mother/daughter mutant pairs (as shown for a single M-D pair in the new figure 5A). In a revised version, we will include data of GAL1 expression for M-D pairs at different times after galactose addition for cells in figures 5C and 5E.

      2c. Similarly - although not as drastic - , in figure 5E, quantification should be done at a timepoint when the induction level is similar between DMSO and Rapamycin treated samples to make conclusions about differences between mother and daughter cell.

      We agree. See our response to the previous point.

      2d. The major claim of the paper is that mRNA export is inhibited by Nup60 deacetylation. In this figure, the mRNA levels need to be quantified to validate that it is not transcription that is affecting expression.

      We agree. In addition to regulating mRNA export (as suggested by the effect of Sac3 anchoring to NPCs) Nup60 deacetylation may also inhibit GAL1 transcription (directly, and/or indirectly via disruption of Gal1-based transcriptional feedback; PMID 23150580). To directly assess the role of Nup60 acetylation in GAL1 transcription and mRNA export, it would be ideal to determine the levels of GAL1 mRNA in both the nucleus and the cytoplasm, using smFISH and/or PP7 tools, in wild type and in mutants of the Nup60 acetylation pathway as we proposed to do for CLN2 (see our response to point 1 of this reviewer). These or equivalent experiments will be included in a revised version.

      3) The manuscript investigates in detail the effects of a KN mutant, however, a non-acetylatable mutant is not investigated. Is such a mutant viable?

      We have obtained a Nup60-KR mutant, which is predicted to behave as a non-acetylatable mimic, and it is viable. We will describe its phenotype in a revised version.

      Minor comments:

      4) Figure 2E: Is the rescue really specific to daughter cells? The dynamic range in the daughter cells is much higher due to the slower and more heterogenous timepoint of Whi5 export. However, zoom-in on the early timepoints after Whi5 import before the 30 min when 50% of the cells have exported Whi5, might reveal a significant increase of mother cells with shortened time to S phase entry. I suggest that the authors test this possibility. The cells shown in the image panels also suggest that the acetyl mimic might shorten mother cell time to S phase entry. If this is not the case, the authors might want to show a different example cell. Interestingly, it appears from the supplementary figure S5, that while Nup60 K647N partially rescues the export of Whi5, budding does not seem to be different to Nup60 wt. This appears to contradict the budding after alpha factor arrest shown in figure 2.

      We thank the reviewer for this suggestion. Indeed, zooming into the first 30 minutes shows a slight increase in the fraction of nup60-KN mother cells that export Whi5; however this change is not statistically significant when considering the entire cell population (p=0.6017, Mann-Whitney test). Therefore, we will replace the cell shown in figure 2E with a more representative example.

      As for figure S5, the reviewer is correct that in these experiments nup60-KN partially rescues Whi5 export (a marker of Start) but not budding (a downstream event), and this is indeed in variance with the experiment shown in figure 2B. Different experimental conditions may contribute to this apparent discrepancy: as noted in the text, the duration of G1 phase in cells synchronised with alpha factor is not directly comparable with that of freely cycling cells.

      5) Figure 3C: The authors use a truncated version of SAC3 for overexpression, since the full length is toxic (Figure S6A). I think it would be important to include this information in the main text.

      We agree, and have included this information in the main text.

      6) Figure 4B: Is there simply less Sac3 protein in the esa1-ts mutant? Although the authors address this question in figure S9, the very low expression levels of Sac3 may make this difficult to conclude from fluorescence quantification. A Western Blot would be an important control. The relative level of Sac3 still seems to be lower in esa1-ts daughter cells compared to mother cells, but no statistical test is shown.

      We are confident that the total Sac3-GFP levels are sufficient to make accurate comparisons, in both the nucleus and the entire cell. However, we will be happy to include western blot controls for Sac3 total levels in a revised version as the reviewer suggests. As for the levels of Sac3 in M vs D cells: Sac3 is indeed asymmetrically distributed in both wild-type and esa1-ts cells (p

      7) Analysis of mother daughter pairs (e.g. figure 5C): a paired t-test would be appropriate.

      We agree. Results do not change with this new analysis (in fact, p values are even lower for wild-type M-D pairs in figure 5C).

      8) Figure 5A: Can some representative mother-daughter pairs be shown as images for both wt and mutant in the timelapse? It is difficult to see in 5A whether there are any mother daughter pairs.

      We have modified the figure to include clearly identifiable mother-daughter pairs, as requested.

      9) Figure 4C: Please show image of localization of Sac3-GFP-FRB +/- rapamycin to the NPC.

      We have added this.

      Significance:

      This manuscript describes an important advance in understanding the role of non-histone protein modification on the regulation of cell cycle progression and gene expression. It is a logical follow-up on a previous paper from the lab (Kumar et al. 2018) and beautifully builds on this work. It is to my knowledge the first mechanistic description of regulation of nuclear pore complex function by a post-translational modification. This will therefore be a very interesting paper for anyone interested in nuclear pore complex regulation and biology, non-histone protein acetylation, asymmetric cell division, and cell cycle regulation.

      Reviewer #3

      Evidence, reproducibility and clarity:

      The pre-print is dedicated to mRNA export and G1/S transition control in mother and daughter cells of budding yeasts through acetylation/deacetylation of nuclear pore component Nup60 (hsNup153). In particular, authors found that Esa1(hsTip60/KAT5) acetylates the basket nucleoporin Nup60, and this event promotes recruitment of mRNA export factors to the nuclear basket and export of polyA RNA to the cytosol. This export event promotes entry of cells into S phase; in particular, Nup60 is deacetylated by histone deacetylase Hos3 that displaces mRNA export complexes from the NPC and inhibits Start specifically in daughter cells.

      The manuscript is a well-designed and well-written study.

      Please, see my major and minor suggestions below:

      Major comments:

      1. P4-5. "deacetylation of the nuclear basket nucleoporin Nup60 does not affect Whi5 nuclear accumulation". I was confused by this statement because, in the previous article Kumar et al., 2018, both main text and abstract have the following phase "nuclear basket and central channel nucleoporins establish daughter-cell-specific nuclear accumulation of the transcriptional repressor Whi5.." Could you please address this discrepancy?

      Thank you for pointing this out. We should have written: “deacetylation of Nup60 does not strongly affect Whi5 nuclear accumulation”. The Kumar et al. paper shows that deacetylation of central channel nucleoporins (such as Nup49) is important to increase accumulation of Whi5 in daughter cells, whereas deacetylation of the basket nucleoporin Nup60 plays a relatively minor role (see Kumar et al, Figure 7c). We have corrected this in the main text.

      Fig.2A: In addition to increased Nup60 acetylation, I noticed an overall increased level of Nup60 after overexpression of Esa1 and Gcn5. Is it a statistically significant increase in the Nup60 level? It is not mentioned in the main text or figure legend. Does the acetylation level of Nup60 influence its stability?

      We don’t know if acetylation of Nup60 affects its stability, although it is an intriguing possibility. Although it´s true that Nup60 levels in the IP fraction seem to increase upon Esa1 and Gcn5 overexpression, nuclear levels of Nup60-mCherry are similar in wild-type, hos3∆ and nup60-KN (Supplementary Figure S11A). Therefore it is unlikely that changes in Nup60 acetylation affect its stability. We have added this information to the text.

      Authors determined the mRNA level of four representative genes in esa1-ts and esa1-ts nup60-KN cultures.

      3a. Do authors know if Nu60-KN expression affects the perinuclear positioning of these transcripts?

      We did not investigate the localisation of individual transcripts in this study. However, as mentioned in our replies to reviewer 2, we propose to do so for the CLN2 and GAL1 mRNAs, in order to test directly the effect of Nup60 acetylation in the positioning of specific mRNAs.

      3b.I also suggest authors investigate if Nup60-KN affects other transcripts using the RNAseq approach. Nup60-KN might improve the transcription output of other transcripts and it will be interesting to know if these transcripts share similar features.

      We agree that investigating the impact of Nup60 acetylation in mRNA synthesis genome-wide is an exciting challenge. We speculate that Nup60-KN is likely to have some effect in transcription, either directly or indirectly through perturbation of feedback regulatory loops caused by mRNA export defects (for instance, transcription of both CLN2 and GAL1 is regulated by positive feedback). However we think that these experiments are beyond the scope of our study, which is focused on mRNA export.

      3c. Do authors know if GAL1pr:HOS3-NLS expression affects specifically G1-dependent transcripts?

      Answering this question would require RNA sequencing experiments. As mentioned in the previous point, we think these are beyond the scope of our study. That being said, it is likely that the Hos3-Nup60 pathway downregulates gene expression during G1, because Nup60 deacetylation is largely restricted to this phase. Note that this is not the same as regulating expression of the G1/S regulon specifically, because Hos3 also regulates GAL1 expression (Figure 5). We mention this important point in the discussion (p. 17).

      3d. Another interesting question will be to define if there is a group of transcripts that respond specifically to the status of Nup60 acetylation during G1/S transition. Is it possible to make ts-driven Nup60-KN expression to turn in ON/OFF? However, this question is beyond the scope of this paper.

      Thank you for this interesting suggestion. The proposed experiment is technically possible (for example, expression of Nup60-KN could be induced in G1 using a GAL1 promoter, followed by RNA sequencing). We agree that this is beyond the scope of our paper but would like to explore the question in future studies.

      1. Fig.2D It is not mentioned that Cln2 is not cycling anymore upon Nup60-KN overexpression.

      The Cln2 protein peaks at 30 minutes in this experiment, and is degraded at approximately 120 minutes. This corresponds to the slow, incomplete G1/S transition wave of the esa1-ts nup60-KN mutant, as indicated in the budding index at the bottom of the panel. We added this in the figure 2 legend. Note that Nup60-KN is not overexpressed, since the KN mutation is inserted in the endogenous gene under the control of its native promoter.

      Fig.2E. Arrows indicating Whi5 export timing do not match to the numbers in the main text. For example, yellow arrows indicate Whi5 export in wt strain at 30 and 78 min, but it is stated 15 and 59 min in the text. Also, do I understand right that Whi5-mCherry is not visible in the cytosol?

      See our reply to reviewer 2, point 4: we will replace the cell shown in figure 2E with a more representative example. As for Whi5-mCherry, it is visible in the cytoplasm but only weakly (since it is diluted into the larger cytoplasmic volume), and not at all in the images shown due to the overlay with the brightfield channel.

      Did the authors analyze where SAC3 and MTR2 are localized in hos3del, Nup60KN, and Esa-ts strains once their localization was affected in the nucleus? Is the overall level Sac3 level is affected in hos3del and Nup60KN strains?

      We have imaged the localisation of Sac3-GFP and Mtr2-GFP during the whole cycle using time-lapse microscopy. Our impression is that in wild type cells, their perinuclear levels increase during S phase in daughter cells, which mirrors the increase in Nup60 acetylation. In contrast, Sac3 and Mtr2 perinuclear levels seem more stable in hos3 and nup60-KN cells. We will include these analyses in a revised version. The total level of Sac3 is not affected, as shown in the updated figure 4; see our reply to reviewer 2, point 6.

      Fig4C. "Sac3-GFP-FRB partitioned equally to M and D nuclei, in the presence of Nup60-mCherry-FKBP and rapamycin (Figure 4C)." Sac3-GFP-FRB is slightly elevated in mother cells. Did you run a statistical test between the first and the third column on the box plot?

      Comparing the first and third columns in Fig 4C (Nup60 and Sac3 in control cells) shows that the mother cell accumulation is higher for Sac3 than for Nup60 (p

      P15. "GAL1 expression levels were higher in wild-type mother cells than in their daughter, and these differences were absent in cells lacking Hos3 or expressing Nup60KN". GAL1-10 promoter contains information necessary and sufficient for recruitment to the nuclear periphery (PMID: 27489341). I wonder if GAL1pr-driven transgenes of HOS3, spt10, hat1, and etc., contain DNA sequences sufficient for targeting genes to the nuclear periphery, and these genes are asymmetrically expressed in mother and daughter cells because of the presence of GAL1pr?

      We agree that these genes may be expressed at different levels in mother and daughter cells. We don’t think this asymmetric expression affects our conclusions. Indeed, the phenotypes scored (growth on plates) apply to the population and not to individual cells. The one exception is figure 3D, in which mRNA nuclear accumulation is scored in single cells. In this case, it remains possible that some of the variability observed corresponds to differences between mothers and daughters. In this case, our measurements could under-estimate the effect of Hos3-NLS in inhibition of mRNA export. However, since we cannot differentiate M and D cells in this experiment, we prefer not to speculate on this possibility in the text.

      Minor comments:

      1. Supplementary Fig. S1, it will be easy to read cell viability assays if 1A, S1A and S1B figures have the same orientation.

      We have changed the figure as suggested.

      Could you please clarify the difference between HOS3-NLS and GAL1pr:HOS3-NLS in the text of figure legend? P.33

      We have fixed this (figure 1 legend).

      P6. I recommend adding the following sentence to help clarity of the text: "To understand how NPC acetylation regulates the G1/S transition (Start), we sought to identify the lysine acetyl-transferases (KATs) counteracting the activity of the Hos3 deacetylase. Hos3 displays asymmetric distribution between mother and daughter cells in wild type Saccharomyces cerevisiae. Overexpression of a version of Hos3 fused to a nuclear localization signal (GAL1pr-HOS3-NLS) leads to targeting of Hos3 to mother and daughter cell nuclei, deacetylation of nucleoporins, and inhibition of cell proliferation (Kumar et al, 2018)."

      We thank the reviewer for this suggestion. This has been added.

      P8. Misspelling: Though Nup60 acetylation

      This has been fixed.

      FigS7. Description of polyA distribution is missing for single gcn5del strain.

      Thank you for pointing this out. This has been added.

      Misspelling: We conclude that Esa1 and Nup60 acetylation promotes Start, at least in part, by targeting Sac3 to the nuclear basket, where it mediates mRNA export.

      This has been fixed.

      Significance

      Authors of this pre-print overview and try to resolve a fundamental and not well-studied question about NPC acetylation status and S phase entry. This work is a logical extension of their previously published work (PMID: 29531309). However, this study for the first-time links status of NPC acetylation to mRNA export through lysine acetyl transferases. It will be interesting to address this question in mammalian cells considering interaction of basket nucleoporins with Tip60/KAT5 (PMID: 24302573).

      This work might be of interest to researchers investigating RNA export, transcription regulation, and nuclear pores.

      My fields of expertise are RNA export, nucleoporins, transcription regulation.

      I do not have expertise to evaluate yeast strains used in this study.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The pre-print is dedicated to mRNA export and G1/S transition control in mother and daughter cells of budding yeasts through acetylation/deacetylation of nuclear pore component Nup60 (hsNup153). In particular, authors found that Esa1(hsTip60/KAT5) acetylates the basket nucleoporin Nup60, and this event promotes recruitment of mRNA export factors to the nuclear basket and export of polyA RNA to the cytosol. This export event promotes entry of cells into S phase; in particular, Nup60 is deacetylated by histone deacetylase Hos3 that displaces mRNA export complexes from the NPC and inhibits Start specifically in daughter cells.

      The manuscript is a well-designed and well-written study.

      Please, see my major and minor suggestions below:

      Major comments:

      1. P4-5. "deacetylation of the nuclear basket nucleoporin Nup60 does not affect Whi5 nuclear accumulation". I was confused by this statement because, in the previous article Kumar et al., 2018, both main text and abstract have the following phase "nuclear basket and central channel nucleoporins establish daughter-cell-specific nuclear accumulation of the transcriptional repressor Whi5.." Could you please address this discrepancy?
      2. Fig.2A: In addition to increased Nup60 acetylation, I noticed an overall increased level of Nup60 after overexpression of Esa1 and Gcn5. Is it a statistically significant increase in the Nup60 level? It is not mentioned in the main text or figure legend. Does the acetylation level of Nup60 influence its stability?
      3. Authors determined the mRNA level of four representative genes in esa1-ts and esa1-ts nup60-KN cultures. Do authors know if Nu60-KN expression affects the perinuclear positioning of these transcripts? I also suggest authors investigate if Nup60-KN affects other transcripts using the RNAseq approach. Nup60-KN might improve the transcription output of other transcripts and it will be interesting to know if these transcripts share similar features. Do authors know if GAL1pr:HOS3-NLS expression affects specifically G1-dependent transcripts?

      Another interesting question will be to define if there is a group of transcripts that respond specifically to the status of Nup60 acetylation during G1/S transition. Is it possible to make ts-driven Nup60-KN expression to turn in ON/OFF? However, this question is beyond the scope of this paper.

      1. Fig.2D It is not mentioned that Cln2 is not cycling anymore upon Nup60-KN overexpression.
      2. Fig.2E. Arrows indicating Whi5 export timing do not match to the numbers in the main text. For example, yellow arrows indicate Whi5 export in wt strain at 30 and 78 min, but it is stated 15 and 59 min in the text. Also, do I understand right that Whi5-mCherry is not visible in the cytosol?
      3. Did the authors analyze where SAC3 and MTR2 are localized in hos3del, Nup60KN, and Esa-ts strains once their localization was affected in the nucleus? Is the overall level Sac3 level is affected in hos3del and Nup60KN strains?
      4. Fig4C. "Sac3-GFP-FRB partitioned equally to M and D nuclei, in the presence of Nup60-mCherry-FKBP and rapamycin (Figure 4C)." Sac3-GFP-FRB is slightly elevated in mother cells. Did you run a statistical test between the first and the third column on the box plot?
      5. P15. "GAL1 expression levels were higher in wild-type mother cells than in their daughter, and these differences were absent in cells lacking Hos3 or expressing Nup60KN". GAL1-10 promoter contains information necessary and sufficient for recruitment to the nuclear periphery (PMID: 27489341). I wonder if GAL1pr-driven transgenes of HOS3, spt10, hat1, and etc., contain DNA sequences sufficient for targeting genes to the nuclear periphery, and these genes are asymmetrically expressed in mother and daughter cells because of the presence of GAL1pr?

      Minor comments:

      1. Supplementary Fig. S1, it will be easy to read cell viability assays if 1A, S1A and S1B figures have the same orientation.
      2. Could you please clarify the difference between HOS3-NLS and GAL1pr:HOS3-NLS in the text of figure legend? P.33
      3. P6. I recommend adding the following sentence to help clarity of the text: "To understand how NPC acetylation regulates the G1/S transition (Start), we sought to identify the lysine acetyl-transferases (KATs) counteracting the activity of the Hos3 deacetylase. Hos3 displays asymmetric distribution between mother and daughter cells in wild type Saccharomyces cerevisiae. Overexpression of a version of Hos3 fused to a nuclear localization signal (GAL1pr-HOS3-NLS) leads to targeting of Hos3 to mother and daughter cell nuclei, deacetylation of nucleoporins, and inhibition of cell proliferation (Kumar et al, 2018)."
      4. P8. Misspelling: Though Nup60 acetylation
      5. FigS7. Description of polyA distribution is missing for single gcn5del strain.
      6. Misspelling: We conclude that Esa1 and Nup60 acetylation promotes Start, at least in part, by targeting Sac3 to the nuclear basket, where it mediates mRNA export.

      Significance

      Authors of this pre-print overview and try to resolve a fundamental and not well-studied question about NPC acetylation status and S phase entry. This work is a logical extension of their previously published work (PMID: 29531309). However, this study for the first-time links status of NPC acetylation to mRNA export through lysine acetyl transferases. It will be interesting to address this question in mammalian cells considering interaction of basket nucleoporins with Tip60/KAT5 (PMID: 24302573).

      This work might be of interest to researchers investigating RNA export, transcription regulation, and nuclear pores.

      My fields of expertise are RNA export, nucleoporins, transcription regulation.

      I do not have expertise to evaluate yeast strains used in this study.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Gomar-Alba et al. follow up on previous work from the lab that showed that the KDAC Hos3 is targeted to the bud neck and daughter cell nuclear pore complexes in budding yeast where it slows cell cycle progression by influencing gene positioning and nucleo-cytoplasmic transport. Overall, the current manuscript describes a well-conducted study that dissects the role of acetylation and deacetylation on Nup60 during the cell cycle using genetics and microscopy. The authors conclusively identify Esa1 as counteracting Hos3 in the nucleus (Figure 1) and show that part of their effect on cell cycle progression and gene expression is mediated by acetylation of Nup60 at K467 (Figure 2). They also demonstrate that this leads to a differential localization of several mRNA export factors and suggest that deacetylation of Nup60 blocks mRNA export in daughter cells. Although this work is overall carefully done, the last conclusion is still somewhat speculative.

      I have a number of minor suggestions to improve the manuscript, but only one major concern, which revolves around the role of chromatin tethering to NPCs. The authors have shown in their previous paper that this plays a role for CLN2 and it is known that active GAL1 interacts with the nuclear periphery, but in the current manuscript this aspect is largely disregarded although I think it could play a major role in the observed mRNA export phenotypes. Therefore, I think some additional experiments and controls as well as additional analysis are required to substantiate especially the results shown in figure 5.

      Major points:

      1) Figure 2: The authors claim that the mechanism by which Nup60 acetylation promotes cell cycle progression is the enhancement of mRNA export through the NPC. In Figure 2, the authors look at the expression levels of four candidate mRNAs which all show disturbed expression in esa1-ts which is not rescued by the nup60-KN mutation, but expression of the protein of one of these candidates (CLN2) is improved. In their previous paper, the same lab has shown that the CLN2 gene is tethered to the NPC in daughter cells with deacetylated Nup60 and that this is relieved in a Nup60 K467N mutant. I think it would be important here to investigate the protein levels of additional candidates that are not regulated at the level of gene localization. Is it a general effect that protein expression is higher in the nup60KN mutant?

      2) Figure 5: In figure 5, the authors investigate the expression of a different inducible RNA (GAL1) to test whether the observed effect on mRNA export is more general. Since this is a crucial point for generalizing the finding, this data needs to be presented in a more convincing manner.

      a. GAL1 is known to be tethered to the NPC upon transcription. Whether this tethering is affected by the Nup60-KN mutant is unclear, but since Nup60 has been implicated in GAL1 tethering in the literature, this possibility is not unlikely. GAL1 therefore becomes a similar case to CLN2, where it is difficult to disentangle effects directly due to mRNA export from the effects of gene tethering on mRNA transcription and processing. Therefore, this experiment should be repeated with a system that is independent of gene tethering. For example, induction of the GAL promoter via a b-estradiol inducible VP16 transactivator does not seem to induce tethering.

      b. The activation kinetics in all mutants analyzed is very different from the wildtype. Therefore, the quantification made in Figure 5C is difficult to interpret. Therefore, it might be more fair to quantify for the mutant strains at an earlier timepoint after activation when the levels are similar to the levels in the wildtype strain. E.g. in the hos3d strain at around 250 min.

      c. Similarly - although not as drastic - , in figure 5E, quantification should be done at a timepoint when the induction level is similar between DMSO and Rapamycin treated samples to make conclusions about differences between mother and daughter cell.

      d. The major claim of the paper is that mRNA export is inhibited by Nup60 deacetylation. In this figure, the mRNA levels need to be quantified to validate that it is not transcription that is affecting expression.

      3) The manuscript investigates in detail the effects of a KN mutant, however, a non-acetylatable mutant is not investigated. Is such a mutant viable?

      Minor comments:

      4) Figure 2E: Is the rescue really specific to daughter cells? The dynamic range in the daughter cells is much higher due to the slower and more heterogenous timepoint of Whi5 export. However, zoom-in on the early timepoints after Whi5 import before the 30 min when 50% of the cells have exported Whi5, might reveal a significant increase of mother cells with shortened time to S phase entry. I suggest that the authors test this possibility. The cells shown in the image panels also suggest that the acetyl mimic might shorten mother cell time to S phase entry. If this is not the case, the authors might want to show a different example cell. Interestingly, it appears from the supplementary figure S5, that while Nup60 K647N partially rescues the export of Whi5, budding does not seem to be different to Nup60 wt. This appears to contradict the budding after alpha factor arrest shown in figure 2.

      5) Figure 3C: The authors use a truncated version of SAC3 for overexpression, since the full length is toxic (Figure S6A). I think it would be important to include this information in the main text.

      6) Figure 4B: Is there simply less Sac3 protein in the esa1-ts mutant? Although the authors address this question in figure S9, the very low expression levels of Sac3 may make this difficult to conclude from fluorescence quantification. A Western Blot would be an important control. The relative level of Sac3 still seems to be lower in esa1-ts daughter cells compared to mother cells, but no statistical test is shown.

      7) Analysis of mother daughter pairs (e.g. figure 5C): a paired t-test would be appropriate.

      8) Figure 5A: Can some representative mother-daughter pairs be shown as images for both wt and mutant in the timelapse? It is difficult to see in 5A whether there are any mother daughter pairs.

      9) Figure 4C: Please show image of localization of Sac3-GFP-FRB +/- rapamycin to the NPC.

      Significance

      This manuscript describes an important advance in understanding the role of non-histone protein modification on the regulation of cell cycle progression and gene expression. It is a logical follow-up on a previous paper from the lab (Kumar et al. 2018) and beautifully builds on this work. It is to my knowledge the first mechanistic description of regulation of nuclear pore complex function by a post-translational modification. This will therefore be a very interesting paper for anyone interested in nuclear pore complex regulation and biology, non-histone protein acetylation, asymmetric cell division, and cell cycle regulation.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Acetylation/Deacetylation controls G1/s transition in budding yeast. The lysine acetyl transferase Esa1 is here shown to play a role, in part via acetylation of the nuclear pore complex basket component Nup60, which stimulates mRNA export.

      Major comments:

      • Figure 1C: The curve for esa1-ts in this figure and the curve in the supplementary figure S2B are not similar, while the first shows 10% cells budding after 60 minutes it is about 50% after 60 min in S2B. Another helpful way of presenting the data could be the length of the G1 phase (from cytokinesis to budding) in the WT, esa1-ts, gcn5delta cells over time.

      • What is the rational of creating the Nup60-KN mutation. Does it prevent acetylation of Nup60, at least by GCN5 and/or esa1?

      • Given the much stronger phenotype of the esa1-ts+GCN5 delta condition for G1/S transition as compared to esa1-ts and that GCN5 seems to strongly acetylate Nup60 I do not understand the sole focus on esa1 in the study. The fact that the Nup60-KN cells do not show G1/S transition under esa1-ts+GCN5 delta conditions in experiments presented in Fig. S3 argues that esa1 meaidted acetylation of Nup60 is only one, probably minor aspect of G1/S transition. This should be much balanced discussed.

      • Suppl: Fig 2: I miss the hat1delta+gcn5delta condition.

      Minor comments:

      • Figure legend 2C "at least 200 cells were scored": please state number of replicates

      • Figure 2E: X axis "impor" should be corrected to "import"

      • Would Mex67 and/or Mrt2 overexpression recue the esa1-ts and esa1-ts+GCN5 delta phenotype?

      • Figure 4 A: The size of the daughter cells in the hos3delta condition seems smaller as compared to esa1-ts. Is this true and can you comment this? Is a premature onset of S phase observed here?

      • Figure 4D: The still images in figure 2E and 4D do not correspond with the quantitation. E.g. in Fig 2E the esa1ts cells shows Whi5 export at t=81 min, which is according to the shown quantitation unusual late.

      • Figure 4B: it is not clear why for the quantitation a different representation is chosen as compared to 4A. It would be better to show the nuclear intensities of mother/daughter as in Figure 4A.

      • Figure 4D: To strengthen these results, it would be good to perform this assay with esa1-ts Nup60-KN cells as in figure 2a. The release of Whi5-GFP is expected to behave in a similar way to the WT. This would ensure that Nup60 acetylation is a pre-requisite for Whi5 release

      • Page 13 "Finally, we tested whether Esa1 targets Sac3 to G1 nuclei": The effect of esa1 knockdown on Sac3 fit with the story line and the effect esa1 imposes on mRNA export. However targeting of Sac3 which is part of a bigger complex by esa1 is a misleading statement, given that you don't show a proof of direct interactions shown, e.g. by immunoprecipiations.

      • Page 18: "Nevertheless, our findings suggest that mammalian nucleoporins may represent a novel category of substrates for KATs and for the multiprotein complexes in which these enzymes reside, with important roles in gene expression." Given that there is little experimental evidence this statement is for my taste too strong. Rather indicate that this is a possibility which needs to be tested...

      • Page 3: "Nuclear pores are macromolecular assemblies composed of approximately 30-50 different

      • Nucleoporins": it is rather approximately 30 different nucleoporins in the species so far analyzed.

      Significance

      The concept of acetylation/deacetylation regulation of G1/S transition in budding yeast is very appealing. The specific (and important) contribution of Esa1, especially in comparison to GCN5 and Hat1 remains unclear as well as its precise effect on Nup60. Clarifying this, also in a more balanced way of presentation of discussion, would be of interest for the field.

      My research centers around NPC function.

      Audience: experts in the nuclear structure/function fields and cell cycle regulation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      I found this an exceptionally impressive manuscript. The evolution of Y chromosomes has until recently been nearly impossible, and this research group have pioneered approaches that can yield reliable results in Drosophila. The study used an innovative heterochromatin-sensitive assembly pipeline on three D. simulans clade species, D. simulans, D. mauritiana and D. sechellia, which diverged less than 250 KYA, allowing comparisons with the group's previous results for the D. melanogaster Y.

      The study is both technically impressive and extremely interesting (an highly unusual combination). It includes a rich set of interesting results about these genome regions, and furthermore the results are discussed in a well-organised way, relating both to previous observations and to understanding of the genetics and evolution of Y chromosomes, illuminating all these aspects. It is a rare pleasure to read such a study. I believe that this study will inspire and be a model for future work on these chromosomes. It shows how these difficult genome regions can be studied.

      Thank you for the positive evaluation of our paper. While we did not make any specific revisions in response to these comments, we did attempt to improve the writing.

      **Major comments:**

      The conclusions are convincing. The methods are explained unusually clearly, and the reasoning from the results is convincing. When appropriate, the caveats, the caveats are clearly explained. The material is clearly organised and the questions studied are well related to the results. I had a few minor comments concerning the English. Even the figure (often a major problem to understand) are very clear and helpful, with proper explanations. I have very rarely read such a good manuscript, and almost never (in a long career) found a manuscript that could be published without revision being necessary.

      Thank you for pointing out that there were minor concerns with the English. We have carefully gone through the manuscript and fixed some minor issues with the writing. The analysis found 58 exons missed in previous assemblies (as well as all previously known exons of the 11 canonical Y-linked genes, which are present in at least one copy across the group). FISH on mitotic chromosomes using probes for 12 Y-linked sequences was used to determine the centromere locations, and to determine gene orders and relate them to the cytological chromosome bands, demonstrating changes in satellite distribution, gene order, and centromere positions between their Y chromosomes within the D. simulans clade species. It also confirmed previous results for Y-linked ribosomal DNA,genes, which are responsible for X-Y pairing in D. melanogaster males. Although 28S rDNA has been lost in D. simulans and D. sechellia (but not in D. mauritiana), the intergenic spacer (IGS) repeats between these repeats are retained on both sex chromosomes in all three species. Only sequencing can reliably reveal this, as their abundance is below the detection level by FISH in D. sechellia. The 11 canonical Y-linked genes' copy numbers vary between the species, and some duplicates are expressed and have complete open reading frames, and may therefore be functional because they, but most include only a subset of exons, often with duplicated exons flanking the the presumed functional gene copy. Mega-introns and Y-loops were found, as already seen in Drosophila species, but this new study detects turn overs in the ~2 million years separating D. melanogaster and the D. simulans clade. 49 independent duplications onto the Y chromosome were detected, including 8 not previously detected. At least half show no expression in testes, or lack open reading frames, so they are probably pseudogenes. Testis-expressed genes may be especially likely to duplicate into the Y chromosome due to its open chromatin structure and transcriptional activity during spermatogenesis, and indeed most of the new Y-linked genes in the species studied clade have likely functions in chromatin modification, cell division, and sexual reproduction. The study discovered two new gene families that have undergone amplification on D. simulans clade Y chromosomes, reaching very high copy numbers (36-146). Both these families appear to encode functional protein-coding genes and show high expression. The paper described intriguing results that illuminate Y chromosome evolution. First, SRPK, arose by an autosome-to-Y duplication of the sequence encoding the testis-specific isoform of the gene SR Protein Kinase (SRPK), after which the autosomal copy lost its testis-specific exon via a deletion. In D. melanogaster, SRPK is essential for both male and female reproduction, so the relocation of the testis-specific isoform to the Y chromosome in the D. simulans clade suggests that the change may have been advantageous by resolving sexual antagonism. The paper presents convincing evidence that the Y copy evolved under positive selection, and that gene amplification may confer advantageous increased expression in males. The second amplified gene family is also potentially related to an interesting function. Both X-linked and Y-linked duplicates are found of a gene called Ssl located on chromosome 2R. In D. simulans, the X-linked copies were previously known, and called CK2ßtes-like. In D. melanogaster, degenerated Y-linked copies are also found, with little or no expression, contrasting with complete open reading frames and high expression in the D. simulans clade species in testes, consistent with the possibility of an arms race between sex chromosome meiotic drive factors. Other interesting analyses document higher gene conversion rates compared to the other chromosomes, and evidence that these Y chromosomes may differ in the DNA-repair mechanisms (preferentially using MMEJ instead of NHEJ), perhaps contributing to their high rates of intrachromosomal duplication and structural rearrangements. The authors relate this to evidence for turnover of Y-linked satellite sequences, with the discovery of five new Y-linked satellites, whose locations were validated using FISH. The study also documented enrichment of LTR retrotransposons on the D. simulans clade Y chromosomes relative to the rest of the genome, together with turnovers between the species.

      Reviewer #1 (Significance (Required)):

      As described above, the advances are both, technical and conceptual for the field. The manuscript itself does an excellent job of placing the work in the context of the existing literature.

      • Anyone working on sex chromosomes and other non-recombining genome regions should be interested in the findings reported.

      • My field of expertise is the evolution of sex chromosomes, and the evolution of genome regions with suppressed recombination. I have experience of genomic analyses. I have less expertise in analyses of gene expression, but I understand enough about such approaches to evaluate the parts of this study that use them.

      Reviewer #2:

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript describes a thorough investigation of the Y-chromosomes of three very closely related Drosophila species (D. simulans, D. sechellia, and D. mauritiana) which in turn are closely related to D. melanogaster. The D. melanogaster Y was analysed in a previous paper by the same goup. The authors found an astonishing level of structural rearrangements (gene order, copy number, etc.), specially taking into account the short divergence time among the three species (~250 thousand years). They also suggest an explanation for this fast evolution: Y chromosome is haploid, and hence double-strand breaks cannot be repaired by homologous recombination. Instead, it must use the less precise mechanisms of NHEJ and MMEJ. They also provide circumstantial evidence that MMEJ (which is very prone to generate large rearrangements) is the preferred mechanism of repair. As far as I know this hypothesis is new, and fits nicely on the fast structural evolution described by the authors. Finally, the authors describe two intriguing Y-linked gene families in D. simulans (Lhk and CK2ßtes-Y), one of them similar to the Stellate / Suppressor of Stellate system of D. melanogaster, which seems to be evolving as part of a X-Y meiotic drive arms race. Overall, it is a very nice piece of work. I have four criticisms that, in my opinion, should be addressed before acceptance.

      Thank you for your positive comments. We respond to your concerns point-by-point below.

      The suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ) should be better supported and explained. At line 387, the authors stated "The pattern of excess large deletions is shared in the three D. simulans clade species Y chromosomes, but is not obvious in D. melanogaster (Fig 6B). However, because all D. melanogaster Y-linked indels in our analyses are from copies of a single pseudogene (CR43975), it is difficult to compare to the larger samples in the simulans clade species (duplicates from 16 genes). ". Given that D. melanogaster has many Y-linked pseudogenes (described by the authors and by other researchers, and listed in Table S6), there seems to be no reason to use a sample size of 1 in this species.

      We only used pseudogenes with large alignable regions (>300 bp) to prevent the potential bias toward small indels and increase our confidence in indel calling. As a result, we excluded most of the duplicates on the D. melanogaster Y chromosome. We now include 5 additional D. melanogaster Y-linked indels in the manuscript, however, the majority of indels in this species (36/41) are still from the same gene.

      Furthermore, given that D. melanogaster is THE model organism, it is the species that most likely will provide information to assess the "preferential MMEJ" hypothesis proposed by the authors.

      A previous paper has shown that male flies deficient in MMEJ have a strong bias toward female offspring (McKee et al. 2000), suggesting that MMEJ is necessary for successfully producing Y-bearing sperm, consistent with our hypothesis. We agree with the reviewer that careful genetic and cytological experiments in D. melanogaster could further clarify the role of MMEJ in the repair of Y-linked mutations. Even more revealing would be experiments using the simulans clade species, where we hypothesize the MMEJ bias is even more pronounced on the Y chromosome. We believe, however, that these experiments are beyond the scope of this study and should merit their own papers.

      Still on the suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ). Y chromosome in heterochromatic, haploid and non-recombining. In order to ascribe its mutational pattern to the haploid state (and the consequent impossibility of homologous recombination repair), the authors compared it to chromosome IV (the so called "dot chromosome"). This may not be the best choice: while chr IV lacks recombination in wild type flies, it is not typical heterochromatin. E.g., " results from genetic analyses, genomic studies, and biochemical investigations have revealed the dot chromosome to be unique, having a mixture of characteristics of euchromatin and of constitutive heterochromatin". Riddle and Elgin, FlyBook 2018 (https://doi.org/10.1534/genetics.118.301146). Given this, it seems appropriate to also compare the Y-linked pseudogenes with those from typical heterochromatin. In Drosophila, these are the regions around the centromeres ("centric heterochromatin"). There are pseudogenes there; e.g., the gene rolled is known to have partially duplicated exons.

      Thank you for the suggestion. We now include the data from pericentric heterochromatin and pseudogenes in supplemental data (see Fig 7). Both data types support our conclusion that indel size is only larger on Y chromosomes, which is consistent with the comparison between the dot chromosome and pericentric heterochromatin reported by Blumenstiel et al. 2002.

      In some passages of the ms there seems to be a confusion between new genes and pseudogenes, which should be corrected. For example, in line 261: "Most new Y-linked genes in D. melanogaster and the D. simulans clade have presumed functions in chromatin modification, cell division, and sexual reproduction (Table S7)".. Who are these "new genes"? If they are those listed in Table S6 (as other passages of the text suggest), most if not all of them are pseudogenes. If they are pseudogenes, it is not appropriate to refer to them as "new genes". The same ambiguity is present in line 263: "Y-linked duplicates of genes with these functions may be selectively beneficial, but a duplication bias could also contribute to this enrichment (...) " Pseudogenes can be selectively beneficial, but in very special cases (e.g.. gene regulation). If the authors are suggesting this, they must openly state this, and explain why. Pseudogenes are common in nearly all genomes, and should be clearly separated from genes (the later as a shortcut for functional genes). The bar for "genes" is much higher than simple sequence similarity, including expression, evidences of purifying selecion, etc., as the authors themselves applied for the two gene families they identified in D. simulans (Lhk and CK2ßtes-Y)

      Thank you for the suggestion. We now state our criteria for calling genes based on the expression and long CDS and correct the sentences that the reviewer refers to. The protein evolution rates of many Y-linked duplicates were surveyed in Tobler et al. 2017, who found that most are not under strong purifying selection. Our study supports this previous report. We think that protein evolution rate alone may not be a good indicator for functionality. Our current study does not focus on the potential function of these genes, and we think further population studies are required to get a solid conclusion. We changed the text to clarify this point: “Most new Y-linked duplications in D. melanogaster and the D. simulans clade are from genes with presumed functions in chromatin modification, cell division, and sexual reproduction (Table S7), consistent with other Drosophila species [17, 77].” (p15 L281-284)

      The authors center their analysis on "11 canonical Y-linked genes conserved across the melanogaster group ". Why did they exclude the CG41561 gene, identified by Mahajan & Bachtrog (2017) in D. melanogaster? Given that most D. melanogaster Y-linked genes were acquired before the split from the D. simulans clade (Koerich et al Nature 2008), the same most likely is true for CG41561 (i.e., it would be Y-linked in the D. simulans clade). Indeed, computational analysis gave a strong signal of Y-linkage in D. yakuba (unpublished; I have not looked in the other species). If CG41561 is Y-linked in the simulans clade, it should be included in the present paper, for the only difference between it and the remaining "canonical genes" was that it was found later. Finally, the proper citation of the "11 canonical Y-linked genes" is Gepner and Hays PNAS 1993 and Carvalho, Koerich and Clark TIG 2009 (or the primary papers), instead of ref #55.

      Thank you for the suggestion. CG41561 is indeed a relatively young Y-linked gene because it’s not Y-linked in D. ananassae (Muller’s element E). We already have CG41561 in Table S6 and we think that it is reasonable to separate a young Y-linked gene from the others. We also fixed the reference as suggested (p5 L116).

      Other points/comments/suggestions:

      1. a) Possible reference mistake: line 88 "For example, 20-40% of D. melanogaster Y-linked regulatory variation (YRV) comes from differences in ribosomal DNA (rDNA) copy numbers [52, 53]." reference #53 is a mouse study, not Drosophila. Thank you for pointing out this error, we fixed the reference (p4 L91).

      2. b) Possible reference mistake: line 208 "and the genes/introns that produce Y-loops differs among species [75]". ref #75 is a paper on the D. pseudoobscura Y. Is it what the authors intended? Yes, our previous paper (ref 75) found that Y-loops do not originate from the kl-3, kl-5, and ORY genes in D. pseudoobscura because they don’t have large introns in this species.

      c) line 113. "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, including 58 exons missed in previous assemblies (Table S1; [55])." Please show in the Table S1 which exons were missing in the previous assemblies. I guess that most if not all of these missing exons are duplicate exons (and many are likely to be pseudogenes). If they indeed are duplicate exons, the authors should made it clear in the main text, e.g., "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, plus 58 duplicated exons missed in previous assemblies."

      Thank you for the suggestion. However, the 58 exons did not include the duplicated exons. We are similarly surprised how much we will miss if we don’t assemble the Y chromosome carefully. We now mark these exons in red in Table S1 to make this point clearer.

      d) line 116 "Based on the median male-to-female coverage [22], we assigned 13.7 to 18.9 Mb of Y-linked sequences per species with N50 ranging from 0.6 to 1.2 Mb." The method (or a very similar one) was developed by Hall et al BMC Genomics 2013, which should be cited in this context. e) line 118: "We evaluated our methods by comparing our assignments for every 10-kb window of assembled sequences to its known chromosomal location. Our assignments have 96, 98, and 99% sensitivity and 5, 0, and 3% false-positive rates in D. mauritiana, D. simulans, and D. sechellia, respectively (Table S2). The procedure is unclear. Why break the contigs in 10kb intervals, instead of treating each as an unity, assignable to Y, X or A? The later is the usual procedure in computational identification of suspect Y-linked contigs (Carvalho and lark Gen Res 2013; Hall et al BMC Genomics 2013). The only reason I can think for analyzing the contigs piecewise is a suspicion of misassemblies. If this is the case, I think it is better to explain.

      Thank you for the suggestion. We did not break the contigs into 10kb intervals when we assigned the Y-linked contigs. As you suspect, our motivation for evaluating our methods and analyzing the contigs in 10kb intervals was to detect possible misassemblies. We rewrote the sentence to make this point clearer (p6 L129-132).

      1. f) Fig. 1. It may be interesting to put a version of Fig 1 in the SI containing only the genes and the lines connecting them among species, so we can better see the inversions etc. (like the cover of Genetics , based on the paper by Schaeffer et al 2008). Thank you for the suggestion. We would like to make a figure like that fantastic cover image you refer to, but the repetitive nature of the Y chromosome makes it difficult to illustrate rearrangements based on alignments at the contig-level. We instead opted to update Figure 1 to better highlight the rearrangements, still based on the unique protein-coding genes which are supported by the FISH experiments.

      2. g) Table S6 (Y-linked pseudogenes). Several pseudogenes listed as new have been studied in detail before: vig2, Mocs2, Clbn, Bili (Carvalho et al PNAS2015) Pka-R1, CG3618, Mst77F (Russel and Kaiser Genetics 1993; Krsticevic et al G3 2015) . Note also that at least two are functional (the vig2 duplication and some Mst77 duplications). Thank you for the suggestion. We now include a column to indicate the potential function of Y-linked duplicates (see Table S6).

      h) line 421: "one new satellite, (AAACAT)n, originated from a DM412B transposable element, which has three tandem copies of AAACAT in its long terminal repeats." The birth of satellites from TEs has been observed before, and should be cited here. Dias et al GBE 6: 1302-1313, 2014.

      Thank you for the suggestion. We now include a sentence to cite this reference (p27 L467-468).

      1. i) Fig S2 shows that the coverage of PacBio reads is smaller than expected on the Y chromosome. Any explanation? This has been noticed before in D. melanogaster, and tentatively attributed to the CsCl gradient used in the DNA purification (Carvalho et al GenRes 2016). However, it seems that the CsCl DNA purification method was not used in the simulans clade species (is it correct?). Please explain the ms, or in the SI. The issue is relevant because PacBio sequencing is widely believed to be unbiased in relation to DNA sequence composition (e.g., Ross et al Genome Biol 2013). Yes, we used Qiagen's Blood and Cell Culture DNA Midi Kit for DNA extraction. We suspect that the underrepresentation of Y-linked reads is driven by the presence of endoreplicated tissue in adults. Heterochromatin is underreplicated in endoreplicated cells, and thus there may simply be less heterochromatin in these tissues. Consistent with this idea, we find that all heterochromatin seems to be underrepresented in the reads, not just the Y chromosome (see Chakraborty et al. 2021; Flynn et al. 2020). We now include this discussion in the SI of our paper (see supplementary text p75).

      2. j) I may have missed it, but in which public repository have the assemblies been deposited? We link to the assemblies in Github (https://github.com/LarracuenteLab/simclade_Y) and they will also be in the Dryad Digital Repository (doi forthcoming).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Due to suppressed recombination, Y chromosomes have degenerated, undergone extensive structural rearrangements, and accumulated ampliconic gene families across species. The molecular processes and selective pressures guiding dynamic Y chromosome evolution are not well understood. In this study, Chang et al. generate updated Y assemblies of three closely related species in the D. simulans complex using long-read PacBio sequencing in combination with FISH. Despite having diverged only 250,00 years ago, the authors find structural rearrangements, two newly amplified gene families and evidence of positive selection across D. simulans. The authors also suggest the high level of Y duplications and deletions may be mediated by MMEJ biased repair.

      The authors generated a valuable resource for the study of Y-chromosome evolution in Drosophila and describe Y chromosome evolution patterns found in previous Y chromosome sequencing studies, such as newly amplified genes, positive selection, and structural rearrangements. The authors improvements to the Drosophila simulans clade Y chromosomes are commended, as assembly of the highly repetitive Y chromosome sequences is challenging. However, the manuscript is largely descriptive, the claims are largely speculative, and lacks a clear question. There are also a number of concerns with the text and figures (see below concerns). Overall, the manuscript would be significantly improved if the authors focused on a specific question as opposed to a survey of sequence features of the Y chromosome. For example, development of the idea that MMEJ is the primary mechanism for loss of Y chromosome sequence could be nice new twist.

      Our aim is to discover and understand the many different factors and processes that shape the evolution of Y chromosome organization and function. Because these Y chromosomes were largely unassembled, we needed to first generate the sequence assembly before we could ask specific questions. We prefer not to focus the manuscript solely on one specific topic such as MMEJ repair, as our other observations and analyses may be interesting to a wide range of scientists studying topics other than mutation and DNA repair. We are therefore choosing to present the more comprehensive story about Y chromosome evolution that we included in our original manuscript.

      We also respectfully disagree with the comment that our paper is just a descriptive survey of Y chromosomal sequence features. On the contrary, we present thorough evolutionary analyses to test hypotheses about the forces shaping the evolution of Y chromosome organization and Y-linked genes. Specifically, we use molecular evolution and phylogenetic and comparative genomics approaches to show that multi-copy gene families experience rampant gene conversion and positive selection. We posit that one simulans clade-specific Y-linked gene family has undergone subfunctionalization, potentially resolving sexual conflict, and another may be involved in meiotic drive. We also use evolutionary genomic approaches to show that the distribution of Y-linked mutations indeed suggests that Y chromosomes disproportionately use MMEJ and we propose that this unique feature may shape the evolution of Y chromosome structural organization. This is, as far as we know, a novel hypothesis. We think that follow-up studies of either hypothesis merit different papers.

      **Major concerns:**

      1. Title: The authors use "unique structure" in the title, which is a vague point. Are not Y chromosomes, or any chromosome, "unique" in some manner? Also are there not more evolutionary processes governing the rapid divergence of the Y's. Thank you for raising your concern. We believe that we are justified in referring to the Y chromosome as unique among all other chromosomes in its structural properties (e.g. combination of its hemizygosity, abundant tandem repeats, large scale rearrangements, and highly amplified testis-specific genes). Because there are many properties of Y chromosomes that we believe contribute to their rapid divergence, we opted for the general phrase ‘unique structure’ to capture all of these features. Many evolutionary processes likely shape the evolution of that unique structure (e.g. Muller’s Ratchet, background selection, Hill Robertson effects; see Charlesworth and Charlesworth 2000 for a review), and these processes are well-studied, especially on newly evolved sex chromosomes. Here our focus is on evolutionarily old Y chromosomes, which may have comparatively fewer targets of purifying selection and are more likely to be shaped by positive selection (Bachtrog 2008).

      p.2, line 53-56: The authors claim that sexually antagonistic selection and regulatory evolution are causes of recombination suppression. Couldn't this statement be reversed? Recombination suppression via inversions or other rearrangements enable sexually antagonistic selection. This is a chicken or egg question, so it should be revised to have both possibilities be equal.

      Thank you for the suggestion. We think that it is unlikely that recombination suppression itself is beneficial, but for sexually antagonistic selection and regulatory evolution, recombination suppression can have short-term benefits. We rephrased this sentence to be agnostic about the direction (p2 L56).

      p.5, 118-120: Are the assemblies de novo or have they been guided based upon the D. melanogaster Y chromosome assembly? Please clarify how the authors evaluate their methods by comparing their Y-sequence assignments to known chromosomal locations.

      Thank you for the suggestion. We didn’t use D. melanogaster Y chromosome assembly to guide our assemblies. “All assemblies are generated de novo”, and thus we don’t think there is any potential bias. We first assigned Y-linked sequences using the presence of known Y-linked genes, and used this assignment to evaluate our methods. We now make the sentence clear (p5 L112).

      While the gene copy number estimates are accurate, the PacBio-based genome assemblies are still not able to accurately assemble large segmental duplications (see Evan Eichler's laboratories recent primate and human genome assemblies). A statement mentioning the concerns about accuracy of the underlying sequence and genomic architecture shown should be included in the main text. FISH provides support for the location of the contigs, but not for the accuracy of the underlying genomic architecture.

      Thank you for the suggestion. We can’t validate all Y-linked regions. We did validate the larger structural features of the assembly and only discuss the results that we are confident in. We now include sentences to address this concern (p7 L150-152).

      The authors assigned Y-linked sequences based on median male-to-female coverage. Is this method feasible for assigning ampliconic sequence to the Y given the N50 of 0.6-1.2Mb? Are the authors potentially excluding novel Y-linked ampliconic sequence?

      We validated our methods to assign contigs to a chromosome by comparing 10-kb intervals to the contigs with known chromosomal location, including the Y chromosome. Our assignments have high (96, 98, and 99%) sensitivity and low (5, 0, and 3%) false-positive rates in D. mauritiana, D. simulans, and D. sechellia, respectively (see Table S2). Based on these results, we think that this method is reasonable for Y-linked contigs with N50 of 0.6-1.2Mb.

      We might exclude some novel Y-linked sequences since we only assigned ~15Mb out of a total ~40 Mb Y-linked sequences. We acknowledged this possibility, and now include a sentence to address this concern (p31 L554-556).

      Where did the rDNA sequences go in D. simulans and D. sechellia? Can they be detected on another chromosome?

      Please see Fig S5 for detailed results. We found a few copies of rDNA on the contigs of autosomes. We assembled many copies of rDNA that can’t be confidently assigned to Y chromosomes. It’s possible that they might be located on other chromosomes. Based on our FISH data (Fig S4) and previous papers, most of these non-Y-linked rDNA copies should be on the X chromosome. However, in this study, we did not make a concerted effort to assign X-linked contigs.

      Figure 2B is hard to follow and it is unclear what additional value it provides to part A. Why is expression level of specific exons important?

      Exon duplication may be an important contributor to Y-linked gene evolution: most genes have duplications and our figure shows that at least some of these duplicates are expressed. The patterns we see indicate that duplication may play different roles in genes depending on their length. For example, the duplications involving short genes (e.g., ARY) may be functional and influence protein expression, whereas duplications involving large genes (e.g. kl-2) may not influence the overall protein expression level from this gene, although the expressed duplicated exons may play some other role. We revised a sentence in the main text and added a sentence to the figure 2 legend to make this point clearer.

      Figure 3 There are many introns that contain gaps, so it is unclear how confident one can be in intron length when there are gaps.

      Indeed, we are not confident about the length of introns with gaps. Therefore, we separated these introns and showed them in different colors.

      Figure 4: What are the authors using as a common ancestor in this figure to infer duplications in the initial branch?

      We used phylogenies to infer the origin of Y-linked duplicates. Any duplications that happened earlier than the divergence between four species are listed in the branch. We also edited the legend to make this point clearer.

      p.15, paragraph 2: The authors describe a newly amplified gene, CK2Btes-Y, in D. simulans. In the first half of the paragraph the authors state that Y-linked copies are also found in D. melanogaster but have "degenerated and have little or no expression" and call them pseudogenes. Later in the paragraph, the authors state that the D. melanogaster Y-linked copies are Su(Ste), a source of piRNAs that are in conflict with X-linked Stellate. Lastly in the paragraph, the authors discuss Su(ste) as a D. melanogaster homolog of CK2Btes-Y. The logic of defining CK2Btes-Y origins is confusing. Was CK2Btes-Y independently amplified on the D. simulans Y, or were CK2BtesY and Su(Ste) amplified in a common ancestor but independently diverged?

      The amplification of CK2Btes-Y and CK2Btes-like happened in the ancestor of D. melanogaster and D. simulans (Fig S11). However, both CK2Btes-Y and CK2Btes-like became pseudogenes (D. melanogaster CK2Btes-Y is named PCKR in a previous study) in D. melanogaster. On the other hand, Ste and Su(Ste) are only limited to D. melanogaster based on phylogenetic analyses (Fig 5A) and are a chimera of CK2Btes-like and NACBtes. The evolutionary history of this gene family has been detailed in other papers, except for the presence of CK2Btes-Y in the D. simulans complex, which we describe for the first time in this study. We now include a new figure (Figure 5B) a schematic of the inferred evolutionary history of sex-linked Ssl/CK2ßtes paralogs

      Figure 5: Is each FISH signal a different gene copy?

      Yes, based on our assemblies, Lhk-1 and Lhk-2 are mostly located on different contigs. Unfortunately, we are not able to design probes that can separate Lhk-1 from Lhk-2.

      The authors suggest DNA-repair on the Y chromosome is biased towards MMEJ based on indel size and microhomologies. Is there any evidence MMEJ is responsible for variable intron length in the canonical Y-linked genes or the amplification of new gene families? Since MMEJ is error-prone, it's a more tolerable repair mechanism in pseudogenes, so their findings might be biased. Rather than comparing pseudogenes to their parent genes, they should compare chrY pseudogenes to autosomal pseudogenes. Even more would be to track MMEJ on the dot chromosome which is known not recombine and is highly heterchromatic like the Y chromosome.

      We did compare chrY pseudogenes to autosomal pseudogenes in our study. We also add new analyses to address other issues from reviewer 2, which are similar to your concern. We now include data from pericentric heterochromatin and pseudogenes (see Fig 7). Both data types support our conclusion that indel size is only larger on Y chromosomes. This is consistent with a report that the dot chromosome and pericentric heterochromatin have similar indel size distributions (Blumenstiel et al. 2002).

      Reviewer #3 (Significance (Required)):

      While it is a benefit to have much improved Y chromosome assemblies from the three D. simulans clade species, the gap in knowledge this manuscript is trying to address is unclear. The manuscript is almost entirely descriptive and the figures are difficult to follow.

      As stated above, we respectfully disagree with the comment that the manuscript is entirely descriptive, as we present thorough evolutionary analyses to test hypotheses about the forces shaping the evolution of Y chromosome organization and Y-linked genes. We have two guiding hypotheses about the importance of sexual antagonism and DNA repair pathways for Y chromosome evolution, and we conduct sequence analyses that support these hypotheses that sexual antagonism and MMEJ affect Y chromosome evolution.

      References cited in this response:

      Bachtrog D. The temporal dynamics of processes underlying Y chromosome degeneration. Genetics. 2008 Jul;179(3):1513-25. doi: 10.1534/genetics.107.084012. Epub 2008 Jun 18. PMID: 18562655; PMCID: PMC2475751.

      Blumenstiel, J.P., Hartl, D.L, Lozovsky, E.R.. Patterns of Insertion and Deletion in Contrasting Chromatin Domains, Molecular Biology and Evolution, Volume 19, Issue 12, December 2002, Pages 2211–2225, __https://doi.org/10.1093/oxfordjournals.molbev.a004045__

      Chakraborty M, Chang CH, Khost DE, Vedanayagam J, Adrion JR, Liao Y, Montooth KL, Meiklejohn CD, Larracuente AM, Emerson JJ. Evolution of genome structure in the Drosophila simulans species complex. Genome Res. 2021 Mar;31(3):380-396. doi: 10.1101/gr.263442.120. Epub 2021 Feb 9. PMID: 33563718; PMCID: PMC7919458.

      Charlesworth B, Charlesworth D. The degeneration of Y chromosomes. Philos Trans R Soc Lond B Biol Sci. 2000 Nov 29;355(1403):1563-72. doi: 10.1098/rstb.2000.0717. PMID: 11127901; PMCID: PMC1692900.

      Flynn,J, Long, M, Wing, RA, A.G Clark, Evolutionary Dynamics of Abundant 7-bp Satellites in the Genome of Drosophila virilis, Molecular Biology and Evolution, Volume 37, Issue 5, May 2020, Pages 1362–1375, https://doi.org/10.1093/molbev/msaa010

      McKee, Bruce D. et al. “On the Roles of Heterochromatin and Euchromatin in Meiosis in Drosophila: Mapping Chromosomal Pairing Sites and Testing Candidate Mutations for Effects on X–Y Nondisjunction and Meiotic Drive in Male Meiosis.” Genetica 109 (2004): 77-93.

      Tobler R, Nolte V, Schlötterer C. High rate of translocation-based gene birth on the Drosophila Y chromosome. Proc Natl Acad Sci U S A. 2017 Oct 31;114(44):11721-11726. doi: 10.1073/pnas.1706502114. Epub 2017 Oct 19. PMID: 29078298; PMCID: PMC5676891.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Due to suppressed recombination, Y chromosomes have degenerated, undergone extensive structural rearrangements, and accumulated ampliconic gene families across species. The molecular processes and selective pressures guiding dynamic Y chromosome evolution are not well understood. In this study, Chang et al. generate updated Y assemblies of three closely related species in the D. simulans complex using long-read PacBio sequencing in combination with FISH. Despite having diverged only 250,00 years ago, the authors find structural rearrangements, two newly amplified gene families and evidence of positive selection across D. simulans. The authors also suggest the high level of Y duplications and deletions may be mediated by MMEJ biased repair.

      The authors generated a valuable resource for the study of Y-chromosome evolution in Drosophila and describe Y chromosome evolution patterns found in previous Y chromosome sequencing studies, such as newly amplified genes, positive selection, and structural rearrangements. The authors improvements to the Drosophila simulans clade Y chromosomes are commended, as assembly of the highly repetitive Y chromosome sequences is challenging. However, the manuscript is largely descriptive, the claims are largely speculative, and lacks a clear question. There are also a number of concerns with the text and figures (see below concerns). Overall, the manuscript would be significantly improved if the authors focused on a specific question as opposed to a survey of sequence features of the Y chromosome. For example, development of the idea that MMEJ is the primary mechanism for loss of Y chromosome sequence could be nice new twist.

      Major concerns:

      1. Title: The authors use "unique structure" in the title, which is a vague point. Are not Y chromosomes, or any chromosome, "unique" in some manner? Also are there not more evolutionary processes governing the rapid divergence of the Y's.
      2. p.2, line 53-56: The authors claim that sexually antagonistic selection and regulatory evolution are causes of recombination suppression. Couldn't this statement be reversed? Recombination suppression via inversions or other rearrangements enable sexually antagonistic selection. This is a chicken or egg question, so it should be revised to have both possibilities be equal.
      3. p.5, 118-120: Are the assemblies de novo or have they been guided based upon the D. melanogaster Y chromosome assembly? Please clarify how the authors evaluate their methods by comparing their Y-sequence assignments to known chromosomal locations.
      4. While the gene copy number estimates are accurate, the PacBio-based genome assemblies are still not able to accurately assemble large segmental duplications (see Evan Eichler's laboratories recent primate and human genome assemblies). A statement mentioning the concerns about accuracy of the underlying sequence and genomic architecture shown should be included in the main text. FISH provides support for the location of the contigs, but not for the accuracy of the underlying genomic architecture.
      5. The authors assigned Y-linked sequences based on median male-to-female coverage. Is this method feasible for assigning ampliconic sequence to the Y given the N50 of 0.6-1.2Mb? Are the authors potentially excluding novel Y-linked ampliconic sequence?
      6. Where did the rDNA sequences go in in D. simulans and D. sechellia? Can they be detected on another chromosome?
      7. Figure 2B is hard to follow and it is unclear what additional value it provides to part A. Why is expression level of specific exons important?
      8. Figure 3 There are many introns that contain gaps, so it is unclear how confident one can be in intron length when there are gaps.
      9. Figure 4: What are the authors using as a common ancestor in this figure to infer duplications in the initial branch?
      10. p.15, paragraph 2: The authors describe a newly amplified gene, CK2Btes-Y, in D. simulans. In the first half of the paragraph the authors state that Y-linked copies are also found in D. melanogaster but have "degenerated and have little or no expression" and call them pseudogenes. Later in the paragraph, the authors state that the D. melanogaster Y-linked copies are Su(Ste), a source of piRNAs that are in conflict with X-linked Stellate. Lastly in the paragraph, the authors discuss Su(ste) as a D. melanogaster homolog of CK2Btes-Y. The logic of defining CK2Btes-Y origins is confusing. Was CK2Btes-Y independently amplified on the D. simulans Y, or were CK2BtesY and Su(Ste) amplified in a common ancestor but independently diverged?
      11. Figure 5: Is each FISH signal a different gene copy?
      12. The authors suggest DNA-repair on the Y chromosome is biased towards MMEJ based on indel size and microhomologies. Is there any evidence MMEJ is responsible for variable intron length in the canonical Y-linked genes or the amplification of new gene families? Since MMEJ is error-prone, it's a more tolerable repair mechanism in pseudogenes, so their findings might be biased. Rather than comparing pseudogenes to their parent genes, they should compare chrY pseudogenes to autosomal pseudogenes. Even more would be to track MMEJ on the dot chromosome which is known not recombine and is highly heterchromatic like the Y chromosome.

      Significance

      While it is a benefit to have much improved Y chromosome assemblies from the three D. simulans clade species, the gap in knowledge this manuscript is trying to address is unclear. The manuscript is almost entirely descriptive and the figures are difficult to follow.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript describes a thorough investigation of the Y-chromosomes of three very closely related Drosophila species (D. simulans, D. sechellia, and D. mauritiana) which in turn are closely related to D. melanogaster. The D. melanogaster Y was analysed in a previous paper by the same goup. The authors found an astonishing level of structural rearrangements (gene order, copy number, etc.), specially taking into account the short divergence time among the three species (~250 thousand years). They also suggest an explanation for this fast evolution: Y chromosome is haploid, and hence double-strand breaks cannot be repaired by homologous recombination. Instead, it must use the less precise mechanisms of NHEJ and MMEJ. They also provide circumstantial evidence that MMEJ (which is very prone to generate large rearrangements) is the preferred mechanism of repair. As far as I know this hypothesis is new, and fits nicely on the fast structural evolution described by the authors. Finally, the authors describe two intriguing Y-linked gene families in D. simulans (Lhk and CK2ßtes-Y), one of them similar to the Stellate / Suppressor of Stellate system of D. melanogaster, which seems to be evolving as part of a X-Y meiotic drive arms race. Overall, it is a very nice piece of work. I have four criticisms that, in my opinion, should be addressed before acceptance.

      The suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ) should be better supported and explained. At line 387, the authors stated "The pattern of excess large deletions is shared in the three D. simulans clade species Y chromosomes, but is not obvious in D. melanogaster (Fig 6B). However, because all D. melanogaster Y-linked indels in our analyses are from copies of a single pseudogene (CR43975), it is difficult to compare to the larger samples in the simulans clade species (duplicates from 16 genes). ". Given that D. melanogaster has many Y-linked pseudogenes (described by the authors and by other researchers, and listed in Table S6), there seems to be no reason to use a sample size of 1in this species. Furthermore, given that D. melanogaster is THE model organism, it is the species that most likely will provide information to assess the "preferential MMEJ" hypothesis proposed by the authors. Still on the suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ). Y chromosome in heterochromatic, haploid and non-recombining. In order to ascribe its mutational pattern to the haploid state (and the consequent impossibility of homologous recombination repair), the authors compared it to chromosome IV (the so called "dot chromosome"). This may not be the best choice: while chr IV lacks recombination in wild type flies, it is not typical heterochromatin. E.g., " results from genetic analyses, genomic studies, and biochemical investigations have revealed the dot chromosome to be unique, having a mixture of characteristics of euchromatin and of constitutive heterochromatin". Riddle and Elgin, FlyBook 2018 (https://doi.org/10.1534/genetics.118.301146). Given this, it seems appropriate to also compare the Y-linked pseudogenes with those from typical heterochromatin. In Drosophila, these are the regions around the centromeres ("centric heterochromatin"). There are pseudogenes there; e.g., the gene rolled is known to have partially duplicated exons. In some passages of the ms there seems to be a confusion between new genes and pseudogenes, which should be corrected. For example, in line 261: "Most new Y-linked genes in D. melanogaster and the D. simulans clade have presumed functions in chromatin modification, cell division, and sexual reproduction (Table S7)".. Who are these "new genes"? If they are those listed in Table S6 (as other passages of the text suggest), most if not all of them are pseudogenes. If they are pseudogenes, it is not appropriate to refer to them as "new genes". The same ambiguity is present in line 263: "Y-linked duplicates of genes with these functions may be selectively beneficial, but a duplication bias could also contribute to this enrichment (...) " Pseudogenes can be selectively beneficial, but in very special cases (e.g.. gene regulation). If the authors are suggesting this, they must openly state this, and explain why. Pseudogenes are common in nearly all genomes, and should be clearly separated from genes (the later as a shortcut for functional genes). The bar for "genes" is much higher than simple sequence similarity, including expression, evidences of purifying selecion, etc., as the authors themselves applied for the two gene families they identified in D. simulans (Lhk and CK2ßtes-Y) The authors center their analysis on "11 canonical Y-linked genes conserved across the melanogaster group ". Why did they exclude the CG41561 gene, identified by Mahajan & Bachtrog (2017) in D. melanogaster? Given that most D. melanogaster Y-linked genes were acquired before the split from the D. simulans clade (Koerich et al Nature 2008), the same most likely is true for CG41561 (i.e., it would be Y-linked in the D. simulans clade). Indeed, computational analysis gave a strong signal of Y-linkage in D. yakuba (unpublished; I have not looked in the other species). If CG41561 is Y-linked in the simulans clade, it should be included in the present paper, for the only difference between it and the remaining "canonical genes" was that it was found later. Finally, the proper citation of the "11 canonical Y-linked genes" is Gepner and Hays PNAS 1993 and Carvalho, Koerich and Clark TIG 2009 (or the primary papers), instead of ref #55. Other points/comments/suggestions:

      a) Possible reference mistake: line 88 "For example, 20-40% of D. melanogaster Y-linked regulatory variation (YRV) comes from differences in ribosomal DNA (rDNA) copy numbers [52, 53]." reference #53 is a mouse study, not Drosophila.

      b) Possible reference mistake: line 208 "and the genes/introns that produce Y-loops differs among species [75]". ref #75 is a paper on the D. pseudoobscura Y. Is it what the authors intended?

      c) line 113. "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, including 58 exons missed in previous assemblies (Table S1; [55])." Please show in the Table S1 which exons were missing in the previous assemblies. I guess that most if not all of these missing exons are duplicate exons (and many are likely to be pseudogenes). If they indeed are duplicate exons, the authors should made it clear in the main text, e.g., "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, plus 58 duplicated exons missed in previous assemblies."

      d) line 116 "Based on the median male-to-female coverage [22], we assigned 13.7 to 18.9 Mb of Y-linked sequences per species with N50 ranging from 0.6 to 1.2 Mb." The method (or a very similar one) was developed by Hall et al BMC Genomics 2013, which should be cited in this context. e) line 118: "We evaluated our methods by comparing our assignments for every 10-kb window of assembled sequences to its known chromosomal location. Our assignments have 96, 98, and 99% sensitivity and 5, 0, and 3% false-positive rates in D. mauritiana, D. simulans, and D. sechellia, respectively (Table S2). The procedure is unclear. Why break the contigs in 10kb intervals, instead of treating each as an unity, assignable to Y, X or A? The later is the usual procedure in computational identification of suspect Y-linked contigs (Carvalho and lark Gen Res 2013; Hall et al BMC Genomics 2013). The only reason I can think for analyzing the contigs piecewise is a suspicion of misassemblies. If this is the case, I think it is better to explain.

      f) Fig. 1. It may be interesting to put a version of Fig 1 in the SI containing only the genes and the lines connecting them among species, so we can better see the inversions etc. (like the cover of Genetics , based on the paper by Schaeffer et al 2008).

      g) Table S6 (Y-linked pseudogenes). Several pseudogenes listed as new have been studied in detail before: vig2, Mocs2, Clbn, Bili (Carvalho et al PNAS2015) Pka-R1, CG3618, Mst77F (Russel and Kaiser Genetics 1993; Krsticevic et al G3 2015) . Note also that at least two are functional (the vig2 duplication and some Mst77 duplications).

      h) line 421: "one new satellite, (AAACAT)n, originated from a DM412B transposable element, which has three tandem copies of AAACAT in its long terminal repeats." The birth of satellites from TEs has been observed before, and should be cited here. Dias et al GBE 6: 1302-1313, 2014.

      i) Fig S2 shows that the coverage of PacBio reads is smaller than expected on the Y chromosome. Any explanation? This has been noticed before in D. melanogaster, and tentatively attributed to the CsCl gradient used in the DNA purification (Carvalho et al GenRes 2016). However, it seems that the CsCl DNA purification method was not used in the simulans clade species (is it correct?). Please explain the ms, or in the SI. The issue is relevant because PacBio sequencing is widely believed to be unbiased in relation to DNA sequence composition (e.g., Ross et al Genome Biol 2013).

      j) I may have missed it, but in which public repository have the assemblies been deposited?

      Significance

      see above.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      I found this an exceptionally impressive manuscript. The evolution of Y chromosomes has until recently been nearly impossible, and this research group have pioneered approaches that can yield reliable results in Drosophila. The study used an innovative heterochromatin-sensitive assembly pipeline on three D. simulans clade species, D. simulans, D. mauritiana and D. sechellia, which diverged less than 250 KYA, allowing comparisons with the group's previous results for the D. melanogaster Y.

      The study is both technically impressive and extremely interesting (an highly unusual combination). It includes a rich set of interesting results about these genome regions, and furthermore the results are discussed in a well-organised way, relating both to previous observations and to understanding of the genetics and evolution of Y chromosomes, illuminating all these aspects. It is a rare pleasure to read such a study. I believe that this study will inspire and be a model for future work on these chromosomes. It shows how these difficult genome regions can be studied.

      Major comments:

      The conclusions are convincing. The methods are explained unusually clearly, and the reasoning from the results is convincing. When appropriate, the caveats, the caveats are clearly explained. The material is clearly organised and the questions studied are well related to the results. I had a few minor comments concerning the English. Even the figure (often a major problem to understand) are very clear and helpful, with proper explanations. I have very rarely read such a good manuscript, and almost never (in a long career) found a manuscript that could be published without revision being necessary.

      The analysis found 58 exons missed in previous assemblies (as well as all previously known exons of the 11 canonical Y-linked genes, which are present in at least one copy across the group). FISH on mitotic chromosomes using probes for 12 Y-linked sequences was used to determine the centromere locations, and to determine gene orders and relate them to the cytological chromosome bands, demonstrating changes in satellite distribution, gene order, and centromere positions between their Y chromosomes within the D. simulans clade species. It also confirmed previous results for Y-linked ribosomal DNA,genes, which are responsible for X-Y pairing in D. melanogaster males. Although 28S rDNA has been lost in D. simulans and D. sechellia (but not in D. mauritiana), the intergenic spacer (IGS) repeats between these repeats are retained on both sex chromosomes in all three species. Only sequencing can reliably reveal this, as their abundance is below the detection level by FISH in D. sechellia. The 11 canonical Y-linked genes' copy numbers vary between the species, and some duplicates are expressed and have complete open reading frames, and may therefore be functional because they, but most include only a subset of exons, often with duplicated exons flanking the the presumed functional gene copy. Mega-introns and Y-loops were found, as already seen in Drosophila species, but this new study detects turn overs in the ~2 million years separating D. melanogaster and the D. simulans clade. 49 independent duplications onto the Y chromosome were detected, including 8 not previously detected. At least half show no expression in testes, or lack open reading frames, so they are probably pseudogenes. Testis-expressed genes may be especially likely to duplicate into the Y chromosome due to its open chromatin structure and transcriptional activity during spermatogenesis, and indeed most of the new Y-linked genes in the species studied clade have likely functions in chromatin modification, cell division, and sexual reproduction. The study discovered two new gene families that have undergone amplification on D. simulans clade Y chromosomes, reaching very high copy numbers (36-146). Both these families appear to encode functional protein-coding genes and show high expression. The paper described intriguing results that illuminate Y chromosome evolution. First, SRPK, arose by an autosome-to-Y duplication of the sequence encoding the testis-specific isoform of the gene SR Protein Kinase (SRPK), after which the autosomal copy lost its testis-specific exon via a deletion. In D. melanogaster, SRPK is essential for both male and female reproduction, so the relocation of the testis-specific isoform to the Y chromosome in the D. simulans clade suggests that the change may have been advantageous by resolving sexual antagonism. The paper presents convincing evidence that the Y copy evolved under positive selection, and that gene amplification may confer advantageous increased expression in males. The second amplified gene family is also potentially related to an interesting function. Both X-linked and Y-linked duplicates are found of a gene called Ssl located on chromosome 2R. In D. simulans, the X-linked copies were previously known, and called CK2ßtes-like. In D. melanogaster, degenerated Y-linked copies are also found, with little or no expression, contrasting with complete open reading frames and high expression in the D. simulans clade species in testes, consistent with the possibility of an arms race between sex chromosome meiotic drive factors. Other interesting analyses document higher gene conversion rates compared to the other chromosomes, and evidence that these Y chromosomes may differ in the DNA-repair mechanisms (preferentially using MMEJ instead of NHEJ), perhaps contributing to their high rates of intrachromosomal duplication and structural rearrangements. The authors relate this to evidence for turnover of Y-linked satellite sequences, with the discovery of five new Y-linked satellites, whose locations were validated using FISH. The study also documented enrichment of LTR retrotransposons on the D. simulans clade Y chromosomes relative to the rest of the genome, together with turnovers between the species.

      Significance

      As described above, the advances are both, technical and conceptual for the field. The manuscript itself does an excellent job of placing the work in the context of the existing literature.

      • Anyone working on sex chromosomes and other non-recombining genome regions should be interested in the findings reported.

      • My field of expertise is the evolution of sex chromosomes, and the evolution of genome regions with suppressed recombination. I have experience of genomic analyses. I have less expertise in analyses of gene expression, but I understand enough about such approaches to evaluate the parts of this study that use them.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Summary

      Fusion and fission of the mitochondrial network is one of the hottest topics in mitochondrial biology in the last years. The process is obviously necessary to allow cells to control the quality of small individual organelles, which are degraded by autophagy or mitophagy, if they are not working properly. Since a healthy mitochondrial network is essential for every cell in the body, the molecular players involved in these two processes are heavily investigated. In this paper, the authors investigate the role of MTFP1 in vitro and in vivo, a protein which has been studied and named because it seemed to be an important fission factor in cultured cells.

      Surprisingly and excitingly, the authors find that mitochondrial morphology and homeostasis is not affected by knocking out this protein in the heart of mice. On the contrary, it shows that this protein is a critical regulator of mitochondrial inner membrane coupling via the adenine-nucleotide-transporter (ANT). A loss of MTFP1 leads to a decline in the mitochondrial membrane potential, leading to cell death, which finally results in dilatative cardiomyopathy and causes early death of the animals. Therefore, this paper gives an important mitochondrial inner membrane protein a new role which may become very important to understand the opening of the large channel (MTPT-channel), which is responsible for some kinds of cell death in almost all cell types.

      Major comments:

      The conclusions are convincing, additional experiments on the molecular nature of the interaction between MTFP1 and ANT may be easily proposed by a reviewer; however, this will open a completely new line of research and should not be asked at this moment. Data and methods are presented in a perfect way, typical for the Wai lab. Statistical analysis has been performed meticulously, and there is nothing to add here. I have read the paper very carefully, but cannot find many points which should be changed.

      Minor Points:

      I must admit I hate the title, but the authors are in good company using the "genetic argument", as many others do. Mitochondrial fission process controls energetic efficiency - that is correct, but it does not prevent inflammatory cardiomyopathy and heart failure in mice. It is intact mitochondria which prevent inflammatory cardiomyopathy and heart failure, and as long as we do not know what exactly MTFP1 does, this title is misleading, although it may be considered attractive for readers. I would reformulate that and mention the new role of this protein in coupling of the mitochondrial inner membrane potential, but I leave this to the authors, of course.

      P. 2, line 45: The loss of MTFP1 promotes ... (erase "the")

      P. 12, line 321: There is clearly no indication of mitochondrial elongation, but I do see clearly in these pictures a separation between the organelles in the mutant mice in contrast to wild type, where mitochondria touch each other (Fig. 3c to d). If this is consistent, it should be mentioned. P. 12, line 324: I am not a true expert in fusion and fission, so wouldn't be a blot showing all the OPA1 isoforms necessary here?

      P. 13, line341: The same argument is repeated in two sentences following each other. I suggest to write here "Our data collectively indicate that MTFP1, unlike DRP1, is not an essential fission protein, contrary to its namesake, either in vitro or in vivo.".

      P. 13, line 349: "We sought to investigate..."

      Significance

      Understanding mitochondrial dynamics (fusion and fission) and bioenergetics (which some people considered to be fully known since the 1950s) is of utmost importance for biology and biomedicine. Since this paper gives a prominent protein, which the field believes is a fission factor, a completely new role, it is a paper of high interest. As the authors state, using these mice the protein may help to understand the molecular function of the mitochondrial membrane permeability transition pore (MPTP), which is still enigmatic, but important for so many ways of cell death. The paper is therefore state of the art and at the frontline of cell biology, and the large mitochondrial community will be very interested to read the paper.

      I have been working on mitochondria for 35 years, starting with bioenergetics, switching then to mitochondrial biogenesis regulated by transcription of nuclear genes as well as the mitochondrial genome, followed by studying the consequences of mtDNA mutations, and now considering how mitochondrial dysfunction may be involved in the normal aging process. Therefore, I feel myself competent to critically judge the quality of this paper. I am not a molecular biologist, therefore, the molecular details of protein-protein interaction do not lie in the focus of my interest; on the contrary, I feel that sometimes too much emphasis is laid on such molecular details, while the big question - in this case, how mitochondrial membrane potential is regulated - is not addressed at all.

      Referee Cross-commenting

      I guess we all suffer from reviewers of our own papers asking for more mechanistic insight. This paper unexpectedly shows a new role for MTFP1 - which is important for the mito community - and opens the door to more mechanistic studies how it uncouples the mitos and leads to cell death via ANT and MPTP - which is imprtant for a very broad community.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Donnarumma et al. characterize cardiac-specific KO of Mitochondrial fission process 1 (MTFP1), a mysterious mitochondrial protein thought to be involved in mitochondrial inner membrane fission. They initially demonstrate that the survival, cardiac function and respiration is diminished in the KO mouse and seek to find a mechanism. In MEF cells, surprisingly, they do not report any changes in fission, though mitochondrial morphology is altered. The authors then identify loss of MTFP1 as being damaging through exacerbation of cell death, possibly due to enhanced activation of the mitochondrial permeability transition. This is a beautiful and thorough paper. The data presented is of high quality and the conclusions are well supported by the figures. There was little to criticize in the manuscript!

      1. Although total mtDNA levels were no different, was there mtDNA release into the cytoplasm in Mtfp1 cKO? This is one possible mechanism to consider regarding the interferon response, as this would be a potent trigger for the innate immune response, as pointed out in the discussion and in PMC4409480.
      2. The authors show mitochondrial morphology in the pre-symptomatic period. What happens during DCM? Does this effect become exacerbated in the KO compared to WT?
      3. Given the cellular phenotypes seen in the ppif/Mtfp1 DKO cells, does this translate into a survival benefit in these animals? (If this data is easily available would recommend showing it, even if negative; but if entirely new crosses and 20-30 weeks of follow-up are required then it's fine to not address this question here).
      4. Methods (line 1281): It appears that only male mice were imaged from 10-34 weeks? Why only show one sex, especially as the authors note a difference in survival between males and females? Also, it is unclear why the data on female HF is relegated to the Supplement. This should be in the main manuscript side-by-side with the male data on the same scale to allow comparison of effect sizes on similar assays. Minor comments:
      5. Please change the title: "inflammatory cardiomyopathy" is a poorly defined term and would suggest myocarditis or inflammatory cell infiltrates, which are not shown in the manuscript. In addition, the only discussion of inflammation is through the innate immunity pathway in the RNA-seq data, with no real further follow-up.
      6. Line 39, Abstract: "ANT" needs to be in brackets/parenthesis
      7. Figure 1M: It would be good to see a higher magnification image showing fibrosis in the trichrome stain.
      8. Line 180, "gender" should more properly read "sex".
      9. At line 321, the authors state that there are no changes in mitochondrial elongation, however, Figure 3D seems to suggest that mitochondrial area is decreased in MKO cells. Is this an error or are the authors suggesting that the data in 3D is not significant? How was elongation measured?
      10. At line 335, the authors state that MTFP1 KO mitochondria were not protected from fragmentation, this is supported by the data in Figures 3G-H. However, to my eye, it appears that the mitochondria from the KO cells were far more fragmented in response to hydrogen peroxide. Is this data not significant?

      Significance

      This paper is novel in that it constitutes the first description of a mouse cardiac knockout of MTFP1, a poorly studied protein previously thought to be involved in mitochondrial fission. Previously MTFP1 has been described in knockdown cells (Aung et al. J Cell Mol Med. 2017 Dec; 21(12)) and the current paper builds upon this research. The current paper demonstrates that MTDP1 is important for cardiac function, but intriguingly, does not share the prior in vitro phenotypes related to mitochondrial fission, suggesting that it may have some other physiological function. Most of the methods shown are standard, though there are some quite novel machine learning-based analyses of imaging data. The paper is quite thorough and of relevance to a wide range of investigators interested in cardiac mitochondrial function, mitochondrial kinetics (fusion/fission), and cell death mechanisms more broadly. Our field of expertise is in cardiac mitochondrial function. The ML computational tools are very interesting, but these are not our expertise.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript entitled "Mitochondrial fission process 1 (MTFP1) controls bioenergetic efficiency and prevents inflammatory cardiomyopathy and heart failure in mice" by Donnarumma and collaborators investigate the role of MTFP1 in the heart in vivo. Mice with a cardiac-specific deletion of Mtfp1 were generated and fully characterized. Structure-function analyses were performed prior to and at the onset of the cardiomyopathy and show that homozygous Mtfp1 ko mice develop DCM progressing to heart failure and death in middle age, associated with increased fibrosis. RNAseq data revealed a severe impairment of metabolic processes including reduced oxidative phosphorylation, TCA cycle and mitochondrial gene expression. Mitochondrial respiration was significantly reduced in mitochondria isolated from Mtfp1 ko mice, while global mitochondrial proteins and activity of the Krebs cycle remained normal. Further assessment of a variety of processes revealed an increase in proton leak through ANT as the major contributor to the mitochondrial defects and cardiac dysfunction in Mtfp1 ko mice. Cardiomyocytes isolated from Mtfp1 ko mice were also more sensitive to stress-induced apoptosis and to mPTP opening. The major conclusion of the study is that contrary to previous reports documenting a role of MTFP1 in mitochondrial fission, MTFP1 does not regulate mitochondria morphology but rather is essential for cardiac energy balance. This is substantiated by mass spectrometry experiments which identify mitochondrial proteins of the complex I/IV and proteins regulating mPTP as MTFP1 partners.

      Overall, this is an elegant study with an impressive amount of work performed in isolated mitochondria and in vivo before and at the onset of DCM. Results are important because they challenge previous findings that established a role of MTFP1 in mitochondrial fission and therefore reveal another function of MTFP1. To rigorously establish how MTFP1 regulates cardiac bioenergetics, additional experiments are needed and are listed below. In particular, the use of wildtype mice as control is concerning because some transgenic lines of Myh6-Cre+ develop DCM. Also, experiments addressing MTFP1 as an essential fission protein should be performed in adult ventricular myocytes isolated from Mtfp1 ko mice to show consistency with experiments performed in MEFs.

      Major comments:

      One major concern is with the control mice, which appear to be wildtype (Myh6-Cre+/+ Mtfp1 LoxP/LoxP). The proper control group should be Myh6-Cretg/+. This is important because some models of Myh6-Cre+ mice develop DCM including mitochondrial dysfunction (Buerger et al., J Card Failure 2006; Hall et al., Am J Physiol Heart Circ Physiol 2011). At a minimum, the most critical assays evaluating mitochondrial function should be performed using Myh6-Cre+ as control to verify that they do not develop pathological cardiac remodeling.

      The observation that Mtfp1ko mice show a complete loss of the protein by Western blot analysis is intriguing because it suggests that Mtfp1 is only expressed in ventricular myocytes and not in the other cells populating the heart. Can you please comment on this?

      Was Seahorse analysis from ventricular myocytes isolated from Mtfp1ko performed in parallel with the analysis in MEF and U2OS cells? This should be done to establish the cell specific defects observed in cardiac mitochondria lacking Mtfp1.

      Mitochondrial morphology under normal or stress condition was assessed in MEF, which have very distinct characteristics than primary cardiac cells. The experiment using oligomycin, rotenone and CCCP should be performed in ventricular myocytes isolated from Mtfp1 ko mice, to rigorously reach the conclusion that MTFP1 is not essential for mitochondrial fission.

      Related to that, is-it possible that while total levels of mitochondrial fission and fusion proteins are similar in Mtfp1 ko and wt mice, their phosphorylated forms may be different?

      Figure 4: Cell death in Mtfp1 ko and control cardiomyocytes is measured using supervised ML-assisted high throughput live-cell imaging (Cretin et al., 2021). This result should be substantiated by additional apoptosis assays.

      Cell death assay are performed by treating cardiomyocytes isolated from Mtfp1 ko and wt mice with the cardiotoxic anthracycline doxorubicin (DOX). The dose DOX of 60 microM is extremely high. Can cell death be observed at lower concentrations of DOX?

      Minor comments:

      Line 349: there is a typo. Please replace "we sought investigate whether MTFP1 loss specifically..." with "we sought to investigate whether MTFP1 loss specifically..."

      Line 417: What the authors mean is that "the modest level of over-expression did not negatively impact cardiac function in vivo (Figure S5B-C)".

      Line 490-500: this is a very long sentence. Please break it down into 2 sentences to ease the reading.

      Significance

      The role of MTFP1 has been investigated in isolated cells where conflicting results were reported in the literature. The in vivo role of MTFP1 in the heart is currently unknown. RNAseq and a panoply of approaches assessing mitochondrial structure and function, before symptomatic DCM occurs, provide important insights on early events causing the cardiomyopathy. This study is potentially conceptually innovative and could reveal a new role of MTFP1 in maintaining energy metabolism in the heart as well as in other organs.

      Referee Cross-commenting

      I have read the comments of the other 3 reviewers in details. Like reviewer 3 and 4, I believe that the study is very well performed and provides new knowledge on the role of MTFP1 on cardiac energetics, assuming that the control mice do not develop DCM. I agree with the issues they identified. Regarding the issues raised by reviewer 1, especially concerning the lack of mechanistic insights, I actually thought that the full characterization of the Mtfp1 cko mouse model before and at the onset of the cardiomyopathy showing a strong cardiac phenotype, the RNAseq data showing alteration of metabolic genes and the detailed experiments performed in isolated mitochondria and isolated cells including rescue experiments, provide strong evidence that Mtfp1 regulates energy metabolism. That being said, I agree that direct causality could be better demonstrated by adding siRNA experiments to knockdown Mtfp1 and see if it can recapitulate the adverse effects seen in Mtfp1 ko mice. This was attempted in MEFs and U2OS cells, which did not show the expected results. I would perform this experiment in cardiac cells, which is the relevant cell type to investigate underlying mechanisms. Adding causality experiments would strengthen the study even more.

    5. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The study investigated the role of mitochondrial fission process 1 (MTFP1) on cardiac structure and function. MTFP1 deletion in the heart resulted in adult-onset dilated cardiomyopathy (DCM), reduced membrane potential, and increased non-phosphorylation-dependent respiration. MTFP1 deletion also increased the sensitivity to programmed cell death, which was accompanied by an opening of the mitochondrial permeability transition pore (mPTP) in vitro. Thus, the authors conclude that MTFP1 influences mitochondrial coupling and cell death sensitivity.

      I have the following concerns regarding the study and its main conclusions:

      Major concerns:

      1- While the study challenges previous reports regarding the role of MTFP1 in mitochondrial fission, the study is descriptive and does not provide any mechanistic insights delineating the impact of MTFP1 on cardiac energy metabolism and cell death.

      2- The significance of the RNA sequencing data is not clear, and the authors need to put these changes in context and explain how these changes may fit in the study context. It is also not clear why the authors decided to only comment on the changes in Nppa and Nppb levels?

      3- It is not clear how MTFP1 influences bioenergetic efficiency, and the authors do not prove any evidence to suggest that this might be the case.

      4- In Figure 2F, there is a decrease in the expression of ATP5A complex in the cMKO mitochondria, which could explain the changes in state 4 respiration and membrane potential. The authors need to delineate how MTFP1 could influence the activity of the ATP5 complex.

      5- In Figure S4, the author should report the baseline measurements of LV function and structure pre-doxorubicin treatment to ensure no significant difference in these parameters occurred prior to the treatment protocol.

      6- How does MTFP1 modify PTP activity? More work is needed to characterize this effect.

      7- Co-immunoprecipitation data in figure S5 are confusing and have no clear significance. Therefore, the authors need to discuss the significance of these changes and how they might be relevant in the study context.

      Minor:

      • Line 222, "wholesale" > whole cell

      Significance

      Significance: This study challenges existing dogma, although the data is not convincing enough to make this challenge convincing.

      Referee Cross-commenting

      I have read the comments of the other 3 reviewers, and I agree with their comments. This is an interesting study, that if adequately revised would make an important contribution to the literature.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1: **Major comments:**

      The authors state that all the RNA and contaminating DNA was validated and verified with nanodrop and BioAnalyzer which is the correct and accepted approach. However, the following concerns arise with testing reaction efficiency and data analysis:

      Comment 1.

      For reaction efficiency, the standard curves for each reference gene and gene of interest target should be included in the supplemental data. A four point standard curve is the bare minimum to assess reaction efficiency and raises concerns about the data quality. The unknown samples being tested should also be plotted on the corresponding standard curves to assess their efficiency

      Response:

      We have indeed calculated primer efficiencies by serial dilution and performed a four-point standard curve wherever possible. In other cases, at least a three point dilution curve was performed to assess primer efficiency. To have a more extensive range of Cq values in the standard curve, the dilution series was done with serial dilution by a factor of 1/10th as indicated in the materials and methods section under the heading “Amplification Efficiencies”. This provided a range of 6.6 cycles (three point dilution) and about 9.9 cycles (4 point dilution for the primers tested. If the Cq values of the 4th dilution fell beyond the detection range of the machines (above 29 cycles) or closer to the No-RT Control, only the first three dilutions were taken into consideration. We have now included the standard curve for all the genes in the metadata/source data and updated the Figshare DOI. All sample Cq values were within this standard curve as mentioned in the materials and methods section and they have been disclosed already in the metadata files. The raw qPCR Cq output for all references and targets for both datasets can be retrieved from the data file in figshare. Moreover, we will also add a new sentence in the methods sections clarifying the standard curve dilutions and data availability.

      Comment 2. The statement starting on line 510: "The WT experimental group was omitted from this analysis as it was used as the experimental calibrator for differential expression. The mean Fold Change of the WT group is always at 1 regardless of the gene/method in question and therefore it is redundant to test for statistical significance of the WT fold change levels across different methods for each gene." indicates that data analysis was not performed in a rigorous and generally accepted manner. PLease check the analysis with that described in: https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(18)30342-1

      The generally accepted methodology for relative, normalized qPCR data analysis is well described in Figure 5 of that article. qPCR statistical analysis should be performed on the log transformed expression results also well described in that paper.

      Response:

      We apologize for any lack of clarity on this line. We have always compared the Control group to the Test group while performing statistical analysis as shown Figure 3 and Figure 6. This is a fundamental point of any study and we have strictly adhered to this. The highlighted statement pertains to the supplementary figures 1 & 2 where we compare the Fold changes of the “Test” groups between qPCR and RNA-Seq in both datasets. Comparing the Control groups with one another between these methods is redundant as the mean Fold change of the control groups are always 1 as we are measuring relative expression. Thus, we cannot perform any meaningful statistical testing between the control groups between RNA-Seq and qPCR regardless of the method employed for testing.

      Furthermore, the use of the 2-ΔΔCt method for relative expression is in strict adherence to the initial papers describing this method (Livak and Schmittgen 2001, Schmittgen and Livak, 2008), which is again recapitulated in the article that you have cited. This can be seen in the metadata where the excel files that were used for calculating differential expression for all samples and datasets can be accessed. However, we would like to remark that we use more stringent criteria for primer validation (Efficiency between 95% and 105% as opposed to between 90% and 110% as mentioned in the paper). Moreover, the statistical testing and data representation prescribed in Figure 5 of the article that you have mentioned are not well founded for the following reasons:

      • We cannot perform parametric T-Tests using low sample sizes. Furthermore, we cannot test for data normality using few data points employed in standard qPCR assays. Thus, neither our qPCR assays nor the ones used in the mentioned article have enough samples to perform a T-Test. Hence, we have used a non – parametric ordinal Mann Whitney test for testing statistical significance in our study, as it is more apt for such low sample sizes and distributions.
      • The article proposes data representation with the mean and SEM or 95% Confidence Intervals (CI). We would like to kindly remark that SEM and CI are sampling parameters that arise when we perform sampling of data points from a larger population. In our study, we have always shown all the data points (biological replicates) for each experimental group. Hence, we can only show the distribution around the mean with the standard deviations (SD) and not with SEM or CI. We have not performed any sampling whatsoever nor has the study mentioned by the reviewer.

        Comment 3. The authors used Normfinder to assess reference gene stability. Since Normfinder uses a particular algorithm for assessing stability, it is recommended to assess stability using a combination of these "stability calculators" including: GeNorm, NormFinder and BestKeeper. This is described in Table 1 of: https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(18)30342-1. This will give a much more reliable perspective on the ranking of reference genes by their stability.

      Response:

      The method used in our study for reference gene validation is a combination of CV, Normfinder and statistical testing of raw expression profiles. In our previous study (Sundaram et al 2019), we have categorically shown that using a combination of different existing methods such as GeNORM, NormFinder and Best Keeper and comparing their ranks results in a sub optimal choice of reference genes. This is because GeNorm ranks genes with similar expression patterns as stable even if they vary significantly among groups. BestKeeper calculates variation based on Cq values which are exponents while the expression levels are calculated in the linear scale (2^Cq). NormFinder stability scores are influenced by the presence of genes with significant overall variation. More evidence backed up with data can be found in our previous study where we have clearly shown that combining these methods and calculating an overall rank (as proposed in the article you have mentioned) is not the best strategy. Hence, we devised the approach used in the present study, which has been previously validated, published (Sundaram et al 2019, PLoS ONE) and was designed taking into account the advantages and disadvantages of the different existing approaches.

      Comment 4. Finally, since many currently studied targets for relative gene expression are low expressed, it would be important to also examine three deferentially expressed targets in the Cq range of 29 to 32. Yes the variability will be higher but these data will give a more realistic test of reference gene stability.

      Response:

      The target genes used in the study range from about 12 cycles to about 29 cycles (both datasets included, please refer to the source data/metadata). This falls well within the standard curves of all these genes used as mentioned earlier. The stability of the reference genes has been shown with absolute parameters such as the Co-efficient of variation and the Normfinder S scores (Tables 1, 2, 3 & 4). Although we are not opposed to adding more target genes, we fail to see as to how adding target genes with Cq values above 29 cycles would reflect on the stability of reference genes. The variability that will be observed is a mere reflection of the variability of Cq values of the target genes in the Cq range of 29 – 32 as it approaches the detection limits of qPCR assays. The Cq values of the best reference genes would still remain the same. Therefore, this exercise cannot test the “stability” of the reference genes but only demonstrate the limit of qPCR detection (which is already well known). We would also like to remark that we have used No-RT controls in our qPCR assays, which exhibit a signal (different dissociation peak) in this Cq range for some genes and hence this is not a signal that arises from the cDNA. Therefore, we do not consider values above 29 cycles are reliable in our qPCR setup and we switch to droplet digital PCR for such low-expressed genes in our studies.

      Reviewer #2: **Summary + Minor Comments** Reference gene selection is one of the most critical steps in gene expression analysis using qPCR. The authors compared data quality using references selected based on RNA-Seq or using panel of often used reference genes. The manuscript is well prepared and easy to understand. Figures are nice and clear. I do not have major comments, but rather a few suggestions to make the manuscript more advanced. Since it is based on already available data or a few more expression measurements could be easily added, I would suggest to include total RNA factor, some rRNA and mtRNA as potential references. It will be interesting to compare their stability and effect on results of other targeted genes.

      In discussion, authors suggested that: "stable reference genes for qPCR data normalisation can be obtained from any random set of candidates provided the statistical approach of reference gene validation is sound and consistent". I do not think the word random in many sentences is appropriate. Panel of reference genes used in this study contains many known stable genes and that does not look random to me. I would rephrase these sentences. Usually panels of reference genes (for human and mouse are commercially available and contains several genes used in study) are composed of genes coding various biological processes to ensure that some of them will be stably expressed in experiments.

      Response:

      We understand the reviewer’s perspective on the use of the words “random reference genes”. We have replaced it with the words “conventional reference genes” throughout the manuscript.

      Regarding the addition of other RNA species as reference genes, we would like to clarify that we have used only protein coding transcripts (encoded by nuclear genes) as reference genes as all our target genes also belong to the same RNA category. This was done in accordance with the MIQE guidelines for qPCR data publication (Bustin et al 2009, DOI: 10.1373/clinchem.2008.112797) which states that rRNA should not be used for mRNA target gene normalization. This is because the vast majority of RNA from total RNA extraction is rRNA and only about 1% - 5% is mRNA. Thus, it is advisable to normalize mRNA targets with mRNA reference genes as it serves as a control for the extraction and RT PCR protocol. This argument can also be extended to other RNA species either in type or in origin (mtRNA). Regarding the total RNA factor, we have always used the same quantity of total RNA from all samples for RT-PCR as mentioned in the materials and methods section.

      Reviewer #3

      **Summary + Minor Comments**

      The aim of this study was to demonstrate that the statistical approach to determine the best reference genes from randomly selected "standard" reference genes might be more sufficient than employing reference genes as indicated by RNA-Seq.

      In a previous study they established a qPCR data normalization workflow, after comparing several statistical approaches for the assessment of reference gene stability. In this study they apply this workflow to compare "random" reference genes with preselected references genes based on RNA-Seq data. They test their hypothesis in two different experimental setups, varying sample material and methodology. After establishing the most "stable" reference genes, the suitability of these genes for normalization was put on trial by investigating their ability to normalize differential expression of target genes. These results were compared to one another and to fold-changes computed from RNA-Seq. The results indicate that as stated in the title of the study, "RNA-Seq is not required to determine stable reference genes for qPCR normalization", since both approaches render similar results. Potential pitfalls when selecting genes from RNA-seq data are discussed and an integration of influencing factors is suggested.

      The key conclusions of the study are convincing and well-supported by the experiments conducted, which are realistic in terms of time and resources. Data and methods are presented articulate and are reproducible. Experiments are adequately replicated and statistical analysis is adequate. The manuscript is well written, tables and figures provided are sound and corroborate a better understanding of the presented results. Minor changes would be:

      Figure 1, 2, 3, 4, 5, 6: in the figure are uppercase letters, in the figure legend are lowercase letters, please adjust that.

      p10 line 347: I understand what is meant, with "using the NF as the reference gene", however, stating again that the combined NF of the two most stable ref genes was used here, would make it clearer. P11 line 355f: the first sentences here are negligible, as already stated elsewhere P30 line 777: The last sentence is not clear to me.

      Response:

      All minor concerns have been addressed in the revised manuscript as follows:

      1. Figure 1, 2, 3, 4, 5, 6: in the figure are uppercase letters, in the figure legend are lowercase letters, please adjust that – Has been modified
      2. p10 line 347: I understand what is meant, with "using the NF as the reference gene", however, stating again that the combined NF of the two most stable ref genes was used here, would make it clearer. – Has been modified
      3. P11 line 355f: the first sentences here are negligible, as already stated elsewhere – Have been removed
      4. P30 line 777: The last sentence is not clear to me.

        We wanted to say that our study aptly addressed the strongest hurdle in performing reliable qPCR assays, which is the choice of good reference genes. This choice is not dependent on RNA-SEQ results. We have modified this sentence for better clarity.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      The aim of this study was to demonstrate that the statistical approach to determine the best reference genes from randomly selected "standard" reference genes might be more sufficient than employing reference genes as indicated by RNA-Seq.

      In a previous study they established a qPCR data normalization workflow, after comparing several statistical approaches for the assessment of reference gene stability. In this study they apply this workflow to compare "random" reference genes with preselected references genes based on RNA-Seq data. They test their hypothesis in two different experimental setups, varying sample material and methodology. After establishing the most "stable" reference genes, the suitability of these genes for normalization was put on trial by investigating their ability to normalize differential expression of target genes. These results were compared to one another and to fold-changes computed from RNA-Seq. The results indicate that as stated in the title of the study, "RNA-Seq is not required to determine stable reference genes for qPCR normalization", since both approaches render similar results. Potential pitfalls when selecting genes from RNA-seq data are discussed and an integration of influencing factors is suggested.

      The key conclusions of the study are convincing and well-supported by the experiments conducted, which are realistic in terms of time and resources. Data and methods are presented articulate and are reproducible. Experiments are adequately replicated and statistical analysis is adequate. The manuscript is well written, tables and figures provided are sound and corroborate a better understanding of the presented results. Minor changes would be:

      Figure 1, 2, 3, 4, 5, 6: in the figure are uppercase letters, in the figure legend are lowercase letters, please adjust that.

      p10 line 347: I understand what is meant, with "using the NF as the reference gene", however, stating again that the combined NF of the two most stable ref genes was used here, would make it clearer. P11 line 355f: the first sentences here are negligible, as already stated elsewhere P30 line 777: The last sentence is not clear to me.

      Significance

      In the last years the necessity of stable reference genes for the normalization of pPCR data has become more and more apparent, since it has been shown, that selecting the genes most "popular", might not always lead to correct expression profiles, since depending on the experimental setup, significant variation can occur. Numerous studies exist, validating potential reference genes, employing several well-established statistical approaches (Genorm, Normfinder etc.) and more recently based on RNA-Seq data. RNA-Seq is definitely accompanied by more work effort and higher costs. Therefore employing the "simpler" approach, obtaining the same results might be beneficial for scientists, establishing a new qPCR protocol, in particular in times, when working cost-effectively is a prerequisite in most laboratories.

      The authors performed a thorough analysis of the two approaches compared in this study. By investigating two entirely different experimental set-ups with a similar outcome, they nicely substantiate their findings. Furthermore, by investigating differential expression of target genes, for both experimental setups, they put their results to the test, convincingly corroborating their results.

      This manuscript is well-written, experiments are thoroughly performed, the findings are convincing and it clearly is an important contribution for the scientific community.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Reference gene selection is one of the most critical steps in gene expression analysis using qPCR. The authors compared data quality using references selected based on RNA-Seq or using panel of often used reference genes. The manuscript is well prepared and easy to understand. Figures are nice and clear. I do not have major comments, but rather a few suggestions to make the manuscript more advanced. Since it is based on already available data or a few more expression measurements could be easily added, I would suggest to include total RNA factor, some rRNA and mtRNA as potential references. It will be interesting to compare their stability and effect on results of other targeted genes.

      In discussion, authors suggested that: "stable reference genes for qPCR data normalisation can be obtained from any random set of candidates provided the statistical approach of reference gene validation is sound and consistent". I do not think the word random in many sentences is appropriate. Panel of reference genes used in this study contains many known stable genes and that does not look random to me. I would rephrase these sentences. Usually panels of reference genes (for human and mouse are commercially available and contains several genes used in study) are composed of genes coding various biological processes to ensure that some of them will be stably expressed in experiments.

      Significance

      Good reference gene selection is needed for most of experiments, where quantities and qualities of samples are not identical. Unfortunately, every experiment has other stable and reliable reference genes. Validation can be time consuming and expensive. RNA-Seq experiments covering broad spectrum of biological samples are potentially a way for faster identification of unknown stable genes, which could be used for normalization in qPCR. Authors compared effectivity of reference genes selected based on RNA-Seq and using panel of potential reference genes. I like their comparison, but do not fully agree with "random" selection.

      I am not aware of other study comparing quality of qPCR references from RNA-Seq or preselected genes. I think the manuscript will be appreciated by technically or methodically oriented readers (gene expression area).

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This article contrasts RNAseq and random selection to assess reference genes for relative gene expression. The study was well contrived with a solid experimental design.

      Major comments:

      The authors state that all the RNA and contaminating DNA was validated and verified with nanodrop and BioAnalyzer which is the correct and accepted approach. However, the following concerns arise with testing reaction efficiency and data analysis:

      1. For reaction efficiency, the standard curves for each reference gene and gene of interest target should be included in the supplemental data. A four point standard curve is the bare minimum to assess reaction efficiency and raises concerns about the data quality. The unknown samples being tested should also be plotted on the corresponding standard curves to assess their efficiency.
      2. The statement starting on line 510: "The WT experimental group was omitted from this analysis as it was used as the experimental calibrator for differential expression. The mean Fold Change of the WT group is always at 1 regardless of the gene/method in question and therefore it is redundant to test for statistical significance of the WT fold change levels across different methods for each gene." indicates that data analysis was not performed in a rigorous and generally accepted manner. PLease check the analysis with that described in: https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(18)30342-1

      The generally accepted methodology for relative, normalized qPCR data analysis is well described in Figure 5 of that article. qPCR statistical analysis should be performed on the log transformed expression results also well described in that paper.

      The authors used Normfinder to assess reference gene stability. Since Normfinder uses a particular algorithm for assessing stability, it is recommended to assess stability using a combination of these "stability calculators" including: GeNorm, NormFinder and BestKeeper. This is described in Table 1 of: https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(18)30342-1. This will give a much more reliable perspective on the ranking of reference genes by their stability.

      Finally, since many currently studied targets for relative gene expression are low expressed, it would be important to also examine three deferentially expressed targets in the Cq range of 29 to 32. Yes the variability will be higher but these data will give a more realistic test of reference gene stability.

      Significance

      This article will be useful for all labs conducting gene expression experiments. It also uncovers additional contrasts between qPCR and RNA seq which are helpful in choosing the appropriate technology for given experiments.

      Referee Cross-commenting

      I agree with the other reviewers comments.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Hello, we wrote our review before seeing that you have special formatting requirements. We're just going to post our review in it's entirety rather than rewrite it based on these suggestions. It encompasses the above content, it's just not formatted in the suggested order. We hope that's OK! **Full review:** This manuscript makes a strong case for the evolvability of multicellular size via selection for settling rate in the icthyosporea. The use of an experimental evolution framework to assess the evolvability of multicellular phenotypes, using sedimentation rate as a selective pressure, extends the previous work of others into a new domain within the holozoan and the closest living relatives of animals. The natural, ecological significance of selection for sedimentation rate is a novel idea, and the connection between sedimentation rate and multicellular evolution in natural as opposed to contrived experimental circumstances is an interesting idea. The results are striking and well supported, with laboratory evolution rapidly adjusting both the cellular composition and the multicellular phenotypes of the organisms involved in ways that are well explained. This is an important result that brings the laboratory study of the evolution of multicellularity forward, into a different branch of the tree of life and showing its broad applicability. Sequencing of evolved lines adds significantly to the completeness of the story. While the causal role of these mutations in the production of the observed multicellular phenotypes are not demonstrated via manipulation or breeding, this is quite understandable in the light of the unusual model organism and the observed homologies and role of the genes involved. While this is largely clear from a reading, we believe the manuscript would benefit from a brief analysis of the numerical enrichment of genes with homologs involved in cytokinesis, cell membrane composition, and cell cycle control relative to the null hypothesis of genes picked randomly from the genome. If this is beyond the scope of this research in an unusual model organism with many poorly annotated genes, then a slightly expanded verbal discussion of the potential roles of the apparent functions of these genes in the evolution of multicellular clumping would be an appropriate substitute. We wholeheartedly recommend the publication of this manuscript with a number of minor revisions, which while not affecting the main conclusions or points of the manuscript will clarify important points, adjust small errors, and point the reader at relevant literature and concepts.

      ANSWER__: We would like to heartily thank the reviewers for their appreciation of our work. __

      **Major points:** none. **Minor points:** Line 79 - is sedimentation rate really invariably associated with multicellularization? Active swimming would seem to prevent this.

      ANSWER__: We meant to refer to the fact that all published examples of the emergence of multicellularity from unicellular ancestors have been accompanied by an increased sedimentation rate. Active swimming alone would just increase the diffusion rate of cells and not counteract the effects of increased size and density; such an active mechanism would also require directionality away from the tendency to sediment. A more passive mechanism, whereby a genetic variant, or cell cycle transition, which simultaneously causes a relative decrease in density while increasing cell size, leaving the net sedimentation rate the same as the ancestor, while conceivable, has not been observed in the literature. We changed the text from “invariably” to “frequently” at line 80 to emphasize how this is an empirical observation.__

      Line 164 - the precise phenotype in the evolution experiment being referred to is unclear without further context, with the ordering of paragraphs possibly needing a little work.

      ANSWER__: We tightened the paragraphs and merged both, the sentence containing “this phenotype” was removed.__

      Line 178 - is sorting them into three classes informative? Are there different mutations associated with these, or is it just visual clumping on the numberline? Perhaps not a useful classification, but the existence of great variation is an important point to get across. A more useful classification might be those that increase sedimentation with large density changes versus exclusively by clumping.

      ANSWER__: We agree with this argument and ultimately decided to remove the visual classification. We revised the text and figures accordingly.__

      Line 254 - excess cellular density is referred to interchangeably with density, when these are very different figures. This continues in line 269, and in the figure legends of Figure 4.

      ANSWER__: We fixed this.__

      Line 341 - the rule of RCC1 homolog in other organisms could be expanded on in slightly more detail. Similarly, other mutations in this same section known to affect cytokinesis could have potential mechanisms for affecting clumping commented upon, especially given the cell membrane results in the figures.

      ANSWER__: We share the reviewer’s enthusiasm about some of these mutations. We, however, try to be very conservative about what each gene or protein could be doing. Indeed, the absence of genetic tools does not allow us to directly test the effect of each mutation. We added a couple of extra sentences about RCC1 as well as about cytokinetic proteins and their potential role in clumping phenotypes.__

      Line 387 - awkward formatting or sentence structure, with dashes and commas.

      ANSWER__: We fixed the sentence structure.__

      Line 395 - this cellular process, or this evolutionary process of selection for faster settling?

      ANSWER__: We revised this appropriately.__

      Line 408 - per unit volume

      ANSWER__: Fixed.__

      Line 425 - the idea of clumpiness as ancestral is quickly put forward and dismissed within a single sentence. This could be explored in slightly more detail as an option, before concluding that what is clear is that the phenotype is easy to change.

      ANSWER:__ We agree that it would be interesting to pursue the ecological role and distribution of clumping and cell cycle phenotypes for other species in the Ichthyosporea genus. We could propose alternative scenarios of which trait came or went first and test this hypothesis by calculating the correlation of the presence or absence of the trait with the branch lengths and branching patterns of phylogenetic trees we have built using genome sequences. However, for our dataset, this would nonetheless remain a fragile correlation consisting of five data points. We do not feel such speculation is helpful for the text.__

      However, because two reviewers have mentioned or suggested in this direction, we expanded the discussion and annotated the tips of the species tree in figure 5 with the traits of interest. The result shows that S. gastrica, S. tapetis and S. nootkatensis species exhibit clumpiness as a trait. However, the data is not enough to resolve whether the traits are “derived” or “ancestral”.

      Line 437 - sedimentation as a highly variable trait, or a highly evolvable trait?

      ANSWER__: Evolvable trait. We fixed it in the text.__

      Figure 1G, 1H: We are fairly certain that the logarithmic scale of DNA content and coenocyte volume are mislabeled. The scale that is labeled log2 in 1G in the legend goes up by factors of 2 rather than single digits. The axis is obviously logarithmic, and the log2 in the legend is superfluous and misleading. Similarly, in 1H a scale labeled as log10 goes from 1 to 30, which on a logarithmic scale would be a sphere approximately 100 kilometers wide. The numbers can remain, but the legend should remove the log10.

      ANSWER__: Fixed. It is indeed a log scale. We made sure to remove the confusing log2 and log10 from figure and legend.__

      **General:** Were there any head to head competitions performed? Not suggesting you need to, but it's a nice way to directly examine fitness consequences of multicellularity, and is commonly done in the field. If you have done this it wasn't clear to us.

      ANSWER__: We now included a fitness experiment previously performed using the clumpy S01 and S03 in a head-to-head competition with the Ancestor (AN). The results are shown in Figure 2E and Figure 2 – figure supplement 1D. The results reflect how the fast-sedimenting clumpy phenotype is highly advantageous in our experimental evolution selection procedure, however deleterious in the absence of selection.__

      Reviewer #1 (Significance (Required)): see the above comments about writing the review before realizing there were specific formatting suggestions. I hope you understand us not wanting to re-write the review having already written it once.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): The present work adds to the growing literature on sedimentation rate as a major player in the evolution of multicellularity. Via rigorous experimentation, the authors convincingly show that they can select for increase sedimentation rate and identify two mechanisms underlying this increase: incomplete cellular separation leading to multicellular groups and increases in cellular density. They also show surprising natural variation in sedimentation and argue that, along with similar evidence from other organisms, their findings cement the likely major role of sedimentation and go farther by revealing the tight genetic control that it is under. Reviewer #3 (Significance (Required)): This is a very significant study because it illuminates processes and underlying mechanisms that could have played a major role in the transition to multicellularity. Their result will likely greatly influence the conceptual and theoretical thinking and will foster additional empirical directions. My only quibble with the manuscript is that I wished for a bit more ecological context and grounding of the main findings: in that respect, both the abstract and the last paragraph of the discussion leave me wanting and occasionally puzzled. If maintaining buoyancy is such a strong selective pressure and the variation in sedimentation rate is such a challenge to it, then I think explaining a bit more exactly why sedimentation would evolve, why so much variation would exist etc etc would be really helpful to the more naive reader. Just a bit further elaboration on selective pressures (even presumed ones and even if speculative) would be helpful to put the picture together.

      ANSWER__: We would like to thank Reviewer #3 for his/her comments. We do believe that extensive ecological context is highly relevant. Throughout the manuscript, we strived to be conservative in the way we describe both our model system and its experimental and natural settings, perhaps to a fault, but we now do offer an evolutionary model that tries to shed light into the phenotypic evolution of the various species through different routes (Fig. 5H). To elaborate more on the rationale behind this strategy, we offer the following two aspects:__

      1. we are investigating a sizeable, but still a very limited number of six Sphaeroforma Therefore, we feel that explaining what trait may be considered ancestral is speculative based on the known species tree (we revised our Discussion in this regard and update figure 5A).
      2. our knowledge about the ecological niches of Sphaeroforma species is limited. We avoid extensive speculation, and while inference of the potential ecological context is part of the scope of this study, we relied on an experimental approach to tackle our questions, rather than ecological observation or computational modeling.
      • throughout the text we aimed to avoid taking a strong stance on the “adaptiveness” of the traits which we are measuring. This is because, depending on the model specification and parameters, ecological models could be made for or against whether the cellular traits of size and density, and their effects on the higher-level trait of sedimentation rate, might be adaptive “in the wild”.

      We hope that future studies will be able to tackle any open questions on the understanding of the ecology of ichthyosporeans, hopefully benefitting from our inferred evolutionary insights in this study.

      **A more minor point:** I remember seeing a talk by Will Ratcliff a while back in which he showed that in S cerevisiae they also see the two mechanisms of increased sedimentation: increased cellular size and clumping. Yet, I didn't see a reference to that work in the context of the cell density mechanism discussion and wondered why.

      ANSWER__: We do believe to have cited the relevant papers from the Ratcliff lab. To be clear, we observed two separate physical mechanisms for fast-sedimentation: __


      1. by cell-clumping (increasing size),
      2. by increasing the number of nuclei per unit volume (increasing density).

      To our knowledge the 1st mechanisms was indeed observed in snowflake yeasts (for which we referenced all relevant studies), whereas the 2nd, which we believe might be specific to multinucleated cells, while a conceivable variable affected by mutations in the organisms from these studies, has not been measured to our knowledge. We added a new model figure (Figure5H) to hopefully better get this message across.


      Reviewer #4 (Evidence, reproducibility and clarity (Required)): In this study Dudin et al. explored the variability of sedimentation rates in members of the Sphaeroforma genus and found that sedimentation rates are very variable between different isolates as well as during the life cycle of each isolates. Following this observation Dudin et al. evolved S. arctica under a regime favoring fast settling objects. After a few hundred generations they observed that most lineages increased their sedimentation rate. Characterization of some of these evolved population suggests two distinct mechanisms allowing fast sedimentation: cluster formation by non-separation of cells post-cellularization and increase in object density. By sequencing the evolved lines Dudin et al. were able to identify that several mutations has been under the effect of positive selection and that some of the mutations relate to mechanisms involved in cell separation and cellularization.

      ANSWER__: We dearly thank Reviewer #4 for his/her time and efforts.__

      **Major comments: **

      • Line 143, I don't understand how figure 1G shows that "nuclear division cycles were periodic...".

      ANSWER__: From previous published results (Ondracka et al 2018 & Dudin et al 2021), we know that nuclear divisions in S. arctica are strictly synchronized and occur within defined time-intervals. As can be seen in Figure 1G, DNA content doubles with a constant interval of about 9 hrs. Likewise, this phenomenon is clearly depicted in Figure 4F and Figure S4H. These results combined with results shown in Figure 1F, demonstrate that division cycles are still periodic in our experimental setting and are not occurring asynchronously as no odd number of nuclei per cell was observed.__

      • When characterizing the evolved lines, the authors display (and measure?) separately the size and the sedimentation rate, but don't directly compare them. If the statement that density plays a role in the sedimentation rate of S4 and S9 but not S1, then correlation between size and sedimentation should be similar between AN and S1 and changed in S4 and S9. It would be nice to see these relationships and the correlations.

      ANSWER__: We do indeed measure the size and the sedimentation rate of each fast-settling mutant separately. This is shown in figure 1C, where sedimentation rate is plotted against cell size for our dataset and the older Smayda (1973) data. Further, both measurements, directly, feed in the estimation of cellular density in Figures 4C and S4D (explained extensively in the methods). Cellular density estimations show the correlations and relationships between S1 and AN as well as between S4 and S9. __

      • Line 288: "surviving 780 generations of passaging for all 10 isolates" what data is this referring to?

      ANSWER__: This refers to growing cultures in the lab of fast-settling mutants with tens of passages done without any selection. These growing cultures maintained their clumping phenotypes even without a constant selection, suggesting they are due to a genetic modification. We are unsure about how to answer reviewer #4 as this is the data we are mentioning. We however changed “surviving” to “persisting for”, and hope it better clarified the sentence.__

      • The weakest aspect of the paper is that there is neither a statistical argument (with a single anecdotal exception), from seeing the same genes or pathways mutated in parallel experiments, or experimental reconstruction that argues that any of the observed mutations were selected as opposed to being neutral mutations that hitch-hiked with adaptive mutations. One strongly suspect that some of the observed mutations were selected, but from the available data, it is impossible to know which were selected and which were hitch-hiking.

      ANSWER__: We agree that our draft did not elaborate in-depth if mutations were drivers versus passengers, a fact also mentioned by another reviewer. To be fair however, there are several important considerations to make.__

      First, and most importantly, we do offer an unprecedented look into the genetic underpinnings of this novel model organism, and demonstrate highly parallel phenotypic evolution in response to selection. The molecular genetic signal reflects this finding given a skewed dN/dS-ratio > 1. While the precise molecular changes are not as easy to interpret, molecular parallelism at the level of genes is not a prerequisite for directional selection in repeat lineages, especially given the complex genomic architecture of S. arctica.

      Second, while we didn’t emphasize this a lot, the results from our bioinformatic analyses are pretty unique. We are dealing with a non-standard model organism here, with highly intriguing placement in the tree of life, but with big genome size, at >140 Mbp. This is 1-2 orders of magnitude larger than that of other single-celled model systems used in evolution experiments, including E. coli or S. cerevisiae. Unlike the latter two, this organism’s genome contains extensive levels of intergenic and intronic sequence, as well as a high amount of (simple sequence) duplication. Hence, the analyses of the resequencing data were a major effort, and it took an extensive amount of time to identify the mutations.

      Third, there are no genetic tools that would allow us to either perform molecular genetics or crossing with S. arctica as of now. This will change in the future, and in this event, our comprehensive list of target genes will be hopefully valuable to the field and beyond.

      • Even if the authors knew which mutations were selected, it is not possible to say if the mutations that have been selected are directly advantageous in the settling regime, they could be due to adaptation to lab conditions and higher temperatures, etc. Having a control evolution experiment with no settling selection would be required to reach the conclusion that the mutants were selected for faster sedimentation.

      ANSWER__: We agree that a “no-selection”-control experiment would have been helpful for the molecular interpretation. But the clumping phenotype has never been observed to occur in many generations of passaging in any of the labs culturing these organisms and at different temperatures (we made sure to specify this in the text) As such, we argue that any adaptation to laboratory conditions must have happened before we conducted our selection experiment. Given that the molecular signals were unique (with one exception), we have reason to believe that the highly controlled nature of the experiment with a constant environment throughout, did at least not bias the molecular signals toward extensive genetic parallelism. __


      **Minor comments:**

      • Line 164, the authors write "this phenotype", it is unclear what phenotype is referred to as.

      ANSWER__: Fixed__

      • Line 187: the authors use the word "radius" in the text, while using "perimeter" in the figure.

      ANSWER__: Fixed__

      • Line 224: Is the use of the expression "incomplete detachment between daughter and mother cell" appropriate given that all cells emerge from a multinucleated cell?

      ANSWER__: Fixed – “incomplete detachment between cells.”__

      • Line 151, typo, the "with" should be removed.

      ANSWER__: We believe the reviewer wanted to point out the “with” in line 251, which we fixed.__

      • The intro about changes in ecology is nice but does not make sense given the rest of the paper, I would add it to the discussion.

      ANSWER__: We beg to differ with Reviewer#4 here, as the water column distribution for plankton in marine environment is one of the key aspects of our paper and is a critical parameter in models of water body ecology.__

      • Line 399 "increase their cell size by increasing cell-cell adhesion post-cellularization" the first use of "cell" is misleading because the objects are now a collection of cells rather than a single cell.

      ANSWER__: Fixed__

      Reviewer #4 (Significance (Required)): Most of the findings made in this study have been obtained in previous studies done with more genetically tractable organisms, however this is the first time that such experimental evolution was made on a unicellular non-model system organism closely related to animals. The significance of the work is reduced by the failure to produce evidence to answer two critical questions about the observed mutations: 1) were they selected during the experiment or did they hitch-hike with other selected mutations, and 2) if they were selected, were they selected because they led to faster sedimentation or some other aspect of the conditions in which they were passaged. It would take serious effort to perform additional experiments to address these questions and thus the authors are likely to be better off explaining that their work is unable to answer the questions and thus they are speculating about both the causality of the mutants and the nature of the advantage they conferred.


      ANSWER__: We beg to differ with the reviewer’s argument.__

      We believe that our study demonstrates heritable phenotypic changes for an evolvable, ecologically relevant trait, and their tight cellular regulation. We identify and carefully quantify how two cellular growth phenotypes – the nuclear division rate and cell size control –– can vary heritably and independently of one another, and together directly shape variation in a critical ecological parameter of a marine organism. Therefore, in addition to the fact that the work was performed in an emerging model marine organism, this work provides fundamental “novel” insight into cellular trait evolution more generally.

      Our results do not depend upon knowing the exact genetic mutations or molecular mechanisms which have caused these phenotypic changes. Nor, as the reviewer implies, do we claim to have identified particular mutations that were selected, or their effects on particular cellular phenotypes. We do, however, provide a large amount of evidence that the changes are likely genetic. With our sequencing effort, we find a strong, statistically significant, molecular signal of adaptation in the lineages (dN/dS > 1), and we publish a curated list of affected genes which are potentially causative for the phenotypes we observe.

      Because we did not observe frequently recurrent mutations, as most directed (and cancer, antimicrobial resistance, etc.) evolution studies find, our results suggest that there is a large mutational target size affecting the phenotype of interest, reflecting its potentially broad genetic and molecular control mechanisms. We view these results as a great strength of the study, and consider this result in and of itself “novel”. Furthermore, we have now added and __used a statistical genetic approach to quantify the heritability of traits, or what proportion of the variance in phenotype is due to an individual’s inherited state__ (Figure 1 – figure supplement 1A). The results show that Heritability exceeds 95% across phenotypes, and across the entire dataset, H exceeded 99% of the total phenotypic variance (ANOVA F = 1118 on 252 and 735 DF, p = 0). This means that for a typical individual genotype in a given environment, we could predict its average phenotypic measurement with >97% accuracy.

      The fact that we do not conclusively identify which particular mutations are causative does not obviate the overwhelming evidence that heritable changes occurred in our samples, leading to repeated phenotypic convergence affecting the trait of sedimentation rate. We believe these phenotypic changes, and our quantification of their magnitude, to be a “novel” and “significant” contribution to the literature on cellular trait evolution, ecology, and multicellularity.





    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this study Dudin et al. explored the variability of sedimentation rates in members of the Sphaeroforma genus and found that sedimentation rates are very variable between different isolates as well as during the life cycle of each isolates. Following this observation Dudin et al. evolved S. arctica under a regime favoring fast settling objects. After a few hundred generations they observed that most lineages increased their sedimentation rate. Characterization of some of these evolved population suggests two distinct mechanisms allowing fast sedimentation: cluster formation by non-separation of cells post-cellularization and increase in object density. By sequencing the evolved lines Dudin et al. were able to identify that several mutations has been under the effect of positive selection and that some of the mutations relate to mechanisms involved in cell separation and cellularization.

      Major comments:

      • Line 143, I don't understand how figure 1G shows that "nuclear division cycles were periodic...".
      • When characterizing the evolved lines, the authors display (and measure?) separately the size and the sedimentation rate, but don't directly compare them. If the statement that density plays a role in the sedimentation rate of S4 and S9 but not S1, then correlation between size and sedimentation should be similar between AN and S1 and changed in S4 and S9. It would be nice to see these relationship and the correlations.
      • Line 288: "surviving 780 generations of passaging for all 10 isolates" what data is this referring to?
      • The weakest aspect of the paper is that there is neither a statistical argument (with a single anecdotal exception), from seeing the same genes or pathways mutated in parallel experiments, or experimental reconstruction that argues that any of the observed mutations were selected as opposed to being neutral mutations that hitch-hiked with adaptive mutations. One strongly suspect that some of the observed mutations were selected, but from the available data, it is impossible to know which were selected and which were hitch-hiking.
      • Even if the authors knew which mutations were selected, it is not possible to say if the mutations that have been selected are directly advantageous in the settling regime, they could be due to adaptation to lab conditions and higher temperatures, etc. Having a control evolution experiment with no settling selection would be required to reach the conclusion that the mutants were selected for faster sedimentation.

      Minor comments:

      • Line 164, the authors write "this phenotype", it is unclear what phenotype is referred to as.
      • Line 187: the authors use the word "radius" in the text, while using "perimeter" in the figure.
      • Line 224: Is the use of the expression "incomplete detachment between daughter and mother cell" appropriate given that all cells emerge from a multinucleated cell?
      • Line 151, typo, the "with" should be removed.
      • The intro about changes in ecology is nice but does not make sense given the rest of the paper, I would add it to the discussion.
      • Line 399 "increase their cell size by increasing cell-cell adhesion post-cellularization" the first use of "cell" is misleading because the objects are now a collection of cells rather than a single cell.

      Significance

      Most of the findings made in this study have been obtained in previous studies done with more genetically tractable organisms, however this is the first time that such experimental evolution was made on a unicellular non-model system organism closely related to animals. The significance of the work is reduced by the failure to produce evidence to answer two critical questions about the observed mutations: 1) were they selected during the experiment or did they hitch-hike with other selected mutations, and 2) if they were selected, were they selected because they led to faster sedimentation or some other aspect of the conditions in which they were passaged. It would take serious effort to perform additional experiments to address these questions and thus the authors are likely to be better off explaining that their work is unable to answer the questions and thus they are speculating about both the causality of the mutants and the nature of the advantage they conferred.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The present work adds to the growing literature on sedimentation rate as a major player in the evolution of multicellularity. Via rigorous experimentation, the authors convincingly show that they can select for increase sedimentation rate and identify two mechanisms underlying this increase: incomplete cellular separation leading to multicellular groups and increases in cellular density. They also show surprising natural variation in sedimentation and argue that, along with similar evidence from other organisms, their findings cement the likely major role of sedimentation and go farther by revealing the tight genetic control that it is under.

      Significance

      This is a very significant study because it illuminates processes and underlying mechanisms that could have played a major role in the transition to multicellularity. Their result will likely greatly influence the conceptual and theoretical thinking and will foster additional empirical directions. My only quibble with the manuscript is that I wished for a bit more ecological context and grounding of the main findings: in that respect, both the abstract and the last paragraph of the discussion leave me wanting and occasionally puzzled. If maintaining buoyancy is such a strong selective pressure and the variation in sedimentation rate is such a challenge to it, then I think explaining a bit more exactly why sedimentation would evolve, why so much variation would exist etc etc would be really helpful to the more naive reader. Just a bit further elaboration on selective pressures (even presumed ones and even if speculative) would be helpful to put the picture together.

      A more minor point:

      I remember seeing a talk by Will Ratcliff a while back in which he showed that in S cerevisiae they also see the two mechanisms of increased sedimentation: increased cellular size and clumping. Yet, I didn't see a reference to that work in the context of the cell density mechanism discussion and wondered why.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Hello, we wrote our review before seeing that you have special formatting requirements. We're just going to post our review in it's entirety rather than rewrite it based on these suggestions. It encompasses the above content, it's just not formatted in the suggested order. We hope that's OK!

      Full review:

      This manuscript makes a strong case for the evolvability of multicellular size via selection for settling rate in the icthyosporea. The use of an experimental evolution framework to assess the evolvability of multicellular phenotypes, using sedimentation rate as a selective pressure, extends the previous work of others into a new domain within the holozoan and the closest living relatives of animals. The natural, ecological significance of selection for sedimentation rate is a novel idea, and the connection between sedimentation rate and multicellular evolution in natural as opposed to contrived experimental circumstances is an interesting idea. The results are striking and well supported, with laboratory evolution rapidly adjusting both the cellular composition and the multicellular phenotypes of the organisms involved in ways that are well explained. This is an important result that brings the laboratory study of the evolution of multicellularity forward, into a different branch of the tree of life and showing its broad applicability.

      Sequencing of evolved lines adds significantly to the completeness of the story. While the causal role of these mutations in the production of the observed multicellular phenotypes are not demonstrated via manipulation or breeding, this is quite understandable in the light of the unusual model organism and the observed homologies and role of the genes involved. While this is largely clear from a reading, we believe the manuscript would benefit from a brief analysis of the numerical enrichment of genes with homologs involved in cytokinesis, cell membrane composition, and cell cycle control relative to the null hypothesis of genes picked randomly from the genome. If this is beyond the scope of this research in an unusual model organism with many poorly annotated genes, then a slightly expanded verbal discussion of the potential roles of the apparent functions of these genes in the evolution of multicellular clumping would be an appropriate substitute.

      We wholeheartedly recommend the publication of this manuscript with a number of minor revisions, which while not affecting the main conclusions or points of the manuscript will clarify important points, adjust small errors, and point the reader at relevant literature and concepts.

      Major points:

      none.

      Minor points:

      Line 79 - is sedimentation rate really invariably associated with multicellularization? Active swimming would seem to prevent this.

      Line 164 - the precise phenotype in the evolution experiment being referred to is unclear without further context, with the ordering of paragraphs possibly needing a little work.

      Line 178 - is sorting them into three classes informative? Are there different mutations associated with these, or is it just visual clumping on the numberline? Perhaps not a useful classification, but the existence of great variation is an important point to get across. A more useful classification might be those that increase sedimentation with large density changes versus exclusively by clumping.

      Line 254 - excess cellular density is referred to interchangeably with density, when these are very different figures. This continues in line 269, and in the figure legends of Figure 4.

      Line 341 - the rule of RCC1 homolog in other organisms could be expanded on in slightly more detail. Similarly, other mutations in this same section known to affect cytokinesis could have potential mechanisms for affecting clumping commented upon, especially given the cell membrane results in the figures.

      Line 387 - awkward formatting or sentence structure, with dashes and commas.

      Line 395 - this cellular process, or this evolutionary process of selection for faster settling?

      Line 408 - per unit volume

      Line 425 - the idea of clumpiness as ancestral is quickly put forward and dismissed within a single sentence. This could be explored in slightly more detail as an option, before concluding that what is clear is that the phenotype is easy to change.

      Line 437 - sedimentation as a highly variable trait, or a highly evolvable trait?

      Figure 1G, 1H: We are fairly certain that the logarithmic scale of DNA content and coenocyte volume are mislabeled. The scale that is labeled log2 in 1G in the legend goes up by factors of 2 rather than single digits. The axis is obviously logarithmic, and the log2 in the legend is superfluous and misleading. Similarly, in 1H a scale labeled as log10 goes from 1 to 30, which on a logarithmic scale would be a sphere approximately 100 kilometers wide. The numbers can remain, but the legend should remove the log10.

      General:

      Were there any head to head competitions performed? Not suggesting you need to, but it's a nice way to directly examine fitness consequences of multicellularity, and is commonly done in the field. If you have done this it wasn't clear to us.

      Significance

      see the above comments about writing the review before realizing there were specific formatting suggestions. I hope you understand us not wanting to re-write the review having already written it once.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements [optional]

      We thank the reviewers for their thoughtful, constructive, and highly actionable critique. The reviewers mentioned that “the experiments presented are well-designed, the methods well-implemented, and communication of the authors' findings is clear and concise”. We are happy to hear that “figure presentation and manuscript layout are top notch and... these data are easy to read and interpret”.

      We appreciate reviewers’ suggestions in improving the interpretability of the morphodynamic representation and address each of the Reviewers’ comments (typeset in blue) in the document below.

      Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      Reviewer # 1 (major points)

      * The Trajectory Feature Vectors (TFVs) are averaged over time - this seems to lose a lot of the salient information in the trajectories themselves, resulting in the low(ish) accuracy of the GMM. Could a Hidden Markov Model trained on the trajectories in state space help to identify/classify those trajectories that change their morphology/motion over time?

      Thanks for the suggestion. We did recognize that averaging will smooth the dynamics in each cell trajectory and reduce diversity of phenotypes. On the other hand, the temporal smoothing serves to reduce the noise, especially when the cells have reached steady state dynamics after being stimulated with pro- or anti-inflammatory cytokines. Our experiments were constructed to probe steady state dynamics and therefore we opted to use temporal smoothing.

      It is possible to identify rare transitions even with some temporal smoothing.

      In our analysis of rare transitions (Fig. 4C), we extracted long trajectories and split them into segments (10~15 frames, 1.5~2 hours). By applying Gaussian Mixture Model (GMM) to each segment, we identified a sequence of states along the full trajectory, from which state transitions were identified.

      During the revision, we will employ the Hidden Markov Model (HMM) to model state transitions in the latent shape space as suggested by the reviewer to detect rare transitions. Our expectation is that HMM will be able to identify more transition events due to its higher time resolution (frame instead of segment), though it may also be affected by unexpected imaging artifacts and noise.

      Reviewer # 1 (minor points)

      Could the authors provide some example images showing interpolation of each PC using the generative decoder?

      Thanks for the suggestion, however the discrete nature of the latent codebook of VQ-VAE makes it challenging to use interpolation as a proxy for utility of interpolation. A possible link between interpolation abilities and usefulness of representation learned by autoencoders has been explored in this paper by Berthelot et al. As Berthelot et al. note, “We perform interpolation in the VQ-VAE by interpolating continuous latents, mapping them to their nearest codebook entries, and decoding the result. Assuming a sufficiently large codebook, a semantically “smooth” interpolation may be possible. On the lines task, we found that this procedure produced poor interpolations. Ultimately, many entries of the codebook were mapped to unrealistic datapoints, and the interpolations resembled those of the baseline autoencoder.”

      Reviewer # 2 (major points)

      -It's unclear what the effect of speed is on the final state determination. TFVs were composed of auto-encoder-based features (PCs from latent space) and speed of the cells. Would the states be very different without speed as part of the TFVs or with TFVs consisting only of speed features? Please quantify and discuss.

      Thanks for your comment. We agree that speed of the cell is a main factor that contributes to the clustering, though shape features (from VQ-VAE) do contribute (Fig. 3B, histograms) to discrimination of cell states. In the revision, we will perform the clustering analysis with only shape features and compare with current results of Fig. 4.

      Reviewer # 3 (major points)

      1. Temporal consistency regularization

      In the authors' framework, models are regularized to minimize the l2 norm between embeddings of adjacent timepoints.

      This is approach is conceptually well-motivated, but could have some unintended effects.

      For instance, some cells may make a rapid state transition such that state(t-1) = A, state(t) = B, state(t+1) = A'.

      In these cases, a regularized model may best minimize the joint loss by returning an embedding at time t that interpolates between state A and A', rather than returning an embedding that reflects the true distinct state B.

      The work would be strengthened if the authors analyzed the impact of this regularization term on the detection of rapid state transitions that occur for only a few frames (e.g. when cells that exhibit filopodial motility "jump" in an actin/myosin contraction).

      This might be accomplished through experiments scanning different regularization hyperparameters on some of the authors' real data, fitting models on temporally downsampled versions of the real data where "slow" multi-timestep transitions now occur in a few timesteps, or perhaps using simulations where rapid state transitions are known to occur.

      Even if the regularization does have some negative impacts, it does not argue against the utility of the general approach, but it is important for users to understand the constraints on downstream applications.

      In our revision, we will evaluate the optimal matching loss for our dataset by training the model with a series of temporal matching loss weights. With this computational experiment, we will illustrate the trade-offs introduced by the relative strengths of matching and reconstruction losses.

      Our expectation is that with very high matching loss, the embeddings (latent vectors) of the frames of the same trajectory will collapse regardless of morphology. For, a relatively wide range of matching loss weights, rank relations between transition pairs ([A->B] + [B->A'] >> [A->A']) should be preserved, from which the rare transitions can be robustly identified. In our experiments, most cells reached steady state morphodynamics when imaged, i.e., the matching loss between two adjacent frames arises primarily due to variations in background/noise. Fast transitions are “rare” in our data. Numerically, fast transitions contribute less to the matching loss during training and therefore their latent representations are not minimized. In other words, if B is a morphologically different state from A/A', the model is driven more by the reconstruction loss due to morphological difference rather than temporal smoothness across three consecutive frames.

      Baseline comparisons

      The authors evaluate their method by assessing the correlation of embedding PCs with heuristic features (Fig. 2C,D + supp.), variation of embedding PCs across cell treatment groups (Fig. 3), and qualitative interpretation of embedding trajectories.

      In the supplement, the authors compare their VQ-VAE approach to VAEs and AAEs and chose to use a VQ-VAE based on lower reconstruction error and higher PC/heuristic feature correlation.

      However, the authors do not compare their method to much simpler baseline approaches to this problem.

      Existing literature suggests that heuristic features of cell shape and motion (similar to those the authors use to evaluate the relevance of their embeddings) are sufficient to perform many of the same tasks a VQ-VAE is used for in this work.

      For instance, in Fig. 3 it appears that a simple analysis of cell centroid speed recovers much of same information as the complex VQ-VAE embeddings.

      In Fig. 2 - Supp. 6, it appears that after regressing out many heuristic features of cell geometry, the latent space largely explains cell non-autonomous information about the background environment, suggesting the heuristic features are largely sufficient.

      To demonstrate the usefulness of their deep modeling approach relative to simple baselines, the authors should compare against existing heuristics and embeddings of heuristics (e.g. PCA) using some of the tasks shown for the VQ-VAE (recovery of perturbation state, state transition detection, qualitative trajectory analysis, discrimination of cell types).

      Heuristics might include those already calculated here, or a more comprehensive set as cited in the Introduction.

      The authors may also consider comparing against baselines that don't include time information for some of their tasks (e.g. recovery of perturbation state could arguably be achieved with CNNs either ignorant of the timestep with simple temporal conditioning, not including trajectory information).

      If these features are sufficient for many of the same tasks performed in this work, the authors should provide a clear argument for readers as to why the unsupervised VQ-VAE approach may be preferable (e.g. ability to recover potentially unknown cell changes, for which no heuristic exists).

      The VQ-VAE doesn't need to be superior along every axis to hold merit, but the work would be strengthened if the authors could show clear superiority along some dimension.

      Thanks for your comments. We agree that through our exploration, specific heuristic features are found to be correlated with latent shape features. We did not start with heuristic features, but instead identified them after observing how cell morphology changes along the principal components of the latent shape space. Discovering the heuristic shape features that describe the variation in shape space, in our view, reinforces the value of self-supervised learning of complex cellular morphologies.

      We’d argue that the dynamorph pipeline complements heuristic approaches: it enables discovery of cell states through unbiased encoding and clustering, and the correlation of learned features with heuristic features enables interpretation of the cell state/data distribution more quantitatively than using either approach in isolation. Our argument is further reinforced by the related work (e.g., Zaritsky et al. and others mentioned in the introduction) on self-supervised learning of cell shape and interpretation of its latent space.

      More specifically, self-supervised learning with temporal matching generates unbiased and smooth encodings for cell morphologies, from which we identified the rank correlations between top PCs and certain geometric properties. However, this does not indicate that the set of heuristics chosen a priori will be equally descriptive of the shape distribution. For example, optical density of cells (phase) is a heuristic feature that has not been used in previous studies, which we recognized after sampling the PCs of shape space. Further identification of such correlations is by itself an interesting discovery enabled by self-supervised learning.

      In the current manuscript, we compared learned latent features (PCA on VQ-VAE latent embeddings) against a simple baseline (top PCs of raw images) and showed superior performances, which already illustrate the advantage of self-supervised learning in denoising data and extracting key diversities. In the revision, we will compare PCs of multiple heuristic features (e.g., cell size) with latent features to further strengthen the above point.

      Reviewer # 3 (minor points)

      For Fig. 4 - supp 1 -- isn't it expected that the GMM cluster of a vector can be predicted from the vector? The GMM clusters were derived from the vectors to begin with, so this seems like a bit of a circular analysis. If I'm missing something, this figure might benefit from more exposition.

      Thanks for your question. The original purpose of having this confusion matrix is to parallel Fig. 3 - supp 2, showing that GMM generated distinct cell states that describe population better than perturbation conditions. The confusion matrix itself is trivial, so we will evaluate how to make this point more precisely during the revision.

      For Fig. 4 - Supp 3, the authors should consider changing the "state" and "cluster" colors on the embedding projections so that they do not match. As presented, it appears as if the states and clusters were co-assayed and linked by some experimental label, when in fact the State 1::Cluster1, State 2::Cluster 2 relationship is just inferred.

      Thanks for your comment, we will change the color scheme for Fig. 4 - supp 3 to avoid confusion in the revision.

      * Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      Reviewer # 1 (major points)

      * The temporal matching to enforce a smooth latent space representation is interesting. The authors mention that they mask out surrounding cells with a median pixel value. Have the authors considered using a pixel weighting in the reconstruction/matching loss to differentiate foreground/background? Also, does this affect detection of any fast (or indeed rare) transitions in the trajectories?

      Thanks for your comment and question. Yes, we indeed incorporated a pixel weighting strategy during training. In addition to masking out surrounding cells, we used a smoothed and enlarged version of individual cell's segmentation mask to emphasize accurate reconstruction of the center cell in each patch, and reduce the influence of the surrounding cells/artifacts/background fluctuations. Matching loss is computed from latent vectors, which will be indirectly affected by the pixel weighting as well.

      More detailed description of the weighting strategy will be added to the methods section. The code for our weighting strategy can be found at: https://github.com/czbiohub/dynamorph/blob/b3321f4368002707fbe39d727bc5c23bd5e7e199/HiddenStateExtractor/vq_vae_supp.py#L287

      Reviewer # 1 (minor points)

      I was a little confused by the labels given to the PCs, as they seem to vary between figures. For example, In Fig2, PC1 and PC2 are Size and Peak Retardance, but in Fig3 they are referred to as Size and Cell Density (which could be interpreted as the number of cells per unit area). Could the authors clarify these in the captions?

      We have clarified the text to distinguish between cell density (population) and optical density (phase).

      The authors note that single-cell tracking is of vital importance. This should be elaborated upon. Also - could the VQ-VAE encodings be used to help track linking in cases of high density?

      We added a clearer reference to the methods section containing details of the tracking procedure. Additionally, we clarified in the discussion that the methods used for segmentation and tracking cells can be refined for high density cultures. Since we rely on the tracks to compute the temporal matching loss and regularize the VQ-VAE encodings (shape space) during the training, the encodings are not useable for refining tracking in high density population.

      Reviewer # 2 (major points)

      -'Cell state' in the field of cell biology has been operationally defined in so many different ways and with so many different types of measurement data, that 'cell state' is becoming a somewhat vacuous term. This is not only a problem of this paper but a challenge for the field. In this case, clustering of cells using a Gaussian mixture model that uses the first few principal components of the latent space coefficients as well speed - both averaged across the frames of cell tracks. This is fine and descriptive, but it's unclear whether this definition of 'cell state' is easily applied to other datasets and how this definition can be operationalized for hypothesis generation and experimentation. For other datasets, e.g. other cell types and other processes, such as differentiation, where e.g. tracking and segmentation may be more difficult and images would look quite different, can one still apply the same approach towards describing cell states? One could state that this definition of cell state is very specific to the dataset and therefore not generally useful. How would the authors respond to such a statement?

      This is an excellent point. We agree that the meaning of a “cell state” or a “cell type” can depend on the context. Cell state can be rigorously described in terms of measurements of the cells, and recent developments of new cell probing techniques, including imaging modalities and single-cell genomics keep adding to the growing list of the features that can be measured. Time-lapse imaging is high dimensional and therefore admits multiple definitions of cell state. Our use of the terms ‘latent shape space’ and ‘trajectory feature vectors’ clarifies how we define the cell state. Given the increasingly wider use of live cell imaging for biological studies and drug discovery, both of these descriptors of cell state are valuable. In the current manuscript, we focus on a combination of morphodynamic features, including but not limited to the cell shape, size, and speed. We use these features to cluster cells in an unbiased manner to detect morpho-dynamic “states” unique for this particular culture system. Our approach can be generalized to other cell culture systems, such as cell differentiation, where cell architecture evolves substantially.

      To clarify this point, we add the following text in the manuscript:

      Line 85: “The meaning of a "cell state" can vary with the physiological and methodological context. In this work, we refer to "morphodynamic states" as a combination of morphological and temporal features. From the trajectory of cells in the latent shape space, we identified transitions among morphodynamic states of single cells. The same approach enabled detection of transitions in the morphodynamic states of cells as a result of immunogeneic perturbations.

      In the discussion:

      Line 333: “ Our work formalizes an analytical approach for data-driven discovery of morphodynamic cell states based on the quantitative shape and motion descriptors. A cell state can be rigorously described in terms of measurements of the cells, and recent developments in measurement techniques, including imaging modalities and single-cell genomics keep adding to the growing list of the features that can be measured. Time-lapse imaging is high dimensional and therefore admits multiple definitions of a cell state.”

      -It's unclear to the reviewer whether the training data (unperturbed microglia) are close enough to the test data (perturbed microglia) such that application of the trained model to the test data makes sense. The authors provide reconstruction loss numbers, but they are difficult to interpret. Can the authors create plots of the unperturbed microglia cells and unperturbed microglia cells in the latent space and show overlap, or in other ways, show that training data and test data are close enough for this application.

      Thank you for pointing out the lack of clarity in generalizability of the model. We trained the model on control, untreated microglia acquired during one experiment, and then applied it to a separate dataset acquired during another experiment that included perturbed and control microglia. The reconstructions shown in Fig. 2 are from the test dataset that was not used during training. The quality of reconstructions supports that the shape space of the training set is representative of the shape space of the larger test set. We will add a density plot in the supplementary figures showing the overlapping latent space distribution of unperturbed (training dataset) and perturbed (test dataset) microglia.

      We now include the revised sentence in the manuscript to clarify the results:

      Line 132: “Comparison of reconstructed shapes from the test set and training set along with the analysis of the shape space described in the next section show that our self-supervised model trained on training dataset generalized well between independent experiments and can be used to compare cell state changes between control microglia and cells treated with multiple perturbations”.

      -Only a small amount of intensity variation is explained; 17% using the first 4 PC components which are mainly used in the analyses. This seems like a very low number. There is a lot of variation in the intensity images that is not explained by the autoencoder. The autoencoder seems to be doing a bad job. At the same time, the downstream analyses using the latent space are insightful and sensible. Can the authors provide more explanation?

      Thanks for your question. We would like to first clarify that the autoencoder (VQ-VAE) used in this work follows the design of the original reference, which doesn't have a very large compression. Given the latent space size (16x16x16), it is understandable that the 4 top PCs captured relatively smaller portions of the variance. The fact that cell shape cannot be described with few principal components is likely due to: a) diversity of morphology of microglia, b) diversity of modalities used to train the model.

      We now include the following text in the manuscript: Line 158: “The high variance of the shape space of microglia can be due to more complex shapes of microglia, such as diversity of protrusions, sub-cellular structures and variations in cell optical density, location of nuclei in migrating cells, etc. As we mentioned above, the inclusion of several imaging channels (brightfield, phase, and retardance) increases the performance of the model, possibly by increasing the diversity of morphological information encoded in our input data.”

      As you note, the downstream analyses from the learned latent space are insightful, e.g., we do detect substantial changes in top PCs upon perturbations. This supports our view that the shape space of microglia as encoded by our data is intrinsically high dimensional and the transients in the shape space are informative.

      Reviewer # 2 (minor points)

      -The motivation for GMMs over k-means is unclear. K-means clustering leads to spatial separation between clusters (states) since all cells/tracks that closest to their cluster mean are per definition further away from the means of other clusters. This is not the case with the more flexible GMMs; e.g. they allow one to have a smaller cluster (with small variance components) inside of a larger cluster (with large variance). The latter scenario seems undesirable for interpretation in terms of states.

      Thanks for your comments. The major reason for choosing GMMs over K-means clustering is that GMM allows different prior distributions for different perturbations. In practice, K-means would be capable of generating clusters regardless of perturbation conditions, while GMM enables a finer separation of states which are very likely correlated with perturbations. We agree that GMM has certain caveats as you mentioned in the comment. In our analyses, we didn’t notice the issues such as ‘nesting of components’ that you described.

      -Related to the previous point, 'self-supervised' sounds nice, but it's still optimizing towards something, in this case explaining the variation in input intensity images. A lot of the variation in the intensity images may not be of interest for the biological investigation of shape and dynamics. Did the authors uncover that indeed some of the latent dimensions are encoding other aspects of the images which may be less related to the biology and more to image properties/artifacts/biases?

      We agree with your assessment. Precisely for the reasons you point out, we counter the dependence of learned representation on non-biological variations in data using temporal regularization. This point is recognized by the reviewer #3. We clarify this concept. We clarify that not all the latent features represent biology of the cells and some represent the features of the instrument and the experiment. We report this for the top few PCs of latent representation and provide the code for the interested reader to discover what other PCs report.

      -The original images are 3D (5 z-planes). The analyzed images were 2D. The reviewer missed how the authors went from 3D to 2D. And since cells are 3D, can the authors describe what they gained by going to 2D and what they potentially lost?

      We added additional text to the methods subsection describing the Dynamorph Pipeline (line 590):

      “The input data for both semantic segmentation and VQ-VAE models are 2D-images of computed phase and retardance that measure integrated optical density and anisotropy across the depth of the cell. The raw collected data is 3-dimensional (5 z-slices acquired in multiple polarization channels). The 2D phase is computed from the full stack of brightfield images via deconvolution. The retardance is computed from an average of the intensities across the 5 z-slices. Subsequent model training is more tractable with 2D data instead of 3D, while capturing the cell architecture across the depth.”

      Reviewer # 3 (major points)

      Cell state transition interpretation

      In line 278, the authors propose that the unbalanced nature of transitions such that p(1 -> 2) >> p(2 -> 1) must represent some difference in timescales across the transitions because "cell states should have reached equilibrium after several days in culture at the time of the imaging experiments".

      This logic is unclear to me for two reasons.

      * If the population obeys detailed balance (e.g. transitions have equal frequency), then observed transitions should be balanced on a reasonably long time window, even if individual transitions occur on different timescales.

      * The assumption that cell states are balanced after a few days in culture is at odds with a few different aspects of the biology. Cell density and nutrient availability are continually changing in the dish, so culture conditions are non-stationary. Imaging apparatuses also commonly impact the cell biology of imaged samples due to imperfect incubation, etc. (2 or 3)

      It seems likelier that these data represent an unbalanced transition due to the non-stationary nature of the culture system.

      Given the authors' emphasis on the value of measuring these transitions, the work would be strengthened by a more careful interpretation of these results, additional analysis details (e.g. how large are most state transitions? are these mostly small shifts "over the border" in state space, or large jumps?), and an attempt at biological interpretation of the observed phenomenon.

      The authors' RNA-seq data may be helpful in this latter regard.

      This is an excellent point. We agree that the cell culture conditions, including nutrient availability, accumulating presence of metabolites and imagine-induced changes constantly introduce new variations to the system. In an attempt to mitigate these dynamic changes to the system, we maintained cells in culture for six days before starting the experiment. To avoid cell stimulation due to freshly added nutrients and growth factors from the culture media, we consistently exchanged the media and performed cytokine treatments 24 hours before each imaging experiment. Each imaging round was started after the cells were allowed to equilibrate to the environmental chamber for at least one hour before imaging. Despite these efforts, we agree with the reviewer that the conditions cannot be considered fully stationary. We removed the sentence “ Given that cell states should have reached equilibrium after several days in culture at the time of the imaging experiments, these results suggest that the transitions from state 2 to state 1 occur at a different time scale (i.e., much slower)” and changed the text to reflect this point:

      Line 294:

      “In our analysis, transition events are very rare among cells treated with IFN beta, while the most frequent cell transitions were observed among cells treated with GBM supernatant. One possible explanation for this imbalance is that IFN-treated cells represent a single polarization axis, while a heterogeneous cell signaling milieu derived from cancer cells provides conflicting pro- and anti-inflammatory signals, instructing cells to transition between the states. While both directions of transitions were observed within the imaging period, cells in state-1 are more likely to transition to state-2 than vice versa within the chosen time frame. This imbalance between the rates of state transitions correlates with the higher state-2/state-1 ratio in GBM and control environment and may explain the longitudinal accumulation of cells in a more activated state under these culture conditions.”

      1. Single cell RNA-seq analysis

      The authors performed a very interesting experiment where they profiled the same cell population using both timelapse imaging and single cell RNA-seq.

      The authors argue that the global structure of the state space resolved by each modality is analogous, but this seems a bit of a stretch to me.

      The behavior state space is unimodal (bifurcated into two states by GMM clustering), while the mRNA-seq space has several distinct clusters.

      The argument that these states are analogous would be significantly strengthened by biological interpretation of the RNA-seq data.

      Do the mRNA profiles exhibit differentially expressed genes that might explain differences in behavior in the cell behavior states?

      The analyses in Fig. 4 - Supp 4 are suggestive that "State 1" contains interferon-responsive cells and not control cells, but broader conclusions don't appear well supported by current analyses.

      We agree with the reviewer’s comment that the analogy between molecular cell states defined with scRNAseq analysis and morphodynamic cell states defined with dynamorph needs to be clarified. In our current work, the correlative measurement of morphodynamics and transcriptome was exploratory and relied on population statistics measured with each modality. More detailed studies linking morphodynamic states to the single cell transcriptomics, such as Patch-Seq or laser microdissection, are needed to decisively link morphodynamics and molecular programs underlying these phenotypes.

      Single cell transcriptomics simultaneously measures thousands of mRNA species in individual cells. Therefore, it can provide a nuanced interpretation for the molecular states of each population, as can be seen at a more granular separation of sub-states in scRNAseq clustering. For example, Cluster 1-2 was defined by high expression of interferon response genes, and predictably, this cluster was primarily derived from the cells treated with IFNb. Interferon exposure induces morphological changes associated with increased cell perimeter, which reports ramification of microglia plasma membrane (Aw et al., PMID: 33183319). It was also shown that infections with neurotropic viruses, leading to interferon response, also leads to decreased velocity and distance traveled for cultured microglia cells (Fekete et al., PMID: 30027450). These observations are in direct agreement with our morphodynamic analysis demonstrating a higher proportion of cells in State 1, characterized by lower cell velocity. Interestingly, scRNAseq analysis also identified a population of cells with high expression of cell cycle genes (Cluster 1-3), which would also be predicted to have a slower speed and potentially larger cell body. These results point to the fact that different molecular states may be underlying very similar morphodynamic states.

      We now provide a revised statement to reflect the above.

      Line 290: “We further compared the detected morphodynamic states with scRNA measurements of the same cell populations. Interestingly, the separation of cells in state-1 and state-2 from control and IFN group parallels the clusters identified with cell transcriptome, suggesting that correlative analysis of gene expression and morphodynamics can reveal molecular programs underlying these phenotypes. In our preliminary analysis, scRNAseq revealed a greater degree of granularity in each of the cell populations, such as cluster 1 of the scRNAseq separating into three additional subclusters. Cluster 1-2 was defined by high expression of interferon response genes, and predictably, this cluster was primarily derived from the cells treated with IFNb. Interferon exposure induces morphological changes associated with increased cell perimeter, which reports ramification of microglia membrane (Aw et al., 2020). It was also shown that infections with neurotropic viruses, leading to interferon response, also leads to decreased velocity and distance traveled for cultured microglia cells (Fekete et al., 2018). These observations are in direct agreement with the higher proportion of cells in State 1, characterized by lower cell velocity. Interestingly, scRNAseq analysis also identified a population of cells with high expression of cell cycle genes (Cluster 1-3), which would also be predicted to have a slower speed and potentially larger cell body. These results point to the fact that different molecular states may be underlying very similar morphodynamic states. Correlative single-cell measurements of morphodynamic states and single cell transcriptomics, such as Patch-Seq or laser microdissection, are needed to decisively link morphodynamics and molecular programs underlying these phenotypes.”

      Reviewer # 3 (minor points)

      1. Check grammar. Some articles are missing and some subject-verb agreements are mismatched. e.g. line 624 "we regularized [the] latent space", line 713 "after both loss[es] achieved".

      Thanks for pointing this out, we have thoroughly checked grammar and typos in this submission.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      Here, the authors present Dynamorph, an unsupervised learning framework for timelapse cell microscopy data built on VQ-VAEs.

      The authors apply this method to the analysis of microglial cell behavior under a series of perturbation conditions.

      Methodologically, the primary contributions of this work are the introduction of a temporal consistency regularization penalty on the latent space of a VQ-VAE model for application to timeseries data and the introduction of a "temporal feature vector"-ization procedure to summarize complex temporal trajectories in a single low-dimensional vector for analysis. Biologically, the primary contributions are the demonstration that microglial responses to different perturbogens and dynamics state transitions can be resolved by transmitted light microscopy.

      Overall, the experiments presented are well-designed, the methods well-implemented, and communication of the authors' findings is clear and concise.

      However, there are unaddressed potential caveats to the proposed framework and the manuscript fails to compare the proposed method to any existing baselines, such that the particular strengths and weaknesses of the method are unclear to readers.

      Major Points

      1. Temporal consistency regularization

      In the authors' framework, models are regularized to minimize the l2 norm between embeddings of adjacent timepoints. This is approach is conceptually well-motivated, but could have some unintended effects.

      For instance, some cells may make a rapid state transition such that state(t-1) = A, state(t) = B, state(t+1) = A'. In these cases, a regularized model may best minimize the joint loss by returning an embedding at time t that interpolates between state A and A', rather than returning an embedding that reflects the true distinct state B.

      The work would be strengthened if the authors analyzed the impact of this regularization term on the detection of rapid state transitions that occur for only a few frames (e.g. when cells that exhibit filopodial motility "jump" in an actin/myosin contraction). This might be accomplished through experiments scanning different regularization hyperparameters on some of the authors' real data, fitting models on temporally downsampled versions of the real data where "slow" multi-timestep transitions now occur in a few timesteps, or perhaps using simulations where rapid state transitions are known to occur.

      Even if the regularization does have some negative impacts, it does not argue against the utility of the general approach, but it is important for users to understand the constraints on downstream applications.

      1. Baseline comparisons

      The authors evaluate their method by assessing the correlation of embedding PCs with heuristic features (Fig. 2C,D + supp.), variation of embedding PCs across cell treatment groups (Fig. 3), and qualitative interpretation of embedding trajectories. In the supplement, the authors compare their VQ-VAE approach to VAEs and AAEs and chose to use a VQ-VAE based on lower reconstruction error and higher PC/heuristic feature correlation.

      However, the authors do not compare their method to much simpler baseline approaches to this problem. Existing literature suggests that heuristic features of cell shape and motion (similar to those the authors use to evaluate the relevance of their embeddings) are sufficient to perform many of the same tasks a VQ-VAE is used for in this work. For instance, in Fig. 3 it appears that a simple analysis of cell centroid speed recovers much of same information as the complex VQ-VAE embeddings. In Fig. 2 - Supp. 6, it appears that after regressing out many heuristic features of cell geometry, the latent space largely explains cell non-autonomous information about the background environment, suggesting the heuristic features are largely sufficient.

      To demonstrate the usefulness of their deep modeling approach relative to simple baselines, the authors should compare against existing heuristics and embeddings of heuristics (e.g. PCA) using some of the tasks shown for the VQ-VAE (recovery of perturbation state, state transition detection, qualitative trajectory analysis, discrimination of cell types). Heuristics might include those already calculated here, or a more comprehensive set as cited in the Introduction. The authors may also consider comparing against baselines that don't include time information for some of their tasks (e.g. recovery of perturbation state could arguably be achieved with CNNs either ignorant of the timestep with simple temporal conditioning, not including trajectory information).

      If these features are sufficient for many of the same tasks performed in this work, the authors should provide a clear argument for readers as to why the unsupervised VQ-VAE approach may be preferable (e.g. ability to recover potentially unknown cell changes, for which no heuristic exists). The VQ-VAE doesn't need to be superior along every axis to hold merit, but the work would be strengthened if the authors could show clear superiority along some dimension.

      1. Cell state transition interpretation

      In line 278, the authors propose that the unbalanced nature of transitions such that p(1 -> 2) >> p(2 -> 1) must represent some difference in timescales across the transitions because "cell states should have reached equilibrium after several days in culture at the time of the imaging experiments". This logic is unclear to me for two reasons.

      • If the population obeys detailed balance (e.g. transitions have equal frequency), then observed transitions should be balanced on a reasonably long time window, even if individual transitions occur on different timescales.
      • The assumption that cell states are balanced after a few days in culture is at odds with a few different aspects of the biology. Cell density and nutrient availability are continually changing in the dish, so culture conditions are non-stationary. Imaging apparatuses also commonly impact the cell biology of imaged samples due to imperfect incubation, etc.

      It seems likelier that these data represent an unbalanced transition due to the non-stationary nature of the culture system. Given the authors' emphasis on the value of measuring these transitions, the work would be strengthened by a more careful interpretation of these results, additional analysis details (e.g. how large are most state transitions? are these mostly small shifts "over the border" in state space, or large jumps?), and an attempt at biological interpretation of the observed phenomenon. The authors' RNA-seq data may be helpful in this latter regard.

      1. Single cell RNA-seq analysis

      The authors performed a very interesting experiment where they profiled the same cell population using both timelapse imaging and single cell RNA-seq. The authors argue that the global structure of the state space resolved by each modality is analogous, but this seems a bit of a stretch to me. The behavior state space is unimodal (bifurcated into two states by GMM clustering), while the mRNA-seq space has several distinct clusters.

      The argument that these states are analogous would be significantly strengthened by biological interpretation of the RNA-seq data. Do the mRNA profiles exhibit differentially expressed genes that might explain differences in behavior in the cell behavior states? The analyses in Fig. 4 - Supp 4 are suggestive that "State 1" contains interferon-responsive cells and not control cells, but broader conclusions don't appear well supported by current analyses.

      Minor Points

      1. Check grammar. Some articles are missing and some subject-verb agreements are mismatched. e.g. line 624 "we regularized [the] latent space", line 713 "after both loss[es] achieved".
      2. For Fig. 4 - supp 1 -- isn't it expected that the GMM cluster of a vector can be predicted from the vector? The GMM clusters were derived from the vectors to begin with, so this seems like a bit of a circular analysis. If I'm missing something, this figure might benefit from more exposition.
      3. For Fig. 4 - Supp 3, the authors should consider changing the "state" and "cluster" colors on the embedding projections so that they do not match. As presented, it appears as if the states and clusters were co-assayed and linked by some experimental label, when in fact the State 1::Cluster1, State 2::Cluster 2 relationship is just inferred.

      Positive comments

      1. Figure presentation and manuscript layout are top notch. Thanks to the authors for making these data easy to read and interpret.

      Significance

      See above.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Please place your comments about significance in section 2.

      -The authors describe Dynamorph; a deep-learning based autoencoder to represent - in an interpretable latent space - live cell microscopy image data of motile microglia in unperturbed and perturbed situations. Using Dynamorph, the authors identify and describe 'morphodynamic' states of the microglia.

      Major comments:

      Are the key conclusions convincing?

      -Yes, the methodology, observations and conclusions are clearly explained and convincing.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      -'Cell state' in the field of cell biology has been operationally defined in so many different ways and with so many different types of measurement data, that 'cell state' is becoming a somewhat vacuous term. This is not only a problem of this paper but a challenge for the field. In this case, clustering of cells using a Gaussian mixture model that uses the first few principal components of the latent space coefficients as well speed - both averaged across the frames of cell tracks. This is fine and descriptive, but it's unclear whether this definition of 'cell state' is easily applied to other datasets and how this definition can be operationalized for hypothesis generation and experimentation. For other datasets, e.g. other cell types and other processes, such as differentiation, where e.g. tracking and segmentation may be more difficult and images would look quite different, can one still apply the same approach towards describing cell states? One could state that this definition of cell state is very specific to the dataset and therefore not generally useful. How would the authors respond to such a statement?

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary to evaluate the paper as it is, and do not ask authors to open new lines of experimentation.

      -It's unclear to the reviewer whether the training data (unperturbed microglia) are close enough to the test data (perturbed microglia) such that application of the trained model to the test data makes sense. The authors provide reconstruction loss numbers, but they are difficult to interpret. Can the authors create plots of the unperturbed microglia cells and unperturbed microglia cells in the latent space and show overlap, or in other ways, show that training data and test data are close enough for this application.

      -It's unclear what the effect of speed is on the final state determination. TFVs were composed of auto-encoder-based features (PCs from latent space) and speed of the cells. Would the states be very different without speed as part of the TFVs or with TFVs consisting only of speed features? Please quantify and discuss. -Only a small amount of intensity variation is explained; 17% using the first 4 PC components which are mainly used in the analyses. This seems like a very low number. There is a lot of variation in the intensity images that is not explained by the autoencoder. The autoencoder seems to be doing a bad job. At the same time, the downstream analyses using the latent space are insightful and sensible. Can the authors provide more explanation?

      -Related to the previous point, 'self-supervised' sounds nice, but it's still optimizing towards something, in this case explaining the variation in input intensity images. A lot of the variation in the intensity images may not be of interest for the biological investigation of shape and dynamics. Did the authors uncover that indeed some of the latent dimensions are encoding other aspects of the images which may be less related to the biology and more to image properties/artifacts/biases? Are the suggested experiments realistic for the authors? It would help if you could add an estimated cost and time investment for substantial experiments. -These are computational experiments based on already existing data/results/code. It should be relatively straightforward to do these additional computational experiments. Careful analysis and interpretation require time.

      Are the data and the methods presented in such a way that they can be reproduced? -The methods are described with sufficient detail.The complicated experimental and computational processes seem reproducible to a decent extent. The code is captured in Github repos. The reviewer did not attempt to reproduce computational results. The reviewer did not check whether the available data meets FAIR requirements. Are the experiments adequately replicated and statistical analysis adequate?

      -Yes, and there is lots of useful supplementary material which helps with interpretation of the results. Minor comments: Specific experimental issues that are easily addressable. -The motivation for GMMs over k-means is unclear. K-means clustering leads to spatial separation between clusters (states) since all cells/tracks that closest to their cluster mean are per definition further away from the means of other clusters. This is not the case with the more flexible GMMs; e.g. they allow one to have a smaller cluster (with small variance components) inside of a larger cluster (with large variance). The latter scenario seems undesirable for interpretation in terms of states.

      -The original images are 3D (5 z-planes). The analyzed images were 2D. The reviewer missed how the authors went from 3D to 2D. And since cells are 3D, can the authors describe what they gained by going to 2D and what they potentially lost? Are prior studies referenced appropriately?

      -Yes, citations are amply and relevant. Are the text and figures clear and accurate?

      -Yes, the figures are informative. Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      -No specific suggestions

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      -This is a technological/computational advance using a large integrative (experimental+computational) approach.

      Place the work in the context of the existing literature (provide references, where appropriate).

      -The authors have done an excellent job at this.

      State what audience might be interested in and influenced by the reported findings.

      -Cell biologists, brain researchers, computer vision computational biologists

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      -Cell biology, cancer biology, systems biology, machine learning, statistics, data integration

      -Brain biology aspects (biological significance of the findings on morphodynamic microglial states) are difficult to assess for the reviewer

      Referee Cross-commenting

      Comments by Reviewer #1 look great and useful. I think they are in line with my comments. I think this manuscript would benefit from a reviewer that could comment on the biological significance. The review reports are skewed towards questions and remarks about the computational approach.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The authors use a combination of quantitative phase microscopy and machine learning to determine the state space of microglia cells. The key conclusions are that a VQ-VAE is able to capture a compact latent representation of the cell morphology, and combined with motion features, can predict state changes in single cell trajectories, and discriminate between purturbations.

      Major comments:

      Overall - I very much enjoyed reading the manuscript. The work has been carefully performed and the results are interesting.

      • The temporal matching to enforce a smooth latent space representation is interesting. The authors mention that they mask out surrounding cells with a median pixel value. Have the authors considered using a pixel weighting in the reconstruction/matching loss to differentiate foreground/background? Also, does this affect detection of any fast (or indeed rare) transitions in the trajectories?
      • The Trajectory Feature Vectors (TFVs) are averaged over time - this seems to lose a lot of the salient information in the trajectories themselves, resulting in the low(ish) accuracy of the GMM. Could a Hidden Markov Model trained on the trajectories in state space help to identify/classify those trajectories that change their morphology/motion over time?

      Minor comments:

      • Could the authors provide some example images showing interpolation of each PC using the generative decoder?
      • I was a little confused by the labels given to the PCs, as they seem to vary between figures. For example, In Fig2, PC1 and PC2 are Size and Peak Retardance, but in Fig3 they are referred to as Size and Cell Density (which could be interpreted as the number of cells per unit area). Could the authors clarify these in the captions?
      • The authors note that single-cell tracking is of vital importance. This should be elaborated upon. Also - could the VQ-VAE encodings be used to help track linking in cases of high density?
      • I was pleased to see the full source code available!

      Significance

      Nature and significance:

      This is a significant, mostly technical piece of work, that explores a complex new area of science -- using ML and large datasets to gain insight into biological systems. There are significant challenges, not least that interpreting ML models can be challenging.

      Existing literature/context:

      There have been relatively few examples of using self-supervised learning to gain insight into these complex datasets. Much of the work has concentrated on learning morphological descriptors. The present work starts to introduce the time dimension more explicity.

      Target Audience:

      Broadly applicable to those studying cell biology, microscopy and machine learning.

      My expertise:

      ML applied to microscopy data. Single cell tracking.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      The authors do not wish to provide a response at this time.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The authors examine the important and challenging question in current biology, the role of RNA in the structural maintenance of nuclear and cytoplasmic membrane-less organelles including stress granules, processing bodies, nucleolus, Cajal bodies, and nuclear speckles. Furthermore, the authors explored super-enhancer complexes involved in the regulation of gene expression. The authors used RNase L, an interferon-induced ribonuclease which, upon activation in the cytoplasm or targeted to the nucleus, degrades all RNAs within the cell. Then they took the quantitative approach to analyze the effect of RNA degradation on disassembly or reorganization of membrane-less organelles. Interestingly, the authors observed that RNAs present within nuclear organelles are susceptible to RNase IL degradation leading to their disassembly. In contrast, super-enhancer-containing eRNAs are largely unaffected.

      Major concerns

      Many studied organelles are challenging to see in many of the figures. Thus this reviewer encourages the authors to present clearer insets at higher magnification to illustrate what is being quantified, and then show that quantification in the central figure next to the immunofluorescent images.

      The amount of specific RNAs degradation after induction of RNase L for several assemblies should be analyzed by qRT-PCR and quantified. This will justify observations provided by microscopy on an individual cell basis. The main issue regards the connection between RNA and its role in the formation and structural integrity of nuclear organelles. There is consensus that these nuclear assemblies are built on specific nascent transcripts which act as a nucleation scaffold. If specific RNA synthesis is impaired, these assemblies collapse. The authors should discuss it. It would be relevant to mention two experimental works on this topics, DOI: 10.1038/ncb2140 and DOI: 10.1038/ncb2157 The study is limited to observed macroscopical changes in the appearances of assemblies. The authors must dig deeper and provide more conclusive results by several colocalizing components of these assemblies. It has been documented that the visualization of a selective marker for a specific assembly is not enough to prove its functionality/dysfunctionality but also the level of its disassembly. For example, in Figure 4A the authors should more convincible visualize nascent 47/45S pre-rRNA transcript to demonstrate that the nucleolus is built on ongoing pre-rRNA synthesis reflected by the tripartite nucleolar substructures. The loss of the GC component after rRNA depletion should be better presented with NPM1 colocalization.

      In Figure 4C, D the authors used the term "coilin assemblies". That's confusing for a reader. The Cajal body after activation of RNase L likely undergoes the structural rearrangement which cannot be justified only by the presence of rearranged coilin foci. The authors should colocalize them with at least one or two functional markers.

      Enhancer RNAs likely play the role in gene control rather than as a nucleation element to build nuclear assemblies. This should be discussed in the explanation of observed differences between MED1 foci and other assemblies.

      Significance

      Understanding changes in the nuclear and cellular organization that accompany and drive changes in the formation and maintenance of cellular structures is an essential and not well-understood topic. Thus, this manuscript is relevant. However, the presented data in this paper are based on a limited approach, and particularly their interpretation and presentation could be substantially improved. Consequently, the conclusions are not convincingly supported by published data. However, some open questions need to be addressed. Specific criticisms are outlined above.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by Decker et al. "RNA is required for the maintenance of multiple cytoplasmic and nuclear membrane-less organelles" investigates the structural role of RNA in membraneless organelles. The authors show that degradation of RNA in transient or constitutive membraneless organelles results in the altered formation and structure of many but not all organelles studied. The main assay is the activation of RNAseL activity by dsRNA which then destroys mRNA in the cell. The collected data leads the authors to highlight the possible roles of RNA in membraneless organelle formation and categorize the organelles: some relying more on the RNA-RNA interactions while others on protein-RNA or protein-protein interactions. The manuscript is well written and the data is sound.

      Major comments:

      The authors study the maintenance of organelles by RNA. For the transient ones, like stress granules (SG), it would be very interesting to see the formation/clearance kinetics with and without RNA. Also maybe using something other than dsRNA to trigger the formation. The idea being - if RNA is needed for SG maintenance, then the clearance kinetics with RNA would differ from that of the depleted RNA.

      The experiments were done in cells. It is known that core components of the organelles can form granule like structures in vitro without RNA. If it is possible to show that RNA presence improves the integrity in vitro, that would support the authors claim. For example studying SG maintenance with and without RNAseL using the previously developed SG extraction protocol.

      Minor comments:

      In the Figure 1a, it is not clear if the smaller granules are different from SGs as mentioned in the text, maybe using additional markers can make it clearer. Figure 3 and 4 requires quantification.

      Significance

      This is a solid paper that advances our understanding of membraneless organelle formation and dynamics. This field is of high general interest for the broader scientific community. My expertise is in the field of membraneless organelles.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In the paper entitled. "RNA is required for the maintenance of multiple cytoplasmic and nuclear membrane-less organelles" Decker et al set out to test the rolw of RNA in maintaining the integrity of a variety of biomolecular condensates. To do this, they assess how multiple different assemblies in the cytoplasm and nucleus retain their structural integrity following RNAseL activation. They identified many condensates which are solubilized and have protein components redistributed following RNAse L activation and presumably subsequent RNA digestion. These experiments largely concur with previous findings from RNAseA treatment. The implication is that RNA rather than protein is the essential organizing component for most tested condensates. The manuscript is well written, and the data are convincing. It is my judgement that this is worthy of publication following a few additional experiments/clarifications.

      1. The authors identify condensates which are sensitive to or refractory to RNAaseL. It would be good if the authors more conclusively eliminate the possibility that remaining condensates contain specific residual antiviral RNAs and this is the reason why these condensates remain intact. Are any of these condensates enriched in anti-viral RNAs like IFNbeta following polyIC treatment by FISH, for example (PMID: 31494035)?
      2. Is there a particular protein feature, charge, IDR-types etc. which is common to solubilized versus not solubilized groups? What about dissolved and novel formed assemblies? A simple table comparing protein features in the three groups would suffice, with particular emphasis on RNA binding domains PMID: 32243832 and intrinsic disordered regions PMID: 24773235.
      3. Demonstrate that the RNAseL treatment is reversible (i.e. withdraw polyIC, particularly for a protein that ends up in a novel assembly) or remove the word maintenance from the title.
      4. Control for RNA-dependence of the activity. Try to dissolve a non-RNA dependent/enriched condensate with RNAseL. SPOP mutations (PMID: 30244836) might be interesting as both SPOP and RNAseL loss of function mutations (PMID: 11799394) are associated with prostate cancer.
      5. A caveat is that certain regions of condensates enriched in RNA may not be accessible to RNAseL protein. A way to address this might be to attempt to directly target the enzyme to a compartment that is deemed refractory to the activity (and inferred to not require RNA) via an inducible systsem (ie FKBP12/FK506)
      6. Overall, this paper would be greatly enhanced by including a more extensive discussion on the basic biological implications for these findings. Why are some condensates RNA dependent? What function(s) are common to these condensates? How does disruption of this lead to disease?

      Significance

      This work addresses the neglected role of RNA in structuring condensates throughout the cell. Despite the prevalence of RNA in many condensates and the enrichment of RNA-binding proteins in condensates, there is still a highly limited understanding of the structural roles RNA plays in their assembly s most work has been protein/IDR-centric. This work seeks to systematically assess the RNA-dependence of the assemblies.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      The authors do not wish to provide a response at this time.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In mice, failures in conducting meiosis during spermatogenesis can be rescued by injecting prophase I male chromosomes into oocytes, to allow them to undergo the two meiotic divisions within the oocyte, together with the chromosomes of the oocyte. However, segregations are highly error prone and rarely lead to a live birth when the resulting embryos are reimplanted into foster mothers. In this study, the authors show that segregation errors in meiosis I oocytes harboring both male and female chromosomes are mainly affecting the male chromosome set. Most errors are due to precocious segregation of sister chromatids in unpaired male chromosomes (univalents). A delay in alignemnt of male chromosomes compared to female chromosomes was also observed. Reducing the volume of the oocyte cytoplams to half leads to a signifncant reduction in the errors occuring, and hence, a significant increase in successful birth after re-implantation. Excitingly, with this technique, live births were obtained from male mice with a spermatogenic arrest phenotype.

      Main points:

      1)The authors conclude that halving the oocyte cell size is helping in proper segregation of male meiosis I chromosomes in the cytoplasm of meiosis I oocytes. It is also possible that the experimental procedure involved in removing half of the cytoplasm is promoting proper segregation for some unknown reason. The authors should include a condition where half of the cytoplasm is aspirated but then put back again, so oocytes have the same volume as before but the cytoplasm underwent the same treatment as in the halved oocytes. Also, increasing the cytoplasm volume of the oocyte should not lead to a better segregation of male chromosomes but make things worse, have the authors checked for that?

      2)The authors mention that male chromosomes align with a delay, compared to the female chromosomes. Does this delay depend on activation of error correction, or the spindle asembly checkpoint? Is it possible that dilution of factors required for checkpoint control and hence, assuring proper chromosome segregation, are the reason for error prone segregation in oocytes harboring twice the amount of chromosomes? If yes, have the authors stained for SAC proteins at the kinetochores? Maybe slight overepxression of the SAC protein were sufficient to rescue male meiotic divisions in the oocyte- have the authors tested this hypothesis?

      3) The authors state that male chromosomes have a hard time segregating in the hugh cytoplasm of the oocytes. Maybe it is not the fact that the chromosomes came from a male pronucleus, but this is just a manner of double the chromosomes that have to be segregated in the oocyte cytoplams. How do male chromosomes behave in enucleated oocytes undergoing meiosis I? Conversely, if female chromosomes coming from another oocyte are injected into the recipient oocyte instead of ale chromosomes, are those segregating correctly, or the delay in chromosome alignment and error rate comparable to the situation when the additional chromosome set comes from the male?

      4) In the rescue of mice with spermatogenic arrest the authors find aneuploidies of sex-chromosomes in the off-spring, not of autosomes. To my best of knowledge, autosome aneuploidies are not viable in the mouse, hence this result does not indicate that sex-chromosomes are the main source of aneuploidies. Nevertheless, it is attractive to speculate that aneuploidies are mainly due to sex chromosomes, because the oocyte is not prepared to segregate a male sex-chromosome bivalent. The authors should determine whether the segregation errors in meiosis I in oocytes harboring the additional male chromosome set concern mainly the male sex-chromosomes, by doing Fish analysis after meiosis I.

      Significance

      This study is very interesting and of high significance, and very well executed. I think the study can go much further as far as mechanistic insights are concerned, only requiring techniques and tools that the authors have at their disposition.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Previously, the team has shown that primary spermatocyte nucleus can undergo meiosis when transplanted into immature oocytes, and later obtained normal mice from the fertilized oocytes (Zygotes 1997, PMID: 9276513; PNAS 1998, PMID: 9576931). However, the efficiency was quite low (~ 1%) due to chromosome aberration, thus not feasible for basic/clinical research applications. In this study, Ogonuki et al., extrapolated from the recent study showing the reduction of the ooplasm ameliorate the error of chromosome segregation during meiosis (Dev Cell 2017, PMID: 28486131), injected the spermatocyte nucleus into the half-sized GV oocytes, and succeeded to obtain live murine pups with a high incidence (the birth rate improved from 1% with full-sized oocytes to 19% with half-sized oocytes). Further, through detailed observation with high-resolution 3D live imaging, the authors clarified that the misalignment of paternal chromosomes could be ameliorated by reducing the volume of ooplasm. Finally, the authors applied this technology and obtained live pups from azoospermic mice, suggesting the potential application in human infertility treatment.

      Major comments:

      This is a great study combining the expertise on both sperm and oocytes. The experiments are well designed and performed. The key conclusions are convincing.

      Line 228. The authors claimed that all the pups born following the injection of wild-type or mutant spermatocytes grew into fertile adults.

      Because the authors tested 3 males from wt spermatocytes (line 197), the above sentence should be rephrased.

      The authors found one XXY male among the three male mice from wt spermatocytes. Was the XYY male mouse fully fertile without XY/XYY mosaicism?

      How many females and males were obtained from wt spermatocytes?

      Minor comments:

      The authors clearly showed the technique can be applied to rescue the spermatogenic arrest. The readers would appreciate if the authors include any unsuccessful cases.

      To prevent sex-chromosome aberration, are there any potential markers for selecting most developed spermatocytes?

      Significance

      One in six couples suffers from infertility, and 70-90% of male infertility cases are related to defects in spermatogenesis. Clinically, intracytoplasmic injection of sperm is common, but it is not applicable to men who lack haploid germ cells. Injection of primary spermatocyte nucleus can give pups but the efficiency was poor (~1%, PNAS 1998, PMID: 9576931). In the present study, by using halved oocytes as recipient, the authors improved the efficiency from 1% to 19%. With the great improvement, they further obtained healthy fertile offspring from the male mice genetically lacking haploid cells. This approach opens up the window for the infertile patients suffering from spermatogenic arrest.

      The reviewer's field of expertise: knockout mice, male infertility, spermatogenesis, sperm function, fertilization.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Ogonuki et al developed a new technique using primary spermatocyte-injected oocytes for offspring production. They examined chromosome segregation error in biparental meiosis using spermatocyte-injected oocytes. They showed that artificially reducing ooplasmic volume rescued highly error-prone chromosome segregation by preventing sister separation in biparental meiosis. Their live-imaging analysis demonstrated that erroneous chromosome segregation derived from univalent-like chromosomes followed by predivision of sister chromatids during prometaphase I in biparental meiosis. They showed that the birth rate was improved using halved oocytes. Furthermore, they showed that production of offspring was successful using spermatocyte from azoospermic mice.

      Overall data are convincing and the manuscript addresses important questions. The data was produced in a technically high level. Presented data are sufficient to support conclusions of the authors, and further provide a significant insight into application to production of offspring for azoospermia animals. Thus, the manuscript could be open for the fields and are supposed to deserve publication, if they could address following minor concerns.

      Fig1A, Line 117 This is an amazing experiment to set up biparental meiosis using spermatocyte nuclei. Since spermatocytes are in different stages during progression through meiotic prophase, some of them (late pachytene) should yield crossover but others (before mid-pachytene) are yet to complete recombination. Thus, whether donor paternal chromosomes have bivalents or univalents depends on which stage spermatocytes derived from. The authors should describe how spermatocytes were picked up for injection and whether they used a particular stage of spermatocytes.

      Line 159-160 The authors stated that paternal chromosomes are susceptible to errors in ooplasm-hosted biparental meiosis. This is nice demonstration to trace the origin of separated chromatids. In Fig2C right graph, 1 to 2 paternal chromosomes showed misalignment. It is unclear whether premature separation is biased to any particular paternal chromosome, eg XY ? The authors should discuss more about it.

      Line 176-177 The authors stated that most of errors were preceded by premature separation of bivalent chromosomes into univalent-like structures. This implies that premature separation of bivalent chromosomes happens prior to anaphase onset. Does this depend on spindle force? Or is cohesion intrinsically fragile in donor spermatocyte chromosomes? The authors should discuss more about it.

      Fig3E, The authors depicted that in normal sized oocytes, univalent-like chromosomes undergo predivision at anaphase. This is somewhat too simplified, because Fig3B shows that a certain population exhibits nondisjunction. This model and description should be corrected to fit the data they demonstrated. If sister segregation at anaphase is predominant, I wonder what happens to sister kinetochore mono-orientation and sister centromeric protection in such univalent-like chromosomes. It would be nice to show centromeric proteins MEIKIN, SGO2 in donor spermatocyte chromosomes versus those of oocyte to examine centromeric cohesion. The authors should clarify this issue.

      Line296-294 What do the authors mean by the sentence " It is known that sex chromosomes are prepared to undergo meiosis later than autosomes."?

      Significance

      The manuscript will provide biological significance for the reproduction fields. There are two major biological significances : They addressed the mechanism of erroneous chromosome segregation in biparental meiosis. They showed that biparental meiosis using spermatocyte-injected oocytes can be applied to production of offspring of azoospermic mice, which would have great impact on reproductive biology field. The data was produced with their high level of technique.

      Referee Cross-commenting

      I agree to the point described in Reviewer #3's Main points2. It would be better to see SAC proteins.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      The comments of the reviewers were highly insightful and enabled us to greatly improve the quality of our manuscript. We provided point-by-point responses to each of the reviewers’ comments. Revisions in the text are highlighted in yellow. We hope that the revisions in the manuscript and our accompanying responses will be sufficient to make our manuscript suitable for publication.

      2. Point-by-point description of the revisions

      Reviewer #1

      - The authors provide no rationale for using the PTI score to measure the protein-coding potential of transcripts. The only attempt to justify this measure is given in the methods: "The definition of PTI score is motivated by our hypothetical concept that translation of pPTI is limited by alternate competing sPTIs." (lines 426-427, page 20). What the PTI score measures is the dominance of the largest predicted ORF over the predicted ORFs, in terms of length. It is not clear why there would be competition for translation of putative ORFs for genuine protein-coding transcripts. An alternative hypothesis, briefly touched upon in the discussion (lines 318-320) is that translation of non-functional ORFs could give rise to the production of toxic proteins, in addition to being costly in terms of energy. The authors should provide the reasoning behind the PTI score and should explain the biological mechanisms that may underlie differences between coding and non-coding transcripts.

      Thank you for your comment. We previously identified a de novo gene, NCYM, and showed that its protein has a biochemical function (Suenaga et al 2014; Suenaga et al 2020). However, NCYM was previously registered as a non-coding RNA in the public database, and the established predictors for protein-coding potential, coding potential assessment tool (CPAT), showed a coding probability of NCYM of 0.022, labeling it as a noncoding RNA (new Supplementary Figure 1B). Therefore, we sought to identify a new indicator for coding potential, comparing NCYM with a small subset of coding and non-coding RNAs to determine whether NCYM has sequence features that would allow it to be registered as a coding transcript (data not shown). We found that predicted ORFs, other than major ORFs, seem to be short in coding RNAs. In addition, it has been reported that upstream ORFs inhibit the translation of major ORFs (Calvo et al 2009). Therefore, we hypothesized that the predicted ORFs may reduce the translation of major ORFs, thereby becoming short in the coding transcripts, including NCYM, during evolution. The term ORF refers to an RNA sequence that is translated into an actual product; however, the biological significance of non-translating, predicted ORFs has been largely ignored and remains to be characterized. Therefore, we defined a PTI as an RNA sequence from the start codon sequence to the end codon sequence and did not assume that it would result in a translated product. Thus, PTI can be defined even in genuine non-coding RNAs. The major ORFs are often the longest PTIs (hereafter, primary PTIs or pPTIs) in coding transcripts. Thus, to investigate the importance of pPTIs relative to other PTIs (hereafter, secondary PTIs, or sPTIs) for the evolution of coding genes, we defined a PTI score as the occupancy of the pPTI length to the total PTI length (Figure 1A–B) and assumed that the PTI score was high in coding transcripts. These are the rationale for using the PTI score for protein-coding potential and are now included in the revised manuscript (lines 92-115, page 5-6).

      To examine the biological mechanism underlying the difference between coding and noncoding RNAs, we investigated the relationship between translation and PTI scores. We chose a dataset of non-coding RNAs that translated small proteins derived from the databases SmProt and sORF.org. From ribosome profiling and mass spectrometry data, the databases include noncoding RNAs that encode small proteins (less than 100 residues) as well as mRNAs that have extra-small ORFs in addition to major ORFs. The SmProt database divides these small ORFs into three categories: upstream (uORF), small (sORF), and downstream (dORF). The definitions are based on their locations: uORFs and dORFs are located in 5’ and 3’ UTRs, respectively, and sORFs overlap with major ORFs using different reading frames (new Figure 2B). We first calculated PTI scores of lincRNAs encoding small proteins and found that the distribution of these lincRNAs shifted to higher PTI scores compared with the distribution of all lincRNAs (new Figure 2A). Therefore, lincRNA translation is correlated with higher PTI scores. Next, we examined whether PTI scores were associated with the translation occupancy of major ORFs in coding RNAs. We calculated PTI scores in mRNAs with uORF, sORF, or dORFs and found that the distribution of mRNAs encoding such small proteins shifted to lower PTI scores (new Figure 2C). Similar data were obtained from the sORF org dataset (Supplementary Figure 5). These data support the idea that the PTI score is related to the occupancy of the major ORF during translation. These results are now included in the results of the revised manuscript (lines 241-271, pp 12-13).

      Translation of small proteins from noncoding RNAs seems to inhibit noncoding functions because of ribosome binding and subsequent translation. On the other hand, translation of sPTIs in coding RNAs seems to inhibit the translation of major ORFs because of competing translations (Calvo et al 2009). At the same time, however, the translation of such proteins may have the advantage of producing new functional proteins/regulatory mechanisms during evolution. Therefore, the right and left shifts of the PTI score that we observed for noncoding and coding RNAs, respectively, seem to be slightly deleterious or beneficial. As further discussed in the responses below, the overlap of distributions of PTI scores between coding and noncoding transcripts was negatively correlated with the effective population size of the species. Therefore, as nearly neutral theory predicts, mutations causing such slightly deleterious/beneficial effects of translation in coding and noncoding transcripts seem to be fixed in species with small effective population sizes (including humans) by genetic drift (Kimura 1968, 1983; Ohta 1992). Clearly, PTI scores are related to translation of PTIs, and their distributions suggest a mechanism for producing bifunctional RNAs that are simultaneously coding and noncoding. The discussion has now been included in the revised manuscript (lines 487-503, pp 23-24).

      Calvo SE, Pagliarini DJ, Mootha VK. Upstream open reading frames cause widespread reduction in protein expression and are polymorphic among humans. Proc Natl Acad Sci U S A. 2009 May 5;106(18):7507-12. doi: 10.1073/pnas.0810916106. Epub 2009 Apr 16. PMID: 19372376; PMCID: PMC2669787.

      Kimura M. 1968. Evolutionary rate at the molecular level Nature. 217(5129):624-6. PMID: 5637732. https://doi.org/10.1038/217624a0

      Kimura, M. (1983). Neutral Theory of Molecular Evolution Cambridge: Cambridge University Press. https://doi.org/10.1093/obo/9780199941728-0132

      Ohta T. 1992. The Nearly Neutral Theory of Molecular Evolution. Annu Rev Ecol Syst. 23:263-86.

      - The presence of ORFs in transcripts has long been used as a predictor of their protein-coding potential. For example, the ORF size and the ORF coverage are part of the set of predictors implemented in CPAT (Wang et al., 2013). The PTI score is necessarily related to these methods, yet no comparison is provided. If the PTI score is to be used as a measure to classify transcripts as coding or non-coding, its performance should be compared to other classifiers, including those that use the presence of ORFs as a predictor (e.g., CPAT) but not only (e.g., PhyloCSF, based on the pattern of sequence evolution).

      Thank you for your comment. As you noted, our reasons for using the PTI score were not clearly described in the original manuscript and are now included in the Results section (lines 92-115, page 5-6). As mentioned in response to comment 1, CPAT was not able to predict NCYM as a coding transcript (Supplementary Figure 1B). Furthermore, we intended to use this new concept to identify the RNA sequence elements that determine protein-coding potential, but did not intend to use the score as a classifier of coding or non-coding RNAs. Many studies have identified bifunctional RNAs that are simultaneously coding and noncoding (Li and Liu 2019; Huang Y et al. 2021). Moreover, neutrally evolving peptides are encoded by small ORFs of noncoding RNAs, possibly contributing to the evolutionary origin of new functional proteins (Ruiz-Orera et al. 2014). Therefore, we argue that such dichotomous classification is often misleading, by unconsciously ignoring ncRNAs that encode functional or nonfunctional small proteins. Additionally, this approach has several technical problems. For a training set for use with such a classification, we need a dataset of genuine noncoding RNAs. However, it is quite difficult to define such noncoding RNAs without bias, for example, for cell or tissue types, including cancer or normal cells/tissues. Increasing evidence has shown peptide translation from known noncoding RNAs (Li and Liu 2019; Huang Y et al. 2021); moreover, some of these peptides are specific to the cellular context (Dohka et al 2021). Therefore, we cannot be certain that we are identifying genuine noncoding RNAs from the datasets from ribosome profiling or mass spectrometry, which neither cover all cell/tissue types nor all physiological contexts.

      We agree with you in that we need to compare PTI scores with other indicators of coding potential, such as transcript length, ORF size, and ORF coverage. ORFs of less than 100 residues have been used to define noncoding RNAs; thus, such RNAs necessarily have shorter ORF sizes relative to coding RNAs. Therefore, we calculated these indicators by focusing on noncoding RNAs that encode proteins, but not coding RNAs (new Supplementary Figure 4). The PTI score distribution shifted to the right for lincRNAs encoding small proteins, indicating that the PTI score is related to translation (new Figure 2C). In contrast, the distributions of transcript length, ORF size, and ORF coverage did not shift higher for noncoding RNAs encoding small proteins (new Supplementary Figure 4), although a slight shift to higher ORF coverage was found. Therefore, we argue that the PTI score is a better indicator of translation than transcript length, ORF size or ORF coverage. These results are now included in the results of the revised manuscript (lines 241-255, page 12).

      - The authors compare the observed PTI score distributions with the PTI scores from random or shuffled sequences. They conclude that the PTI scores do not depend on transcript lengths but on transcript sequences (lines 122-123). However, this is not true for non-coding RNAs, for which the observed and randomized distributions are very similar. The relationship between transcript length and PTI scores should be analyzed into more detail. Are the annotated non-coding transcripts with high PTI scores particular in terms of length?

      We analyzed the length of high-PTI-score transcripts compared to all lncRNA transcripts. The average high-PTI-score with high coding potential (0.6 PTI score −29), consistent with the distribution of transcript length in lincRNAs translating small proteins (new Supplementary Figure 4C). Therefore, the high PTI scores are not simply due to the larger ORF size derived from longer transcript length, but also because of the occupancy of pPTI among all PTIs. The occupancy of pPTI can be estimated by ORF coverage or PTI score, and we can easily see that transcript length (the denominator of ORF coverage) correlates with the sum of the lengths of all PTIs (the denominator of the PTI score). Thus, we need to clarify which indicators have more biological significance in terms of gene evolution. Higher PTI scores in noncoding RNAs cause overlap of the coding and noncoding transcripts in eukaryotes, especially in multicellular eukaryotes (new Figure 4 and 5). The overlaps of PTI score distributions between coding and noncoding RNAs (Opti) were positively and negatively correlated with mutation rate and effective population size, respectively, and approximated by logarithmic or exponential relationships (new Figure 6). Because the inverse of the effective population size defines the strength of genetic drift relative to the strength of selection, the overlaps quantified by Opti seem to be derived from genetic drift. These results clearly suggest that the observed PTI score distribution of noncoding RNAs is not random. In contrast, ORF coverage (Ocov) showed a weaker relationship with mutation rates and effective population sizes (new Supplementary Figure 8 and 9). These results suggest that ORF coverage is less related to gene evolution than PTI score, with the weak relationship seemingly indirectly derived from the correlation with the PTI score. We have now included these results in the revised manuscript (lines 306-322, page 15).

      - The authors discuss in depth the correlation between PTI scores and PTI-based protein-coding potential measures (e.g., section "PTI scores correlate with protein coding potential in humans and mice", starting line 125; section "Relationship between the PTI score and protein-coding potential", starting line 243). Given that the protein-coding potential is directly derived from the PTI score distributions for coding and non-coding transcripts, it is not surprising that the two should be correlated. The significance of observing a linear or a sigmoid relationship is not clearly explained.

      As you noted, the protein-coding potential was directly derived from the PTI score distributions. Therefore, if the distribution for coding RNA shows a higher or lower PTI score compared to that of noncoding RNA, the protein-coding potential is expected to be positively or negatively correlated with the PTI score. If the distributions of coding and noncoding RNA significantly overlapped (Opti > 0.7), the protein-coding potential became constant and was not correlated with the PTI score (new Figure 7 and new Supplementary Figure 10). Thus, the PTI score is not always positively correlated with the protein-coding potential.

      We had divided the species into three groups; the sigmoidal group, the linear group, or others based on the intercept and slope in the linear approximation, but considering the fit of the linear approximation, there is no essential difference between the sigmoidal and linear groups. Therefore, in the revised text, we classify the species into two groups: linear and constant (new Figure 7 and Supplementary Figure 10). We have now replaced the figures and added a new interpretation of the results in the revised manuscript (lines 341-353, pages 16-17).

      - The authors use the entire set of annotated coding and non-coding transcripts to assess the distribution of PTI scores and to define the protein-coding potential. Traditionally, for methods that aim to classify transcripts as coding or non-coding, this is done using "bona fide" coding and non-coding transcripts, which are used as training sets. The efficiency of the method can then be evaluated using a test set of transcripts. This aspect is lacking here and should be implemented.

      As we wrote in response to your comment 2, we aimed to examine what RNA sequence elements determine genuine-coding RNA but not to identify the classifier of coding and noncoding RNA. Technically, the “bona fide” coding and noncoding RNAs cannot be rigorously defined, given the possible existence of unidentified bifunctional RNAs in the testing sets; therefore, more traditional approaches often eliminate such possibilities.

      - The comparisons among species are likely biased by the quality of lncRNA annotations in non-model organisms - cf. high variations among primates, which are likely driven by the annotation quality and depth.

      As written in the response to comment 3, the variation of PTI score distribution in lncRNA is not random, and overlaps with the distribution of coding RNA are negatively correlated with effective population size (new Figure 6). In addition, we found that the tissue-specific expression of lncRNA influences the PTI score distribution in multicellular eukaryotes (new Figure 8 C and D and new Supplementary Figure 11 and 12). Therefore, the variation is caused, at least in part, by the specificity of gene expression, and it thus contains biological significance. These results are now included in the revised manuscript (lines 383-402, pages 18-19).

      Based on these results, we expect that the quality of the lncRNA annotations derived from two major databases, Ensenbl and RefSeq, are well curated and sufficient to compare the PTI score distribution. Realistically, there is no database that catalogs a number of curated lncRNAs from various species other than these two. However, we also expect that recent progress in whole genome sequencing and transcriptome analysis of vertebrates may improve the annotation of lncRNAs, including non-model organisms, and provide more ideal datasets for comparisons among species.

      - The differences among bacteria, archaea and eukaryotes should be discussed into more depth. In bacteria, the genuine ORF is well defined by the presence of translation signals (e.g., Shine-Dalgarno sequence). Other factors are also at work in both prokaryotes and eukaryotes, including RNA secondary structures. The relationship between these factors and the PTI score should be discussed.

      The Shine–Dalgarno sequence in bacteria and the Kozak sequence in eukaryotes have been identified as important regulatory elements for ribosome binding, but these sequences are not essential for all coding RNAs, and their significance is not well characterized, especially in noncoding RNAs that are translated. Recent research has sought to identify the determinants that regulate ribosome binding to lncRNAs using 99 characteristics, including the weight of each base at the −6 to +1 positions relative to the start codon (Kozak-like sequence) or RNA secondary structure (Zeng et al 2018). They found that transcript length is a stronger indicator than either of these characteristics for ribosome binding in human lncRNAs. Because the PTI score is a better indicator for translation of lincRNAs than transcript length (new Supplementary Figure 4C), we would argue that Kozak sequences and RNA secondary structures are not reliable indicators for ribosome binding of lncRNAs, and their significance should be limited to more specific transcript classes. Furthermore, Hata et al. recently showed that the Kozak sequence is a negative regulator of de novo gene birth in plants (Hata et al. 2021). Therefore, these sequence characteristics seem to evolve after the birth of coding transcripts and are not generally involved in new coding gene origination from noncoding RNAs.

      Zeng C, Hamada M. 2018. Identifying sequence features that drive ribosomal association for lncRNAs BMC Genomics. 19(Suppl 10):906. PMID: 30598103; PMCID: PMC6311901. https://doi.org/10.1186/s12864-018-5275-8

      Hata T, Satoh S, Takada N, Matsuo M, Obokata J. 2021. Kozak sequence acts as a negative regulator of de novo transcription initiation of newborn coding sequences in the plant genome. Mol Biol Evol. 38:2791-2803. PMID: 33705557; PMCID: PMC8233501. https://doi.org/10.1093/molbev/msab069

      - From an evolutionary perspective, the effective population size (Ne) is also likely related to the "quality" of the ORFs. An analysis of Ne vs. the PTI score distributions would be an interesting addition to this manuscript.

      We appreciate this comment. We now include an analysis of the relationship between Ne and PTI scores by defining an indicator of the extent of overlap in the PTI score distributions between coding and noncoding transcripts. This overlapping score was calculated based on PTI scores or ORF coverage and named Opti or Ocov, respectively. Opti showed positive and negative correlations with mutation rates (Up) and effective population size (Ne), respectively (new Figure 6A), suggesting that the overlap of PTI score distribution is related to slightly deleterious or beneficial mutations fixed in populations due to genetic drift. Furthermore, using the relationship between Ne and Opti, we calculated the minimum effective population size to be approximately 1000, which is consistent with the results from conservation biology (Frankham et al. 2014). Indeed, species at risk of extinction had significantly higher Opti than species with little risk of extinction (left panel, new Figure 6B). In addition, Opti was higher for species with a decrease compared to those with stable population sizes (right panel, new Figure 6B). These results are now included in the revised manuscript (lines 323-332, page 15-16).

      Frankham R, Bradshaw CJA. 2014. Genetics in conservation management: Revised recommendations for the 50/500 rules, Red List criteria and population viability analyses, Biological Conservation, 170:56-63, https://doi.org/10.1016/j.biocon.2013.12.036

      Reviewer #1 (Significance (Required)):

      This manuscript is lacking in novelty and is not well positioned in the field. If the aim of this work is to provide a method to classify transcripts as coding or noncoding, the authors should provide detailed comparisons with existing methods (see above). If the aim is to understand what defines a genuine protein-coding transcript, then the biological mechanisms should be better described and the comparisons among species and among functional categories of genes should be further developed. The idea of using the "dominance" of the largest ORF compared to the other predicted ORFs is interesting, and provides a new element compared to existing methods that rely exclusively on ORF length and ORF coverage. I would recommend that the authors develop this idea further and discuss the advantages of using the ORF dominance compared to just the ORF length or coverage.

      Thank you for your comment. To address this, we have revised the description of our aim to investigate what defines a genuine protein-coding transcript and found that doing so prompted us to learn that the extent of overlap of PTI score distribution between coding and noncoding transcripts is negatively correlated with effective population size. In addition, we have added characterizations of functional categories of high-PTI-score lncRNAs in mice (new Supplementary Tables 6 to 8) and C. elegans (new Supplementary Tables 9, 10, and 11). Comparison of ORF size and coverage to PTI score showed that PTI score is a better indicator for translation of lncRNAs than these indicators and has biological significance in molecular evolution because of the clear correlation between mutation rate and effective population size. These results and related descriptions are now included in the revised manuscript (lines 323-332, pages 15-16; lines 210-218, pages 10-11).

      **Referee Cross-commenting**

      I fully agree with Reviewer 2's remarks. In particular, adding ribosome profiling analyses is an excellent idea and could substantially improve the manuscript.

      We investigated the PTI scores in lncRNAs that are translated, using ribosome profiling data, and found that PTI scores correlated with translation (lines 241-271, pages 12-13). Thank you for this excellent suggestion.

      Reviewer 2

      **Major comments:**

      - some validation of their predictions of coding potential would be good to add. There are plenty of ribosome profiling experiments out there for some of the studied organisms (human, mouse, E. coli) that could be used to show that indeed some of the non-coding RNAs are misclassified and have ribosome density across the predicted open reading frames.

      Thank you for your comment. As noted in our response to Reviewer 1 above, we calculated the PTI scores of translated lncRNAs from the two databases and found that the PTI score correlates with translation of both coding and noncoding RNAs (new Figure 2 and new Supplementary Figures 4 and 5). As noted above, such translation seems to produce slightly deleterious/beneficial effects, thereby becoming fixed in species with smaller effective population sizes by genetic drift. These results and related discussion are now included in the revised manuscript (lines 241-271, pages 12-13; lines 323-332, pages 15-16; lines 487-503, page 23-24).

      - the manuscript is at times difficult to follow and the implication of the statements may not be immediately clear to the readers, particularly those without formal training in bioinformatic methods; even in the abstract. Some examples: "The relationship between the PTI score and protein-coding potential was sigmoidal in most eukaryotes; however, it was linear passing through the origin in three distinct eutherian lineages, including humans". Here it is not clear what this means (without reading the paper) - and even after reading the paper the importance of noting the sigmoidal vs linear relationship of PTI vs. protein-coding potential is unclear. I would encourage the authors to double-check that they provide a clear interpretation of their results, with readers unschooled in proper statistics in mind.

      Thank you for these comments. As we noted in response to comment 4 of Reviewer 1, considering the fit of the linear approximation, there was no essential difference between the sigmoidal and linear groups. Therefore, in the revised manuscript, we classify the species into two groups: linear and constant (new Figure 7 and Supplementary Figure 10). We also propose and diagram a new gene birth model to help readers understand our interpretations more easily (Figure 9). These results and discussion are now included in the revised manuscript (lines 341-353, pages 16-17; lines 514-538, pages 24-25).

      - For the definition of PTI and protein-coding potential the authors refer to the Materials and Methods. I would encourage to explain in plain terms in the results section 1.) how they decided on this particular formalization and 2.) explain clearly what this means.

      Thank you for your suggestion. We have included a concise definition in the revised text in plain terms (lines 107-115, page 5-6; lines 144-146, page 7).

      - The definition of protein coding potential for appears to be dependent on database classification of a transcript as either coding and non-coding. Particularly for organisms with complex transcriptomes, databases may not contain the proper information - what are the implications for their protein-coding potential score?

      Organisms with complex transcriptomes, such as multicellular organisms, present difficulties in classifying coding vs. noncoding transcripts because RNAs classified as noncoding based on proteomic data from a subset of cell types may encode functional proteins in other cell types for which proteomic data are not available. To examine whether cell types affect the PTI distribution of coding and noncoding transcripts, we analyzed transcriptomic data from five mammals (human, mouse, rat, macaque, and opossum) and found that the PTI score distributions were similar in most cell or tissue types for noncoding transcripts (new Figure 8C and Supplementary Figure 11). However, PTI score distributions for noncoding RNA in mature testes showed a rightward shift for all five species (new Figure 8C and Supplementary Figure 11).

      Furthermore, we found that tissue specificity of RNA expression was correlated with PTI score (new Figure 8D and new Supplementary Figure 12 and 13), with more specific expression associated with higher PTI scores in all five species, with the majority of the tissue-specific expression in mature testis. Therefore, the mature testis is a special tissue that expresses noncoding RNAs with high coding potentials. These results support the hypothesis that the testis is a special organ for new gene origination (Kaessmann 2010). We have added these results and discussion to the revised manuscript (lines 383-402, pages 18-19; lines 427-434, pages 20-21; lines 435-445, page 21).

      Kaessmann H. 2010. Origins, evolution, and phenotypic impact of new genes. Genome Res, 20:1313-26. Epub 2010 Jul 22. PMID: 20651121; PMCID: PMC2945180. https://doi.org/10.1101/gr.101386.109

      - The authors completely ignore plants - would it make sense to expand their analysis to this branch of the tree of life?

      In Supplementary Figure 5 of our original manuscript (new Supplementary Figure 7), we have included the PTI score distributions from plants. We also present their overlapping scores (Opti) in the revised manuscript.

      Reviewer 2 (Significance (Required)):

      The manuscript presents an elegant way to predict protein-coding and non-coding RNAs, which may be very relevant to the study of organisms with complex transcriptomes. The audience for the manuscript at the moment may be more limited to scientists trained and working in the field of bioinformatics, but with some integration of transcriptomics and ribosome profiling data, as well as an effort to make the results accessible to scientists not trained in bioinformatics, this manuscript may be relevant and of interest to researchers working on the biology of long non-coding RNAs and translation in general. My expertise: systems biology of RNA binding proteins, transcriptomics, RNA biology.

      **Referee Cross-commenting**

      I fully agree with my co-reviewer regarding additional analyses to strengthen the manuscript.

      Thank you for these comments. We analyzed noncoding RNAs using ribosome profiling data and transcriptomes in different tissues. We found that high PTI scores correlated with translation of noncoding RNAs, and that such high PTI-score noncoding RNAs were specifically expressed in mature testes. Because the effective population size was inversely correlated with the overlap of PTI distributions, the slightly deleterious or beneficial mutations in germ cells of matured testis seem to generate high-PTI score noncoding RNAs as candidates for new coding genes in the next generation. This idea is consistent with the hypothesis that new coding transcripts are derived from noncoding transcripts expressed in spermatocytes and spermatids in mature testes. In addition, we found that human noncoding transcripts with high PTI scores tended to be involved in transcriptional regulation, and the target gene of MYCN was significantly enriched as the original gene. A recent study showed that binding sites for transcription factors, including MYCN, are mutational hotspots in human spermatogonia (Kaiser et al. 2021). Therefore, the PTI score offers an opportunity to integrate the concept of gene birth with classical molecular evolutionary theory, thereby contributing to our understanding of evolution.

      Kaiser VB et al. 2021. Mutational bias in spermatogonia impacts the anatomy of regulatory sites in the human genome. Genome Res. Epub ahead of print. PMID: 34417209. https://doi.org/10.1101/gr.275407.121

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In the manuscript "Potentially translated sequences determine protein-coding potential of RNAs in cellular organisms" Suenaga and colleagues analyze the available transcriptomes from 100 prokaryotes and eukaryotes, as well as >100 viruses to understand whether transcripts tend to be translated or not. They develop a potentially translated island score (PTI) that combines the number and length of open reading frames in a transcript. From there they develop a protein-coding potential score that combines PTI with database information on coding and non-coding transcripts in various organisms and that in some sense predicts whether a transcript would fall in the coding or non-coding category. The main takeaway appears to be that in prokaryotes PTIs and protein coding potential strongly differentiates coding and non-coding transcripts, while in eukaryotes these differences appear to be more fluid. The manuscript presents an interesting bioinformatic analysis of coding properties across the phylogenetic field and may represent an interesting resource. The audience for the manuscript at the moment may be more limited to scientists trained and working in the field of bioinformatics, but with some integration of transcriptomics and ribosome profiling data, as well as an effort to make the results accessible to scientists not trained in bioinformatics, this manuscript may be relevant and of interest to researchers working on the biology of long non-coding RNAs and translation in general.

      Major comments:

      • some validation of their predictions of coding potential would be good to add. There are plenty of ribosome profiling experiments out there for some of the studied organisms (human, mouse, E. coli) that could be used to show that indeed some of the non-coding RNAs are misclassified and have ribosome density across the predicted open reading frames.
      • the manuscript is at times difficult to follow and the implication of the statements may not be immediately clear to the readers, particularly those without formal training in bioinformatic methods; even in the abstract. Some examples: "The relationship between the PTI score and protein-coding potential was sigmoidal in most eukaryotes; however,it was linear passing through the origin in three distinct eutherian lineages, including humans". Here it is not clear what this means (without reading the paper) - and even after reading the paper the importance of noting the sigmoidal vs linear relationship of PTI vs. protein-coding potential is unclear. I would encourage the authors to double-check that they provide a clear interpretation of their results, with readers unschooled in proper statistics in mind.
      • For the definition of PTI and protein-coding potential the authors refer to the Materials and Methods. I would encourage to explain in plain terms in the results section 1.) how they decided on this particular formalization and 2.) explain clearly what this means.
      • The definition of protein coding potential for appears to be dependent on database classification of a transcript as either coding and non-coding. Particularly for organisms with complex transcriptomes, databases may not contain the proper information - what are the implications for their protein-coding potential score?
      • The authors completely ignore plants - would it make sense to expand their analysis to this branch of the tree of life?

      Significance

      The manuscript presents an elegant way to predict protein-coding and non-coding RNAs, which may be very relevant to the study of organisms with complex transcriptomes.

      The audience for the manuscript at the moment may be more limited to scientists trained and working in the field of bioinformatics, but with some integration of transcriptomics and ribosome profiling data, as well as an effort to make the results accessible to scientists not trained in bioinformatics, this manuscript may be relevant and of interest to researchers working on the biology of long non-coding RNAs and translation in general.

      My expertise: systems biology of RNA binding proteins, transcriptomics, RNA biology.

      Referee Cross-commenting

      I fully agree with my co-reviewer regarding additional analyses to strengthen the manuscript.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      The manuscript submitted by Suenaga and co-authors presents a method to evaluate the protein-coding potential of transcripts. This method is based on an index that they name the PTI (potentially translated island) score, which represents the ratio between the length of the largest predicted ORF and the sum of all the predicted ORF lengths, for each transcript. The author compare PTI score distributions between transcripts classified as protein-coding and as non-coding in public nucleotide databases, for a wide range of species, including bacteria, archaea, eukaryotes and viruses. They derive from this comparison a measure of the protein-coding potential of transcripts. To validate this approach, the authors evaluated the distributions of Ka/Ks values for transcripts annotated as coding or non-coding, in various classes of PTI-based protein-coding potential. The main finding of the manuscript stems from the comparison among species: the authors find that bacteria and archaea have narrow, non-overlapping PTI distributions for coding and non-coding transcripts, while eukaryotes have broader and more overlapping PTI distributions.

      Major comments

      • The authors provide no rationale for using the PTI score to measure the protein-coding potential of transcripts. The only attempt to justify this measure is given in the methods: "The definition of PTI score is motivated by our hypothetical concept that translation of pPTI is limited by alternate competing sPTIs." (lines 426-427, page 20). What the PTI score measures is the dominance of the largest predicted ORF over the predicted ORFs, in terms of length. It is not clear why there would be competition for translation of putative ORFs for genuine protein-coding transcripts. An alternative hypothesis, briefly touched upon in the discussion (lines 318-320) is that translation of non-functional ORFs could give rise to the production of toxic proteins, in addition to being costly in terms of energy. The authors should provide the reasoning behind the PTI score and should explain the biological mechanisms that may underlie differences between coding and non-coding transcripts.
      • The presence of ORFs in transcripts has long been used as a predictor of their protein-coding potential. For example, the ORF size and the ORF coverage are part of the set of predictors implemented in CPAT (Wang et al., 2013). The PTI score is necessarily related to these methods, yet no comparison is provided. If the PTI score is to be used as a measure to classify transcripts as coding or non-coding, its performance should be compared to other classifiers, including those that use the presence of ORFs as a predictor (e.g., CPAT) but not only (e.g., PhyloCSF, based on the pattern of sequence evolution).
      • The authors compare the observed PTI score distributions with the PTI scores from random or shuffled sequences. They conclude that the PTI scores do not depend on transcript lengths but on transcript sequences (lines 122-123). However, this is not true for non-coding RNAs, for which the observed and randomized distributions are very similar. The relationship between transcript length and PTI scores should be analyzed into more detail. Are the annotated non-coding transcripts with high PTI scores particular in terms of length?
      • The authors discuss in depth the correlation between PTI scores and PTI-based protein-coding potential measures (e.g., section "PTI scores correlate with protein-coding potential in humans and mice", starting line 125; section "Relationship between the PTI score and protein-coding potential", starting line 243). Given that the protein-coding potential is directly derived from the PTI score distributions for coding and non-coding transcripts, it is not surprising that the two should be correlated. The significance of observing a linear or a sigmoid relationship is not clearly explained.
      • The authors use the entire set of annotated coding and non-coding transcripts to assess the distribution of PTI scores and to define the protein-coding potential. Traditionally, for methods that aim to classify transcripts as coding or non-coding, this is done using "bona fide" coding and non-coding transcripts, which are used as training sets. The efficiency of the method can then be evaluated using a test set of transcripts. This aspect is lacking here and should be implemented.
      • The comparisons among species are likely biased by the quality of lncRNA annotations in non-model organisms - cf. high variations among primates, which are likely driven by the annotation quality and depth.
      • The differences among bacteria, archaea and eukaryotes should be discussed into more depth. In bacteria, the genuine ORF is well defined by the presence of translation signals (e.g., Shine-Dalgarno sequence). Other factors are also at work in both prokaryotes and eukaryotes, including RNA secondary structures. The relationship between these factors and the PTI score should be discussed.
      • From an evolutionary perspective, the effective population size (Ne) is also likely related to the "quality" of the ORFs. An analysis of Ne vs. the PTI score distributions would be an interesting addition to this manuscript.

      Significance

      This manuscript is lacking in novelty and is not well positioned in the field. If the aim of this work is to provide a method to classify transcripts as coding or non-coding, the authors should provide detailed comparisons with existing methods (see above). If the aim is to understand what defines a genuine protein-coding transcript, then the biological mechanisms should be better described and the comparisons among species and among functional categories of genes should be further developed. The idea of using the "dominance" of the largest ORF compared to the other predicted ORFs is interesting, and provides a new element compared to existing methods that rely exclusively on ORF length and ORF coverage. I would recommend that the authors develop this idea further and discuss the advantages of using the ORF dominance compared to just the ORF length or coverage.

      Referee Cross-commenting

      I fully agree with Reviewer 2's remarks. In particular, adding ribosome profiling analyses is an excellent idea and could substantially improve the manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01041 Corresponding author(s): Gregory P. Way, PhD

      1. General Statements

      On behalf of the authors, I’d like to thank the Review Commons team for sending our manuscript out for review. I’d also like to thank the three anonymous reviewers for providing valuable feedback that will improve the clarity, focus, and analysis interpretation presented in our manuscript.

      To prompt the editorial team, our paper provides two well-controlled innovations:

      We are the first to train variational autoencoders (VAEs) on classical image features extracted from Cell Painting images. VAEs are commonplace in, and have contributed major discoveries to, other biomedical data types (e.g. transcriptomics), but they have been underexplored in morphology data. In our paper, we trained and optimized three different VAE variants using Cell Painting readouts and compared these variants against shuffled data, against PCA (a nonlinear dimensionality reduction algorithm commonly used as a VAE control), and against L1000 (mRNA) readouts from the same perturbations. We found that cell morphology VAEs train with different settings than gene expression data, and that they generate interpretable latent spaces that depend on the chosen VAE variant.

      We tested special VAE properties to predict polypharmacology cell states in a novel way. Polypharmacology is a major reason why drugs fail to reach the bedside. Off-target effects cause unintended toxicity, and lead to adverse clinical events. In our paper, we used VAE latent space arithmetic (LSA) to predict polypharmacology cell states; in other words, what cells might look like if we perturbed them with a compound that had two mechanisms of action (MOA). We compared our results to shuffled data, PCA, and to LSA performed with VAEs trained using L1000 readouts. We found that cell morphology and gene expression provide complementary information, and that we could predict some polypharmacology cell states robustly, while others were more difficult to predict.

      We found value in all of the reviewer comments. We intend to conduct all but four of the proposed analyses to supplement our aforementioned innovations.

      In the following revision plan, we include all reviewer comments exactly as they were written. The reviewers often had overlapping suggestions. In these cases, we grouped together similar reviewer comments and responded to them once.

      We include three sections: 1) A description of the revisions we plan to conduct in the near future; 2) A description of changes we have already made; and 3) A description and rationale of changes we will not pursue.

      Lastly, we would like to highlight that all reviewers provided positive feedback in their reviews. They discussed our paper as “conceptually and technically unique” and were positive about our methods section, stating that we did a “good job making everything available and reproducible”. Our methods section is complete, and we provide a fully reproducible and versioned github repository. We will release a second version of our github repository when we complete our revision plan to maintain clarity for our submitted version and the peer-reviewed version.

      1. Description of the planned revisions

      2.1. Address UMAP interpretability to provide a deeper description of MOA performance

      Reviewer 1: Instead of using UMAP embedding, it would be better to compare reconstruction error or show a reconstructed image with the original image to claim that models reliably approximate the underlying morphology data.

      Reviewer 1: Rather than just stating that the VAE's did not span the original data distribution and saying beta-VAE performed best by eye, some simple metrics can be drawn to analyze the overlap in data for a more direct and quantified comparison. Researchers should also explain what part of the data is not being captured here. Some analysis of what the original uncaptured UMAP represents is important in understanding the limitations of the VAEs' capacity.

      Reviewer 2: The authors compare generation performance based on UMAP. In the UMAP space, data tend to cluster together even though they might be far from each other in the feature space. I would like to see more quantitive metrics on how well these methods capture morphology distributions. You can compute metrics like MMD distance, kullback leibler (KL), earthmoving distance, or a simple classifier trained on actual MoA classes tested on generated data.

      We agree with the reviewers that evaluating reconstruction loss in addition to providing the UMAP coordinates would improve understanding of VAE limitations and enable a better comparison of VAE performance. We will analyze reconstruction loss across models and include these data as a new supplementary figure, which will enable direct comparisons across models and across different MOAs.

      We also agree that UMAP interpretation can be misleading. While currently state-of-the-art, UMAP has mathematical limitations that prevent interpretation of global data structures. However, there are emerging tools, including a new dimensionality reduction algorithm, called PaCMAP, which aims to preserve both local and global structure (Wang et al, 2021). We will explore this tool to determine, both mathematically and empirically, which is most appropriate for our dataset by cross-referencing the visualization with our added supplementary figure describing per-MOA reconstruction loss.

      We would also like to emphasize that we trained our VAEs using CellProfiler readouts from Cell Painting images and not the raw Cell Painting images themselves. As this was one of our primary innovations, this detail is extremely important. Therefore, we have improved clarity and added emphasis to this point in the manuscript introduction and discussion (see section 3).

      2.2. More specific comparisons of MOA predictions to shuffled data and improved description of MOA label accuracy

      Reviewer 1: It is difficult to know the clear threshold for successful performance is on figures like Figure 7 and SFigure 9, but by and large, it appears that the majority of predicted combination MOAs were not successful. Without the ability to either A) adequately predict most all combinations from individual profiles that were used in training or B) an explanation prior to analysis of which combination will be able to predict, it is difficult to see this method being used since the combinatorial predictions are more likely not informative.

      Reviewer 1: The researchers justify the poor performance compared to shuffled data, by saying that A) MOA annotations are noisy and unreliable and B) they MOAs may only manifest in other modalities like what was seen in the L1000 vs morphology predictability. While these might be true, knowing this the researchers should make an effort to clean and de-noise their data and select MOAs that are well-known and reliable, as well as, selecting MOAs for which we have a known morphological or genetic reaction.

      Reviewer 3: Figure 6 is missing error bars (standard deviation of the L2 distance) and, as such, is hard to draw conclusions from.

      We thank the reviewers for raising this concern. We agree that it is critical, and we appreciate the opportunity to address it.

      All three of these comments relate to being unable to draw conclusions from our results when most A∩B predictions appear to have no difference from shuffled controls. Therefore, to address this comment, we will update our LSA evaluation to compare each MOA to a matched set of randomly shuffled data. Specifically, in our existing comparison, we realized a methodological fallacy in how we're displaying these data shuffles. We should be comparing specific MOA combinations to their corresponding shuffled results instead of comparing all to all, which will artificially decrease performance when there are polypharmacology predictions that fail to recapitulate the ground truth cell states.

      We have connected with Paul Clemons, the senior director Director of Computational Chemical Biology Research at the Broad Institute of MIT and Harvard, who has informed us that the Drug Repurposing Hub annotations are among the most well documented. Therefore, while we know that biological annotations are often incomplete, our original text overemphasized the amount of noise contributed by inaccurate labels. We therefore added the following sentence to the discussion to clarify this important point:

      “However, the Drug Repurposing Hub MOA annotations are among the most well-documented resources, so other factors like different dose concentration and non-additive effects contribute to weak LSA performance for some compound combinations (Corsello et al, 2017).”

      We will also update our supplementary figure to account for specific MOA shuffling and include additional text comparing Cell Painting and L1000 showing which MOAs perform best in which modality.

      2.3. More detailed evaluation of MOA performance across drug variance and drug classes

      Reviewer 1: With the small number of combinations that are successfully predicted, to build confidence in the performance, it would be necessary to explain the reason for the differences in performance. Further experimentations should be done looking into any relationship between the type of MOAs (and their features) and the resulting A|B predictability. Looking at Figure 7, the top-performing combinations are comprised entirely of inhibitor MOAs. If the noisiness of the data is a factor, there should be some measurable correlation between feature noisiness and variation and the resulting A|B predictability from LSA.

      We agree with the reviewer that further experimentation would be helpful to gain confidence in our LSA performance. We plan to perform two different analyses to address this question. First, we will compare profile reproducibility (median pairwise correlations among MOAs) to MOA predictability. This will provide insight to determine the relationship between MOA measurement variance and performance. Second, we will split MOAs by category (e.g. inhibitor, activator) and test if there are significant performance differences between categories across VAE models in both L1000 and Cell Painting data. This will tell us if there are certain trends in the type of MOAs we’re able to predict. If there is, this would be useful knowledge since it could suggest that certain types of MOAs are associated with a more consistent cell state.

      2.4. Higher confidence in LSA overfitting assessment

      Reviewer 1: To show that the methodology works well on unseen data, researchers withheld the top 5 performing A|B MOAs (SFig 9) and showed they were still well predicted. This is not the most compelling demonstration since the data to be held out was selected with bias as the top-performing samples. It would be much more interesting to withhold an MOA that was near or only somewhat above the margin of acceptability and see how many holdouts affected the predictability of those more susceptible data points. From my best interpretation, the hold-out experiment also only held out the combination MOA groups from training. It would be better if single MOAs (for example A) which were a part of a combination of MOA (A|B) were also held out to see if predictability suffered as a result and if generalizability did extend to cells with unseen MOAs (not just cells which had already highly performing combinations of seen MOAs).

      We believe our original analysis was extremely compelling. Even if we removed the top MOAs from training, we were still able to capture their combination polypharmacology cell states through LSA. We find this similar to removing all pictures of sunglasses in an image corpus of human faces, but still being able to reliably infer pictures of people wearing sunglasses. Specifically, this tells us that our model is learning some fundamental data generating function that our top performing MOAs tap into regardless of if they are present or not in training.

      However, we agree with the reviewer that withholding intermediate-performing MOAs would also be informative, but for a separate reason. Unlike the best predicted MOAs, the intermediate MOAs are likely more susceptible to changes in the training data, so it would be interesting to determine if intermediate MOAs’ performance is a result of overfitting instead of truly learning aspects of the data generating function. We plan to perform this new analysis and add the results to Supplementary Figure 8 as a subpanel and add a full description of the approach to the appropriate methods subsection.

      2.5. Additional metrics to evaluate LSA predictions to provide more confident interpretation

      Reviewer 2: The predictions are evaluated using L2 distances, which I find not that informative. I would like to see other metrics (correlation or L1 or distribution distances in previous comments)

      We agree with the reviewer that using more than one metric would be helpful because oftentimes a single metric does not tell a complete story. We will add a panel to the LSA supplementary figure (Supplementary Figure 7), using Pearson correlation instead. While L2 distances will tell us how close predictions are to ground truth, Pearson correlations will tell us how consistent, on average, we are able to predict feature direction.

      2.6. Adding a performance-driven feature level analysis to categorize per-feature modeling ability

      Reviewer 2: I would like to see feature-level analysis, which features are well predicted and which ones are more challenging to predict?

      We agree with the reviewer that feature level analysis would be interesting to study. We believe that understanding which features are easy and hard to model could give insight into why certain MOAs (which could be associated with more signal in certain Cell Painting features) are predicted better than others.

      However, we are concerned that it is difficult to have an objective measurement of which features are easier to model because features that have less variation might be easier to model. So, we will analyze the correlation between individual feature reconstruction loss vs. feature variance across profiles. We will color-code the points to represent feature groups or channels. This analysis will not only demonstrate the relationship between feature variance and modeling ability, but also provide insight into the difficulty of modeling individual CellProfiler features.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      3.1. Documenting positive feedback as provided by the three reviewers

      Reviewer 1: With access to the dataset, the posted GitHub, and documentation in the paper, I believe that the experiments are reproducible.

      Reviewer 1: The experiments are adequately replicated statistically for conventions of deep learning.

      Reviewer 1: This paper proposes a conceptually and technically unique proposal in terms of application, taking existing technologies of VAEs and LSA and, and as far as I know, uses them in a novel area of application (predicting and simulating combination MOAs for compound treatments). If this work is shown to work more broadly and effectively, is seen through to it completion, and is eventually successfully implemented, it will help to evaluate the effects of drugs used in combination on gene expression and cell morphology. An audience in the realm of biological deep learning applications as well as an audience working in the compound and drug testing would be interested in the results of this paper. Authors successfully place their work within the context of existing literature, referencing the numerous VAE applications that they build off of and fit into the field of (Lafarge et al, 2018; Ternes et al, 2021, etc...), citing the applications of LSA in the computer vision community (Radford et al, 2015, Goldsborough et al, 2017), and discussing the biological context that they are working in (Chandrasekaran et al, 2021).

      Reviewer 2: The main novelty of the work is applying VAEs on cell painting data to predict drug perturbations. The final use case could be guiding experimental design by predicting unseen data. However, the authors do not show such an example and use case which is understandable due to the need for doing further experiments to validate computational results and maybe not the main focus of this paper. The authors did a good job of citing existing methods and relevant. The potential audience could be the computational biology and applied machine learning community.

      Reviewer 3: The manuscript is beautifully written in a crystal clear manner. The authors have made a visible effort towards making their work understandable. The methods section is clear and comprehensive. All experiments are rigorously conducted and the validation procedures are sound. The conclusions of the paper are convincing and most of them are well supported by the data. Both the data and the code required to reproduce this work are freely available. Overall, the article is of high quality and relevance to several scientific communities.

      We thank the reviewers for their encouraging remarks and overall positive sentiment. As early-career researchers, we feel empowered by these words.

      3.2. Moved Figure 2 to supplement and removed Figure 5

      Reviewer 1: Fig 2 is not informative so it can go to supplementary.

      Reviewer 2: I liked the paper's GitHub repo, the authors did a good job making everything available and reproducible. As a suggestion, you can move the learning curves in two the sup figures cause they might not be the most exciting piece of info for the non-technical reader.

      Reviewer 3: I would suggest removing Figure 5 (or moving it to the supplementary) as it revisits the content of Figure 1 and does not bring much extra information.

      We agree that Figure 2 might not be informative to a non-technical reader, so we have accepted this suggestion by both reviewers 1 and 2, and we have moved Figure 2 to supplementary.

      We agree with the reviewer and have removed Figure 5.

      3.3. Clarified our data source as CellProfiler readouts, not raw Cell Painting images

      Reviewer 1: In Fig 4, it would be useful to show a few sample representative images with respect to CellProfiler feature groups.

      Reviewer 1: Figure 6, what does it means original input space? Does it mean raw pixel image? As researchers extracted CellProfiler feature groups already, it would be interesting to compare mean L2 distance based on CellProfiler features so that whether VAE improves performance or not (compared to handcrafted features) as a baseline.

      Reviewer 3: While what "morphological readouts" concretely mean becomes clearer later on in the paper, it would be useful to give a couple of examples early on when introducing the considered datasets.

      We thank the reviewer for these suggestions, which bring to light a common source of confusion, which we must alleviate. We are working with CellProfiler readouts (features extracted using classical algorithms) of the Cell Painting images and not the images themselves. We have made several edits throughout the manuscript to improve clarity and remove this confusion, including the introduction, in which we clearly state our model input data:

      “Because of the success of VAEs on these various datasets, we sought to determine if VAEs could also be trained using cell morphology readouts (rather than directly on images), and further, to carry out arithmetic to predict novel treatment outcomes. We derive the cell morphology readouts using CellProfiler (McQuin et al, 2018), which measures the size, structure, texture, and intensity of cells, and use these readouts to train all models.”

      This decision comes with tradeoffs: The benefit of using CellProfiler readouts instead of images is that they are more manageable but we might lose some information. We more thoroughly discuss this important tradeoff in the discussion section:

      “We determined that VAEs can be trained on cell morphology readouts rather than directly using the cell images from which they were derived. This decision comes with various trade-offs. Compared to cell images, cell morphology readouts as extracted by image analysis tools (e.g. CellProfiler) are a more manageable data type; the data are smaller, easier to distribute, substantially less expensive to analyze and store, and faster to train (McQuin et al, 2018). However, it is likely some biological information is lost, because these tools might fail to measure all morphology signals. The so-called image-based profiling pipeline also loses information, by nature of aggregating inherently single-cell data to bulk consensus signatures (Caicedo et al, 2017).”

      3.4. Clarified future directions to infer cell health readouts from simulated polypharmacology cell states

      Reviewer 1: Authors also make the claim that they can infer toxicity and simulate the mechanism of how two compounds might react. This is a claim that would not be supported even if the method were able to successfully predict morphology or gene profiles. Drug interaction and toxicity are quite complex and goes beyond just morphology and expression. VAEs predicting a small set of features would not be able to capture information beyond the readouts, especially when dealing with potentially unseen compounds for which toxicity is not yet known. For example, two compounds might produce a morphology that appears similar to other safe compounds but has other factors that contribute to toxicity. Further, here they show no evidence of toxicity or interaction analysis.

      The reviewer is correct that such a claim is unsupported by our research. Our message was actually that inferring toxicity could be a potential future application of our work. Specifically, for example, we can apply orthogonal models of cell toxicity that we previously derived using other data (Way et al, 2021a) to our inferred polypharmacology cell states. We thank this reviewer for noticing our lack of clarity, and we have made changes in the discussion to make it clear that inferring toxicity is something we may do in the future and is not something that is discussed in the manuscript:

      “In the future, by predicting cell states of inferred polypharmacology, we can also infer toxicity using orthogonal models (e.g. Way et al. 2021) and simulate the mechanisms of how two compounds might interact.”

      3.5. Clarified our method of splitting data, and noting how a future analysis will answer overfitting extent

      Reviewer 2: Could authors outline detailed data splits? Which MoA are in train and which are held out from training? As I understood, there were samples from MoAs that were supposed to be predicted in the calculation of LSA? Generally, the predicted MoA should not be seen during training and not in LSA calculation.

      We now more explicitly detail how we split our data in the methods:

      “As input into our machine learning models, we split the data into an 80% training, 10% validation, and 10% test set, stratified by plate for Cell Painting and stratified by cell line for L1000. In effect, this procedure evenly distributes compounds and MOAs across data splits.”

      We also thank the reviewer for this comment, because they express an important concern about making sure that we are not overfitting to the data. We have explained in the manuscript that because of lack of data, MOAs were repeated in training and LSA. However, we believe overfitting is not playing a large role in model performance. Through our hold 5 out experiment, we are able to show that our models are able to predict the same MOAs irrespective of whether they were in the training data, indicating that we did not overfit to the distribution of certain MOAs.

      Reviewer 1 also suggested that we do the hold 5 out experiment on A∩Bs that were barely predicted. After we do that, we will explicitly demonstrate the extent of overfitting.

      3.6. Introduced acronyms when they first appear in the manuscript

      Reviewer 3: The Kullback-Leibler divergence is properly introduced in the methods part, but not at all in the introduction (it directly appears as "the KL divergence"). To enhance readability, it would be better to fully spell it before using the acronym, and maybe give a one-sentence intuition of what it is about before pointing out to the methods part for more details.

      We thank the reviewer for bringing this to our attention. We have carefully reviewed the entire manuscript and have corrected such instances of clear introductions to acronyms.

      3.7. Fixed minor text changes

      Reviewer 3: In Figure 1, I would recommend changing "compression algorithms" to "dimension reduction algorithm" or "embedding algorithm". In a compression setting, I would expect the focus to be on the number of bits of information each method requires (or the dimension of the resulting embedding) to encode the data while guaranteeing a certain quality threshold. This is obviously not the case here as the dimension of the embedding is fixed and the focus is on exploring how the embedding is constructed (eg how much it decorrelates the different features, etc) - which may be misleading.

      Reviewer 3: I recommend using "A n B" or "A & B" or "(A, B)" to denote the combination of two independent modes of action A and B. The current notation "A | B" overloads the statistical "A given B" which appears in the VAE loss and is therefore misleading.

      We agree with the reviewer, and aim to minimize all sources of potential confusion. We have made the change in the figure.

      We also agree that our current notation can be confusing. We have updated all instances of “A|B” with “A ∩ B”.

      3.8. Added hypothesis of MMD-VAE oscillations to supplementary figure legend

      Reviewer 3: Do the authors have a hypothesis of what may be causing MMD-VAE to oscillate during validation when data are shuffled? This seems to be the case on two of the three considered datasets (Figure 2 and SuppFigure 1) and is not observed for the other models. Including a few sentences on that in the text would be interesting.

      We believe a big reason for this is because of the fact that the optimal MMD-VAE had a much higher regularization term, which puts a greater emphasis on forming normal latent distributions, than the optimal Beta or Vanilla VAE. Forcing the VAE to encode a shuffled distribution into a normally distributed latent distribution would be difficult to do consistently across different randomly shuffled data subsets, and therefore might cause oscillations in the training curve across epochs when the penalty for that term is high. As these observations may be interesting to a certain population of readers, we have incorporated this explanation into the supplementary figure legend (which is where this figure is shown):

      “Forcing the VAE to consistently encode a shuffled distribution into a normally distributed latent distribution would be difficult, and therefore might cause oscillations in the training curve across epochs.”

      3.9. Explained our selection of VAE variants

      Reviewer 3: The different types of considered VAE and their differences are very clearly introduced. It may however be good to motivate a bit more the focus on beta-VAE and MMD-VAE among all the possible VAE models. This is partly done through examples in the second paragraph of page 2, but could be elaborated further.

      We thank the author for their encouraging remarks. We have made edits to the manuscript’s introduction, explaining why we chose these two variants out of all the possible choices:

      “We trained vanilla-VAEs, β-VAEs, and MMD-VAEs only, and not other VAE variants and other generative model architectures, such as generative adversarial networks (GANs), because these VAE variants are known to facilitate latent space interpretability.”

      1. Description of analyses that authors prefer not to carry out

      4.1. We will not explore additional latent space dimensions in more detail, as this is out of scope

      Reviewer 1: As both reconstructed and simulated data did not span the full original data distribution, it might be better to look at reconstruction error and increase the dimension of latent space.

      We thank the reviewer for bringing up this important point. Our VAE loss function consists of the sum of reconstruction error and some form of KL divergence. Specifically, this reviewer is suggesting that if we only minimize reconstruction error (or focus more on reconstruction over KLD by lowering beta), a higher latent dimension would result in better overall reconstruction. This is true, but doing so would have negative consequences. While we would perhaps get the UMAPs to show the full data distribution, the UMAPs are not our focus; predicting polypharmacology through LSA is. We found that when we have a higher focus on the reconstruction term, we have more feature entanglement, as indicated by lower performance when simulating data and overlapping feature contribution per latent feature. The fact that simulating data would logically require less disentanglement than performing LSA shows that we require higher regularization (and hence lower focus on reconstruction) than the one we got from simulating data.

      Essentially, while the reviewer's comments would improve reconstruction and allow us to improve the UMAPs, doing so would likely worsen LSA performance, which is the main focus of the project. Also, increasing the latent dimension without changing beta would likely have caused little to no change because since beta is encouraging disentanglement, it would cause the newly added dimension to have little variation and encode little new information that wasn’t already encoded before.

      We have also previously explored the concept of toggling the latent dimensions in a separate project (Way et al, 2020). We are very interested in this area of research in general, and any additional analyses (beyond hyperparameter optimization) deserves a much deeper dive than what we can provide in this paper.

      Lastly, we intend to include a deeper description and analysis of reconstruction loss across models, datasets, and MOAs as was suggested by a previous reviewer comment (see section 2.1 above)

      4.2. We will not review Gaussian distribution assumptions of the VAE as we feel it is not informative

      Reviewer 1: By looking at SFigure 6, I am wondering whether latent distribution actually met gaussian distribution (assumption of VAE). It may show skew distribution as some of latent features shows low contribution.

      This reviewer’s comment is interesting, but we do not believe it would change the findings of our study. Suppose we find that the latent dimensions aren’t normally distributed. This wouldn’t change much; a gaussian distribution isn’t the most critical to perform LSA. We need the latent code to be disentangled, but having normally distributed latent features doesn't necessarily mean that we have good disentanglement (see https://towardsdatascience.com/what-a-disentangled-net-we-weave-representation-learning-in-vaes-pt-1-9e5dbc205bd1)

      4.3. In this paper, we will not train or compare conditional VAEs nor cycle GANs

      Reviewer 2: While authors provided a comparison between vanilla VAE and MMD-VAE, B-VEA, there are other methods capable of doing similar tasks (data simulation, counterfactual predictions ), I would like to see a comparison with those methods such as conditional VAE( https://papers.nips.cc/paper/2015/hash/8d55a249e6baa5c06772297520da2051-Abstract.html, CVAE + MMD : https://academic.oup.com/bioinformatics/article/36/Supplement_2/i610/6055927?login=true) or cycle GANs(https://arxiv.org/abs/1703.10593 ).

      While such comparisons would be interesting, they are not the main focus of the manuscript, which is to benchmark the use of VAEs in cell morphology readouts and to predict polypharmacology.

      We think that CVAE would not be appropriate for our study. In a CVAE, the encoder and decoder are both conditioned to some variable. In our situation where we are predicting the cell states of different MOAs, it would make most sense to condition on the MOA. However, because we’re using the MOA labels in our LSA experiment, conditioning on them is likely to bias our results and not be effective for MOAs outside the conditioning.

      For cycle GANs, we have found that training using these data, in a separate study in our lab, is extremely difficult. Our lab has not published this yet, but once we are able to better understand cycleGAN behavior in these data, it will require a separate paper in which we compare performance and dissect model properties in much greater detail.

      Nevertheless, we have added citations to multi-modal approaches like cycle GANs (see section 4.4) as they will point a reader to useful resources for future directions.

      4.4. We will not be comparing with multi-modal integration, but we clarified our focus on Cell Painting VAE novelty and added multi-modal citations

      Reviewer 1: Researchers found that the optimal VAE architectures were very different between morphology and gene expression, suggesting that the lessons learned training gene expression VAEs might not necessarily translate to morphology. It would be interesting to compare the result with multimodal integration as baseline (i.e., Seurat).

      Our focus in this paper was to train and benchmark different variational autoencoder (VAE) architectures using Cell Painting data and to demonstrate an important, unsolved application in predicting polypharmacology that we show is now possible for a subset of compounds. It was a natural and useful extension to compare Cell Painting VAE performance with L1000 VAE performance especially since our data set contained equivalent drug perturbations. We feel that any extension including multi-modal data integration will distract focus away from the Cell Painting VAE novelty, and requires a much deeper dive beyond scope of our current manuscript.

      Additionally, there have been other, more in-depth and very recent multi-modal data integration efforts using the same or similar datasets (Caicedo et al, 2021; Haghighi et al, 2021). In a separate paper that we just recently submitted, we also dive much deeper to answer the question of how the two modalities complement one another in various ways and for various tasks (Way et al, 2021b). These two papers already provide a deeper and more informative exploration of Cell Painting and L1000 data integration.

      Therefore, because multi-modal data integration, while certainly interesting, will distract from the Cell Painting VAE novelty and is redundant with other recent publications, we feel it is beyond scope of this current paper.

      Nevertheless, multi-modal data integration is important to mention, so we add it to the discussion. Specifically, we discuss how multi-modal data integration might help with predicting polypharmacology in the future and include pertinent citations so that we, or another reader, might be able to follow-up in the future. The new section reads:

      “Because we had access to the same perturbations with L1000 readouts, we were able to compare cell morphology and gene expression results. We found that both models capture complementary information when predicting polypharmacology, which is a similar observation to recent work comparing the different technologies’ information content (Way et al, 2021). We did not explore multi-modal data integration in this project; this has been explored in more detail in other recent publications (Caicedo et al, 2021; Haghighi et al, 2021). However, using multi-modal data integration with models like CycleGAN or other style transfer algorithms might provide more confidence in our ability to predict polypharmacology in the future (Zhu et al, 2017).”

      1. References

      Caicedo JC, Cooper S, Heigwer F, Warchal S, Qiu P, Molnar C, Vasilevich AS, Barry JD, Bansal HS, Kraus O, et al (2017) Data-analysis strategies for image-based cell profiling. Nat Methods 14: 849–863

      Caicedo JC, Moshkov N, Becker T, Yang K, Horvath P, Dancik V, Wagner BK, Clemons PA, Singh S & Carpenter AE (2021) Predicting compound activity from phenotypic profiles and chemical structures. bioRxiv: 2020.12.15.422887

      Corsello SM, Bittker JA, Liu Z, Gould J, McCarren P, Hirschman JE, Johnston SE, Vrcic A, Wong B, Khan M, et al (2017) The Drug Repurposing Hub: a next-generation drug library and information resource. Nat Med 23: 405–408

      Haghighi M, Singh S, Caicedo J & Carpenter A (2021) High-Dimensional Gene Expression and Morphology Profiles of Cells across 28,000 Genetic and Chemical Perturbations. bioRxiv: 2021.09.08.459417

      McQuin C, Goodman A, Chernyshev V, Kamentsky L, Cimini BA, Karhohs KW, Doan M, Ding L, Rafelski SM, Thirstrup D, et al (2018) CellProfiler 3.0: Next-generation image processing for biology. PLoS Biol 16: e2005970

      Wang Y, Huang H, Rudin C & Shaposhnik Y (2021) Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data visualization. J Mach Learn Res 22: 1–73

      Way GP, Kost-Alimova M, Shibue T, Harrington WF, Gill S, Piccioni F, Becker T, Shafqat-Abbasi H, Hahn WC, Carpenter AE, et al (2021a) Predicting cell health phenotypes using image-based morphology profiling. Mol Biol Cell 32: 995–1005

      Way GP, Natoli T, Adeboye A, Litichevskiy L, Yang A, Lu X, Caicedo JC, Cimini BA, Karhohs K, Logan DJ, et al (2021b) Morphology and gene expression profiling provide complementary information for mapping cell state. bioRxiv: 2021.10.21.465335

      Way GP, Zietz M, Rubinetti V, Himmelstein DS & Greene CS (2020) Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations. Genome Biol 21: 109

      Zhu J-Y, Park T, Isola P & Efros AA (2017) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv [csCV]

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this paper, the authors explore the use of VAE to learn low-dimensional representations of morphological features of cells. They demonstrate that the representations learned by the different VAE models considered accurately model the distribution of features in real data and can be complemented by other biological readouts such as gene expression. Additionally, the structure of the learned feature space appears to be sufficient to generate accurate predictions relying on latent space arithmetic - for instance allowing to predict the morphology of samples subjected to two perturbations knowing the morphology of samples affected by either of these perturbations in isolation.

      Comments:

      The manuscript is beautifully written in a crystal clear manner. The authors have made a visible effort towards making their work understandable. The methods section is clear and comprehensive. All experiments are rigorously conducted and the validation procedures are sound. The conclusions of the paper are convincing and most of them are well supported by the data. Both the data and the code required to reproduce this work are freely available.

      Overall, the article is of high quality and relevance to several scientific communities. I only have a couple of minor comments that I think could help improve it further:

      • The Kullback-Leibler divergence is properly introduced in the methods part, but not at all in the introduction (it directly appears as "the KL divergence"). To enhance readability, it would be better to fully spell it before using the acronym, and maybe give a one-sentence intuition of what it is about before pointing out to the methods part for more details.
      • While what "morphological readouts" concretely mean becomes clearer later on in the paper, it would be useful to give a couple of examples early on when introducing the considered datasets.
      • The different types of considered VAE and their differences are very clearly introduced. It may however be good to motivate a bit more the focus on beta-VAE and MMD-VAE among all the possible VAE models. This is partly done through examples in the second paragraph of page 2, but could be elaborated further.
      • In Figure 1, I would recommend changing "compression algorithms" to "dimension reduction algorithm" or "embedding algorithm". In a compression setting, I would expect the focus to be on the number of bits of information each method requires (or the dimension of the resulting embedding) to encode the data while guaranteeing a certain quality threshold. This is obviously not the case here as the dimension of the embedding is fixed and the focus is on exploring how the embedding is constructed (eg how much it decorrelates the different features, etc) - which may be misleading.
      • Do the authors have a hypothesis of what may be causing MMD-VAE to oscillate during validation when data are shuffled? This seems to be the case on two of the three considered datasets (Figure 2 and SuppFigure 1) and is not observed for the other models. Including a few sentences on that in the text would be interesting.
      • I recommend using "A n B" or "A & B" or "(A, B)" to denote the combination of two independent modes of action A and B. The current notation "A | B" overloads the statistical "A given B" which appears in the VAE loss and is therefore misleading.
      • I would suggest removing Figure 5 (or moving it to the supplementary) as it revisits the content of Figure 1 and does not bring much extra information.
      • Figure 6 is missing error bars (standard deviation of the L2 distance) and, as such, is hard to draw conclusions from.

      Significance

      Nature and significance:

      This work does not hold new conceptual or technical contributions per se as it focuses on showcasing the use of existing techniques established in other fields (eg in the context of natural image processing for latent space arithmetics) to biological data analysis. That said, popularizing successful methodologies beyond the scientific community where they have been developed, as done in this work, is immensely valuable. As such, the approach presented in the paper is likely to inspire and enable many other studies and is therefore a significant contribution (especially so thanks to the code availability!)

      Comparison to existing published knowledge:

      While a bunch of published works use VAEs on biological data, I am not aware of existing ones that study the relative merit of the representations obtained with different VAE models as done here and explore their use in a generative setting with latent space arithmetics. As such, this work is novel and distinguishes itself from existing published knowledge.

      Audience:

      This work is likely to be of interest to life scientists with an enthusiasm for state-of-the-art data analysis techniques. Because the paper is clearly written and makes very few assumptions of prior expert knowledge, it is also likely to be a good entry point to the wider VAE/generative models literature for non-experts. I also believe that this manuscript can be of interest to computer scientists and machine learning researchers as it presents a concrete example of the use of published methods in the context of biological data analysis.

      My expertise:

      Computer vision and machine learning. I do not feel qualified to assess the clinical relevance of this work.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Ler et al. propose a series of VAE based methods to predict compound polypharmacology For cell painting data. They first learn a latent space and try to answer the following counterfactual:

      how would cell morphology or gene expression of a cell perturbed with Drug A change if was perturbed with Drug A and B (A+B) given we have the measurement for drug A and drug B. They address the problem by doing latent space arithmetics (LSA) and decoding the predicted morphology measurements. They first train different VAE models to compare the training stability and simulation performance by sampling from the latent space. Further analysis is the learned latent space to deconvolve latent space to feature space. I like the application of LSA+VAE on cell painting datasets which is the main novelty of the paper. However, I have some major comments and concerns:

      Major comments:

      While authors provided a comparison between vanilla VAE and MMD-VAE, B-VEA, there are other methods capable of doing similar tasks (data simulation, counterfactual predictions ), I would like to see a comparison with those methods such as conditional VAE( https://papers.nips.cc/paper/2015/hash/8d55a249e6baa5c06772297520da2051-Abstract.html, CVAE + MMD : https://academic.oup.com/bioinformatics/article/36/Supplement_2/i610/6055927?login=true) or cycle GANs(https://arxiv.org/abs/1703.10593 ). The authors compare generation performance based on UMAP. In the UMAP space, data tend to cluster together even though they might be far from each other in the feature space. I would like to see more quantitive metrics on how well these methods capture morphology distributions. You can compute metrics like MMD distance, kullback leibler (KL), earthmoving distance, or a simple classifier trained on actual MoA classes tested on generated data.

      The predictions are evaluated using L2 distances, which I find not that informative. I would like to see other metrics (correlation or L1 or distribution distances in previous comments) I would like to see feature-level analysis, which features are well predicted and which ones are more challenging to predict?

      • Could authors outline detailed data splits? Which MoA are in train and which are held out from training? As I understood, there were samples from MoAs that were supposed to be predicted in the calculation of LSA? Generally, the predicted MoA should not be seen during training and not in LSA calculation.

      Minor comments:

      I liked the paper's GitHub repo, the authors did a good job making everything available and reproducible. As a suggestion, you can move the learning curves in two the sup figures cause they might not be the most exciting piece of info for the non-technical reader.

      Significance

      The main novelty of the work is applying VAEs on cell painting data to predict drug perturbations. The final use case could be guiding experimental design by predicting unseen data. However, the authors do not show such an example and use case which is understandable due to the need for doing further experiments to validate computational results and maybe not the main focus of this paper.

      • The authors did a good job of citing existing methods and relevant
      • The potential audience could be the computational biology and applied machine learning community.
      • My expertise is in computational biology and machine learning.
    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Researchers used two primary data modalities (L1000 sequencing data, and Cell painting morphology features) for cell data perturbed by a series of compounds, each with labeled (individual and combination) mechanisms of action. Using several VAEs and ML methods, they evaluated their ability to encode interpretable latent spaces (evaluated by subtracting +/-3stds and checking the contribution off features to the latent space) and adequately reconstruct the input data. Using the constructed latent spaces and labeled MOAs, researchers performed latent space arithmetic, to remove base DMSO features and add features of individual MOAs to produce the features of combination MOAs (evaluated by the significance of difference to shuffled data). Researchers found that MDD-VAE encoded the most information and that VAEs successfully simulated morphology and gene expression features. They found that the optimal VAE architectures were very different between morphology and gene expression. Researchers found that VAEs were able to use individual MOA profiles to simulate some combination MOA profiles with varied success.

      Comments:

      • Researchers found that the optimal VAE architectures were very different between morphology and gene expression, suggesting that the lessons learned training gene expression VAEs might not necessarily translate to morphology. It would be interesting to compare the result with multimodal integration as baseline (i.e., Seurat).

      -Instead of using UMAP embedding, it would be better to compare reconstruction error or show a reconstructed image with the original image to claim that models reliably approximate the underlying morphology data. As both reconstructed and simulated data did not span the full original data distribution, it might be better to look at reconstruction error and increase the dimension of latent space.

      -Fig 2 is not informative so it can go to supplementary. -In Fig 4, it would be useful to show a few sample representative images with respect to CellProfiler feature groups.

      -By looking at SFigure 6, I am wondering whether latent distribution actually met gaussian distribution (assumption of VAE). It may show skew distribution as some of latent features shows low contribution.

      -Figure 6, what does it means original input space? Does it mean raw pixel image? As researchers extracted CellProfiler feature groups already, it would be interesting to compare mean L2 distance based on CellProfiler features so that whether VAE improves performance or not (compared to handcrafted features) as a baseline.

      -It is difficult to know the clear threshold for successful performance is on figures like Figure 7 and SFigure 9, but by and large, it appears that the majority of predicted combination MOAs were not successful. Without the ability to either A) adequately predict most all combinations from individual profiles that were used in training or B) an explanation prior to analysis of which combination will be able to predict, it is difficult to see this method being used since the combinatorial predictions are more likely not informative.

      -Authors also make the claim that they can infer toxicity and simulate the mechanism of how two compounds might react. This is a claim that would not be supported even if the method were able to successfully predict morphology or gene profiles. Drug interaction and toxicity are quite complex and goes beyond just morphology and expression. VAEs predicting a small set of features would not be able to capture information beyond the readouts, especially when dealing with potentially unseen compounds for which toxicity is not yet known. For example, two compounds might produce a morphology that appears similar to other safe compounds but has other factors that contribute to toxicity. Further, here they show no evidence of toxicity or interaction analysis.

      -The researchers justify the poor performance compared to shuffled data, by saying that A) MOA annotations are noisy and unreliable and B) they MOAs may only manifest in other modalities like what was seen in the L1000 vs morphology predictability. While these might be true, knowing this the researchers should make an effort to clean and de-noise their data and select MOAs that are well-known and reliable, as well as, selecting MOAs for which we have a known morphological or genetic reaction.

      -With the small number of combinations that are successfully predicted, to build confidence in the performance, it would be necessary to explain the reason for the differences in performance. Further experimentations should be done looking into any relationship between the type of MOAs (and their features) and the resulting A|B predictability. Looking at Figure 7, the top-performing combinations are comprised entirely of inhibitor MOAs. If the noisiness of the data is a factor, there should be some measurable correlation between feature noisiness and variation and the resulting A|B predictability from LSA.

      -To show that the methodology works well on unseen data, researchers withheld the top 5 performing A|B MOAs (SFig 9) and showed they were still well predicted. This is not the most compelling demonstration since the data to be held out was selected with bias as the top-performing samples. It would be much more interesting to withhold an MOA that was near or only somewhat above the margin of acceptability and see how many holdouts affected the predictability of those more susceptible data points. From my best interpretation, the hold-out experiment also only held out the combination MOA groups from training. It would be better if single MOAs (for example A) which were a part of a combination of MOA (A|B) were also held out to see if predictability suffered as a result and if generalizability did extend to cells with unseen MOAs (not just cells which had already highly performing combinations of seen MOAs).

      -Rather than just stating that the VAE's did not span the original data distribution and saying beta-VAE performed best by eye, some simple metrics can be drawn to analyze the overlap in data for a more direct and quantified comparison. Researchers should also explain what part of the data is not being captured here. Some analysis of what the original uncaptured UMAP represents is important in understanding the limitations of the VAEs' capacity.

      -My suggestions are realistic and feasible. The cost for the recommended tests and validations would cost no additional money (outside of researcher labor and re-training on the existing GPUs) as my recommendations are simply further analysis and training on the same data. Time would be dependent on the time required to train the VAE models, but seeing as 2-layer VAEs are relatively small for the deep learning community, time to train and analyze through existing pipelines should be minimal. This is confirmed by looking at their GitHub code, where jupyter notebooks show that models can be trained in a few minutes.

      -With access to the dataset, the posted GitHub, and documentation in the paper, I believe that the experiments are reproducible.

      -The experiments are adequately replicated statistically for conventions of deep learning.

      Significance

      My background of expertise is developing and applying deep learning and VAEs applied to single cell imaging and expression data. There is no part of this paper that I do not have sufficient expertise to evaluate.

      This paper proposes a conceptually and technically unique proposal in terms of application, taking existing technologies of VAEs and LSA and, and as far as I know, uses them in a novel area of application (predicting and simulating combination MOAs for compound treatments). If this work is shown to work more broadly and effectively, is seen through to it completion, and is eventually successfully implemented, it will help to evaluate the effects of drugs used in combination on gene expression and cell morphology. An audience in the realm of biological deep learning applications as well as an audience working in the compound and drug testing would be interested in the results of this paper. Authors successfully place their work within the context of existing literature, referencing the numerous VAE applications that they build off of and fit into the field of (Lafarge et al, 2018; Ternes et al, 2021, etc...), citing the applications of LSA in the computer vision community (Radford et al, 2015, Goldsborough et al, 2017), and discussing the biological context that they are working in (Chandrasekaran et al, 2021).

  3. Oct 2021
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** The manuscript submitted by Djekidel et al entitled: "CovidExpress: an interactive portal for intuitive investigation on SARS-CoV-2 related transcriptomes" reports on a new web portal to search and analyze RNAseq data related to SARS-CoV-2 infections. The authors downloaded and reprocessed data of more than 40 different studies, which is available on the web portal along with all available meta data. The web portal allows to perform numerous differential expression and gene set enrichment analyses on the data and provides publication ready figures. Because of batch effects that could not be removed, the authors do not recommend to analyze data across studies at this point. The authors conclude that the web portal is unique and will allow scientists to rapidly analyze gene expression signatures related to SARS-CoV-2 infections with the potential to make new discoveries. **Major comments:** Based on the scientific literature, the web portal seems to be an unprecedented resource to search and analyze SARS-CoV-2-related RNAseq data and as such would certainly be a useful resource for the SARS-CoV-2 scientific community. The authors argue that new discoveries are possible by using their web portal in providing use cases. However, the section detailing the analyses the authors did to generate new hypotheses about genes potentially relevant in SARS-CoV-2 infections are very difficult to follow and without more guidance very difficult to reproduce with the web portal. It would require substantial expert knowledge in RNAseq data analysis without more information being provided. It also seems that key candidate genes identified by their analyses have all been studied or identified to be related to SARS-CoV-2 infections, so it is somewhat unclear whether new hypotheses can be generated by the reanalysis of RNAseq datasets, especially because combining the data from different studies is currently not recommended by the authors. The manuscript would benefit from providing fewer use cases but for each of them providing more information on how the portal and which studies were used to generate them and which findings were not described in the publication of the used studies. Some observations in the manuscript are not substantiated with significance calculations (see below). At times, the English writing (grammar) should be improved.

      We thank the reviewer for the positive comments. We suppose the reviewer conclude it need substantial expert knowledge in RNAseq data analysis were due to lacking Video Tutorial. We have now put up several Video Tutorials and more tutorials would be added along later along with users’ feedbacks. We believed this would help ease reviewers’ concern.

      In response to whether new hypothesis can be generated. Sorry if it’s not clear, for all the case studies and our “CovidExpress Reveals Insights and Potential Discoveries”, our portal has provided information not reported by their original publications, as listed below:

      1. Case study #1: The original publication employed a multiomics approach to find the predictor genes between ICU and non-ICU patient. But it’s not obviously to know which genes were mainly due to expression level, which might be due to other data they included (e.g. mass spectrometry data). Our portal allow user to quickly check their expression level and find SESN2 does not have strong expression differences.
      2. Case study #2: We replace this case study with bacterial-susceptibility genes to show such questions could be quickly asked and answered using our portal. Such investigation has not been reported before.
      3. FURIN’s function have been well related to SARS-CoV-2. However, for all reports we could find, they focused on Furin cleavage sites of SARS-CoV-2 or whether FURIN were expressed in the SARS-CoV-2 sensitive tissues. SARS-CoV-2 infection could up-regulate FURIN expression have never been reported before. The study published the data didn’t mentioned FURIN at all. We have made this discovery simply by using CovidExpress portal to find the differential expressed genes and overlap with the literature-based gene list (Supplementary Table S2), we believe more discoveries could be made by users by selecting different data.
      4. If we search OASL AND " SARS-CoV-2" on pubmed, only 5 results shown up indicated it’s under-studied. And none of them indicated OASL could be up-regulated both by SARS-CoV-2 infected lung and Rhinovirus-infected nasal in human. It is not clear to us if we might misunderstand reviewers’ suggestion as “fewer use cases”. Thus, we haven’t removed any use cases, instead we provided more details to help users understand what and how did we made those discoveries not reported by their original studies using CovidExpress.

      At last, we have gone through substantial scientific editing to improve the grammar. **Minor comments:** Page 6 last sentence: The statement of this sentence is very much what one would expect. It remains unclear whether the authors mean this as a result to validate the processing of the RNAseq data or as a new discovery. Please, clarify.

      We apologize for the confusion. We intended this statement to be a result confirming what we had expected. We have now amended the text to make this point clearer.

      Figure 3A: The violin plots are so tiny that it is impossible to see any trends. It is also difficult to understand which categories one should compare with each other. If there is anything significant to observe, please, add a statistical test and better guide the reader.

      We agree with the reviewer; therefore, we have removed this figure from the paper. The goal of this figure was to demonstrate how to use violin plots for exploratory analysis; however, in this case, the violin plot did not show a clear trend. By using more filtering and other plots (e.g., Figure 3B-C), we believe we now provide better insight.

      Figure 3C: A legend for the color scale is missing. The signal (I guess expression amounts) for SESN2 seems very weak and the same between ICU and non-ICU samples. What is the significance for assigning this gene to the group of genes being upregulated in ICU samples? Also contrary to what the authors state on page 8, SESN2 does not seem to be highly expressed in ICU samples, however, without knowing what the colors represent (fold changes or absolute expression values?) this is somewhat speculative.

      We thank the reviewer for bringing this to our attention. We have now added a legend for the color scale in the revised figure. In Figures 3A-C, we are showcasing how an exploratory analysis can be performed using CovidExpress. As an example, we investigated the expression of the top 20 genes identified by the random forest classifier of Overmyer et al., 2021, as predictors of ICU and non-ICU cases. In the original Overmyer et al. paper, only the general performance metrics of the models are presented (Fig. 6c-g), but the authors do not show the expression patterns of the top predictors. Hence, we demonstrate how CovidExpress can be used to further investigate some questions not explored in the original paper. SESN2 was listed as a top predictor; however, its expression did not vary between ICU and non-ICU samples, as was also observed by the reviewer. We suspect SESN2 was a top predictor due to other data the Overmyer et al. paper included, such as mass spectrometry data. Our statement about SESN2 was not accurately reflected in the figure; therefore, we have rewritten this section to make it clearer.

      Page 9 first sentence: Please, specify what you mean by "starting list". Furthermore, in this paragraph, how do your results compare to the results from the study that you re-analyze here?

      We thank the reviewer for the question. By “starting list,” we meant the top genes from the Overmyer et al., 2021, article as predictors of ICU and non-ICU cases. We have now rewritten this section to make it clearer. We did not expect our results to differ from their data. Our goal was to ask which of their top predictors (by multi-omics data) show a difference in gene expression. When we downloaded their TPM values from their GEO records, the values were very similar overall (see below).

      Figure 3F: Please add labels to your axes and is there a particular reason why in a correlation plot like this one, the y and x axis are not shown with the same range and why does the y axis not start at 0?

      We thank the reviewer for this helpful comment. Our reasoning for presenting the figure in this way is that different genes can have very different expression levels but still be correlated. For example, if gene A expressed 1, 5, and 10 in samples 1,2, and 3, while gene B expressed 100, 500, and 1000 for samples 1, 2, and 3, then their range would be very different but still perfectly correlated (see panel A below). If we draw the x- and y-axes using the same range, this correlation will not be visually obvious (see panel B below).

      This comparison is different from the correlation plots that compare the expression of one gene in different samples. We apologize for the confusion and to avoid misleading readers, we have enlarged the gene names in the Figure labels to ensure that readers notice their differences. We have also added an option to the correlation plot on our portal so that users can choose the optimal format (see below).

      Page 9 second last sentence: It remains unclear which kind of analysis the authors intend to do here and what the starting question is. Please, try to rewrite with less technical terms (i.e. what do you mean by "precalculated contrasts"). In line with this, it remains unclear what Figure 3I is supposed to show. Please, provide some more information to readers who are not RNAseq analysis experts.

      We thank the reviewer for this suggestion. To avoid any misleading claims, we followed Reviewer #2’s suggestion and replaced the coagulation gene list with a filtered gene list from the “Coronavirus disease - COVID-19” KEGG pathway (hsa05171) to showcase how to identify experiments in which this gene signature is enriched or depleted. We also replaced the related figures and text with new results and rewrote this section to avoid using technical terms.

      Figure 3J is somewhat confusing. Why is the mean expression range indicated from 0 to 1 and why are all genes apparently having a mean expression of 1?

      We thank the reviewer for this question. Because the levels of expression of different genes can vary greatly, in Figure 3J (new Figure 3A and 3I), we normalized the mean expression levels of the genes to their maximum values across groups to improve the visualization. We have now made this clearer in the figure, legend, and text.

      Page 10 line 5-6. Are you referring to coagulation markers here or general expression patterns? In case of the latter, how does this statement fit to the paragraph about analyzing expression patterns of coagulation markers? Please, specify. And in line with this, are the highlighted genes in Figure 3K coagulation markers? If not, what is the relevance of these to make the point that one can use the portal to investigate the role of coagulation markers in SARS-CoV-2 infections?

      As mentioned above, to avoid any misleading claims, we followed Reviewer #2’s suggestion and replaced the coagulation gene list with a filtered gene list from the “Coronavirus disease - COVID-19” KEGG pathway (hsa05171). This revision enables us to show how to identify experiments in which this gene signature is enriched or depleted. We have now replaced these figures and text with new results.

      The appearance of describing batch effects and attempts to remove them from the studies was somewhat surprising on page 10 as I would expect this kind of results rather earlier in the results section before describing use cases of the data. You may consider changing the order of your results for a better flow.

      We apologize for the confusion. However, we want to make it clear that the analysis before page 10 did not involve “batch effect”; all analyses were performed within each study. Thus, it is not necessary to change the order in which the results are presented. Also, based on Reviewer #2’s comments, we did not accurately use the term “batch effect,” because “batch effects are purely due to technical differences.” We have now revised the corresponding text to make this point clearer.

      Page 11, second paragraph. Please, explain briefly what the silhouette score is supposed to reflect and thus how Figure S4G should be interpreted. The difference of both bars in Figure S4G is very marginal and thus, does not seem to support the statement of the authors that the ssGSEA scores-based projection is better unless you perform a significance test or I misunderstood. Please, clarify.

      We thank the reviewer for this suggestion. We have now added an explanation of the silhouette score in the manuscript. Briefly, a silhouette score is a metric of the degree of separability of gene clusters from the nearest cluster. For a given sample, lets be the mean intra-cluster distance, and be the mean distance to the nearest cluster. The silhouette score (sil) will be calculated as follows

      The silhouette score ranges between -1 and 1. A value near 1 means that the clusters are well separated, and a value near -1 means that the clusters are intermingled. Using a Wilcoxon rank test, we showed that using ssGSEA scores significantly improves the separability of global GTEx tissues (in Figure S4G; p=8.75e-26).

      Page 11, third paragraph: Figure 4B, to the best of my understanding, does not support the claim that samples clustered less according to study cohorts using the ssGSEA approach. Please, quantify the effect and test for significance or better explain.

      We apologize for the confusion. We quantified the separability between cohorts (GSE ids) by using the silhouette score. In Figure S4H (panel A below), we show that the TPM-based PCA leads to more separation by studies than does the Covid contrast ssGSEA scores in which the separation between studies is less prominent (p-value=0.0045, paired Wilcoxon test).

      For the analyses described starting on page 12 it remains largely unclear whether they were conducted across studies or within studies and which studies were used. This section until the end of the results would especially benefit from providing more information on how the analyses were performed, either in the results or in the methods section.

      We apologize for the confusion. The goal of the analysis on page 12 and the corresponding Figure 4G was to identify genes whose expression increased in both the SARS-CoV-2 infection lung and rhinovirus-infected nasal tissue. Hence, we did a log2(fold-change) vs log2(fold-change) comparison. The log2(fold-change) values were independently calculated for each study. Because we compared values by using the same ranking metric, the cross-samples comparison was possible, as shown in Figure 4G. We have now added more details to the Methods section to clarify this point.

      Figures 4J and 4K miss axis labels and since we look at correlations, the figures could be redrawn using the same ranges on x and y axis.

      We thank the reviewer for this suggestion. We have now added axes labels to the new figures. However, we have not used the same range on the x and y axes because they depict expression levels of different genes. For example, if gene A is expressed 1, 5, and 10 in samples 1, 2, and 3, while gene B is expressed 100, 500 and 1000 for samples 1, 2, and 3, their range would be very different but still perfectly correlated (panel A below). If we draw x and y axes using the same range, this correlation will not be visually obvious (panel B below).

      This comparison is different from the correlation plots that compare the expression of one gene in different samples. We apologize for the confusion and to avoid misleading readers, we have enlarged the gene names in Figure labels to ensure that readers notice they are different genes. We have also added an option to the correlation plot on our portal so that users can choose the optimal format (see below).

      Page 14 line 5: Is this the right figure reference here to Figure 4G? If yes, then it is unclear how Figure 4G supports the statement in this sentence. Please, clarify.

      We apologize for the confusion. In Figure 4G, we labeled several important genes and used different colors to indicate whether the gene was regulated by SARS-CoV-2 only (purple), Rhinovirus only (black), or both(red). FURIN was the gene that is only significantly upregulated by SARS-CoV-2. The data in Figure 4G were from GSE160435(“SARS-CoV-2 infection of primary human lung epithelium for COVID-19 modeling and drug discovery”); that study used lung organoid alveolar type 2 (AT2) cells as the model. We think this confusion was caused by our failure to provide the details about the GSE160435 study. We have now amended the manuscript to include these details in the Methods section to avoid confusion. We also enlarged the gene labels in the figure to make them more visible. In the manuscript, we have changed from “our results found FURIN gene was also upregulated in SARS-CoV-2–infected lung organoid alveolar type 2 cells (Figure 4G, Supplementary Table S3).” to “We found that FURIN was upregulated in SARS-CoV-2-infected lung organoid alveolar type 2 cells (Figure 4G, Supplementary Table S4) (Mulay, Konda et al., 2021), it has reported that TGF-β signaling could also regulates FURIN (Blanchette, Rivard et al., 2001). Our gene enrichment analysis also found TGF-β signaling enriched only for up-regulated genes in SARS-CoV-2-infected lung cells (FDR correct p=7.58E-05, Supplementary Table S4), these observations implicated a positive feedback mechanism only for SARS-CoV-2-infected lung but not RV-infected nasal cells.”

      Figure 2 is of too low resolution. Many details cannot be read. Please, provide a higher resolution figure.

      We apologize for the inconvenience. However, we did not expect the reader to read the details on Figure 2, as it is just an overview of the CovidExpress portal. The aim is give the reader an impression about what functions CovidExpress could offer.

      Reviewer #1 (Significance (Required)):

      Providing a single platform for the analysis of SARS-CoV-2-related RNAseq data is certainly of high value to the scientific community. However, as the portal and manuscript are currently presented, for scientists that are not RNAseq analysis specialists, more guidance would be required to understand and use correctly the functionalities of the portal. Unfortunately, because batch effects could not be removed from the studies, the authors, correctly, do not recommend to combine data from different studies for analyses, however, this likely will also limit the potential of the resource to make new discoveries beyond what the original studies have already published. As indicated above, the authors could support their claim by comparing their findings with findings published from the studies they reanalyzed. The portal is only of use to scientists studying SARS-CoV-2. I am not an expert in RNAseq data analysis and thus cannot comment on the technicalities, especially the processing of the RNAseq datasets. We thank the reviewer for the positive comments. We apologize for the confusion and acknowledge that we should not describe our effort using the term “batch effect.” As described by Reviewer #2 (and we agree), batch effect should be used only to indicate a purely technical difference in the same biological system; for example, differences in experiments performed on different days or by different lab personnel. Thus, we cannot correct for “batch effect” by using CovidExpress. We hope that the reviewer realizes that what we did was correct for the effect caused by differences in software and parameters across the studies. For example, in our approach, the DEGs from GSE155518 and GSE160435 (both primary lung alveolar AT2 cells (both from Mulay et al., Cell Report, 2021) were significantly correlated (panel A below; p = 1.36e-24, F-test). However, when we downloaded the TPM values from their GEO records, GSE155518 appeared to have a genome-wide decrease in the expression of SARS-CoV-2–infected samples (panel B below). We suspect that this is because in their data processing, the expression of virus themselves were also considered. Thus, using the proceed data directly without careful reviewing the method might lead to false hypothesis.

      At last, researchers can make new discoveries, such as our OASL and FURIN findings, by using many other features that CovidExpress provides.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Djekidel and colleagues describe a web portal to explore several SARS-CoV-2 related datasets. The authors applied a uniform reprocessing pipeline to the diverse RNA-seq datasets and integrated them into a cellxgene-based interface. The major strengths of the manuscript are the scale of the compiled data, with over one thousand samples included, and the data portal itself, which has useful visualization and analysis functions, including GSEA and DEG analysis. My primary concerns with the study are centered on the analysis examples that are presented and their interpretation, as well as the user interface for the data portal. **Major Comments:**

      1. The literature analysis feels out of place and is not informative (Fig 1E), as the conclusions that can be drawn from literature mining are minimal. In evidence of this, the authors highlight that CRP is a top-studied "gene" and later voice their interest in how CRP is not a differentially expressed gene (pg6). This illustrates the problems with the literature-based analysis, since in the context of COVID-19, CRP is a common blood laboratory measurement that is used as a general marker of inflammation. Transcription of CRP is essentially exclusively in hepatocytes as an acute phase reactant (see GTEx portal for helpful reference), and would therefore not be expected to be found in the various datasets collected by the authors. The one exception might be liver RNA-seq samples from COVID-19 patients, but I do not think these are available in the current collection. I would therefore suggest to remove the literature analysis parts from the manuscript.

      We thank the reviewer for sharing knowledge about CRP. As discussed in our manuscript, we agree that not all top genes from literature-based analysis were expected to be included in RNA-seq analysis. We apologize for the confusion, and we have amended our description to make this point clearer. However, we still believe that literature-based analyses are very useful in the following aspects:

      1. This type of analysis bridges the gap between data-driven research and hypothesis-driven research. For example, we found many genes in our meta-analysis, but it is not feasible to describe the functions of all of them. Thus, in Figure 1F, we color-coded genes in red if they also appeared as top genes in the literature-based analysis and read related manuscripts to build confidence that the meta-analysis is useful. Then we expanded our review to more top genes and found more interesting evidence (Supplementary Table S2, “TopGenesbyDifferentialAnalysis” tab).
      2. Literature-based analyses also reduce the time researchers spend prioritizing their investigations. For example, in our comparison of SARS-CoV-2–infected lung and Rhinovirus-infected nasal tissue, we found >2000 genes upregulated only in SARS-CoV-2–infected lung but not in Rhinovirus-infected nasal cells. It is not easy to derive a hypothesis from so many genes. When we overlapped the gene list with literature-based analysis, FURIN popped up as the most well-studied gene, and we did not find any report that mentioned that SARS-CoV-2 can regulate FURIN This raised our interest and led to a suggested mechanism in which SARS-CoV-2 could evolve to induce FURIN expression and gain superior infectivity. FURIN’s upregulation is significant but not among the top genes, in terms of fold change (>2-fold change, FDR p th by fold change). Thus, without the literature-based analysis, this observation could have easily been neglected.
      3. Such analyses help researchers to prime their hypotheses for novel findings. For example, in our comparison between SARS-CoV-2–infected lung and Rhinovirus-infected nasal tissues (Figure 4G, Supplementary Figure 5D and E), we found many upregulated genes, but OASL was not in our literature-based analysis, which indicated that it is under-studied and worth highlighting. We hope the reviewer will agree that we should retain the literature-based analysis in our paper. These analyses were not meant to be conclusive but rather a way to prioritize investigations. Finally, we removed CRP from Fig 1E and the main text to avoid confusion.
      1. The data portal, implemented through cellxgene, is accessible for non-programmers to use. However, it is very easy to end up with an "Unexpected HTTP response 400, BAD REQUEST" error, with essentially no description of the cause of the error or how to rectify it. When this occurs (and in my experience it occurs very frequently), this also forces the user to refresh the page entirely, losing any progress they may have made. I see that the authors describe this error in their FAQ page, but their answer is not very intuitive and I was unsure of what they meant: "This happens because the samples you selected doesn't contain all "Group by" you want compare for each "Split by" group. You could confirm using the "Diff. groups" buttons.".

      We apologize for the confusion. This excellent point made by the reviewer required an improvement in the software engineering, which we have now completed. We have figured out how to avoid this error and have run thorough tests to ensure that it does not appear anymore. We also added a gitter chat channel to our landing page, so that users can report if they encounter this or other errors.

      I would therefore ask that the authors provide more detailed tutorials (ideally step-by-step) on common analyses that users will want to perform, hopefully minimizing the amount of frustration that users will encounter.

      We thank the reviewer for this suggestion. We have uploaded several video tutorials to our landing page and will gradually add more. We also added a gitter chat channel, so users can ask questions, report bugs, or suggest new studies to include in the portal.

      1. Selection of samples is not very quick or intuitive. If I wanted to select only the samples from one specific GEO accession, I had to resort to individually checking the boxes of the sample IDs that I wanted. If I instead selected the GEO accession under the samples source ID, then used the "Subset to currently selected samples" button, I invariable got the HTTP error 400 message. Of course, this may simply reflect my lack of familiarity with cellxgene; I would nevertheless encourage the authors to improve the FAQ to include a step-by-step example for how to do common analyses/procedures.

      We apologize for the confusion. To select an individual GEO accession, users can simply tick the box beside “Samples Source ID.”

      Then all boxes would be clear for “Samples Source ID” that allow you to select only the one you want. We also have uploaded video tutorials to help users learn how to navigate the portal.

      We apologize for the “HTTP error 400” messages. We figured out that users would encounter that message frequently after they encounter it once due to a back-end cache mechanism. We have now improved the portal from the software-engineering side. In our recent tests of the latest version, this error does not appear anymore. We also added a gitter chat channel on our landing page so that users can report encountering this or other errors.

      1. The second case study, centered on coagulation genes, is misguided. Alteration of coagulation lab values in severe COVID-19 patients is reflecting the general inflammatory state of these patients, and would not be expected to manifest on the transcriptional level in infected cells/tissues. Coagulation labs are measuring the functional status of the coagulation cascade, which is far-removed from the direct transcription of the corresponding genes - proteolytic processing of clotting factors, etc. As with CRP (see above comment), most clotting factors are transcribed almost exclusively in the liver (check GTEx portal); I would not expect upregulation of coagulation factors in lung cell lines/organoids/cultures etc after infection with SARS-CoV-2. I would recommend the authors to pick a different gene ontology set for a case study, as the current one focusing on coagulation is confusing in a pathophysiologic sense.

      We thank the reviewer for this suggestion. To avoid any misleading claims, we have replaced the coagulation gene list with a filtered gene list from the “Coronavirus disease - COVID-19” KEGG pathway (hsa05171) to showcase how to identify experiments in which this gene signature is enriched or depleted. We also replaced Figures 3G-J with new results.

      1. The two large clusters of blood-derived samples vs other tissues is not surprising and the authors' interpretation is confusing. The authors write that "the COVID-19 signature was not able to overcome the tissue specificity and that immune cells might respond to SARS-CoV-2 differently." This should be immediately obvious given the pathophysiology of COVID-19 infection; the cell types that are directly infected by SARS-CoV-2 will of course have a distinct response compared to the circulating blood cells of COVID-19 patients, which are responding by mounting an immune response. There is no reason to expect a priori that the DEGs in the directly infected lung cells would be similar to that of immune cells that are mounting a response against the virus.

      We thank the reviewer for these comments. We agree that it should be obvious that directly infected lung cells would differ from immune cells. However, this has never been shown in a large dataset. Also, it is not obviously whether all other different tissues would respond to SARS-CoV-2 differently. Thus, we believe it is important to present this overview. We have amended the description to deliver clearer message as “This confirmed immune cells respond to SARS-CoV-2 differently from other tissues also suggested the response of most other tissues might sharing similar features.”.

      1. The authors devote considerable space in the manuscript to exploring "batch effects" and trying to minimize them (pg10-11 Fig 4A-D, Fig S4). However, given that the compiled datasets are from entirely different experimental and biological systems (e.g. in vitro infection vs patient infection, different cell lines, timepoints after virus exposure, diverse tissues, varying disease severity), it is inappropriate to simply refer to all of these differences as "batch effects" alone. Usually, the term "batch effect" would refer to the same biological experiment/system (i.e. A549 cells infected with CoV vs control), but performed on different days or by different lab personnel - in other words, batch effects are purely due to technical differences. This term clearly does not apply when comparing samples from entirely different cell lines, or tissues, etc, and the authors should not keep describing these differences as batch effects that should be "corrected" out.

      We thank the reviewer for the insight. We apologize for the confusion caused by using the phrase “batch effect correction” to describe our approach. We agree that the difference between studies should not be referred to as a “batch effect correction” and have now amended the descriptions to avoid confusion.

      Indeed, the authors themselves state that the main point of their "batch effect correction" efforts is only for PCA visualization. I therefore feel this section contributes very little to the overall manuscript, especially given the authors' own recommendation that all analyses should be performed on individual datasets (which I certainly agree with). I assume that the authors were required to provide some sort of dimensional reduction projection for the cellxgene browser, but this is more a quirk in their choice of platform for the web portal. Thus, this section of the manuscript should be deemphasized.

      We thank the reviewer for these comments and again apologize for the confusion caused by our use of the term “batch effect correction” to describe our approach. However, we believe these parts of the paper should be retained for the following reasons:

      • In practice, sample mislabeling can happen. PCA or simple clustering approaches are very useful for helping raise researchers’ attention, so they could further check the possibility of sample mislabeling.
      • Even within a study, one sample can be an outlier due to low or unequal sample quality. Removing outliers would help boost the significance of real findings. Without our approach, it would be harder for users to notice and remove outliers from their investigations.
      • Finally, these efforts are useful for generating hypotheses. For example, although we collected a lot of data, it is not feasible for us to read all the details in all the manuscripts published. We observed a similarity between SARS-CoV-2–infected lung samples and Rhinovirus–infected nasal samples by exploring our portal’s capabilities (Figure 3E-F). Then we read the manuscripts in which those data were published and found that our discovery was consistent with the original studies’ results. We believe these efforts are essential to help researchers generate or refine their hypotheses. As we update the database with more samples, this approach will become increasingly powerful.
        1. Given the limitations of any combined multi-dataset analyses, one very useful feature would be to conduct "meta-analyses" across multiple datasets. For instance, it would be informative to find which genes are commonly DEGs in user-selected comparisons, calculated separately for each dataset and then cross-referenced across the relevant/user-selected datasets.

      We thank the reviewer for this comment. Indeed, we agree that “meta-analyses” are useful and have now compiled Supplementary Table S2 and Figure 1F to demonstrate the commonly regulated genes. To enable user-selected comparisons across studies on our portal, we need to design a thoughtful user interface. Otherwise, the results from our portal could easily cause fatal misinterpretation. For example, GSE154613 includes samples like DMSO, Drug, SARS-CoV-2, and DMSO+SARS-CoV-2. If a user simply selected to compare SARS-CoV-2 versus Control, the results would be SARS-CoV-2 and DMSO+SARS-CoV-2 versus DMSO and Drug. Such functions need time to design and implement; therefore, we will consider this suggestion for further development of our portal.

      **Minor comments:**

      1. Fig S1G, color legend should be added (I understand that these colors are the same from S1H).

      We thank the reviewer for the comment. We have now added information about the colors in the figure legend.

      1. Mouseover text for trackPlot on the data portal is incorrect (it says the heatmap text instead).

      We thank the reviewer for this comment. We have now corrected this bug.

      1. Abstract should be revised to describe only the 1093 final remaining RNA-seq samples after filtering/QC steps.

      We thank the reviewer for this comment. We have now amended the Abstract to include this information.

      1. Text in many figures is too small to be legible. I would suggest pt 6 font minimum for all figure text, including the various statistics in the figure panels.

      We thank the reviewer for this comment. We have now amended the font sizes and will provide high-resolution figures in revision.

      1. Are the DE analyses in Fig 1F specifically limited to control vs SARS-CoV-2/COVID-19 comparisons? Many of the samples included in this study are from other respiratory infections (labeled "other" in Fig 1B).

      We thank the reviewer for the question. Figure 1F was not originally limited to control vs SARS-CoV-2/COVID-19 comparisons, because we thought control vs virus, drug vs mock, or difference between time points would also be interesting. If we narrow the analysis to contrasts only between control vs SARS-CoV-2/COVID-19, Figure 1F would be still look similar (as below) because the genes in that comparison comprise the largest share of genes included in the original graphic.

      In the end, we replaced Figure 1F to avoid confusion and added more details in the Methods.

      1. The word cloud format is not conducive for understanding or interpretation. It would be much more informative to simply have a barplot or similar to clearly indicate the relative "abnudance" of a given gene among all 315 DE analyses.

      We thank the reviewer for this comment but respectfully disagree with this point. Visualization of the relative “abundance” of genes with word clouds is a relatively novel concept in computational biology. However, we believe, that in this case, it has certain advantages over visualization using traditional bar plots for example. The word cloud format allows us to highlight genes relative to their importance, with the word “importance” being used here in the sense of combined metrics from DEGs, as shown in Figure 1F, or the frequency with which genes are mentioned/discussed in various literature sources, as shown in Figure 1E. For this purpose, the exact values will most likely not be important for most users/readers. Be presenting a word cloud visualization, readers can easily discern the top genes and use them in the exploration of their own data or the CovidExpress portal. However, if users want to analyze raw values, we provide in Supplementary Table S3 a full list of all genes and gene sets that can be download from our landing page (section “CovidExpress Expression Data Download”) in GMT format. Also, when we visualized the ranks of genes by using bar plots as the reviewer suggested, the results were much harder to read (as shown in the bar graph below) than simply looking at the raw data in supplementary tables.

      1. Claims of increased/decreased dataset separability should have statistical analysis on the silhouette score boxplots (Fig S4G-I).

      We thank the reviewer for the reminder. We have added statistical tests to referred silhouette score boxplots (Wilcoxon rank test)

      1. Regarding Fig 4E-F - what are the key genes that contribute to PC1, and how do they relate to the DEGs in Fig 4G?

      We thank the reviewer for this question and apologize for the confusion. In Figure 4E-F, the PCA were based on ssGSEA score, as each gene set would have a score for a sample, not individual genes. Thus, the top contributed to PC1 were gene sets upregulated or down-regulated in certain contrasts. We provided on the portal’s landing page detailed results for top gene sets (for the ssGSEA approach) and genes (for the TPM approach) that contributed to various PCs (“Clustering Results for Reviewing and Download” section). This allows users to download and further explore these data.

      1. Statistics describing the relation between OASL And TNF/PPARGC1A should be included to justify the author's statements. This could be correlation, mutual information, regression, etc.

      We thank the reviewer for this suggestion, and we have updated Figures 4J-K to show the correlation values and corresponding F-statistics. The Pearson correlation between OASL and TNF was significant (Pearson Correlation=0.75 and p-value = 6.85e-72), but the correlation between OASL and PPARGC1A had a negative slope and showed a moderately significant p-value (Pearson Correlation=-0.08 and p-value=0.12), confirming to a certain degree our statement. We have now updated the corresponding text in the manuscript.

      1. There are several studies now that have performed scRNA-seq on the lung resident and peripheral immune cells of COVID-19 patients. To more definitively tie in their analyses in Fig 4J-K/Fig S5D-E (to affirm "its important role in the innate immune response in lungs"), the authors should assess whether OASL is upregulated in the lung macrophages of COVID-19 patients vs controls.

      We thank the reviewer for this suggestion. Indeed, Liao, et al. recently reported “BALFs of patients with severe/critical COVID-19 infection contained higher proportions of macrophages and neutrophils and lower proportions of mDCs, pDCs, and T cells than those with moderate infection.” (Nature Medicine, 2020, https://doi.org/10.1038/s41591-020-0901-9). They further refined macrophage data into subclusters and reported top enriched GO terms as “response to virus” (group 1), “type I interferon signaling pathway” (group 2), “neutrophile degranulation” (group 3), and “cytoplasmic translational initiation” (group 4). When we investigated their data, we found that group1 and group2 both identified OASL as a marker gene, indicated OASL might response to virus and help type I interferon signaling. Furthermore, another data set (from Ren et al., Cell, 2021, https://dx.doi.org/10.1016%2Fj.cell.2021.01.053) showed several clusters in patients with severe COVID-19 (left panel below) that were enriched for OASL expression(right panel below).

      We have now added these observations to strengthen our hypothesis about the role of OASL.

      1. The visualization and analysis functions in the data portal appear to work reasonably well out of the box. However, the download buttons for plots did not work in my hands. I realized that a workaround is to right click -> "Save image as" (which then downloads a .svg file), but this is not ideal and should be fixed to improve usability. I had tested the data portal on both Firefox and Edge browsers, using a Windows 10 PC.

      We agree with the reviewer. Due to some technical issues with the figure javascript plugin, the download feature does not work unless the figure is saved as a file on the server side. To avoid any security issues, we tried to minimize new file generations, hence, for the moment we have disabled this feature. Users can still download high-resolution .svg figures by using the right-click -> “save image as.” This information is now included in the FAQ section on the portal’s landing page.

      Reviewer #2 (Significance (Required)): The data portal appears to have useful analysis and visualization features, and the data collection appears to be quite comprehensive. I would strongly encourage the authors to continue collecting datasets as they become available and further improving the usability of the portal. As noted in the above comments, I think there is potential for their cellxgene-based browser to be useful to non-computational biologists, but at present, the data portal is not as simple to use as it should be. With further efforts to developing step-by-step tutorials for common analysis/visualization tasks, more informative case studies, and the other revisions suggested above, this study could be a valuable resource for the community. Of note, this review is written from the perspective of a primary wet-lab biologist with extensive bioinformatics experience but limited web development expertise.

      We thank the reviewer for the positive comments. We understand the importance of data updating. Our plan is to complete quarterly updates once this manuscript has been accepted or when 10 new studies have been either collected by us or suggested by users. This information is also now included in the FAQs of the portal’s landing page. We have also uploaded several tutorials videos to the landing page and will gradually add more. We also added a gitter chat channel, so users can ask questions, report bugs, or suggest new studies to add to the database.

      **Referee Cross-commenting** I agree with the comments of the other reviewers. Reviewer #3 (Evidence, reproducibility and clarity (Required)): **Summary:** The ongoing COVID-19 pandemic is a big threat to human health. The researchers have conducted studies to explore the gene expression regulations of human cells responding to COVID-19 infection. A website that integrating those datasets and providing user-friendly tools for gene expression analysis is a valuable resource for the COVID-19 study community. The authors collected published RNASeq datasets and developed a database and an interactive portal for users to investigate the gene expression of SARS-CoV-2 related samples. This website would be of great value for the SARS-CoV-2 research community if the batch normalization problems are solved. **Major comments:** 1) The major concern of CovidExpress is the batch effects from different studies. As the authors have shown and mentioned in their discussion that "For the current release, we strongly suggest investigators to perform gene expression comparison within individual study." This limits the usage of CovidExpress as integrating analysis from multiple datasets of different studies is the key value and purpose of CovidExpress.

      We thank the reviewer for the comment. Reviewer #2 reminded us, and we agree, that differences between studies should not be considered “batch effects.” We apologize for the confusion. The GSEA function provided in the portal does not suffer from batch effect, because all the pre-ranked lists of genes are based on contrasts from the same studies. Although we cannot correct for the differences between studies, we did correct for effect caused by differences in software and parameters used. For example, in our approach, the DEGs from GSE155518 and GSE160435 (both studies of primary lung alveolar AT2 cells from Mulay et al., Cell Report, 2021) were significantly correlated (below panel A, p-value = 1.36e-24, F-test). However, if we simply download the TPM values from their GEO records, GSE155518 appears to show a genome-wide decrease in expression in SARS-CoV-2–infected samples (below panel B). These errors might lead to false hypotheses.

      2) The authors should include experimental protocols as one key parameter in the description and further integrating analysis of different datasets. As the authors showed that QuantSeq is a 3' sequencing protocol of RNA sequencing. However, it is not convincing to me that simply excluding QuantSeq samples is the ideal solution for downstream integrating analysis as QuantSeq has been shown that it has pretty good correlations with normal RNASeq methods in gene quantifications. It is interesting that there are 21.2% of samples were biased toward intronic reads. What protocol differences or experimental variations would explain the biases?

      We thank the reviewer for the comment and apologized for not being clearer. One of our main goals re-processing all samples is to correct for pipeline processing–related batch effects. We tried to reduce those effects introduced by using different software or parameters. QuantSeq or similar protocols are heavily bias to 3’ UTR; thus, the software and parameters used for RNA-seq data will not be suitable. In contrast, we agree that the downstream results from QuantSeq have good correlation to RNA-seq (we observed a correlation of ~0.75, when compared to the log2 fold-change from Quant-Seq to RNA-seq). However, we could not reconcile QuantSeq always correlated well with RNA-seq, in terms of individual quantification. For example, Jarvis et al. recently reported only ~0.35 correlation between QuantSeq and RNA-seq (https://doi.org/10.3389/fgene.2020.562445). Theoretically, the correlation would be weaker for genes with a small 3’ UTR. Thus, we will not include QuantSeq data in this portal. However, if we collect enough studies in the future, we will consider uploading a separate portal just for QuantSeq using a pipeline optimized for protocol bias to 3’ UTR.

      For the 21.2% samples that were biased towards intronic reads, we believe they reflect differences in the kits used. For example, of the 162 samples “BASE_INTRON (%)” >30% (Supplementary Table S1) that passed QC, 76 samples were total RNA obtained using the SMARTer kit and 36 were total RNA obtained using the Trio kit. Given that we have 105 samples of total RNA derived using the SMARTer kit and 38 samples of total RNA derived using the Trio kit, we conclude that the Trio kit was more biased toward introns, and the SMARTer kit was also strongly biased. This finding is consistent with those of others who have reported the bias of the SMARTer kit (Song et al., https://doi.org/10.1186/s12864-018-5066-2). Users can find these results in our Supplementary Table S1. We have also uploaded the protocol information to our portal.

      3) How do the authors plan to update and maintain CovidExpress?

      We thank the reviewer for this question. We understand the importance of data updating. Our plan is to update the database quarterly once this manuscript has been accepted or when 10 new studies have been collected by us or suggested by users. We have added this information to the FAQs on the portal’s landing page. We also understand the importance of maintaining the service for a feasible amount of time for research. Therefore, we will keep the server activated for at least 2 years after the WHO announces that COVID-19 is no longer a global pandemic. We will also ensure that, even after we take down the server , scientists with programming skills will be able to create local servers based on the data provided on CovidExpress.

      **Minor comments:** 1) Some texts in figures are not readable. For example, Fig2B, 2C, 2D, 2E.

      We thank the reviewer for this comment. We have now increased the font sizes and provided high-resolution figures in revision.

      2) The authors could use Videos to demonstrate how to use CovidExpress on the website as they have shown in Fig3.

      We thank the reviewer for this suggestion. We have uploaded several video tutorials to the landing page and will gradually add more. We also added a gitter chat channel so that users can ask questions, report bugs, or suggest new studies to include in the database.

      Reviewer #3 (Significance (Required)): The ongoing COVID-19 pandemic is a big threat to human health. Many molecular and cellular questions related to COVID-19 pathophysiology remain unclear and many researchers have conducted studies to explore the gene expression regulations of human cells responding to COVID-19 infection. However, there is no database/website that integrating all RNASeq data to provide user-friendly tools for gene expression analysis for COVID-19 researchers. The authors collected the published RNASeq datasets and developed a database and an interactive portal, named CovidExpress, to allow users to investigate the gene expressions response to COVID-19 infection. CovidExpress is a valuable resource for the COVID-19 study community once the batch normalization problems are solved. The users who came up with ideas about the regulation of COVID-19 response could use the system to test their hypothesis, without experience in bioinformatics and RNASeq data analysis. This will be more important when more RNASeq data from samples with different tissues, cell lines, and conditions are integrated into the database.

      We thank the reviewer for the positive comments. We apologize for the confusion and acknowledge that we should not describe our effort using the term “batch effect.” As described by Reviewer #2 (and we agree), batch effect should be used only to indicate a purely technical difference in the same biological system; for example, differences in experiments performed on different days or by different lab personnel. Thus, we cannot correct for “batch effect” by using CovidExpress. We hope that the reviewer realizes that what we did was correct for the effect caused by differences in software and parameters across the studies. For example, in our approach, the DEGs from GSE155518 and GSE160435 (both primary lung alveolar AT2 cells (both from Mulay et al., Cell Report, 2021) were significantly correlated (panel A below; p = 1.36e-24, F-test). However, when we downloaded the TPM values from their GEO records, GSE155518 appeared to have a genome-wide decrease in the expression of SARS-CoV-2–infected samples (panel B below).

      Thus, using the proceed data directly without careful reviewing the method might lead to false hypothesis. At last, researchers can make new discoveries, such as our OASL and FURIN findings, by using many other features that CovidExpress provides.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The ongoing COVID-19 pandemic is a big threat to human health. The researchers have conducted studies to explore the gene expression regulations of human cells responding to COVID-19 infection. A website that integrating those datasets and providing user-friendly tools for gene expression analysis is a valuable resource for the COVID-19 study community. The authors collected published RNASeq datasets and developed a database and an interactive portal for users to investigate the gene expression of SARS-CoV-2 related samples. This website would be of great value for the SARS-CoV-2 research community if the batch normalization problems are solved.

      Major comments:

      1) The major concern of CovidExpress is the batch effects from different studies. As the authors have shown and mentioned in their discussion that "For the current release, we strongly suggest investigators to perform gene expression comparison within individual study." This limits the usage of CovidExpress as integrating analysis from multiple datasets of different studies is the key value and purpose of CovidExpress.

      2) The authors should include experimental protocols as one key parameter in the description and further integrating analysis of different datasets. As the authors showed that QuantSeq is a 3' sequencing protocol of RNA sequencing. However, it is not convincing to me that simply excluding QuantSeq samples is the ideal solution for downstream integrating analysis as QuantSeq has been shown that it has pretty good correlations with normal RNASeq methods in gene quantifications. It is interesting that there are 21.2% of samples were biased toward intronic reads. What protocol differences or experimental variations would explain the biases?

      3) How do the authors plan to update and maintain CovidExpress?

      Minor comments:

      1) Some texts in figures are not readable. For example, Fig2B, 2C, 2D, 2E.

      2) The authors could use Videos to demonstrate how to use CovidExpress on the website as they have shown in Fig3.

      Significance

      The ongoing COVID-19 pandemic is a big threat to human health. Many molecular and cellular questions related to COVID-19 pathophysiology remain unclear and many researchers have conducted studies to explore the gene expression regulations of human cells responding to COVID-19 infection. However, there is no database/website that integrating all RNASeq data to provide user-friendly tools for gene expression analysis for COVID-19 researchers. The authors collected the published RNASeq datasets and developed a database and an interactive portal, named CovidExpress, to allow users to investigate the gene expressions response to COVID-19 infection. CovidExpress is a valuable resource for the COVID-19 study community once the batch normalization problems are solved. The users who came up with ideas about the regulation of COVID-19 response could use the system to test their hypothesis, without experience in bioinformatics and RNASeq data analysis. This will be more important when more RNASeq data from samples with different tissues, cell lines, and conditions are integrated into the database.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Djekidel and colleagues describe a web portal to explore several SARS-CoV-2 related datasets. The authors applied a uniform reprocessing pipeline to the diverse RNA-seq datasets and integrated them into a cellxgene-based interface. The major strengths of the manuscript are the scale of the compiled data, with over one thousand samples included, and the data portal itself, which has useful visualization and analysis functions, including GSEA and DEG analysis. My primary concerns with the study are centered on the analysis examples that are presented and their interpretation, as well as the user interface for the data portal.

      Major Comments:

      1. The literature analysis feels out of place and is not informative (Fig 1E), as the conclusions that can be drawn from literature mining are minimal. In evidence of this, the authors highlight that CRP is a top-studied "gene" and later voice their interest in how CRP is not a differentially expressed gene (pg6). This illustrates the problems with the literature-based analysis, since in the context of COVID-19, CRP is a common blood laboratory measurement that is used as a general marker of inflammation. Transcription of CRP is essentially exclusively in hepatocytes as an acute phase reactant (see GTEx portal for helpful reference), and would therefore not be expected to be found in the various datasets collected by the authors. The one exception might be liver RNA-seq samples from COVID-19 patients, but I do not think these are available in the current collection. I would therefore suggest to remove the literature analysis parts from the manuscript.
      2. The data portal, implemented through cellxgene, is accessible for non-programmers to use. However, it is very easy to end up with an "Unexpected HTTP response 400, BAD REQUEST" error, with essentially no description of the cause of the error or how to rectify it. When this occurs (and in my experience it occurs very frequently), this also forces the user to refresh the page entirely, losing any progress they may have made. I see that the authors describe this error in their FAQ page, but their answer is not very intuitive and I was unsure of what they meant: "This happens because the samples you selected doesn't contain all "Group by" you want compare for each "Split by" group. You could confirm using the "Diff. groups" buttons.".

      I would therefore ask that the authors provide more detailed tutorials (ideally step-by-step) on common analyses that users will want to perform, hopefully minimizing the amount of frustration that users will encounter.

      1. Selection of samples is not very quick or intuitive. If I wanted to select only the samples from one specific GEO accession, I had to resort to individually checking the boxes of the sample IDs that I wanted. If I instead selected the GEO accession under the samples source ID, then used the "Subset to currently selected samples" button, I invariable got the HTTP error 400 message. Of course, this may simply reflect my lack of familiarity with cellxgene; I would nevertheless encourage the authors to improve the FAQ to include a step-by-step example for how to do common analyses/procedures.
      2. The second case study, centered on coagulation genes, is misguided. Alteration of coagulation lab values in severe COVID-19 patients is reflecting the general inflammatory state of these patients, and would not be expected to manifest on the transcriptional level in infected cells/tissues. Coagulation labs are measuring the functional status of the coagulation cascade, which is far-removed from the direct transcription of the corresponding genes - proteolytic processing of clotting factors, etc. As with CRP (see above comment), most clotting factors are transcribed almost exclusively in the liver (check GTEx portal); I would not expect upregulation of coagulation factors in lung cell lines/organoids/cultures etc after infection with SARS-CoV-2. I would recommend the authors to pick a different gene ontology set for a case study, as the current one focusing on coagulation is confusing in a pathophysiologic sense.
      3. The two large clusters of blood-derived samples vs other tissues is not surprising and the authors' interpretation is confusing. The authors write that "the COVID-19 signature was not able to overcome the tissue specificity and that immune cells might respond to SARS-CoV-2 differently." This should be immediately obvious given the pathophysiology of COVID-19 infection; the cell types that are directly infected by SARS-CoV-2 will of course have a distinct response compared to the circulating blood cells of COVID-19 patients, which are responding by mounting an immune response. There is no reason to expect a priori that the DEGs in the directly infected lung cells would be similar to that of immune cells that are mounting a response against the virus.
      4. The authors devote considerable space in the manuscript to exploring "batch effects" and trying to minimize them (pg10-11 Fig 4A-D, Fig S4). However, given that the compiled datasets are from entirely different experimental and biological systems (e.g. in vitro infection vs patient infection, different cell lines, timepoints after virus exposure, diverse tissues, varying disease severity), it is inappropriate to simply refer to all of these differences as "batch effects" alone. Usually, the term "batch effect" would refer to the same biological experiment/system (i.e. A549 cells infected with CoV vs control), but performed on different days or by different lab personnel - in other words, batch effects are purely due to technical differences. This term clearly does not apply when comparing samples from entirely different cell lines, or tissues, etc, and the authors should not keep describing these differences as batch effects that should be "corrected" out.

      Indeed, the authors themselves state that the main point of their "batch effect correction" efforts is only for PCA visualization. I therefore feel this section contributes very little to the overall manuscript, especially given the authors' own recommendation that all analyses should be performed on individual datasets (which I certainly agree with). I assume that the authors were required to provide some sort of dimensional reduction projection for the cellxgene browser, but this is more a quirk in their choice of platform for the web portal. Thus, this section of the manuscript should be deemphasized.

      1. Given the limitations of any combined multi-dataset analyses, one very useful feature would be to conduct "meta-analyses" across multiple datasets. For instance, it would be informative to find which genes are commonly DEGs in user-selected comparisons, calculated separately for each dataset and then cross-referenced across the relevant/user-selected datasets.

      Minor comments:

      1. Fig S1G, color legend should be added (I understand that these colors are the same from S1H).
      2. Mouseover text for trackPlot on the data portal is incorrect (it says the heatmap text instead).
      3. Abstract should be revised to describe only the 1093 final remaining RNA-seq samples after filtering/QC steps.
      4. Text in many figures is too small to be legible. I would suggest pt 6 font minimum for all figure text, including the various statistics in the figure panels.
      5. Are the DE analyses in Fig 1F specifically limited to control vs SARS-CoV-2/COVID-19 comparisons? Many of the samples included in this study are from other respiratory infections (labeled "other" in Fig 1B).
      6. The word cloud format is not conducive for understanding or interpretation. It would be much more informative to simply have a barplot or similar to clearly indicate the relative "abnudance" of a given gene among all 315 DE analyses.
      7. Claims of increased/decreased dataset separability should have statistical analysis on the silhouette score boxplots (Fig S4G-I).
      8. Regarding Fig 4E-F - what are the key genes that contribute to PC1, and how do they relate to the DEGs in Fig 4G?
      9. Statistics describing the relation between OASL And TNF/PPARGC1A should be included to justify the author's statements. This could be correlation, mutual information, regression, etc.
      10. There are several studies now that have performed scRNA-seq on the lung resident and peripheral immune cells of COVID-19 patients. To more definitively tie in their analyses in Fig 4J-K/Fig S5D-E (to affirm "its important role in the innate immune response in lungs"), the authors should assess whether OASL is upregulated in the lung macrophages of COVID-19 patients vs controls.
      11. The visualization and analysis functions in the data portal appear to work reasonably well out of the box. However, the download buttons for plots did not work in my hands. I realized that a workaround is to right click -> "Save image as" (which then downloads a .svg file), but this is not ideal and should be fixed to improve usability. I had tested the data portal on both Firefox and Edge browsers, using a Windows 10 PC.

      Significance

      The data portal appears to have useful analysis and visualization features, and the data collection appears to be quite comprehensive. I would strongly encourage the authors to continue collecting datasets as they become available and further improving the usability of the portal. As noted in the above comments, I think there is potential for their cellxgene-based browser to be useful to non-computational biologists, but at present, the data portal is not as simple to use as it should be. With further efforts to developing step-by-step tutorials for common analysis/visualization tasks, more informative case studies, and the other revisions suggested above, this study could be a valuable resource for the community. Of note, this review is written from the perspective of a primary wet-lab biologist with extensive bioinformatics experience but limited web development expertise.

      Referee Cross-commenting

      I agree with the comments of the other reviewers.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The manuscript submitted by Djekidel et al entitled: "CovidExpress: an interactive portal for intuitive investigation on SARS-CoV-2 related transcriptomes" reports on a new web portal to search and analyze RNAseq data related to SARS-CoV-2 infections. The authors downloaded and reprocessed data of more than 40 different studies, which is available on the web portal along with all available meta data. The web portal allows to perform numerous differential expression and gene set enrichment analyses on the data and provides publication ready figures. Because of batch effects that could not be removed, the authors do not recommend to analyze data across studies at this point. The authors conclude that the web portal is unique and will allow scientists to rapidly analyze gene expression signatures related to SARS-CoV-2 infections with the potential to make new discoveries.

      Major comments:

      Based on the scientific literature, the web portal seems to be an unprecedented resource to search and analyze SARS-CoV-2-related RNAseq data and as such would certainly be a useful resource for the SARS-CoV-2 scientific community. The authors argue that new discoveries are possible by using their web portal in providing use cases. However, the section detailing the analyses the authors did to generate new hypotheses about genes potentially relevant in SARS-CoV-2 infections are very difficult to follow and without more guidance very difficult to reproduce with the web portal. It would require substantial expert knowledge in RNAseq data analysis without more information being provided. It also seems that key candidate genes identified by their analyses have all been studied or identified to be related to SARS-CoV-2 infections, so it is somewhat unclear whether new hypotheses can be generated by the reanalysis of RNAseq datasets, especially because combining the data from different studies is currently not recommended by the authors. The manuscript would benefit from providing fewer use cases but for each of them providing more information on how the portal and which studies were used to generate them and which findings were not described in the publication of the used studies. Some observations in the manuscript are not substantiated with significance calculations (see below). At times, the English writing (grammar) should be improved.

      Minor comments:

      Page 6 last sentence: The statement of this sentence is very much what one would expect. It remains unclear whether the authors mean this as a result to validate the processing of the RNAseq data or as a new discovery. Please, clarify.

      Figure 3A: The violin plots are so tiny that it is impossible to see any trends. It is also difficult to understand which categories one should compare with each other. If there is anything significant to observe, please, add a statistical test and better guide the reader.

      Figure 3C: A legend for the color scale is missing. The signal (I guess expression amounts) for SESN2 seems very weak and the same between ICU and non-ICU samples. What is the significance for assigning this gene to the group of genes being upregulated in ICU samples? Also contrary to what the authors state on page 8, SESN2 does not seem to be highly expressed in ICU samples, however, without knowing what the colors represent (fold changes or absolute expression values?) this is somewhat speculative.

      Page 9 first sentence: Please, specify what you mean by "starting list". Furthermore, in this paragraph, how do your results compare to the results from the study that you re-analyze here?

      Figure 3F: Please add labels to your axes and is there a particular reason why in a correlation plot like this one, the y and x axis are not shown with the same range and why does the y axis not start at 0?

      Page 9 second last sentence: It remains unclear which kind of analysis the authors intend to do here and what the starting question is. Please, try to rewrite with less technical terms (i.e. what do you mean by "precalculated contrasts"). In line with this, it remains unclear what Figure 3I is supposed to show. Please, provide some more information to readers who are not RNAseq analysis experts.

      Figure 3J is somewhat confusing. Why is the mean expression range indicated from 0 to 1 and why are all genes apparently having a mean expression of 1? Page 10 line 5-6. Are you referring to coagulation markers here or general expression patterns? In case of the latter, how does this statement fit to the paragraph about analyzing expression patterns of coagulation markers? Please, specify. And in line with this, are the highlighted genes in Figure 3K coagulation markers? If not, what is the relevance of these to make the point that one can use the portal to investigate the role of coagulation markers in SARS-CoV-2 infections?

      The appearance of describing batch effects and attempts to remove them from the studies was somewhat surprising on page 10 as I would expect this kind of results rather earlier in the results section before describing use cases of the data. You may consider changing the order of your results for a better flow. Page 11, second paragraph. Please, explain briefly what the silhouette score is supposed to reflect and thus how Figure S4G should be interpreted. The difference of both bars in Figure S4G is very marginal and thus, does not seem to support the statement of the authors that the ssGSEA scores-based projection is better unless you perform a significance test or I misunderstood. Please, clarify.

      Page 11, third paragraph: Figure 4B, to the best of my understanding, does not support the claim that samples clustered less according to study cohorts using the ssGSEA approach. Please, quantify the effect and test for significance or better explain.

      For the analyses described starting on page 12 it remains largely unclear whether they were conducted across studies or within studies and which studies were used. This section until the end of the results would especially benefit from providing more information on how the analyses were performed, either in the results or in the methods section.

      Figures 4J and 4K miss axis labels and since we look at correlations, the figures could be redrawn using the same ranges on x and y axis.

      Page 14 line 5: Is this the right figure reference here to Figure 4G? If yes, then it is unclear how Figure 4G supports the statement in this sentence. Please, clarify. Figure 2 is of too low resolution. Many details cannot be read. Please, provide a higher resolution figure.

      Significance

      Providing a single platform for the analysis of SARS-CoV-2-related RNAseq data is certainly of high value to the scientific community. However, as the portal and manuscript are currently presented, for scientists that are not RNAseq analysis specialists, more guidance would be required to understand and use correctly the functionalities of the portal. Unfortunately, because batch effects could not be removed from the studies, the authors, correctly, do not recommend to combine data from different studies for analyses, however, this likely will also limit the potential of the resource to make new discoveries beyond what the original studies have already published. As indicated above, the authors could support their claim by comparing their findings with findings published from the studies they reanalyzed. The portal is only of use to scientists studying SARS-CoV-2. I am not an expert in RNAseq data analysis and thus cannot comment on the technicalities, especially the processing of the RNAseq datasets.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01024

      Corresponding author(s): Martin Spiess

      1. Description of the planned revisions — point-by-point response


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Apart from the default constitutive pathway for protein secretion some specialized cells (e.g., neuroendocrine cells, exocrine cells, peptidergic neurons and mast cells) exhibit additional regulated secretory pathway, where peptide hormones are stored as highly concentrated ordered manner inside electron opaque "dense core" of secretory granule for long duration until secretagogue mediated burst release. Although the general sorting receptor for packaging hormones in secretory granules is not yet identified, self-aggregation in the trans-Golgi network is a common shared property of peptide hormones and is a well-accepted potential sorting mechanism. Here the authors have hypothesized that cysteine containing small disulphide loop (CC loop), which is abundant in several hormone precursors, acts as aggregation mediator in TGN for sorting into secretory granule. They have tested the aggregation propensity of a misfolded reporter protein, NPΔ, in ER by attaching the CC loop segment of different hormones which promoted the pathological aggregation in endoplasmic reticulum (ER) of mutant provasopressin in the case of diabetes insipidus. Immunofluorescence and immunogold electron microscopy revealed accumulation of aggregates in the ER when CC loop of different hormonal origin fused NPΔ was transiently transfected in COS-1 fibroblasts and Neuro-2a neuroblastoma cells. The authors have also shown small disulphide loop mediated functional aggregation in TGN can sort a constitutively secreted protein, α1-protease inhibitor, into the secretory granule. The rerouting capacity of CC loop was tested in stably expressed AtT-20 cell line by confirming their localization with CgA-positive secretory granule as well as by studying BaCl2 mediated stimulated secretion and by testing secretory granule specific lubrol insolubility.

      **Major comments:**

      The study is highly impressive, and the results fully support the CC loop mediated hormone sorting hypothesis. However, it would be nice if the authors characterize the nature of the CC-loop mediated aggregates as hormones are reported to be stored inside secretory granules as functional amyloid (Maji et al., 2009). The mechanistic reason behind the small disulfide loop mediated aggregation was not explained in the paper. Authors may propose the probable molecular reasons behind CC loop mediated aggregation to completely justify their hypothesis.

      Although the hypothesis and the experimental results are highly impressive, the authors may consider adding the following experiments.

      The authors replaced CC-loop by the proline/glycine repeat sequence (Pro1) as a negative control which was previously reported to abolish aggregation as well. However, the authors may completely delete the small loop forming segment, CCv, and may check the status of His-tagged fused neurophysin II (NPΔ) segment as an additional negative control. We plan to use a NP∆ construct completely lacking any N-terminal extension as a further negative control, as proposed by the reviewer.

      To find the ultrastructure authors have done immunogold assay with anti-His antibody which indicated different CC loop mediated ER aggregation. Since the amyloid-like fibril nature of pro-vasopressin mutant mediated ER aggregates was previously reported (Beuret et al., 2017), authors must check the nature of the CC loop mediated ER aggregates with amyloid specific antibody.

      We will test staining ER aggregates of our CC loop–NP∆ constructs with anti-amyloid antibodies. A caveat is that CC loops cannot form a classical cross-b structure (strict b-sheets) because of the ring closure – which is why we suggest their aggregation to be "amyloid-like". These structures may not be recognized by anti-amyloid antibodies.

      Since hormones are known to form reversible functional amyloid during their storage inside secretory granule, authors may consider characterizing the nature of the aggregates formed by CC loop fused constitutive protein in AtT-20 cell line by immunostaining, immunoprecipitation and dot blot assay using amyloid specific antibody. Endogenous AtT20 granules are expected to be positive for amyloid stains or antibodies anyway (if the size and mass of the granules is sufficient for detection; Maji et al. used pituitary tissue and purified granules).

      **Minor comments:**

      In the quantification study (Figure 2C) CCc and CCr showed almost similar ER aggregates (around 40%). But authors have commented that all constructs except CCc produce statistically significant increases in cells compared to background. Authors must clarify the statement.

      CCc also increased, but in a statistically not significant manner (p = 0.08). We will change the sentence to: "It confirmed the ability of all constructs to produce an increase of cells with aggregates above background in COS-1 cells (Figure 2C), although not statistically significant for CCc (p = 0.08)."

      In lubrol insolubility assay, the otherwise constitutively secreted protein A1Pimyc (negative control) showed 23% insolubility. The authors explained the observation by commenting about trapping of the protein inside granule aggregate. But CCv and CCa fused proteins showed a very slight increase (around 30%). Only CCc construct showed more than 40% insolubility. If the trapping of constitutive protein may result in 23% insolubility, all the insolubility data except CCc is not satisfactory to claim as secretory granular content of aggregated protein. The authors must explain that.

      Lubrol insolubility is an empirical assay with high specificity for Golgi/post-Golgi forms, but with a relatively high background that we suggest to be due to trapping. Interpretation is based on statistical analysis of several independent experiments. It supports the conclusion of the other assays from an independent angle.

      We present the data of the paired t-test

      The authors have satisfactorily referenced prior studies in the field. However, authors may consider adding the following papers as they are directly connected with the hypothesis. The sorting of POMC hormone into secretory granules by disulphide loop was previously studied. (Cool et al.,1995). The N-terminal loop segment was also previously used to reroute a constitutive protein chloramphenicol acetyltransferase (Tam and Peng, 1993). S K. Maji and his coworker had previously shown that disulphide bond maintains native reversible functional amyloid structure relevant to hormone storage inside secretory granule whereas disulphide bond disruption led to rapid irreversible amyloid aggregation using cyclic somatostatin as model peptide. (Anoop et al., 2014). We will be happy to add these references (Anoop et al., 2014, is already discussed in the text).. Authors must check grammar and may reconstruct a few sentences where sentence construction seems complicated. We will go through the text to improve readability.

      Reviewer #1 (Significance (Required)):

      This manuscript has a significant contribution to enrich academia with fundamental research knowledge of hormone sorting mechanisms. Although constitutive and regulated secretory pathways are known for long times, the exact sorting mechanism is not yet elucidated. There is no common receptor identified yet for recruiting regulated secretary proteins inside the secretory granules.

      Aggregation in the TGN is a well-accepted mechanism for sorting. However, the triggering factor for aggregation is not yet known. This study has shed light on a novel hypothesis, which has considered intramolecular disulfide bond mediated small CC loop in hormone may act as aggregation mediator. Since many regulated secretory proteins contain the short disulphide loop, the hypothesis proposed in the manuscript is interesting.

      It has been confirmed that TGN is the last compartment which is common to both regulated and constitutive pathways (Kelly, 1985). There is no sorting mechanism required for the constitutive one as this is the default mechanism, whereas a regulated secretory pathway requires a specific sorting mechanism to be efficiently packaged in the secretory granules. There are two popular hypotheses about protein sorting in regulated secretory pathways. They are "sorting for entry" and "sorting for retention" (Blázquez and Kathleen, 2000). In "sorting for entry" hormones destined to go to the regulated secretory pathway start to form aggregates in the TGN specific environment excluding other proteins destined to go to the constitutive pathway. Arvan and Castle proposed the second mechanism as some hormones, like proinsulin, are initially packaged with lysosomal enzymes in immature secretory granules (ISG) (Arvan and Castle, 1998). But with time they start to aggregate and lysosomal enzymes are removed from ISG by small constitutive-like vesicles. Although, in both the mechanisms aggregation is an essential sorting criterion the molecular events that lead to aggregation is not yet elucidated. TGN specific environmental conditions including pH (around 6.5), divalent metal ions (Zn2+, Cu2+), Glycosaminoglycans (GAGs) have potential to trigger aggregation (Dannies, Priscilla S, 2012). Though each hormone has aggregation prone regions in the amino acid sequence, there is no common amino acid sequence responsible for aggregation. The authors in this manuscript, have pointed out an interesting observation that many hormones contain small disulfide loops which are exposed due to their presence in N or C terminal or close to the processing site. Based on their observation, they hypothesized CC loop may act as aggregation driver for hormone sorting. In-cell study with CC construct from different hormones successfully rerouted a constitutively secretory protein into the regulated pathway which supported their novel hypothesis.

      However, the hypothesis raises some questions to be answered regarding the molecular mechanism of CC loop mediated aggregation. Why does CC-loop promote aggregation? Does the amino acid sequence, size of the loop play a role in aggregation? The granular structure shown in the manuscript from different CC loops has different size and shape (Figure 2 and 3). What is the reason for the structural heterogeneity of the CC loop mediated dense core? Since authors have shown CC loop mediated aggregation both in functional as well as in diseased aggregation, a very important aspect to address would be the structure-function relationship of the aggregates. Since authors have rightly pointed out that not all hormones or prohormones contain CC loop, another curious question would be about the sorting mechanism of those without CC loop. The best part of the study is that it has tried to explain the well-established aggregation mediated sorting mechanism from a new perspective, which raises room for many questions to be addressed by further research. These are very valid questions, but beyond the scope of this study in which we address the contribution of CC loops in a cellular context. This is a novel extension to published in vitro studies, where a few CC loop proteins (vasopressin, oxytocin, somatostatin-14) have already been shown to enable amyloid(-like) aggregation in vitro.

      From this study, the audience will get to know about the role of small disulphide loop in functional and diseased associated protein/peptide aggregation. The audience will also get an idea about the sorting mechanism in the regulated secretory pathway from the study. According to my expertise and knowledge where I do protein aggregation related to human diseases and hormone storage, I see this manuscript is a fantastic addition to understand the secretory granules biogenesis of hormones with storage and subsequent release.

      Reference: Maji, Samir K., et al. "Functional amyloids as natural storage of peptide hormones in pituitary secretory granules." Science 325.5938 (2009): 328-332. Beuret, Nicole, et al. "Amyloid-like aggregation of provasopressin in diabetes insipidus and secretory granule sorting." BMC biology 15.1 (2017): 1-14. Cool, David R., et al. "Identification of the sorting signal motif within pro-opiomelanocortin for the regulated secretory pathway." Journal of Biological Chemistry 270.15 (1995): 8723-8729. Tam, W. W., K. I. Andreasson, and Y. Peng Loh. "The amino-terminal sequence of pro-opiomelanocortin directs intracellular targeting to the regulated secretory pathway." European journal of cell biology 62.2 (1993): 294-306.

      Anoop, Arunagiri, et al. "Elucidating the Role of Disulfide Bond on Amyloid Formation and Fibril Reversibility of Somatostatin-14: RELEVANCE TO ITS STORAGE AND SECRETION." Journal of Biological Chemistry 289.24 (2014): 16884-16903. Kelly, Regis B. "Pathways of protein secretion in eukaryotes." Science 230.4721 (1985): 25-32. Blázquez, Mercedes, and Kathleen I. Shennan. "Basic mechanisms of secretion: sorting into the regulated secretory pathway." Biochemistry and Cell Biology 78.3 (2000): 181-191. Arvan, Peter, and David Castle. "Sorting and storage during secretory granule biogenesis: looking backward and looking forward." Biochemical Journal 332.3 (1998): 593-610. Dannies, Priscilla S. "Prolactin and growth hormone aggregates in secretory granules: the need to understand the structure of the aggregate." Endocrine reviews 33.2 (2012): 254-270.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      This manuscript by Reck and colleagues aim at determining the importance of short disulfide loops for the correct sorting to, and release from, secretory granules. They utilize hybrid secretory proteins where sequences encoding disulfide loop from different hormones are cloned in frame with the same secretory peptide, and assess how the presence of the disulfide loop affect the ability of the protein to aggregate in the ER and to get sorted for secretion. By immunofluorescence analysis they show that the presence of a disulfide loop increases the ability of the peptide hormone to form aggregates in the ER, and these observations are confirmed by immunogold-EM. Importantly, aggregate formation is seen both in professional secretory (N-2a) and non-secretory (COS-1) cells. Using immunofluorescence and quantitative immuoblotting, they also show that the ability to aggregate the secretory proteins coincide with increased localization to secretory granules and in increased release from cells in response to stimuli.

      The results from this study are interesting and suggest that small disulfide loops may be an important part of the cargo sorting mechanism in secretory cells, and perhaps also a cause of sorting defects in certain diseases. The study is overall well conducted and worthy of publication after revision.

      **Major comments:**

      1) It is unclear to me what the relationship between the CC-loop and amyloid is. They are not involved in the formation of fibrils and amyloid, yet the authors conclude that they support the amyloid hypothesis of granule biogenesis. This must be clarified.

      Maji et al. (2009) concluded in their Science paper that secretory granules of the pituitary are made of functional amyloids formed by the protein hormones themselves. Evidence for this is that many purified protein hormones formed fibrillar aggregates in vitro with amyloid characteristics. Among the hormones analyzed were 4 CC loop-containing ones: vasopressin, oxytocin, somatostatin-14 (these are just the CC loop segments of the respective precursors), and full-length prolactin (199 aa, containing an N- and a C-terminal CC loop). Amyloid formation of somatostatin-14 was further analyzed in vitro with and without the disulfide bond by Anoop et al. (2014). On the tissue level, it was only shown that granules are stained by amyloid dyes (Maji et al., 2009). Our own lab found that folding-deficient mutant forms of provasopressin formed fibrillar aggregates in vitro (Birk et al., 2009) and in the ER of expressing cells (Birk et al., 2009; Beuret et al., 2011). These ER aggregates likely represent mislocalized amyloid formation that normally happens at the TGN for granule sorting.

      In the present study, we therefore tested the role of different CC loops in cells with respect to (1) inducing ER aggregation of a folding-incompetent reporter and (2) inducing granule sorting of a folded constitutive cargo protein. Unfortunately, the ER aggregates were all very compact and did not reveal fibrillarity. However, secretory granules, which contain functional amyloids, similarly do not have a fibrillar appearance.

      In this study, we do not directly provide evidence for the amyloid (or rather amyloid-like) character of aggregation. The concept of granules consisting of functional amyloids of peptide hormones was the starting point for our analysis. Our results are in line with the functional amyloid hypothesis and thus provide first functional support for it.

      2) What is the actual function of the CC-loops? The authors show that the loops promote aggregation of cargo proteins, yet the mechanism behind this is unclear. For example, would the proteins used in this study be able to aggregate in vitro (i.e. the CC-loop enable aggregation) or do they require some co-factor/chaperone? It would also be good if the authors could clarify or explain why some CC-loops cause aggregation and others not.

      Maji et al. (2009) showed for 3 different CC loops (vasopressin, oxytocin and somatostatin-14) that they aggregate in an amyloid-like form in vitro in purified form in the absence of chaperones or other protein cofactors. Anoop et al. (2014) analyzed in vitro amyloid formation of somatostatin-14 with and without disulfide bond in more detail. The proposed function is aggregation of the hormone into secretory granules as functional amyloids, which is supported by the finding that secretory granules are positive for amyloids.

      In the present study, we tested a variety of CC loops for aggregation in cells rather than in vitro. Many proteins and peptides have been shown to be able to form amyloids in vitro. The hallmark of pathological or functional amyloids is that they are still able to do it in living cells despite the presence of chaperones, whose function is to generally prevent aggregation.

      We found all CC loops to have the ability to mediate ER aggregation and granule sorting, although to different extents. The differences are likely due to their intrinsic potency and/or the way they are presented by the reporter proteins, since we used the same rather short linkers.

      We plan to go through the manuscript text to make our points clearer.

      3) The MS data in table 2 is very confusing, since half of the data points are missing. It is also not clear what the numbers in the table represent and if they are from a single experiment or multiple. As it is presented now, and as I interpret it, these results do not give support to the conclusion that CC loops form disulfide bonds. Since this is an important conclusion from the paper, these experiments need to be clarified, repeated or a different experimental approach used.

      Thanks to this comment, we realize that Table II may have presented the result in a confusing way, making the impression that a lot of data are missing, while in fact the data was measured to be 0. To improve it, we will write 0 instead of – to indicate that no signal could be detected for a particular peptide. In addition, we will move the missing results for CCpN-NP∆ into the figure legend to avoid confusion. In the legend, we will also note that the intensities detected by mass spectrometry differ strongly for different peptides. One experiment is shown, because the numbers for peak areas inherently differ between experiments. We will revise the text to make the experiment clearer.

      Proposed new Table II:

      Table II. Cysteines of CC loops are oxidized in secreted reporter fusion proteins.

      __nonreduced

      • IAA__

      __reduced

      • IAA__

      Diagnostic peptide*

      CCv disulf

      1637

      10

      CYFQNCPR↓

      CCv 2xmod

      0

      696

      CCa disulf

      4

      0

      ↓CNTATCATQTGEDPQGDAAQK↓

      CCa 2xmod

      0

      23

      CCc disulf

      6

      0

      ↓CGNLSTCMLGTTGEDPQGDAAQK↓

      CCc 2xmod

      0

      32

      CCr disulf

      570

      152

      ↓CSRLYTACVYHK↓

      CCr 2xmod

      0

      246

      CC loop fusion proteins with A1Pimyc were immunoprecipitated from the media of producing AtT20 cell lines, reduced with TCEP or not, before treatment with iodoacetic acid (+IAA). Samples analyzed by mass spectrometry for the expected peptide masses and the peak areas, normalized to the intensity of the peptide LQHLENELTHDIITK within A1Pi in arbitrary units are shown. It should be noted that intensities detected by mass spectrometry differ strongly by peptide. *CC loop sequences are shown in green with red cysteines, the N-terminal sequence of A1Pi in blue, linker sequence in black. CCv-, CCa-, and CCc-NP∆ containing samples were digested with trypsin, CCr- and CCpN-NP∆ containing samples with Lys-C. The peptides for CCpN-NP∆ (↓LPICPGGAARCQVTTGEDPQGDAAQK↓, disulfide bonded or carbamidomethylated) could not be detected.

      4) As the authors state, it is well-known that the concentration of proteins in the ER will influence the ability to aggregate. In figure 1 and 2, the authors use transient overexpression to assess the ability of different CC-loops to induce aggregation in the ER. How were these results normalized to expression levels of the proteins? In later experiments the authors instead use stable cell lines expressing similar amounts of the different proteins. However, in these cells there is no obvious aggregation in the ER (see figure 4). It therefore becomes unclear what the role of ER aggregation for sorting to granules is.

      The ER aggregation experiments were not normalized for expression levels. Plasmids were identical except for the short CC loop segments and produced similar transfection efficiencies. Stable cell lines with useful expression levels of CC-NP∆ could not be obtained, most likely because expression of mutant proteins inhibits growth.

      To analyze granule sorting, we expressed CC fusion proteins with rapidly folding A1Pi as a reporter that does not accumulate in the ER. Stable cell lines were important to select clones with moderate and very similar expression levels.

      5) What is the basal secretion of the different proteins, i.e. how much goes through the constitutive secretory pathway and how much goes through the regulated secretory pathway? The authors should show the resting secretion (before BaCl2 addition) for all conditions tested instead of just the change in relation to control (i.e. the way data is presented now it is not possible to tell whether BaCl2 stimulation actually cause an increased release of the peptides).

      The experiment is done by comparing resting secretion (– lanes) with BaCl2 stimulated secretion (+ lanes) in Fig. 5A and C. Stimulated secretion is calculated as a ratio of resting secretion / stimulated secretion (after normalization for cell number and supernatant loading).

      6) Lastly, the importance of CC-loops for the sorting of native peptides is unclear. The authors should test the importance of these loops for aggregation, sorting and secretion of a non-hybrid hormone with naturally occurring CC-loops (and a mutated version lacking the loop). This is important, since it is so far only shown that loops can affect the secretion of non-biologically relevant hybrid hormones.

      In our previous study Beuret et al. (2017), we analyzed the segments contributing to ER aggregation of folding-incompetent mutant provasopressins and to granule sorting for folding-competent mutants of provasopressins by self-aggregtion at the TGN. We found separate protein segments – vasopressin (=CCv) and the glycopeptide – to contribute to aggregation in both localizations. Our study is a follow up on the finding for vasopressin, expanding to other CC loops found in peptide hormones. Our results show that CC loops in general have the ability to aggregate and contribute to granule sorting.

      As exemplified by provasopressin, the CC loop may not be the only contributor. Preliminary experiments suggest the same for growth hormone. The detailed analysis of the aggregating sequences in one or more prohormone is clearly beyond the scope of our study.

      **Minor comments:**

      1) Stated that the 2x CC-loop constructs showed a positive effect in the cases of CCv and CCr, but this is not evaluated statistically.

      We will add the statistics to the respective figures.

      2) Explain the abbreviation POMC

      We will add the full name to the text.

      3) Figure 6D. Paired Student's t-test is not appropriate for determining significance when data is not paired (unpaired t-tests used throughout the rest of the paper).

      Only in the lubrol insolubility experiment did we find considerable shifts between experiments (particularly obvious for the yellow experiment). Instead of normalizing to the control construct, we used the paired t-test. However, using the unparied t-test does not produce fundamentally different significance. If required, we will change the figure as suggested.

      Figure 6D using unpaired t-test: [Figure]

      Reviewer #2 (Significance (Required)):

      The work in this paper builds on previous work from the same group and reinforces the notion that peptide aggregation is an important part of the sorting process that controls efficient delivery of certain proteins to nascent secretory granules, and suggest that short loops formed by disulfide bridges between closely apposed cysteine residues may be part of this sorting mechanism. The paper is of general cell biological interest, but perhaps of special interest to researches working on professional secretory cells and mechanisms of secretory protein sorting and secretion. My own research focuses on stimulus-secretion coupling pathways in secretory cells and we primarily use live cell imaging approaches to visualize different steps of secretory granule biogenesis and release.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Since the small disulfide loop of the nonapeptide vasopressin has been previously demonstrated to play a role the self-aggregation and secretory granule targeting of vasopressin precursor (Beuret et al., 2017), and as several other peptide hormones contain small disulfide loops, Reck and colleagues investigate in this study the requirement of small disulfide loops coming from four additional peptide hormones for the self-aggregation and secretory granule targeting of their precursors. Then, they studied the aggregation role of small disulfide loops in the ER and the TGN of two cell lines, COS1 and Neuro-2a. Using confocal and TEM, an aggregation has indeed been observed, although to different extents depending on the cell line. When fused to a constitutively secreted reporter protein, these disulfide loops induced their sorting into secretory granules, increased the stimulated secretion and Lubrol insolubility in endocrine AtT20 cells. All these results led the authors to hypothesize that small disulfide loops may act as a general device for peptide hormone aggregation and sorting, and therefore for secretory granule biogenesis.

      **Major comments:**

      The authors demonstrated the ability of small disulfide loops of peptide hormones to induce peptide precursor aggregation in ER using confocal microscopy, in COS1 and Neuro-2a cell lines, with a higher extent in COS1 cells. The authors have to moderate this conclusion and to include in their interpretation that distinct results may be due to the distinct secretory phenotype of these two cell lines: COS1 are epithelial cells, i.e. with a unique constitutive secretory pathway, while Neuro-2a as well as AtT20 cells also possess a regulated secretory pathway. Thus, the differences could be explained by the distinct molecular mechanisms involved in the formation of constitutive vesicles or secretory granules, and therefore aggregation and/or sorting processes could be distinct in the two cell types. We can also suggest to remove COS1-related results, to avoid hasty conclusions. As suggested, we will amend the text to point out that the two cell lines differ with respect to regulated secretion and to explain why they were used. COS-1 and Neuro-2a cells were previously used by Birk et al. (2009) to study ER aggregation of disease mutants of provasopressin. COS-1 cells were used because they are large with an extensive ER suitable for immunofluorescence microscopy. Neuro-2a cells are of neuroendocrine origin and thus more comparable to the cell types where ER aggregation of disease mutants of provasopressin or growth hormone was observed. However, the presence or absence of a regulated pathway has no relevance for ER aggregation experiments, since the different pathways diverge only at the TGN.

      The data and the methods can be reproduced and the experiments are adequately replicated, using timely statistical analysis.

      **Minor comments:**

      • Figure 3: to complete TEM study, the concomitant use of an ER specific antibody would definitely demonstrate that small disulfide loop-containing aggregates are linked to ER compartment.

      In our previous study Birk et al. (2009), we performed double-immunogold staining for provasopressin mutants and calreticulin to confirm aggregation in the ER. This anti-calreticulin antibody is unfortunately not commercially available anymore and other antibodies we tested were not suitable for immuno-EM. Instead, we colocalized PDI with CC-NP∆ constructs for immunofluorescence microscopy. Colocalization is so extensive that we believe EM confirmation to be unnecessary.

      • Along abstract, introduction and discussion sections, the authors should avoid to conclude on the role of small disulfide loops on secretory granule biogenesis, but rather limit their conclusion on prohormone aggregation and targeting. Indeed, the present study did not highlight any direct molecular / physical link between disulfide loops and TGN membrane to drive secretory granule formation. Granule biogenesis involves a number of processes including interaction of cargo components with the membrane and of the actomyosin complex with the forming buds, but also selfaggregation of cargo as functional amyloids. However, we will reword our statements in the Abstract avoiding the term "**granule biogenesis".

      Reviewer #3 (Significance (Required)):

      • This study highlights small disulfide loops as novel signals for self-aggregating and secretory granule sorting of prohormone precursors in cells with a regulated secretory pathway. These results help to understand the molecular mechanism driving peptide hormone secretion, a physiological process which is crucial for interorgan communication and functional synchronization. Moreover, their previous study revealed that vasopressin small disulfide loop is involved in toxic unfolded mutant aggregation in the ER (Beuret et al., 2017), which highlights the clinical potential of the work.
      • Audience that might be interested in and influenced by the reported findings: cell biologists interested in cell trafficking, peptide hormone secretion
      • My field of expertise: secretory granule biogenesis, hormone sorting, secretory cells, neurosecretion.

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      The manuscript has not yet been revised.

      3. Description of analyses that authors prefer not to carry out

      As indicated in the point-by-point response above, we consider additional analyses of in vitro aggregation with purified proteins to be beyond the scope of our study.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      Since the small disulfide loop of the nonapeptide vasopressin has been previously demonstrated to play a role the self-aggregation and secretory granule targeting of vasopressin precursor (Beuret et al., 2017), and as several other peptide hormones contain small disulfide loops, Reck and colleagues investigate in this study the requirement of small disulfide loops coming from four additional peptide hormones for the self-aggregation and secretory granule targeting of their precursors. Then, they studied the aggregation role of small disulfide loops in the ER and the TGN of two cell lines, COS1 and Neuro-2a. Using confocal and TEM, an aggregation has indeed been observed, although to different extents depending on the cell line. When fused to a constitutively secreted reporter protein, these disulfide loops induced their sorting into secretory granules, increased the stimulated secretion and Lubrol insolubility in endocrine AtT20 cells. All these results led the authors to hypothesize that small disulfide loops may act as a general device for peptide hormone aggregation and sorting, and therefore for secretory granule biogenesis.

      Major comments:

      The authors demonstrated the ability of small disulfide loops of peptide hormones to induce peptide precursor aggregation in ER using confocal microscopy, in COS1 and Neuro-2a cell lines, with a higher extent in COS1 cells. The authors have to moderate this conclusion and to include in their interpretation that distinct results may be due to the distinct secretory phenotype of these two cell lines: COS1 are epithelial cells, i.e. with a unique constitutive secretory pathway, while Neuro-2a as well as AtT20 cells also possess a regulated secretory pathway. Thus, the differences could be explained by the distinct molecular mechanisms involved in the formation of constitutive vesicles or secretory granules, and therefore aggregation and/or sorting processes could be distinct in the two cell types. We can also suggest to remove COS1-related results, to avoid hasty conclusions.

      The data and the methods can be reproduced and the experiments are adequately replicated, using timely statistical analysis.

      Minor comments:

      • Figure 3: to complete TEM study, the concomitant use of an ER specific antibody would definitely demonstrate that small disulfide loop-containing aggregates are linked to ER compartment.
      • Along abstract, introduction and discussion sections, the authors should avoid to conclude on the role of small disulfide loops on secretory granule biogenesis, but rather limit their conclusion on prohormone aggregation and targeting. Indeed, the present study did not highlight any direct molecular / physical link between disulfide loops and TGN membrane to drive secretory granule formation.

      Significance

      • This study highlights small disulfide loops as novel signals for self-aggregating and secretory granule sorting of prohormone precursors in cells with a regulated secretory pathway. These results help to understand the molecular mechanism driving peptide hormone secretion, a physiological process which is crucial for interorgan communication and functional synchronization. Moreover, their previous study revealed that vasopressin small disulfide loop is involved in toxic unfolded mutant aggregation in the ER (Beuret et al., 2017), which highlights the clinical potential of the work.
        • Audience that might be interested in and influenced by the reported findings: cell biologists interested in cell trafficking, peptide hormone secretion
        • My field of expertise: secretory granule biogenesis, hormone sorting, secretory cells, neurosecretion.
    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This manuscript by Reck and colleagues aim at determining the importance of short disulfide loops for the correct sorting to, and release from, secretory granules. They utilize hybrid secretory proteins where sequences encoding disulfide loop from different hormones are cloned in frame with the same secretory peptide, and assess how the presence of the disulfide loop affect the ability of the protein to aggregate in the ER and to get sorted for secretion. By immunofluorescence analysis they show that the presence of a disulfide loop increases the ability of the peptide hormone to form aggregates in the ER, and these observations are confirmed by immunogold-EM. Importantly, aggregate formation is seen both in professional secretory (N-2a) and non-secretory (COS-1) cells. Using immunofluorescence and quantitative immuoblotting, they also show that the ability to aggregate the secretory proteins coincide with increased localization to secretory granules and in increased release from cells in response to stimuli.

      The results from this study are interesting and suggest that small disulfide loops may be an important part of the cargo sorting mechanism in secretory cells, and perhaps also a cause of sorting defects in certain diseases. The study is overall well conducted and worthy of publication after revision.

      Major comments:

      1) It is unclear to me what the relationship between the CC-loop and amyloid is. They are not involved in the formation of fibrils and amyloid, yet the authors conclude that they support the amyloid hypothesis of granule biogenesis. This must be clarified.

      2) What is the actual function of the CC-loops? The authors show that the loops promote aggregation of cargo proteins, yet the mechanism behind this is unclear. For example, would the proteins used in this study be able to aggregate in vitro (i.e. the CC-loop enable aggregation) or do they require some co-factor/chaperone? It would also be good if the authors could clarify or explain why some CC-loops cause aggregation and others not.

      3) The MS data in table 2 is very confusing, since half of the data points are missing. It is also not clear what the numbers in the table represent and if they are from a single experiment or multiple. As it is presented now, and as I interpret it, these results do not give support to the conclusion that CC loops form disulfide bonds. Since this is an important conclusion from the paper, these experiments need to be clarified, repeated or a different experimental approach used.

      4) As the authors state, it is well-known that the concentration of proteins in the ER will influence the ability to aggregate. In figure 1 and 2, the authors use transient overexpression to assess the ability of different CC-loops to induce aggregation in the ER. How were these results normalized to expression levels of the proteins? In later experiments the authors instead use stable cell lines expressing similar amounts of the different proteins. However, in these cells there is no obvious aggregation in the ER (see figure 4). It therefore becomes unclear what the role of ER aggregation for sorting to granules is.

      5) What is the basal secretion of the different proteins, i.e. how much goes through the constitutive secretory pathway and how much goes through the regulated secretory pathway? The authors should show the resting secretion (before BaCl2 addition) for all conditions tested instead of just the change in relation to control (i.e. the way data is presented now it is not possible to tell whether BaCl2 stimulation actually cause an increased release of the peptides).

      6) Lastly, the importance of CC-loops for the sorting of native peptides is unclear. The authors should test the importance of these loops for aggregation, sorting and secretion of a non-hybrid hormone with naturally occurring CC-loops (and a mutated version lacking the loop). This is important, since it is so far only shown that loops can affect the secretion of non-biologically relevant hybrid hormones.

      Minor comments:

      1) Stated that the 2x CC-loop constructs showed a positive effect in the cases of CCv and CCr, but this is not evaluated statistically.

      2) Explain the abbreviation POMC

      3) Figure 6D. Paired Student's t-test is not appropriate for determining significance when data is not paired (unpaired t-tests used throughout the rest of the paper).

      Significance

      The work in this paper builds on previous work from the same group and reinforces the notion that peptide aggregation is an important part of the sorting process that controls efficient delivery of certain proteins to nascent secretory granules, and suggest that short loops formed by disulfide bridges between closely apposed cysteine residues may be part of this sorting mechanism. The paper is of general cell biological interest, but perhaps of special interest to researches working on professional secretory cells and mechanisms of secretory protein sorting and secretion. My own research focuses on stimulus-secretion coupling pathways in secretory cells and we primarily use live cell imaging approaches to visualize different steps of secretory granule biogenesis and release.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Apart from the default constitutive pathway for protein secretion some specialized cells (e.g., neuroendocrine cells, exocrine cells, peptidergic neurons and mast cells) exhibit additional regulated secretory pathway, where peptide hormones are stored as highly concentrated ordered manner inside electron opaque "dense core" of secretory granule for long duration until secretagogue mediated burst release. Although the general sorting receptor for packaging hormones in secretory granules is not yet identified, self-aggregation in the trans-Golgi network is a common shared property of peptide hormones and is a well-accepted potential sorting mechanism. Here the authors have hypothesized that cysteine containing small disulphide loop (CC loop), which is abundant in several hormone precursors, acts as aggregation mediator in TGN for sorting into secretory granule. They have tested the aggregation propensity of a misfolded reporter protein, NPΔ, in ER by attaching the CC loop segment of different hormones which promoted the pathological aggregation in endoplasmic reticulum (ER) of mutant provasopressin in the case of diabetes insipidus. Immunofluorescence and immunogold electron microscopy revealed accumulation of aggregates in the ER when CC loop of different hormonal origin fused NPΔ was transiently transfected in COS-1 fibroblasts and Neuro-2a neuroblastoma cells. The authors have also shown small disulphide loop mediated functional aggregation in TGN can sort a constitutively secreted protein, α1-protease inhibitor, into the secretory granule. The rerouting capacity of CC loop was tested in stably expressed AtT-20 cell line by confirming their localization with CgA-positive secretory granule as well as by studying BaCl2 mediated stimulated secretion and by testing secretory granule specific lubrol insolubility.

      Major comments:

      The study is highly impressive, and the results fully support the CC loop mediated hormone sorting hypothesis. However, it would be nice if the authors characterize the nature of the CC-loop mediated aggregates as hormones are reported to be stored inside secretory granules as functional amyloid (Maji et al., 2009). The mechanistic reason behind the small disulfide loop mediated aggregation was not explained in the paper. Authors may propose the probable molecular reasons behind CC loop mediated aggregation to completely justify their hypothesis.

      Although the hypothesis and the experimental results are highly impressive, the authors may consider adding the following experiments.

      The authors replaced CC-loop by the proline/glycine repeat sequence (Pro1) as a negative control which was previously reported to abolish aggregation as well. However, the authors may completely delete the small loop forming segment, CCv, and may check the status of His-tagged fused neurophysin II (NPΔ) segment as an additional negative control.

      To find the ultrastructure authors have done immunogold assay with anti-His antibody which indicated different CC loop mediated ER aggregation. Since the amyloid-like fibril nature of pro-vasopressin mutant mediated ER aggregates was previously reported (Beuret et al., 2017), authors must check the nature of the CC loop mediated ER aggregates with amyloid specific antibody. Since hormones are known to form reversible functional amyloid during their storage inside secretory granule, authors may consider characterizing the nature of the aggregates formed by CC loop fused constitutive protein in AtT-20 cell line by immunostaining, immunoprecipitation and dot blot assay using amyloid specific antibody.

      Minor comments:

      In the quantification study (Figure 2C) CCc and CCr showed almost similar ER aggregates (around 40%). But authors have commented that all constructs except CCc produce statistically significant increases in cells compared to background. Authors must clarify the statement.

      In lubrol insolubility assay, the otherwise constitutively secreted protein A1Pimyc (negative control) showed 23% insolubility. The authors explained the observation by commenting about trapping of the protein inside granule aggregate. But CCv and CCa fused proteins showed a very slight increase (around 30%). Only CCc construct showed more than 40% insolubility. If the trapping of constitutive protein may result in 23% insolubility, all the insolubility data except CCc is not satisfactory to claim as secretory granular content of aggregated protein. The authors must explain that. The authors have satisfactorily referenced prior studies in the field. However, authors may consider adding the following papers as they are directly connected with the hypothesis. The sorting of POMC hormone into secretory granules by disulphide loop was previously studied. (Cool et al.,1995). The N-terminal loop segment was also previously used to reroute a constitutive protein chloramphenicol acetyltransferase (Tam and Peng, 1993). S K. Maji and his coworker had previously shown that disulphide bond maintains native reversible functional amyloid structure relevant to hormone storage inside secretory granule whereas disulphide bond disruption led to rapid irreversible amyloid aggregation using cyclic somatostatin as model peptide. (Anoop et al., 2014).

      Authors must check grammar and may reconstruct a few sentences where sentence construction seems complicated.

      Significance

      This manuscript has a significant contribution to enrich academia with fundamental research knowledge of hormone sorting mechanisms. Although constitutive and regulated secretory pathways are known for long times, the exact sorting mechanism is not yet elucidated. There is no common receptor identified yet for recruiting regulated secretary proteins inside the secretory granules.

      Aggregation in the TGN is a well-accepted mechanism for sorting. However, the triggering factor for aggregation is not yet known. This study has shed light on a novel hypothesis, which has considered intramolecular disulfide bond mediated small CC loop in hormone may act as aggregation mediator. Since many regulated secretory proteins contain the short disulphide loop, the hypothesis proposed in the manuscript is interesting.

      It has been confirmed that TGN is the last compartment which is common to both regulated and constitutive pathways (Kelly, 1985). There is no sorting mechanism required for the constitutive one as this is the default mechanism, whereas a regulated secretory pathway requires a specific sorting mechanism to be efficiently packaged in the secretory granules. There are two popular hypotheses about protein sorting in regulated secretory pathways. They are "sorting for entry" and "sorting for retention" (Blázquez and Kathleen, 2000). In "sorting for entry" hormones destined to go to the regulated secretory pathway start to form aggregates in the TGN specific environment excluding other proteins destined to go to the constitutive pathway. Arvan and Castle proposed the second mechanism as some hormones, like proinsulin, are initially packaged with lysosomal enzymes in immature secretory granules (ISG) (Arvan and Castle, 1998). But with time they start to aggregate and lysosomal enzymes are removed from ISG by small constitutive-like vesicles. Although, in both the mechanisms aggregation is an essential sorting criterion the molecular events that lead to aggregation is not yet elucidated. TGN specific environmental conditions including pH (around 6.5), divalent metal ions (Zn2+, Cu2+), Glycosaminoglycans (GAGs) have potential to trigger aggregation (Dannies, Priscilla S, 2012). Though each hormone has aggregation prone regions in the amino acid sequence, there is no common amino acid sequence responsible for aggregation. The authors in this manuscript, have pointed out an interesting observation that many hormones contain small disulfide loops which are exposed due to their presence in N or C terminal or close to the processing site. Based on their observation, they hypothesized CC loop may act as aggregation driver for hormone sorting. In-cell study with CC construct from different hormones successfully rerouted a constitutively secretory protein into the regulated pathway which supported their novel hypothesis.

      However, the hypothesis raises some questions to be answered regarding the molecular mechanism of CC loop mediated aggregation. Why does CC-loop promote aggregation? Does the amino acid sequence, size of the loop play a role in aggregation? The granular structure shown in the manuscript from different CC loops has different size and shape (Figure 2 and 3). What is the reason for the structural heterogeneity of the CC loop mediated dense core? Since authors have shown CC loop mediated aggregation both in functional as well as in diseased aggregation, a very important aspect to address would be the structure-function relationship of the aggregates. Since authors have rightly pointed out that not all hormones or prohormones contain CC loop, another curious question would be about the sorting mechanism of those without CC loop. The best part of the study is that it has tried to explain the well-established aggregation mediated sorting mechanism from a new perspective, which raises room for many questions to be addressed by further research.

      From this study, the audience will get to know about the role of small disulphide loop in functional and diseased associated protein/peptide aggregation. The audience will also get an idea about the sorting mechanism in the regulated secretory pathway from the study. According to my expertise and knowledge where I do protein aggregation related to human diseases and hormone storage, I see this manuscript is a fantastic addition to understand the secretory granules biogenesis of hormones with storage and subsequent release.

      Reference: Maji, Samir K., et al. "Functional amyloids as natural storage of peptide hormones in pituitary secretory granules." Science 325.5938 (2009): 328-332. Beuret, Nicole, et al. "Amyloid-like aggregation of provasopressin in diabetes insipidus and secretory granule sorting." BMC biology 15.1 (2017): 1-14. Cool, David R., et al. "Identification of the sorting signal motif within pro-opiomelanocortin for the regulated secretory pathway." Journal of Biological Chemistry 270.15 (1995): 8723-8729. Tam, W. W., K. I. Andreasson, and Y. Peng Loh. "The amino-terminal sequence of pro-opiomelanocortin directs intracellular targeting to the regulated secretory pathway." European journal of cell biology 62.2 (1993): 294-306.

      Anoop, Arunagiri, et al. "Elucidating the Role of Disulfide Bond on Amyloid Formation and Fibril Reversibility of Somatostatin-14: RELEVANCE TO ITS STORAGE AND SECRETION." Journal of Biological Chemistry 289.24 (2014): 16884-16903. Kelly, Regis B. "Pathways of protein secretion in eukaryotes." Science 230.4721 (1985): 25-32. Blázquez, Mercedes, and Kathleen I. Shennan. "Basic mechanisms of secretion: sorting into the regulated secretory pathway." Biochemistry and Cell Biology 78.3 (2000): 181-191. Arvan, Peter, and David Castle. "Sorting and storage during secretory granule biogenesis: looking backward and looking forward." Biochemical Journal 332.3 (1998): 593-610. Dannies, Priscilla S. "Prolactin and growth hormone aggregates in secretory granules: the need to understand the structure of the aggregate." Endocrine reviews 33.2 (2012): 254-270.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      FULL REVISION

      Manuscript number: RC-2021-00934

      Corresponding author(s): Seiya, Mizuno

      General Statements

      We would like to thank all the reviewers for their comments on improving the manuscript. We are encouraged by the overall positive responses from the reviewers. According to the reviewers’ comments, we have further refined our manuscript. We are confident that we have addressed all the reviewers’ comments and suggestions by incorporating them into the revised manuscript. We highlighted the changed text in the manuscript in red. The point-by-point responses to all comments follow.

      Point-by-point description of the revisions

      Reviewer 1:

      The study by Akihiro and colleagues describe the generation of multiplex genotyping method for detecting CRISPR gene editing alleles using nanopore sequencing and a machine learning program. The method is based on long-range PCR amplification of intended targeted loci from gene edited animals followed by nanopore sequencing. A PCR-index is introduced to the sample pooling system before sequencing, thus allow sequencing up to 100 sample in one flowcell. The study developed a machine learning program for allele binning, analysis, and presentation. To demonstrate the applicability of the method, the study has validated their methods for detection of point mutations, deletion, and flox insertion. The study has in principal provided sufficient investigation and data to demonstrate the validity of the method. All the figures are very nicely and clearly presented. However, there is a few concerns that it should be taken in to consideration.

      We appreciate the constructive and important comments from the reviewer.

      Reviewer 1_Comment #1:

      Many previous reported unintended structure variations caused by CRISPR off-targets are typically much larger deletion/insertion/insertion/translocation occurred outside the target sites. The current study is more for targeted allele genotyping. The use of structure variable (SV) in the whole study should be considered to revise thoroughly.

      SV is typically referred to genomic variation of approximately 1kb and above. What the study describe in this study is still within indel types instead. Thus, comparing the DAJIN with NanoSV and Sniffles on reads with 50, 100 and 200 bases deletions is not appropriate.

      The detection of SV alleles in the whole study is most likely a cause of minor indel alleles and sequencing errors. Figure 2b, BC32, WT mice also contains a proportion of SV allele, which is apparently caused by sequencing error. Such SV which is not related to CRISPR gene editing is also seen in other genotyping results e.g. Figure 3a. Figure 4b, Figure 5c, Figure 6b.

      Another co-factor that contributes to the SVs is the PCR-error from the method.

      Thank you very much for your comments. We agree that structural variation traditionally referred to genomic alterations that are larger than 1 kb in length. Although the application of sequencing technology has expanded the spectrum of structural variation to include smaller events >50 bp in length (PMID: 21358748, PMID: 26432246), there are no common understanding on the definition of the name of genomic rearrangements >50 bp in length through genome editing. We changed the name of the unexpected mutation reads more than approximately 50 bp in length “Large rearrangements (LAR)”. We changed description on the name of reads that DAJIN annotates in the Methods (Page 6, Line 205) and Results section (Page8, Line 249) as well as all other parts throughout the manuscript.

      Although we believe most of the LAR alleles are the real alleles generated through genomic rearrangements (Fig. 3b&3c, S12, and S16), we recognize that minor fractions of the LAR alleles, including those observed in WT mice, are composed of reads with high sequencing error rate. Visualized BAM files and consensus sequences can be indicators of the annotation results, providing information to the users of DAJIN that minor alleles that are similar in proportion to the one in the WT sample can be artificial alleles. We also cannot exclude the possibility that LAR alleles include those generated through PCR error. ‘Pseudo-LoxP’ alleles could be generated if the PCR products, which included one-side LoxP but not another-side LoxP, worked as a PCR primer to anneal WT allele in the next PCR step (Page 12, Line 425-427). Recently developed methods may address these limitations. We added description in the Discussion section (Page 17-18, Line 608-620).

      Reviewer 1_Comment #2:

      The reason that current method detect more than two alleles from one animal is probably due to the chimerism of the animal. However, when looking at the BAM file and figures presented in Figure 1b, 2c, 3b, 3d, 4c, as well as those in the Supplementary figures, there seems to be more than one allele (indels reads with different size) presented in one category.

      For example, Figure 2C, mice BC12, it is not fully aligned between the all alleles and the allele1 and allele 2 presented. For allele 1, which is called SV, there are reads with different size of indels. For allele 2, which is called intended PM, some reads are a hybrid of deletion and intended substitution.

      Thank you for checking the data in detail. As the reviewer pointed out, some of the reads in each allele showed indels with different sizes. We think these indel mutations are due to nanopore sequencing errors. Although the error rate of nanopore sequencing has improved, it has been reported that an error rate of 5% occurs in 1D sequencing of R9.4 flow cells that is the same flow cells used in our study (DOI: 10.1002/wfs2.1323). In this study, DAJIN mitigated the nanopore sequencing errors by calculating the MIDS score (Fig. S7), but the visualization using the BAM file showed the raw reads including the sequence errors. For this reason, the one allele seems to include different indel alleles.

      To evaluate the point, we performed Sanger sequencing and found that there were no hybrid sequences containing indel mutations, but only intended point mutation in BC12 allele 2 (Fig. 2d). The results of Sanger sequencing suggested that the indel mutations visualized by the BAM file were due to nanopore sequencing errors. To clarify the points, we updated the description in the Discussion section (Page 15-16, Line 528-548).

      Reviewer 1_Comment #3:

      What is the advantage of the current method as compared to the one reported by Bi et al., 2020, genome biology, previously?

      Thank you for pointing it out. We believe that one of the advantages of IDM-seq developed by Bi et al. is performing quantitative analysis by correcting PCR bias via Unique Molecular Identifiers (UMIs). However, when multiple samples are processed simultaneously, it is impractical in terms of cost and workability to prepare primers for the UMIs. While IDM-seq has the advantage to quantify the precise amount of each allele in a single sample, DAJIN is more suitable for primary and comprehensive analysis of multiple genome-edited samples. We have described these points in the Discussion section (Page 15, Line 509-513).

      Reviewer 1_Comment #4:

      The report machine learning method is developed for calling the different alleles. Has the authors compare DAJIN with e.g. NanoCaller, which is developed for SNPs and small indels calling based on DNN.

      We are thankful to the referee for bringing the comparison with NanoCaller to our attention. We conducted NanoCaller and found it performed better to detect the point mutation than Medaka and Clair. However, because NanoCaller could not detect the LAR (formerly labelled as “SV”) alleles, it incorrectly reported the genotype of BC25 as 'point mutation', not 'LAR with point mutation'. We added the results of NanoCaller in Table S9 and described these points in the Results section (Page 10, Line338-339).

      Reviewer 1_Comment #5:

      Apart from genotyping, many CRISPR studies performed in cells are focusing on profiling the indel profiles in a pool of edited cells. It would broaden the applicability of the method for detecting different indels types in such samples and conditions. Current methods, such as TIDE/ICE, NGS-based amplicon sequencing, IDAA can only detect smaller indels. DAJIN will add the advantage of detecting longer indels for such application.

      Thank you very much for your comments. We added description on application of DAJIN in the Discussion section (Page 17, Line 592-596).

      Reviewer #1 Significance :

      Although similar methods are reported for genotyping of the CRISPR editing outcome, the current study introduce the PCR barcoding and particularly the bioinformatic tool box for allele binning and calculation contribute with useful tool to the filed. The study has demonstrated with multiple applications demonstrating the broad applicability of it.

      Reviewer 2:

      CRISPR nucleases typically generate DNA double strand breaks (DSBs) at target site, which typically generate small insertion and deletion (indel) enabling precise gene knockout or knock-in. However, accompanied DNA DSBs often induce unwanted large deletions or chromosomal translocation. Thus, to assess such large variations as well as small indels is crucial in the genome editing field. In this manuscript, the authors developed a long-range assessment tool, named Determine Allele mutations and Judge Intended genotype by Nanopore sequencer (DAJIN), using a long-read sequencer, Nanopore sequencing. Overall, the topic will be interesting for broad readers and this tool looks technologically sound. I would suggest a few comments that may strengthen this manuscript, as follows.

      We are grateful for the referee’s valuable suggestions to improve our manuscript.

      Reviewer 2_Comment #1:

      Another key study is missed in this manuscript. Recently, a tool with similar concept to DAJIN was published in Nat Methods, which uses also long-read sequencers, Nanopore or PacBio [PMID: 33432244]. It is necessary to describe the benefits of DAJIN against the previous study.

      Thank you for pointing this out. Our method has an advantage over those utilizing unique molecular identifiers (UMIs) in its automatic identification and classification of genomic rearrangements including unexpected mutations in multiple samples obtained under different editing conditions (different target loci). As per our response to the Reviewer #1_Comment #3, one of the disadvantages of UMIs is the cost. More accessible methods of routine assessment of on-target genome editing outcomes are required, as well as unbiased assessment of editing products (PMID: 32643177). We showed in the manuscript that the machine-learning-based model could bypass molecular tagging to provide a feasible approach for routine assessment of genome editing outcomes. DAJIN will make a very significant contribution to speeding up and improving the accuracy of this experimental process.

      We agree that the approach reported by Karst et al. has certainly contributed to generation of highly accurate single-molecule consensus sequences. Analysis of small portion of samples using UMI-based methods may compensate for the limitations of DAJIN such as PCR bias and/or PCR-mediated recombination as you described in your comment #6. We added description in the Discussion section (Page 15, Line 509-513; Page 17, Line 615-618).

      Reviewer 2_Comment #2:

      In Figure 1a, the authors used Barcoding but details information is not present in the main text. The length and context information are necessary to be described in the main text.

      We thank the reviewer for these comments. According to the comments, we illustrated the process of PCR-based barcoding in Fig. 1a. Besides, we described the length of barcodes at "Library preparation and nanopore sequencing" in the Methods section (Page 4, Line 137 & 140).

      Reviewer 2_Comment #3:

      The term "SV (structural variation)" over "Single-nucleotide variant (SNV)" seems ambiguous. Does "SV" include large deletion and chromosomal translocation? In this manuscript, I guess that SNV indicates small indels, whereas SV indicates large indels. The detailed definition is needed for better understanding.

      Thank you very much for your comments. We intended to classify and label large genomic rearrangements including large deletion and chromosomal translocation as “SV (structural variation)”. We agree that structural variation traditionally referred to genomic alterations that are larger than 1 kb in length. Although the application of sequencing technology has expanded the spectrum of structural variation to include smaller events >50 bp in length (PMID: 21358748, PMID: 26432246), there are no common understanding on the definition of the name of genomic rearrangements >50 bp in length through genome editing. We changed the name of the unexpected mutation reads more than approximately 50 bp in length “Large rearrangements (LAR)”. We changed description on the name of reads that DAJIN annotates in the Methods (Page 6, Line 205) and Results section (Page8, Line 249) as well as all other parts throughout the manuscript.

      Reviewer 2_Comment #4:

      In Figure 2, IGV exhibits several SNVs (i.e., random errors) in each query sequence, which might be due to the low accuracy of Nanopore sequencing. I understand that DAJIN makes consensus sequence based on those long-read sequences. But I wonder how DAJIN pinpoint the point mutation (PM) so exactly?

      Thank you for pointing it out. As you mentioned, the low accuracy of Nanopore long-read sequencing made PM detection difficult. We tackled the issue and partly solved it by (i) calculation of MIDS score (Fig. S7), (ii) reducing data's dimension by principal component analysis (PCA), and (iii) setting proper parameters of HDMSCAN.

      DAJIN converts ACGT nucleotide information to MIDS (Match, Insertion, Deletion, and Substitution) (Fig. S6). Subsequently, DAJIN subtracts the relative frequency of MIDS between a control and a sample. We called the subtracted relative frequency 'MIDS score' (Fig. S7). The subtraction mitigates the sequencing errors because the error patterns are similar between a sample and a control. We next perform clustering using the MIDS score. DAJIN compresses the score by PCA and extracts five dimensions. The dimension reduction may be effective to mitigate sequencing errors because the sequencing errors have lower scores than true mutations. Subsequently, DAJIN performs HDBSCAN, a density-based clustering method. The HDBSCAN have a parameter of 'min_cluster_size' that indicates a minimum number of samples in a cluster. DAJIN finds the parameter returning the most frequent cluster numbers by searching the value in the range of 10% to 40% of reads. It means DAJIN ignores minor clusters that contain less than 10% of reads. We set the criteria because sequencing errors often made such minor clusters.

      In summary, we consider the MIDS score, PCA and the parameter setting of HDBSCAN support DAJIN's accurate PM detection. To clarify the point, we updated the description in the Methods section (Page 7, Line 217-225).

      Reviewer 2_Comment #5:

      In page 9, the authors also used next-generation sequencing (NGS). I guess this NGS indicates illumine-based short-read sequencing. Clearer definition is necessary here.

      We thank the referee for bringing this unclarity to our attention. According to the reviewer's comment, we updated the words 'NGS' to the 'illumina-based short-read next-generation sequencing' or 'short-read NGS' in the whole text.

      Reviewer 2_Comment #5-1:

      Whereas DAJIN could reported SVs, PM, and WT, the NGS could not capture SVs. Could you write the reason here? I guess that the short-read sequences including SVs might be discarded during the alignment process, which means that it is because of software limitation, rather than the NGS itself.

      Thank you for pointing this out. In this study, we performed the short-read NGS analysis by paired-end sequencing (2 x 151 bases) for PCR amplicons of about 200 bp length. We consider the main reason that NGS could not capture LAR (formerly labelled as “SV”) is due to the PCR process. The allele 2 in BC20, BC25, and BC26 of Tyr point mutation had a large deletion including primer annealing sites, which makes it impossible to obtain the PCR amplicon of this allele. Besides, allele 1 in BC25 had about 60-70 bp insertions. The insertion might make it difficult to amplify the whole length of this allele because of the limited number of cycles in short-read NGS.

      To examine whether the short-read sequencing reads were discarded during the alignment process, we calculated the mapping percentages of BC20, BC25, and BC26 and found that 97-99% of reads were successfully aligned to the mm10 reference genome. We think this result can support our hypothesis. We added the results in Table S10 and described the points in the Results section (Page 10, Line 329-332).

      Reviewer 2_Comment #6:

      Basically, DAJIN amplify the target region using PCR, thus PCR bias (e.g. unequal amplification according to different lengths) should be considered. The authors should address it. Moreover, it is better to describe the limitation of current DAJIN in the discussion section.

      Thank you very much for your comments. PCR amplification of genomic DNA is essential in our method described in the manuscript. As we have described in a paragraph in the Discussion section (Page 17, Line 597-601), we recognize there is an unavoidable limitation with PCR bias. We also cannot exclude the possibility that large rearrangements (‘LAR’, formerly labeled as ‘SV’) include alleles generated through PCR and/or sequencing error. ‘Pseudo-LoxP’ alleles could be generated if the PCR products, which included one-side LoxP but not another-side LoxP, worked as a PCR primer to anneal WT allele in the next PCR step (Page 17, Line 608-613). We recognize that minor fractions of the ‘LAR’ alleles, including those observed in WT mice, are composed of reads with high sequencing error rate. Recently developed methods including the one you kindly mentioned in the comment #1 may address these limitations. We added description in the Discussion section (Page 17-18, Line 615-618).

      Reviewer #2 Significance:

      Overall, the topic will be interesting for broad readers

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      General comments

      CRISPR nucleases typically generate DNA double strand breaks (DSBs) at target site, which typically generate small insertion and deletion (indel) enabling precise gene knockout or knock-in. However, accompanied DNA DSBs often induce unwanted large deletions or chromosomal translocation. Thus, to assess such large variations as well as small indels is crucial in the genome editing field. In this manuscript, the authors developed a long-range assessment tool, named Determine Allele mutations and Judge Intended genotype by Nanopore sequencer (DAJIN), using a long-read sequencer, Nanopore sequencing. Overall, the topic will be interesting for broad readers and this tool looks technologically sound. I would suggest a few comments that may strengthen this manuscript, as follows.

      Specific Comments:

      1. Another key study is missed in this manuscript. Recently, a tool with similar concept to DAJIN was published in Nat Methods, which uses also long-read sequencers, Nanopore or PacBio [PMID: 33432244]. It is necessary to describe the benefits of DAJIN against the previous study.
      2. In Figure 1a, the authors used Barcoding but details information is not present in the main text. The length and context information are necessary to be described in the main text.
      3. The term "SV (structural variation)" over "Single-nucleotide variant (SNV)" seems ambiguous. Does "SV" include large deletion and chromosomal translocation? In this manuscript, I guess that SNV indicates small indels, whereas SV indicates large indels. The detailed definition is needed for better understanding.
      4. In Figure 2, IGV exhibits several SNVs (i.e., random errors) in each query sequence, which might be due to the low accuracy of Nanopore sequencing. I understand that DAJIN makes consensus sequence based on those long-read sequences. But I wonder how DAJIN pinpoint the point mutation (PM) so exactly?
      5. In page 9, the authors also used next-generation sequencing (NGS). I guess this NGS indicates illumine-based short-read sequencing. Clearer definition is necessary here.
        • 5-1. Whereas DAJIN could reported SVs, PM, and WT, the NGS could not capture SVs. Could you write the reason here? I guess that the short-read sequences including SVs might be discarded during the alignment process, which means that it is because of software limitation, rather than the NGS itself.
      6. Basically, DAJIN amplify the target region using PCR, thus PCR bias (e.g. unequal amplification according to different lengths) should be considered. The authors should address it. Moreover, it is better to describe the limitation of current DAJIN in the discussion section.

      Significance

      Overall, the topic will be interesting for broad readers

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The study by Akihiro and colleagues describe the generation of multiplex genotyping method for detecting CRISPR gene editing alleles using nanopore sequencing and a machine learning program. The method is based on long-range PCR amplification of intended targeted loci from gene edited animals followed by nanopore sequencing. A PCR-index is introduced to the sample pooling system before sequencing, thus allow sequencing up to 100 sample in one flowcell. The study developed a machine learning program for allele binning, analysis, and presentation. To demonstrate the applicability of the method, the study has validated their methods for detection of point mutations, deletion, and flox insertion. The study has in principal provided sufficient investigation and data to demonstrate the validity of the method. All the figures are very nicely and clearly presented. However, there is a few concerns that it should be taken in to consideration.

      1. Many previous reported unintended structure variations caused by CRISPR off-targets are typically much larger deletion/insertion/insertion/translocation occurred outside the target sites. The current study is more for targeted allele genotyping. The use of structure variable (SV) in the whole study should be considered to revise thoroughly.

      SV is typically referred to genomic variation of approximately 1kb and above. What the study describe in this study is still within indel types instead. Thus, comparing the DAJIN with NanoSV and Sniffles on reads with 50, 100 and 200 bases deletions is not appropriate.

      The detection of SV alleles in the whole study is most likely a cause of minor indel alleles and sequencing errors. Figure 2b, BC32, WT mice also contains a proportion of SV allele, which is apparently caused by sequencing error. Such SV which is not related to CRISPR gene editing is also seen in other genotyping results e.g. Figure 3a. Figure 4b, Figure 5c, Figure 6b.

      Another co-factor that contributes to the SVs is the PCR-error from the method.

      1. The reason that current method detect more than two alleles from one animal is probably due to the chimerism of the animal. However, when looking at the BAM file and figures presented in Figure 1b, 2c, 3b, 3d, 4c, as well as those in the Supplementary figures, there seems to be more than one allele (indels reads with different size) presented in one category.

      For example, Figure 2C, mice BC12, it is not fully aligned between the all alleles and the allele1 and allele 2 presented. For allele 1, which is called SV, there are reads with different size of indels. For allele 2, which is called intended PM, some reads are a hybrid of deletion and intended substitution.

      1. What is the advantage of the current method as compared to the one reported by Bi et al., 2020, genome biology, previously?

      2. The report machine learning method is developed for calling the different alleles. Has the authors compare DAJIN with e.g. NanoCaller, which is developed for SNPs and small indels calling based on DNN.

      3. Apart from genotyping, many CRISPR studies performed in cells are focusing on profiling the indel profiles in a pool of edited cells. It would broaden the applicability of the method for detecting different indels types in such samples and conditions. Current methods, such as TIDE/ICE, NGS-based amplicon sequencing, IDAA can only detect smaller indels. DAJIN will add the advantage of detecting longer indels for such application.

      Significance

      Although similar methods are reported for genotyping of the CRISPR editing outcome, the current study introduce the PCR barcoding and particularly the bioinformatic tool box for allele binning and calculation contribute with useful tool to the filed. The study has demonstrated with multiple applications demonstrating the broad applicability of it.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: #RC-2021-00992

      Corresponding author(s): Parisa Kakanj and Maria Leptin

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors use the fruit fly as a model to understand the role and regulation of autophagy in epidermal integrity during development and wound healing. They discover that hyper activation of autophagy via overexpression of Atg1 leads to disruption of epithelial organization, junctional protein localization, and syncytium formation. In addition, these epidermal defects were found to be dependent on TORC1 as knockdown or inhibition of TORC1 antagonists resulted in similar epidermal defects which could be rescued by knockdown of Atg1 or Atg5. Wound healing in fruit fly epidermis is known to induce cell fusion and here the authors show that syncytium formation is dependent on autophagy. GFP-Atg8a autophagosomes were found to accumulate in cells adjacent to the wound site, but Atg1-induced syncytium formation was dispensable for wound repair. However, the authors found that hyper activation of autophagy prior to injury slowed wound closure. This may be due to defects in actomyosin organization or another developmental defect the authors observed in the epidermis. Overall, the key conclusions of this study are convincing, but the experiments would be strengthened by validation of all the RNAi strains used as well as demonstration that epidermal barrier remains intact as described.

      **Major Comments**

      1. This study uses a number of UAS-RNAi strains as well as dominant negative and overexpression transgenes. There is no validation that these genetic perturbations work as expected.

        Almost all of the lines we use have been extensively used and validated by others as documented in the literature. We append a table (below, page 14) with these references. It would be close to impossible for us to show their tissue specific efficacy in the larval epidermis because it is extremely difficult to obtain clean dissections of epidermis without contamination from other tissues (muscles, nerves, etc.), and we believe we can rely on the known validation of most of the lines. It is true that some of the lines are less well characterised, and we comment on those below, and will eliminate our speculation on their effects in the manuscript.

      In fact, the authors state on pg 5 that RNAi to Atg6, Atg7, and Atg12 may be less effective, but do not verify the knockdown efficiency to the gene of interest (i.e. Atg5 RNAi knock downs Atg5 transcript or protein).

      Atg12 and Atg7 have been shown (PMID: 25882046) by quantitative RT-PCR to effectively reduce RNA levels in the midgut during larval to pupal transition. We will therefore have to eliminate our speculation that the weak effect in the epidermis may be due to ineffective knock-down. Rather, it seems that these components are accessory but not necessarily essential for the completion of autophagy, as also observed by others (PMID: 25882046, PMID: 1805642, PMID: 23599123, PMID: 15296714, PMID: 23873149, PMID: 23406899)

      This is particularly important as authors use a single UAS-rictor RNAi strain to conclude that autophagy is dependent on TORC1 and not TORC2. If rictor RNAi is also weak or ineffective than this conclusion would be erroneous.

      The function of rictor has been validated by classic genetics: Animals homozygous for deletions of rictor show no defects throughout their normal life cycle (Hietakangas and Cohen, 2007). We have also shown that epidermis of homozygous rictor∆1 larvae (marked with Src-GFP, DsNuc-Red2) shows no abnormalities in cell shapes or cell membranes. We include an image here.

      Figure A __| Effect of rictor deletion on the epidermis. a,b, Fluorescence micrographs of larval epidermis expressing the indicated markers in a larva homozygous for a rictor deletion (rictorEY08986 , also named rictor∆1). a, Lower magnification showing the entire width of larval segments A3 or A4. n=16-18 larvae each genotype. Scale bars: a 50 μm; b,__ 20 µm.

      A major conclusion of this study is that autophagy remodels the lateral cell membranes and not the basal or apical, so the membrane integrity remains intact. This is described and shown in Fig S3a, but it is hard to see that the apical membrane is intact. It would be helpful if authors could show a true membrane marker, such as UAS-CD8mGFP to see if there is a continuous membrane.

      We will include new experiments with this marker.

      Alternatively, is there a barrier assay that could help demonstrate that syncytium formation does not disrupt epithelial integrity?

      This follows from the fluorescence recovery we performed (Supplementary Video 13), where we observe rapid diffusion between areas in the epidermis, but never any leakage of fluorescence in the y-axis into the body cavity. We will emphasize this more clearly in the text.

      This could be performed in the fly gut, using the smurf assay (Rera M et al. 2011), since the authors also describe (pg 9) a similar role for autophagy in disruption of epithelial lateral membranes.

      We had done a smurf assay, and observed no leakage from the gut, but didn’t document this at the time because of difficulties during the period of Covid restrictions of accessing a dissecting scope/camera set up in a lab outside our own. We will try to repeat this now in the hope that with current reduced restrictions we can record the result.

      Is autophagy dependent syncytium formation cell autonomous?

      Our clonal analysis in wound healing addresses this point (Figure 2; text page 5 and 6). Clones of GFP-expressing cells neighbouring a wound share their cytoplasmic contents with their neighbours during wound closure. However, a clonal cell that is Atg5-deficient in a wild-type background does not share its content with the neighbouring cells. This shows that for a cell to participate in syncytium formation, that every cell itself has to be competent to perform autophagy. We will expand the explanation of this point in the text.

      The A58-Gal is not cell-type specific as authors describe (pg 9) similar effects in trachea, salivary glands, and intestine and it is unclear if effects are due to disruption of autophagy in epidermal cells or general disruption in fly's physiology. The authors should determine, using a more restrictive Gal driver, whether syncytium formation is due to activation of autophagy in the epidermal cells or another cell type (trachea, salivary glands, or intestine).

      We apologize if our phrasing of ‘ectodermal’ led to the impression that A58-Gal4 is cell-type specific. A58 also drives expression in the tracheal system, as all other available epidermal drivers do. A58 expression in the salivary gland is presumably due to the origin of the Gla4 construct, which like many other Gal4 drivers (e.g. NP1-Gal4) includes salivary gland specific enhancers (PMID: 8223268 and PMID: 12324947). A58 is not active in the gut, and for the experiments in the gut we used the NP1 driver. We will rephrase the text in the paper to avoid confusion. There is no driver that is absolutely restricted to the epidermis.

      Alternatively, if no other Gal4 is available for the larval epidermis then authors could at least show using enterocytes driver (NP1-Gal4) that overexpression of Atg1 is sufficient to induce syncytium formation and its effect on gut barrier integrity.

      We did do this experiment but didn’t include the images because of the large number of figures we already had. We now show them here. As mentioned above, barrier integrity is not compromised. We can also provide images of the phenotype in tracheal cells.

      Figure B __| Effect__ of uncontrolled autophagy on enterocytes and salivary glands. Larval gut or salivary glands expressing the indicated markers and overexpression (Tsc1,2 or Atg1S) or RNAi (raptori) constructs using the NP1-Gal4 driver. Images are from live imaging of gut or salivary gland of 6 to 11 larvae for each genotype. Scale bars, 20 µm.

      In Fig 8, authors nicely show that Atg1 RNAi can rescue Tor RNAi and raptor RNAi, but, what about the reverse. Is overexpression of Tor sufficient to inhibit the overexpression Atg1 and reduce autophagy-induced syncytium formation?

      Overexpression of Tor would affect both TORC1 and TORC2. We have done this experiment using UAS-Torwt construct but found that it leads to excessive autophagy rather than suppression, consistent with similar results reported by others (PMID: 12324961 and PMID: 15186745). This approach can therefore not be used to do the proposed experiment. Instead, one could use downregulation of the Tor inhibitor TSC1, which acts on TORC1, and we have shown to reduce autophagosome formation in wound healing (Fig. 1d). Another option is to overexpress the TORC1-specific activator Rheb (PMID: 12893813, PMID: 17208179 and PMID: 31422886). We will set up the experiments with these constructs in the hope that they will yield interpretable results.

      **Minor comments:**

      1. Check spelling of abbreviations, Sqh is often misspelled Shq in figures

        We will correct them. Thanks for alerting us.

      The order of images in Figure 3 should match the description in the text (pg. 6).

      We would prefer to retain the current order because it is then consistent with all the other figures. Re-writing the text to reflect this order would make it less clear.

      AtgW is described in text, but not shown in Fig 3a-c. Also, upstream activators of TORC1 are described first, but shown last in this Figure making it difficult to follow.

      We will now only mention Atg1W later in the text where we also show it in a figure.

      Fig7a should show junctional effect of Atg1W alone and in combination with Atg5i which is used in 7b.

      We had left this out to save space, but we will now include these data.

      It is unclear why authors switched to this weak overexpression for this photobleaching assay when Atg1S was predominantly used in the rest of the study.

      The reason we used Atg1W in this particular experiment is that we had it on a chromosome where it was recombined with GFP which made it genetically much easier to use for FLIP experiments. However, perhaps these constructs merit some discussion. Atg1W and Atg1S were originally called “weak” and “strong” based on studies in other tissues and other stages (PMID: 33253201). However, we found that in the epidermis their effects are practically indistinguishable, as judged by TEM (Fig.3d,e) (Fig 5e,f) (Suppl. Fig. 5a,b and Suppl. Fig. 6b,c), and all markers we used in confocal analyses (which we will include them). Thus, to avoid confusion, we will change the nomenclature we use on our paper to the neutral Atg1GS and Atg16B.

      Reviewer #1 (Significance (Required)):

      This study elucidates the role and regulation of TORC1 and autophagy in epithelial membrane remodeling. This is important work that is significant to both developmental and wound healing research. Many cell types become multinucleate during differentiation, aging, and wound healing and here the authors find a novel role for authophagy in remodeling lateral cellular junctions to facilitate syncytium formation.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their present manuscript Kakanj and colleagues show that during epithelial wound healing autophagy pathway controls plasma membrane integrity and homeostasis. Furthermore, elevated autophagic activity is sufficient to induce syncytium formation, which is essential for wound closure and healing. Authors used the epidermis of fruit fly larvae as model to study wound healing and video microscopy to examine this process. The methodology is well established, since authors already published a related study in 2016 using similar tools.

      The findings presented here are interesting and promising, the quality of most experiments are satisfactory, the confocal images/videos are excellent and I truly appreciate that authors used electron microscopy to support some of their claims. Findings are well presented and the text is well written and easy to read.

      Overall, my opinion is very positive about this manuscript.

      I believe most of the findings are very well supported, but I have some suggestions, which may can help strengthen the authors' points.

      1) Authors used GFP-Atg8a reporter to follow autophagy during wound healing. While I also believe that, the appearing GFP-Atg8a dots represent autophagic vesicles after wounding but GFP-Atg8a has some certain limitations. First: Atg8a (or LC3 in mammals) is removed from the outer surface of autophagosomes by Atg4 and the Atg8a trapped inside the autophagosomes will be degraded in the autolysosomal lumen. Thus Atg8a not always localizes to autolysosomes, actually Atg8a immunostaining mostly labels autophagosomes (and phagophores) but not autolysosomes in insect cells. Accordingly, GFP-Atg8a reporter is also subject of autolysosomal degradation and furthermore most of the GFP signal is quenched in the acidic lumen of autolysosomes, since at lower pH GFP loses fluorescence. Nevertheless, if lysosomal degradation proceeds normally, GFP-Atg8 will be degraded completely. Thus, some of the autolysosomes cannot be detected using this reporter, for this mCherry-Atg8a reporters can be used, since mCherry is more resistant than GFP and thus accumulate inside lysosomes, and retains its fluorescence in acidic environments.

      This is a good suggestion and we had done these experiments. However, the red fluorophores have a serious problem in that they all tend to form small aggregates or puncta – not in all tissues and at all stages, but this is a very wide-spread phenomenon, and is even observed in in vitro experiments (own observations). This makes quantification of vesicles or other small structures such as autophagosomes complete impossible. Nevertheless, here are a few figures from our analyses. While some of the spots clearly appear to be autophagosomes, as judged by their positions, they cannot be objectively distinguished from the other spots.

      Figure C __| Autophagy during epidermal wound healing. Time-lapse series of single-cell wound healing in larva expressing mCherry-Atg8a (black) to mark autophagosomes and autolysosomes (A58>mCherry-Atg8a). a, z-projections of a time-lapse series. b, Higher magnification of the areas marked by magenta boxes in (a). n=11 larvae. Each frame is a merge of 57 planes spaced 0.28 μm apart. Scale bars: a 20 μm; b,__ 10 µm.

      However, I still believe that for video microscopy GFP-Atg8a was a perfect choice, I just suggest to confirm the appearance of autophagosomes after wounding by other means: for instance, immunostaining of the epidermis after wounding (120 min) against Atg8a should confirm the presence of autophagosomes. There are a few specific available antibodies working in flies which are listed in the reviews of Nagy (PMID: 25481477) or more recently in Lorincz (PMID: 28704946)

      This is technically a huge challenge. We would have to induce a single cell wound, then filet and fix the epidermis, during which it rolls up and often destroys the area of interest. If it doesn’t, then the prep can be flattened out, but it still can be very difficult to find the wound in the prep.

      2) One of the major claims of the authors is that elevated autophagy leads to the breakdown or removal of lateral plasma membranes to promote syncytium formation. It is clearly seen on the confocal or EM images that lateral membranes disappear after wounding. However, it is also suggested that the lateral plasma membrane material is incorporated into autophagosomes or plasma membrane is a potential membrane source of autophagosome formation. I believe this is the least supported claim of the manuscript since no direct evidence for this is presented. This is based on BodyPy staining only, that BodyPy positive vesicles accumulate inside the cells. If this is indeed the case plasma membrane components should be detected in autophagic vesicles. Thus, I recommend co-staining membrane components with autophagic markers.

      This is indeed the clear next step, and we did a number of experiments along those lines, but they were once again compromised by the problem with the mCherry aggregates. This made the interpretation in the unwounded epidermis with artificially upregulated autophagy impossible. However, experiments with naturally upregulated autophagy in wound healing yielded results that are consistent with plasma membrane components being associated with autophagosomes (with the caveat that not every red dot we see represents an autophagosome). We have just repeated some of these using the septate junction marker FasIII and have obtained some beautiful movies that show FasIII labelled membrane (green) being surrounded by mCherry spots, and as the membrane begins to dissociate, the mCherry spots turn from red to yellow. We have included stills from results of these analyses here and will include them in a new figure in the revised manuscript.

      Figure D __| Colocalization of Atg8a and the septate junction component FasIII during epidermal wound healing. a, Time-lapse series of single-cell wound healing in a larva expressing mCherry-Atg8a (red) (A58>mCherry-Atg8a) and endogenously tagged FasIII (GFP gene trap; green), a transmembrane component of septate junctions. b, Higher magnification of the time-lapse marked by magenta boxes in (a). n=11 larvae. a,b, Each frame is a merge of 68 planes spaced 0.28 μm apart. Scale bars: a,b __20 μm.

      However if authors observe no colocalization of plasma membrane components with autophagy markers I still believe this study worth to be published. I would like to recommend the review of Ungermann and Reggiori (PMID: 29966469) in which the trafficking of Atg9 is discussed,

      Yes, indeed. And there is in fact now a further paper that goes in a similar direction (PMID: 34257406). We had left this out because we did not have direct data on Atg9, but will be happy to include it in the discussion in which we cite the paper that shows that Drosophila Atg9 is localized on the lateral plasma membrane in nurse cells, and loss of it leads to syncytium formation.

      since the source of autophagosomal Atg9 is in part the plasma membrane in mammalian cells. Therefore, these findings may strengthen the authors' claims.

      **Minor points:**

      Figure 2A: I believe authors wanted to use the word 'maintaining' not mating in their scheme.

      Indeed. Thanks for alerting us.

      Discussion: Authors suggest that: another function of autophagy in the cells surrounding the wound may be to clear up debris as in planarian and other cell types autophagy is activated in healthy cells, which simultaneously phagocytose cell debris. Honestly, I do not believe that this is the case here. Some of the Atg proteins are indeed required for phagocytosis during LC3-assiciated phagocytosis (LAP) (see: PMID: 30787029), but LAP is independent form Atg1

      Good point, we will include this in the discussion.

      and if LAP happened in the cells, surrounding the wound then GFP-Atg8a positive phagosomes would appear in those cells. However, it is clearly not the case here.

      Reviewer #2 (Significance (Required)):

      I highly recommend this manuscript to be uploaded to a relevant journal and I believe the findings presented here will be interesting for biologists specialized in regeneration and readers from the autophagy fields alike.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The larval epidermis of Drosophila is a prime model for studying wound healing by combining live imaging with cellular, genetic and molecular analysis of the processes involved. Autophagy is known to be activated and necessary for efficient wound healing in animal models through secretion of cytokines and clearance of bacteria. This manuscript implicates autophagy in cellular syncytium formation during wound healing. Live imaging demonstrates autophagy activation in cells surrounding the wound. Inhibition of autophagy by RNAi against atg1 or atg5, required for autophagy initiation and autophagosome formation had no effect on the rate of constriction and closing of the wound site. However, elegant live imaging demonstrates that autophagy is required cell autonomously for cell fusion, a normal process during wound healing in flies. Autophagy can also be instructive for cell fusion. Strong induction of autophagy by TORC1 inhibition, TSC1/2 overexpression or Atg1 overexpression induce cell fusion that is genetically dependent on atg5, a gene acting downstream of atg1 in autophagosome formation. As Chloroquine treatment, a chemical inhibiting autophagosome fusion to the lysosome and lysosomal breakdown showed no effect, the authors suggest that later steps of autophagy are not involved. Live imaging with a selection of cellular fluorescently tagged markers of apical, lateral and basolateral membrane domains, combined with electron microscopy show clearly that lateral membrane are disrupted and removed within the epithelium. During this process, membranous large vesicles "drift" away from the plasma membrane. If these vesicles relate to autophagy is not addressed. In addition to the effect on cell fusion, strong autophagy induction also leads to autophagy within the nucleus, chromatin condensation and distortion of the nuclear membrane. The manuscript is well written and easy to follow. Figure panels and data are clearly presented. All experiments are well described throughout and skillfully executed with appropriate controls and statistical analysis. It remains unknown what induces autophagy in response to wounding. It also remains unclear whether autophagy deconstructs or engulfs parts of the plasma membrane, or if parts of the autophagy machinery has additional roles in plasma membrane fusion.

      **Major comments:**

      • Are the key conclusions convincing? -Conclusions are generally balanced and convincing.

      -I have seldom seen a paper so well written, presented and balanced by first pass. Hence my experimental suggestions are few.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? -Claims are well founded.
      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary to evaluate the paper as it is, and do not ask authors to open new lines of experimentation.

        -The inhibition of autophagy is performed using knockdown of two genes acting in autophagy initiation (atg1, a part of the ULK1 kinase complex) and atg5, required for autophagosome formation. Later acting genes in the autophagy process such as autophagosome closure, fusion with the lysosome or degradation were not analyzed. In the abstract, the authors state "Proper functioning of TORC1 is needed to prevent autophagy from destroying the larval epidermis which depends on membrane isolation and phagophore expansion, but not fusion of autophagosomes to lysosomes". As far as I can see, the last statement on fusion derives from experiments with Chloroquine. Although frequently used for qualitative experiments, CQ is not suited for conclusive experiments. Without genetic experiments targeting genes for autophagosome-lysosome fusion such as snap29,stx17,vamp7 this statement is in my mind not well supported.

      We agree this would strengthen our findings, and we had indeed ordered these strains from the Bloomington stock collection. However, they were dead on arrival and both our labs in Heidelberg and Cologne currently have major problems with shipments from Bloomington and German customs. Other colleagues whom we asked did not have them available either. We will continue to search for appropriate constructs, but even if we find them and they arrive alive, and are processed by customs within a reasonable time, it will take many weeks to establish and then expand them and subsequently do the multi-generation crosses to obtain the stocks with all the relevant drivers and markers to set up the experiment. Three months is the absolute lower limit provided everything works according to plan, and first time round 6 months is a more realistic assumption. We hope that the referees and the editors agree that while this is a desirable experiment, it is not essential for the publication of the other results we present.

      • Are the suggested experiments realistic for the authors? It would help if you could add an estimated cost and time investment for substantial experiments. -Given the expertise of the authors, these experiments should be easy to perform within 3 months.

        • Are the data and the methods presented in such a way that they can be reproduced?

        • The manuscript is well written and an excellent example of how how methods and experiments should be presented. Methods, tools and experiments are all well described.

        • Are the experiments adequately replicated and statistical analysis adequate? -Replicates and statistics are adequate and custom for the type of analysis performed.

        **Minor comments:**

      • Specific experimental issues that are easily addressable. Figure 3 h. The live imaging documents the striking disappearance of lateral cell membranes using SRC-GFP. In 3h, large vesicle formation and movement towards the cell interior is shown. How frequent is this?

      This can only be seen clearly in experiments with time-controlled (Gal80ts) induction of authophagy where we can observe the process unfolding. We see these structures very frequently, but great variability in morphology and the structures are not always captured clearly in the plane of imaging. We here provide further examples.

      Figure E __| Autophagy in unwounded epidermis. a-c, Three additional examples showing apparent extrusions from lateral membranes after induction of autophagy (same experiment asn Figure 3h).__ Time-lapse series of epidermal cells expressing Src-GFP and Atg1S. Transgene expression is induced at the end of the second larval instar, live imaging started 6 h later (t=0) and continued for an additional 6 hours. a-c, Src-GFP containing material appears to be taken out of and eventually detached from lateral cell membranes (arrows).

      Is this believed to be the mechanism of lateral membrane removal?

      We would of course like to believe that, but we have no proof, and would therefore only be able to speculate.

      If so, is it dependent on the autophagy machinery. Are these vesicle positive for autophagy markers?

      Some autophagy markers have indeed been reported to be associated with the plasma membrane (e.g. Atg9, Atg16), but a conclusive study, while highly desirable, in our view goes beyond the scope of this study.

      Resolving this issue may lift the conclusions of the paper. Using 3xCherry-Atg8 together with SRC-GFP, this should be possible.

      We are intrigued by this suggestion and will be setting up the necessary crosses to do the experiments. However, it will take a long time to generate the necessary stocks (see genetics described below), and we will then again encounter the problem with the mCherry aggregates (see response to referees # 2). We are curious about the outcome, but we do not think it will be reasonable to promise as part of this revision that we will be able to provide conclusive results in the foreseeable future. Along with the many other things to do, this may just have to become part of a future paper, especially if there turn out to be other problems to be solved along the way. Like, for example, having to make an infrared (like mIFP or mKate, with which we have had much better experience in other contexts) Atg8 construct.

      Using CQ, the authors should be able to detect plasma membrane and junctional components in autophagosomes or autolysosomes (by confocal and electron microscopy) as degradation is inhibited. This should help to distinguish whether lateral membranes are engulfed and digested or if cells simply fuse, by using a part of the autophagy machiney.

      We have many interesting EM images on which we have had extensive discussions with the Paolo Ronchi and Yannick Schwab at the EMBL (whom we embarrassingly forgot to acknowledge in our manuscript, which will now be corrected), and one of the authors of this paper (BM) is an expert in EM imaging of the larval epidermis. It was agreed that some structures could indeed be interpreted as autophagosomes with content resembling junctional material. However, in the absence of absolute proof, we did not include them in the paper. We now show them here.

      Figure F __| Autophagosomes with junctional material in unwounded epidermis.__ Transmission electron micrographs of sections through the epidermis of a larva with elevated autophagy (A58>Atg1S) at two different magnifications. Arrows mark the autophagosomal membrane with content resembling junctional material.

      The authors, state that strong autophagy activation also leads to syncytium formation of tracheal cells, salivary glands and gut EC cells. Representative images in a supplementary figure would be useful for future reference.

      See response to other comments above (response to referees # 1). We have added some images in this document (Figure B) and will be happy to add additional ones in the revised manuscript.

      • Are prior studies referenced appropriately? -Yes. Key literature and findings are cited and discussed.

      • Are the text and figures clear and accurate? -Yes

        • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      -See suggested experiments above.

      Reviewer #3 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. -The findings clearly documents a role of autophagy in syncytium formation in the physiological process of wounding. This has parallels to muscle syncytium formation, but has to my knowledge not been demonstrated in any other cell type to be performed by autophagy. Moreover, the authors show that strong autophagy induction can lead to fusion of epithelial cells. This may have relevance for processes and diseases where polyploidy are observed.

      • Place the work in the context of the existing literature (provide references, where appropriate).

      • State what audience might be interested in and influenced by the reported findings. -The data are very strong and the demonstration that autophagy controls syncytium formation outside of muscle development is surprising and significant. It is of interest to the field of cell biology and development in general and the autophagy field in particular. It will also be of interest for the medical field that deals with multinuclear phenotypes, such as cancer.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. -Development, cell signaling, autophagy, vesicle trafficking.

      Table 2 | Fly stocks used in experiments

      Transgenes

      Stock ID

      Source

      Publications using this construct

      Reference

      UAS-GFP-Kuk

      (UAS-GFP-KukEY07696(w+))

      Jörg Großhans

      PMID: 16421189

      https://flybase.org/reports/FBal0161312

      29

      UAS-Atg1i

      (UAS-Atg1RNAi)

      V # 16133

      (GD7149)

      PMID: 19363474

      PMID: 31995752

      PMID: 32032548

      PMID: 32915229

      https://flybase.org/reports/FBtp0034071.html

      UAS-Atg5i

      (UAS-Atg5RNAi)

      V # 104461

      (KK108904)

      PMID: 31995752

      PMID: 32032548

      https://flybase.org/reports/FBtp0046851.html

      UAS-Atg6i

      (UAS-Atg6RNAi)

      V # 110197

      (KK102460)

      PMID: 28581519

      PMID: 23599123

      PMID: 27542914

      PMID: 25644700

      Dissertation of Philipp Trachte, Abb. 23. https://refubium.fu-berlin.de/handle/fub188/27709

      Dissertation of Sirena Soriano Rodríguez. https://roderic.uv.es/bitstream/handle/10550/50749/Tesis%20SSoriano.pdf?sequence=1

      UAS-Atg7i

      (UAS-Atg7RNAi)

      V # 45558

      (GD11671)

      PMID: 25882046

      PMID: 31995752

      PMID: 32032548

      PMID: 23599123

      https://flybase.org/reports/FBtp0025106.html

      UAS-Atg12i

      (UAS-Atg12RNAi)

      V # 29791

      (GD15230)

      PMID: 25882046

      PMID: 17568747

      PMID: 31995752

      https://flybase.org/reports/FBtp0027770.html

      UAS-TSC1,2

      (UAS-TSC1, AUS-TSC2)

      Iswar K. Hariharan

      PMID: 15296714

      PMID: 11348592

      64

      UAS-TSC1i

      (UAS-TSC1RNAi)

      V # 22252

      (GD11836)

      PMID: 23144631

      PMID: 29144896

      PMID: 29456138

      https://flybase.org/reports/FBtp0025266.html

      UAS-Tori

      (UAS-TorRNAi)

      BL # 33951

      Nobert Perrimon

      PMID: 25882046

      PMID: 26395483

      https://flybase.org/reports/FBtp0065159.html

      65

      UAS-TORDN

      (UAS-TORTED)

      BL # 7013

      Thomas P. Neufeld

      PMID: 15296714

      PMID: 29144896

      https://flybase.org/reports/FBtp0016360.html

      66

      UAS-raptori

      (UAS-raptorRNAi)

      BL # 34814

      Nobert Perrimon

      PMID: 25882046

      PMID: 31048465

      https://flybase.org/reports/FBtp0068814.html

      65

      UAS-raptori-2

      (UAS-raptorRNAi)

      BL # 41912

      Nobert Perrimon

      PMID: 32097403

      https://flybase.org/reports/FBtp0081336.html

      65

      UAS-rictori

      (UAS-rictorRNAi)

      BL # 36699

      Nobert Perrimon

      PMID: 25882046

      https://flybase.org/reports/FBtp0070835.html

      65

      UAS-Atg1S

      (UAS-Atg16B)

      Thomas P. Neufeld

      PMID: 33253201

      https://flybase.org/reports/FBtp0041043.html

      67

      UAS-Atg1W, UAS-GFP

      (UAS-Atg1GS10797)

      Thomas P. Neufeld

      PMID: 33253201

      https://flybase.org/reports/FBal0216676.html

      67

      UAS-S6Ki

      (UAS-S6KRNAi)

      BL # 41895

      Nobert Perrimon

      PMID: 25284370

      https://flybase.org/reports/FBtp0080798.html

      65

      UAS-SqaKA

      (UAS-SqaT279A/CyO)

      Guang-Chao Chen

      PMID: 21169990

      https://flybase.org/reports/FBtp0071419

      30

      UAS-RhoAi

      (UAS-RhoARNAi)

      V # 12734

      (GD4726)

      PMID: 23853710

      PMID: 33789114

      https://flybase.org/reports/FBtp0031970.html

      UAS-Roki

      (UAS-RokRNAi)

      V # 104675

      (KK107802)

      PMID: 24995985

      PMID: 33789114

      https://flybase.org/reports/FBtp0046110.html

      UAS-RhebAV4

      BL # 9690

      Fuyuhiko Tamanoi

      PMID: 31909714

      PMID: 28829944

      https://flybase.org/reports/FBal0141561.html

      69

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The larval epidermis of Drosophila is a prime model for studying wound healing by combining live imaging with cellular, genetic and molecular analysis of the processes involved. Autophagy is known to be activated and necessary for efficient wound healing in animal models through secretion of cytokines and clearance of bacteria. This manuscript implicates autophagy in cellular syncytium formation during wound healing. Live imaging demonstrates autophagy activation in cells surrounding the wound. Inhibition of autophagy by RNAi against atg1 or atg5, required for autophagy initiation and autophagosome formation had no effect on the rate of constriction and closing of the wound site. However, elegant live imaging demonstrates that autophagy is required cell autonomously for cell fusion, a normal process during wound healing in flies. Autophagy can also be instructive for cell fusion. Strong induction of autophagy by TORC1 inhibition, TSC1/2 overexpression or Atg1 overexpression induce cell fusion that is genetically dependent on atg5, a gene acting downstream of atg1 in autophagosome formation. As Chloroquine treatment, a chemical inhibiting autophagosome fusion to the lysosome and lysosomal breakdown showed no effect, the authors suggest that later steps of autophagy are not involved. Live imaging with a selection of cellular fluorescently tagged markers of apical, lateral and basolateral membrane domains, combined with electron microscopy show clearly that lateral membrane are disrupted and removed within the epithelium. During this process, membranous large vesicles "drift" away from the plasma membrane. If these vesicles relate to autophagy is not addressed. In addition to the effect on cell fusion, strong autophagy induction also leads to autophagy within the nucleus, chromatin condensation and distortion of the nuclear membrane. The manuscript is well written and easy to follow. Figure panels and data are clearly presented. All experiments are well described throughout and skillfully executed with appropriate controls and statistical analysis. It remains unknown what induces autophagy in response to wounding. It also remains unclear whether autophagy deconstructs or engulfs parts of the plasma membrane, or if parts of the autophagy machinery has additional roles in plasma membrane fusion.

      Major comments:

      • Are the key conclusions convincing? -Conclusions are generally balanced and convincing. -I have seldom seen a paper so well written, presented and balanced by first pass. Hence my experimental suggestions are few.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? -Claims are well founded,

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary to evaluate the paper as it is, and do not ask authors to open new lines of experimentation.

      -The inhibition of autophagy is performed using knockdown of two genes acting in autophagy initiation (atg1, a part of the ULK1 kinase complex) and atg5, required for autophagosome formation. Later acting genes in the autophagy process such as autophagosome closure, fusion with the lysosome or degradation were not analyzed. In the abstract, the authors state "Proper functioning of TORC1 is needed to prevent autophagy from destroying the larval epidermis which depends on membrane isolation and phagophore expansion, but not fusion of autophagosomes to lysosomes". As far as I can see, the last statement on fusion derives from experiments with Chloroquine. Although frequently used for qualitative experiments, CQ is not suited for conclusive experiments. Without genetic experiments targeting genes for autophagosome-lysosome fusion such as snap29,stx17,vamp7 this statement is in my mind not well supported.

      • Are the suggested experiments realistic for the authors? It would help if you could add an estimated cost and time investment for substantial experiments. -Given the expertise of the authors, these experiments should be easy to perform within 3 months.

      • Are the data and the methods presented in such a way that they can be reproduced?

      • The manuscript is well written and an excellent example of how how methods and experiments should be presented. Methods, tools and experiments are all well described.

      • Are the experiments adequately replicated and statistical analysis adequate? -Replicates and statistics are adequate and custom for the type of analysis performed.

      Minor comments:

      • Specific experimental issues that are easily addressable. Figure 3 h. The live imaging documents the striking disappearance of lateral cell membranes using SRC-GFP. In 3h, large vesicle formation and movement towards the cell interior is shown. How frequent is this? Is this believed to be the mechanism of lateral membrane removal? If so, is it dependent on the autophagy machinery. Are these vesicle positive for autophagy markers? Resolving this issue may lift the conclusions of the paper. Using 3xCherry-Atg8 together with SRC-GFP, this should be possible.

      Using CQ, the authors should be able to detect plasma membrane and junctional components in autophagosomes or autolysosomes (by confocal and electron microscopy) as degradation is inhibited. This should help to distinguish whether lateral membranes are engulfed and digested or if cells simply fuse, by using a part of the autophagy machiney.

      The authors, state that strong autophagy activation also leads to syncytium formation of tracheal cells, salivary glands and gut EC cells. Representative images in a supplementary figure would be useful for future reference.

      • Are prior studies referenced appropriately?

      -Yes. Key literature and findings are cited and discussed.

      • Are the text and figures clear and accurate?

      -Yes

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      -See suggested experiments above.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      -The findings clearly documents a role of autophagy in syncytium formation in the physiological process of wounding. This has parallels to muscle syncytium formation, but has to my knowledge not been demonstrated in any other cell type to be performed by autophagy. Moreover, the authors show that strong autophagy induction can lead to fusion of epithelial cells. This may have relevance for processes and diseases where polyploidy are observed.

      • Place the work in the context of the existing literature (provide references, where appropriate).

      • State what audience might be interested in and influenced by the reported findings. -The data are very strong and the demonstration that autophagy controls syncytium formation outside of muscle development is surprising and significant. It is of interest to the field of cell biology and development in general and the autophagy field in particular. It will also be of interest for the medical field that deals with multinuclear phenotypes, such as cancer.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      -Development, cell signaling, autophagy, vesicle trafficking.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In their present manuscript Kakanj and colleagues show that during epithelial wound healing autophagy pathway controls plasma membrane integrity and homeostasis. Furthermore, elevated autophagic activity is sufficient to induce syncytium formation, which is essential for wound closure and healing. Authors used the epidermis of fruit fly larvae as model to study wound healing and video microscopy to examine this process. The methodology is well established, since authors already published a related study in 2016 using similar tools.

      The findings presented here are interesting and promising, the quality of most experiments are satisfactory, the confocal images/videos are excellent and I truly appreciate that authors used electron microscopy to support some of their claims. Findings are well presented and the text is well written and easy to read.

      Overall, my opinion is very positive about this manuscript.

      I believe most of the findings are very well supported, but I have some suggestions, which may can help strengthen the authors' points.

      1) Authors used GFP-Atg8a reporter to follow autophagy during wound healing. While I also believe that, the appearing GFP-Atg8a dots represent autophagic vesicles after wounding but GFP-Atg8a has some certain limitations. First: Atg8a (or LC3 in mammals) is removed from the outer surface of autophagosomes by Atg4 and the Atg8a trapped inside the autophagosomes will be degraded in the autolysosomal lumen. Thus Atg8a not always localizes to autolysosomes, actually Atg8a immunostaining mostly labels autophagosomes (and phagophores) but not autolysosomes in insect cells. Accordingly, GFP-Atg8a reporter is also subject of autolysosomal degradation and furthermore most of the GFP signal is quenched in the acidic lumen of autolysosomes, since at lower pH GFP loses fluorescence. Nevertheless, if lysosomal degradation proceeds normally, GFP-Atg8 will be degraded completely. Thus, some of the autolysosomes cannot be detected using this reporter, for this mCherry-Atg8a reporters can be used, since mCherry is more resistant than GFP and thus accumulate inside lysosomes, and retains its fluorescence in acidic environments. However, I still believe that for video microscopy GFP-Atg8a was a perfect choice, I just suggest to confirm the appearance of autophagosomes after wounding by other means: for instance, immunostaining of the epidermis after wounding (120 min) against Atg8a should confirm the presence of autophagosomes. There are a few specific available antibodies working in flies which are listed in the reviews of Nagy (PMID: 25481477) or more recently in Lorincz (PMID: 28704946)

      2) One of the major claims of the authors is that elevated autophagy leads to the breakdown or removal of lateral plasma membranes to promote syncytium formation. It is clearly seen on the confocal or EM images that lateral membranes disappear after wounding. However, it is also suggested that the lateral plasma membrane material is incorporated into autophagosomes or plasma membrane is a potential membrane source of autophagosome formation. I believe this is the least supported claim of the manuscript since no direct evidence for this is presented. This is based on BodyPy staining only, that BodyPy positive vesicles accumulate inside the cells. If this is indeed the case plasma membrane components should be detected in autophagic vesicles. Thus, I recommend co-staining membrane components with autophagic markers. However if authors observe no colocalization of plasma membrane components with autophagy markers I still believe this study worth to be published. I would like to recommend the review of Ungermann and Reggiori (PMID: 29966469) in which the trafficking of Atg9 is discussed, since the source of autophagosomal Atg9 is in part the plasma membrane in mammalian cells. Therefore, these findings may strengthen the authors' claims.

      Minor points:

      Figure 2A: I believe authors wanted to use the word 'maintaining' not mating in their scheme. Discussion: Authors suggest that: another function of autophagy in the cells surrounding the wound may be to clear up debris as in planarian and other cell types autophagy is activated in healthy cells, which simultaneously phagocytose cell debris. Honestly, I do not believe that this is the case here. Some of the Atg proteins are indeed required for phagocytosis during LC3-assiciated phagocytosis (LAP) (see: PMID: 30787029), but LAP is independent form Atg1 and if LAP happened in the cells, surrounding the wound then GFP-Atg8a positive phagosomes would appear in those cells. However, it is clearly not the case here.

      Significance

      I highly recommend this manuscript to be uploaded to a relevant journal and I believe the findings presented here will be interesting for biologists specialized in regeneration and readers from the autophagy fields alike.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study, the authors use the fruit fly as a model to understand the role and regulation of autophagy in epidermal integrity during development and wound healing. They discover that hyper activation of autophagy via overexpression of Atg1 leads to disruption of epithelial organization, junctional protein localization, and syncytium formation. In addition, these epidermal defects were found to be dependent on TORC1 as knockdown or inhibition of TORC1 antagonists resulted in similar epidermal defects which could be rescued by knockdown of Atg1 or Atg5. Wound healing in fruit fly epidermis is known to induce cell fusion and here the authors show that syncytium formation is dependent on autophagy. GFP-Atg8a autophagosomes were found to accumulate in cells adjacent to the wound site, but Atg1-induced syncytium formation was dispensable for wound repair. However, the authors found that hyper activation of autophagy prior to injury slowed wound closure. This may be due to defects in actomyosin organization or another developmental defect the authors observed in the epidermis. Overall, the key conclusions of this study are convincing, but the experiments would be strengthened by validation of all the RNAi strains used as well as demonstration that epidermal barrier remains intact as described.

      Major Comments

      1. This study uses a number of UAS-RNAi strains as well as dominant negative and overexpression transgenes. There is no validation that these genetic perturbations work as expected. In fact, the authors state on pg 5 that RNAi to Atg6, Atg7, and Atg12 may be less effective, but do not verify the knockdown efficiency to the gene of interest (i.e. Atg5 RNAi knock downs Atg5 transcript or protein). This is particularly important as authors use a single UAS-rictor RNAi strain to conclude that autophagy is dependent on TORC1 and not TORC2. If rictor RNAi is also weak or ineffective than this conclusion would be erroneous.
      2. A major conclusion of this study is that autophagy remodels the lateral cell membranes and not the basal or apical, so the membrane integrity remains intact. This is described and shown in Fig S3a, but it is hard to see that the apical membrane is intact. It would be helpful if authors could show a true membrane marker, such as UAS-CD8mGFP to see if there is a continuous membrane. Alternatively, is there a barrier assay that could help demonstrate that syncytium formation does not disrupt epithelial integrity? This could be performed in the fly gut, using the smurf assay (Rera M et al. 2011), since the authors also describe (pg 9) a similar role for autophagy in disruption of epithelial lateral membranes.
      3. Is autophagy dependent syncytium formation cell autonomous? The A58-Gal is not cell-type specific as authors describe (pg 9) similar effects in trachea, salivary glands, and intestine and it is unclear if effects are due to disruption of autophagy in epidermal cells or general disruption in fly's physiology. The authors should determine, using a more restrictive Gal driver, whether syncytium formation is due to activation of autophagy in the epidermal cells or another cell type (trachea, salivary glands, or intestine). Alternatively, if no other Gal4 is available for the larval epidermis then authors could at least show using enterocytes driver (NP1-Gal4) that overexpression of Atg1 is sufficient to induce syncytium formation and its effect on gut barrier integrity.
      4. In Fig 8, authors nicely show that Atg1 RNAi can rescue Tor RNAi and raptor RNAi, but, what about the reverse. Is overexpression of Tor sufficient to inhibit the overexpression Atg1 and reduce autophagy-induced syncytium formation?

      Minor comments:

      1. Check spelling of abbreviations, Sqh is often misspelled Shq in figures
      2. The order of images in Figure 3 should match the description in the text (pg. 6).<br> AtgW is described in text, but not shown in Fig 3a-c. Also, upstream activators of TORC1 are described first, but shown last in this Figure making it difficult to follow.
      3. Fig7a should show junctional effect of Atg1W alone and in combination with Atg5i which is used in 7b. It is unclear why authors switched to this weak overexpression for this photobleaching assay when Atg1S was predominantly used in the rest of the study.

      Significance

      This study elucidates the role and regulation of TORC1 and autophagy in epithelial membrane remodeling. This is important work that is significant to both developmental and wound healing research. Many cell types become multinucleate during differentiation, aging, and wound healing and here the authors find a novel role for authophagy in remodeling lateral cellular junctions to facilitate syncytium formation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      The manuscript is interesting and well presented. The authors propose the use of an antifibrotic drug to attenuate resistance to RTK inhibitors.

      \*Specific comments***

        • It is not entirely clear how Nintedanib decreases tumour growth. It may be due to its effect on resistant melanoma cells as proposed, but it could also be due to the effect on CAFs. This should be at least discussed. *

      The reviewer asks about a potential effect of Nintedanib on CAFs in our mouse model. While we show that Nintedanib has a direct action on melanoma cells in vitro, the in vivo situation can indeed be more complex. We agree that we cannot rule out the possibility that its therapeutic efficacy could be attributed in part to inhibition of CAFs, knowing that BRAF inhibitors has been shown to activate CAFs in melanoma, generating a host-tumor niche that can mediate therapeutic escape. However, addressing the contribution of CAF in vivo is challenging and would represent an entire new study. As requested by the reviewer, we have discussed this important issue and added 3 new references (see discussion section lines 377-381).

      • A potential caveat is that drug used is non-specific as it also blocks PDGFR signalling. Hyperactivation of RTKs is a mechanism of BRAFi resistance and for example in Figure 1J, they see that BIF1120/Nintedanib has a significant effect on BRAFi-resistant cells, which may indicate that the growth inhibition seen in allografts could be a combination of an "anti-fibrotic" role and its own activity inhibiting the survival of resistant cells. This needs to be considered.*

      We thank the reviewer for this interesting issue. Nintedanib was chosen due to its inhibitory action on extracellular matrix deposition and as an example of a rapidly available drug to be exploited therapeutically to increase the effect of targeted therapy and delay the emergence of therapy-resistant cells. We recognize that a possible disadvantage of Nintedanib could be due to its multi-targeted nature (e.g. PDGFR (α and β), FGFR-1, -2, -3, -4 and VEGFR-1, -2, -3 as well as Src, Lck or Lyn) but it is one of the only approved molecules for the treatment of fibroproliferative diseases. Upregulation of PDGFRβ/AKT signaling was previously shown to contribute to acquired resistance in M238R (Shi et al. Cancer Res. 2011;71:5067-74 ; Nazarian et al. Nature. 2010;468:973-7). Our in vitro results indicate that Nintedanib inhibits survival of these resistant cells along with a decrease in their myofibroblast-like dedifferentiated phenotype (Fig. 1 I-J).

      To meet the reviewer’s comment, we have now addressed the contribution of PDGFRβ inhibition in Nintedanib’s effects on resistant cells. We have performed experiments on M238R using the selective PDGFR inhibitor CP673451 in comparison with Nintedanib (please see results section lines 120-127 and new Supplementary Fig. S1F-H). The data show that selective inhibition of the PDGFR pathway attenuates the myofibroblast-like signature typical of resistant cells to a similar degree as Nintedanib and affects melanoma cell viability (new Supplementary Fig. S1G-H). However, administration of CP673451 showed less efficiency than Nintedanib in inducing a phenotype switch toward a more differentiated phenotype (new Supplementary Fig. S1G). To further confirm the implication of RTK pathway in the phenotype observed, we analyzed the tyrosine phosphorylation status of EGFR, PDGFR and FGFR (another RTK inhibited by Nintedanib) and activation of AKT in M238R melanoma cells upon treatment with Nintedanib or CP673451 (new Supplementary Fig. S1F and additional results for the reviewers). Nintedanib had no effect on FGFR tyrosine phosphorylation and slightly decreased pEGFR levels. However, we found that the two inhibitors showed similar efficiency in decreasing phospho-PDGFRβ and phospho-AKT levels (Supplementary Fig. S1F). The results section has been modified according to these new results (lines 126-127).

      Altogether these data suggest that inhibition of PDGFR signaling likely plays a prominent role in the efficacy of Nintedanib in vitro on M238R survival. Thus, as proposed by the reviewer, we can predict that the growth inhibition induced by Nintedanib seen in vivo could be a combination of its "anti-fibrotic" action and PDGFR inhibitory activity inhibiting the survival of resistant cells. It is important to note that, compared to Nintedanib, inhibition of PDGFR/AKT signaling by the CP673451 compound is not sufficient to direct melanoma cells to a more differentiated state. This is now discussed in the manuscript (Discussion section lines 404-405).

      • Does the viability decrease in BRAFi-sensitive cells? For instance, in the parental cells?*

      This information was already addressed in the manuscript. As shown in Supplemental Fig. S1D, Nintedanib had no effect on BRAFi-sensitive M238P viability. We have also confirmed this result using a crystal violet viability assay on M238P and UACC62 cells treated with different doses of BIBF1120.

      • Figure 1 b-e, in vivo and in vivo experiments. *How many animals were used? Collagen decrease is not quantified (statistics missing).

      We apologize for this omission and have now added the number of animals in the legend of Fig.1 (n = 6). We have also performed statistics for collagen quantification and included this analysis in Fig.1F (see lines 720/723). We also provide to the referee the detailed statistical analysis of mature collagen fibers between the different treatment groups.

      • The title is not accurate. "prevent" resistance in melanoma is an overestimation because the cells do become resistant, albeit later.*

      We agree with the reviewer and we have modified the title accordingly. The new title is now: “Blockade of pro-fibrotic response mediated by the miR-143/-145 cluster prevents targeted therapy-induced phenotypic plasticity and delays resistance in melanoma”.

      Reviewer #1 (Significance):

      As the authors discussed, they and others have previously studied the contribution of ECM and stromal remodelling to resistance to targeted therapies in melanoma. Previous data from E. Sahai´s lab show that BRAFi activate CAFs and increase the production and remodelling of the extracellular matrix, but in this work, they look at a cell-autonomous mechanism mediated by miRs that promotes fibrosis and propose the use of an antifibrotic drug to attenuate resistance to RTK inhibitors.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In this very interesting study, Diazzi and colleagues show that during adaptation to MAPK-targeted therapy (MAPKi), melanoma cells upregulate a miRNA profibrotic cluster (miR-143, -145), which drives a phenotypic switch towards a drug resistant undifferentiated mesenchymal-like state. From the miRNA targets, authors identify FSCN1 as a gene that needs to be downregulated during adaptation to MAPKi by the miRNAs, since FSCN1 ablation promotes the drug resistant phenotype. Importantly, authors show in a preclinical mouse melanoma model that the anti-fibrotic drug nintedanib (BIBF) improves response to MAPKi and delays onset of resistance.

      The study conclusions are convincing and the data are adequately replicated and presented, authors should be commended for having the manuscript in such good shape. However, there are a few issues that authors should clarify/expand.

      We sincerely thank the reviewer for his/her careful review and constructive comments.

      1. The study starts with the in vivo YUMM1.7 model and combination BRAFi+MEKi, and then authors use this combination in many in vitro experiments. However, when studying resistant lines, only BRAFi-resistant and -sensitive pairs were used. I would suggest including more validation of the upregulation of the miRNA and the fibrotic genes on BRAFi+MEKi-resistant lines, and this could be easily gathered from published transcriptomes of several BRAFi+MEKi-resistant melanoma lines from Roger Lo's lab (Song et al 2017 Cancer Discov, including M238, M229, M249 used by the authors). To complement this approach, miRNA expression could be evaluated in large collections of melanoma cell lines classified as more or less undifferentiated (correlating with more or less resistance) as in Tsoi 2018 Cancer Cell and Verfaille 2015 Nat Commun.

      We thank the reviewer for these interesting suggestions. We have performed several analyses, summarized below:

      • First, we have analyzed the expression of the miRNA-143/-145 cluster and pro-fibrotic signature by qPCR in A375 parental and BRAFi/MEKi double resistant melanoma cell lines described in Shen et al. Nat Commun. 2019;10:5713. We observed the upregulation of both mature miRNAs along with a pro-fibrotic signature in several A375 DR clones compared to parental cells. This new result is described in the results section (lines 147-150) and shown in new Supplementary Fig. S2B. In addition, we have included in the results section the important information that the undifferentiated/mesenchymal-like BRAFi-resistant M229R and M238R cells used in our work also displayed cross-resistance to MEKi (results section, line 112 and 1 new reference).

      • Second, as recommended, we have also fully (re)analyzed the mentioned studies and associated datasets. We provide a summary of the different studies including samples number, design of the study, platform used and accession.

      A general observation is that unfortunately, none of these published studies provided an available small RNA-seq dataset, which thus does not allow quantifying the expression levels of mature miRNAs. However, some interesting observations have been uncovered from these datasets, confirming at least in part some of our data:

      i) The dataset from Song et al. 2017 compared 18 isogenic parental versus resistant cell lines. Two subsets of resistant cells were identified, with MAPK addiction (Ra) or Resistance with MAPK redundancy (Rr). The expression of the pri-miR-143/145 precursor, named MIR143HG, was detected in these cells and was found significantly upregulated in Rr cell lines compared to parental cells. Of note, MIR143HG was also part of the Rr specific signature associated with a mesenchymal phenotype. This interesting observation is now discussed in the manuscript (Discussion section, lines 392-394).

      ii) The dataset from Tsoi et al. 2018 focused on transcriptome analysis of 53 human melanoma cell lines including paired acquired resistance sublines established from patient biopsies. Unfortunately, MIR143HG expression is not detected in this dataset, probably due to a limited sequencing depth. Interestingly, we found that FSCN1 expression was decreased in most mesenchymal-like resistant cell lines compared to their parental counterpart. These data cannot be added in the manuscript since we cannot correlate the expression of the miRNAs with their target.

      iii) The dataset from Verfaillie et al. 2015 revealed transcriptomic analyses on 11 short-term cultures derived from patient biopsies before therapy and gave access to RNA-seq data of tumors with a proliferative or an invasive phenotype. MIR143HG is not detected and FSCN1 expression does not appear to be associated with a specific phenotype. We have performed qPCR-based expression of miR-143-3p and miR-145-5p in some of these short-term cultures, confirming that miR-143/-145 expression is not associated with a specific phenotype in therapy naïve melanoma cells (results for referees, see below). Expression of miR-143-3p and miR-145-5p in each short-term culture was compared to the average expression of the analyzed miRNA in the proliferative short-term cultures. These results are consistent with the findings of our study describing that expression of the miR-143/145 cluster is triggered by the inhibition of the BRAF oncogenic pathway.

      Related to this, the clinical relevance would increase if findings were validated using patient samples, for example, from published transcriptomes (Hugo 2015 Cell, Song 2017 Cancer Discov, Wagle 2014 Cancer Discov...) or even from TCGA, which could be used to identify if patients with high miRNA have worse prognosis.

      We agree with the reviewer about the importance of providing clinical data supporting our observations. We have carefully analyzed all these profiling studies and provide below a summary.

      Overall, these studies have several limitations: i) as underlined above, expression of the miRNA cluster is specifically induced in response to therapy and is not present (or barely) in tumors at diagnosis; ii) no small RNA-seq datasets are available yet; iii) melanoma tumors are highly heterogeneous and invaded with stroma, especially CAFs and vessels that also express these miRNAs. We have looked at the expression of the MIR143HG precursor in these datasets and it was not present, probably due to low to medium sequencing depths in these clinical studies.

      We have also carefully explored TCGA datasets to look at possible association between prognosis and mature / precursor miRNA as well as miRNA target (FSCN1) expression in skin cutaneous melanoma (SKCM) using the tools developed by Anaya et al. 2016, PeerJ Computer Science 2:e67. Cox regression models and Kaplan-Meier analysis (using different percentiles) did not show any association of our candidates with survival on a cohort of 459 SKCM patients (median survival of 2.4 years).

      Finally, during the revision process, we could have access to 9 relapsed melanoma for research purposes from the Dermatology Department of Nice University Hospital (CHU) following treatment with targeted therapies, immunotherapies or a combination of them. We have analyzed in these biopsies the expression of fibrotic/mesenchymal genes, FSCN1 and the miR-143/145 cluster compared to the mean expression of the same genes/miRNAs in therapy naïve patient-derived xenografts (MEL003, MEL006, MEL015, MEL047). Our first results indicate that relapsed tumors acquire a strong fibrotic signature which is associated to increased expression of the miR-143/-145 cluster and decreased expression of FSCN1 (8 out of 9 patients).

      These results are encouraging and represent a good indicator for further clinical validation but are not solid enough to be incorporated in the manuscript. Overall, validation of our hypotheses in patient samples would require an entire new and highly complex clinical study comparing tumors at diagnosis with relapsed tumors after targeted therapies and ideally processed using single-cell RNA-seq and/or RNA FISH to take into account the stromal compartment.

      • While blocking the miRNA improves BRAFi response (Fig.3H), it is not clear that this combination would overcome resistance (using resistant lines), although authors show that BIBF does overcome resistance (Fig.1J). *This also applies to line 277 "… mirroring the effect of miR143/145 ASOs, forced expression of FSCN1 in M238R cells decreased viability in the presence of BRAFi (Fig.5H)." However, the miRNA ASOs were used in parental cells (Fig.3H).

      To meet the reviewer’s comment, we have conducted new experiments in resistant melanoma cells using different approaches to silence simultaneously the 2 mature miRNAs: i) an ASO-directed RNAse H degradation of the miR-143/145 precursor, as described by Plaisance et al., JACC Basic Transl Sci. 2016, 1:472-493 to knock-down the pri-miRNA in cardiomyocytes, and ii) a combination of the 2 anti-miRs ASOs. Unfortunately, the first approach failed to efficiently inhibit the expression of mature miR-143-3p and miR-145-5, suggesting that the miR-143/145 cluster has a different precursor gene in melanoma than the one described in cardiomyocytes.

      Concerning the second approach, as expected, the 2 anti-miRs ASOs as well as the combination of the 2 ASOs efficiently targeted the mature miRNAs (new Supplementary Fig.S6C). Inhibition of miR-145-5p alone and combined inhibition of the two miRNAs significantly affected the viability of BRAFi resistant melanoma cells (M238R) in the absence of BRAFi (new Supplementary Fig.S6D) in a similar way as Nintedanib/BIBF (Fig. 1J).

      • Analysis of cytoskeletal changes. Text (lines 284-287) is missing references, regarding "…morphological changes with cells assuming flattened spindle-like shape" and "..function of FSCN1 in F-actin microfilaments reorganization...".*

      We apologize for these omissions and have added the relevant references in the text (lines 305/306).

      Besides, authors say that transient overexpression of miRNAs reproduced these morphological changes as shown by F-actin staining. These would have benefited from including also side-by-side comparison of BRAFi treatment on these cell lines. To my knowledge, these melanoma lines (M238, M229, etc) have not been characterized in that regard (F-actin, focal adhesions). In Nazarian et al 2010, only brightfield pictures are shown in a supplementary figure.

      The same applies to YAP and especially MRTF activation upon miRNA overexpression, and whether this mirrors what BRAFi does to YAP and MRTF. In Misek et al 2020 and Kim et al 2015 YAP and MRTF were shown to be more enriched in the nucleus in resistant than in parental cells. Kim et al also show in time course experiments that there is significantly higher nuclear YAP after 7-14 days of BRAFi treatment. In the present manuscript, authors seemed to have assessed nuclear YAP/MRTF after 72h miRNA overexpression. Does it mirror MAPKi?

      As suggested by the reviewer, we have compared side-by-side the effect of oncogenic MAPK pathway inhibition to the effect of miR-143 or miR-145 overexpression on cytoskeleton and focal adhesion dynamics as well as YAP and MRTFA nuclear translocation in M238P, M229P and UACC62P melanoma cells. These analyses clearly show that transient overexpression of miR-143-3p or miR-145-5p mirrors the effects of BRAF or BRAF/MEK inhibition after 3 days on mechanopathways and acto-myosin remodeling. We thank the referee for this comment, which is helpful for the interpretation of the data. The new additional panels have been included in new Fig. 6B-D, new Fig. 7B-D, new Supplementary Fig. S10B-D and new Supplementary Fig. S11C-D.

      Regarding the decreased proliferation/survival after miRNA overexpression, is it truly slow cycling and not combined with some cell death? Table S1 has a "cell death of tumor cell lines" theme after miRNA overexpression.

      Following the reviewer suggestion, Annexin V/DAPI staining has been performed in M238P cells upon transient overexpression of miR-143 or miR-145. No significant cell death was observed (new Supplementary Fig. S4D). Detailed statistical analysis and quantification of the experiment is provided. Staurosporine (Stauro) treatment was used as a positive control of cell death induction.

      Related to this, in Supp. Fig.4C the effect on the cell cycle effect is very small, is this significant? It is unclear when the cell cycle was assessed after miRNA overexpression (72h?), it could be a matter of timing. According to Fig.3E, there is a reduction in growth from 60-72h onwards.

      We performed, as suggested by the reviewer, cell cycle analysis at longer timing after transfection (96 hours) (new Supplementary Fig. S4C). We observed a significant accumulation of melanoma cells in G0/G1 phase upon miR-143 or miR-145 overexpression and a significant decrease of the percentage of cells in S phase. Detailed statistical analysis of the described experiment is provided.

      Statistics. While multiple comparison tests were used, most graphs have asterisks on top of some bars, and it is unclear what is being compared with what. For example, Fig.2B have asterisks on top of BRAFi+MEKi group, does it mean it is significant vs vehicle group? In this and other similar cases (1J, 2C, S1B and others), a comparison against the combination group (BRAFiMEKi+BIBF) is also relevant. This should be revised throughout manuscript.

      As recommended by the reviewer, statistical analysis have been modified in the mentioned figures: Fig. 1J (lines 732/733), Fig. 2B (lines 745/746), Fig. 2C (lines 749/750) and Fig. S1B (see new figures and lines 251/252 of Supplementary materials).

      \*Minor:** -For all the studies using stable cell lines, authors should state how long after transduction and selection experiments were performed. *

      As recommended, we have now added this information (see lines 8-12 of Supplementary materials). - Authors only show single miRNA overexpression or inhibition. However, both miRNA are upregulated upon MAPKi. Did authors try the double overexpression or blockade?

      As suggested by the reviewer, we experimented the double blockade in M238P and 1205Lu cells treated with MAPK inhibitors. Results are presented in new Fig. 3B, 3D, 3H and Supplementary Fig. S6A-B. Overall, combined inhibition of the two miRNAs had an effect comparable or more significant than the single miRNA inhibition depending on the cellular parameter analyzed.

      Concerning the double overexpression, we already experimented lentivirus-mediated stable overexpression of the two miRNAs in two melanoma cell lines. Results are presented in Supplementary Fig. S5A-F and confirmed the functional effects observed by the single miRNA overexpression.

      - For the 1205Lu xenograft experiment, authors should also show the tumour growth curves, and explain how long treatment was and when miRNA expression was analysed (endpoint?). In addition, why in 5A there are only 3 dots (mice?) per group, while in 5B there are more (6-7 in control, 4-5 in BRAFi)?

      We apologize for this omission. We have added line 270 of the manuscript the reference to the previous study in which the experiment is described. miRNA expression was analyzed in tumors at the endpoint of the experiment i.e. 2 weeks after Vemurafenib treatment start. Moreover, we performed again the analysis of FSCN1 and miR-143/145 expression with the same number of mice (n = 6), please see new Fig. 5A.

      - In a few graphs, the axis legend should give more information. For example, Fig.2 says Fold change, and it should be Fold change expression, or similar; Fig.4G fold change FSCN mRNA expression; Fig. S2 log2 expression (resistant/par), S5A...

      We have corrected this and modified y-axis legends in the corresponding figures.

      - Fig.1E-G and S1B. **Is this at endpoint for each group?

      Yes, it is as stated in the materials and methods section.

      - Fig.3H and S6B. how long were these experiments?

      Experiments shown in Fig. 3H and Fig. S6B were carried out during 72 h. This information has been included in the legend of the corresponding figures.

      - Fig.7B and D. Why the MRTFA signal in miR-neg and siCTRL is so different? Same for UACC in S11A vs s11D.

      We apologize for this inaccuracy. We have revised the figures to show more representative pictures (new Figs. 7B, 7D and S11A, S11D and new Fig. 6C).

      • Fig.5C and 5E. FSCN1 knockdown in 5C is very efficient, while not so much in 5E. However, effects on MITF, AXL etc in 5C are quite impressive. are these knockdowns representative?

      We again apologize for this inaccuracy. We performed a new experiment and we are now showing a more representative FSCN1 knockdown in new Fig. 5E.

      - Fig.6-7 legend. When mentioning scale bar, it reads uM, should it be um?

      We have corrected this mistake.

      • Fig.7A. In the graph, the "YAP nuclear enrichment", do the numbers represent the nuclear/cytoplasm ratio?

      Yes, numbers represent the nuclear/cytoplasm ratio. This information was added in the legend of the corresponding figures.

      - When showing migration and a picture (Fig.3F, 5D, S4D, S5E...), the blue over dark background is difficult to see, using greyscale or a brighter pseudocolour would help

      We thank the reviewer for this useful suggestion. We have done this and used the gray scale to improve the quality of the pictures.

      Reviewer #2 (Significance):

      These findings have important preclinical implications, since the study proposes a biomarker of resistance (profibrotic signature) and importantly, a potential new therapy to delay MAPKi resistance in melanoma (BIBF). It could also apply to other BRAFmutant cancers and diseases cursing with fibrosis.

      Field of expertise: melanoma, drug resistance, cytoskeleton

      Reviewer #3:

      Major comments:

      The manuscript is well written, data are convincing, well presented and supportive of the conclusions.

      We thank the reviewer for his/her interest about our study and supportive comments.

      \*Minor points that may be improved:***

      - The expression of miR-143/145 increases in melanoma cell lines treated with BRAFi and/or MEKi for 72h (Fig. 2B, Supp. Fig. 2B-F), and also after the development of resistance to MAPK-targeted therapies (Fig. 2A, Supp. Fig. 2A). The transient overexpression of miRs in therapy-naive cells leads to cells de-differentiation toward a mesenchymal/MAPK resistant state. On the other hand, these cells become more sensitive to BRAFi treatment when combined with LNA-mediated inhibition of miRs activity. It would be important to determine if the same occurs also in resistant cells, or whether MAPKi-resistance is established, cells are no longer sensitive to miRs blockade.

      The answer to this point is common to the point 2 raised by the reviewer #2.

      According to reviewers suggestion, we have conducted new experiments in resistant melanoma cells using different approaches to silence simultaneously the 2 mature miRNAs: i) an ASO-directed RNAse H degradation of the miR-143/145 precursor, as described by Plaisance et al., JACC Basic Transl Sci. 2016, 1:472-493 to knock-down the pri-miRNA in cardiomyocytes, and ii) a combination of the 2 anti-miRs ASOs. Unfortunately, the first approach failed to efficiently inhibit the expression of mature miR-143-3p and miR-145-5, suggesting that the miR-143/145 cluster has a different precursor gene in melanoma than the one described in cardiomyocytes.

      Concerning the second approach, as expected, the 2 anti-miRs ASOs as well as the combination of the 2 ASOs efficiently targeted the mature miRNAs (Supplementary Fig.S6C). Inhibition of miR-145-5p alone and combined inhibition of the two miRNAs significantly affected the viability of BRAFi resistant melanoma cells (M238R) in the absence of BRAFi (new Supplementary Fig.S6D) in a similar way as BIBF (Fig. 1J).

      - In 2 out of 4 melanoma PDX samples naïve/resistant to combo BRAFi/MEKi therapy, the expression level of miR-143/145 cluster correlates with the de-differentiated transcriptomic profile of resistant tumor. How is Fascin1 expression in these samples?

      The reviewer legitimately asks about the expression level of the miR-143/-145 target FSCN1 in the PDX samples used in the study. Expression of FSCN1 in PDX resistant vs naïve samples has been assessed by RT-qPCR. Results are provided. We observed decreased expression of FSCN1 in only 1 out of the 2 samples showing increased miR-143/145 expression. This can be due to the heterogeneity of the subpopulations composing the tumor sample. It would have been interesting and probably more informative to test FSCN1 expression also at protein level since often miRNA molecular targets are inhibited at translation level but unfortunately we did not have the access to protein extracts corresponding to these samples.

      - The clinical relevance of the data could be strongly improved by assessing the expression of the miRs cluster and of its target Fascin1 in resistant subsets of patients, comparing their expression to patients before treatment, making use of available datasets.

      We agree with the reviewer about the importance of providing clinical data supporting our observations. We have carefully analyzed all available profiling studies and datasets and provide below a summary.

      Overall, these studies have several limitations: i) as demonstrated in our study, expression of the miRNA cluster is specifically induced in response to therapy and is not present (or barely) in tumors at diagnosis; ii) no small RNA-seq datasets are available yet; iii) melanoma tumors are highly heterogeneous and invaded with stroma, especially CAFs and vessels that also express these miRNAs. We have looked at the expression of the MIR143HG precursor in these datasets and it was not present, probably due to low to medium sequencing depths in these clinical studies.

      We have also carefully explored TCGA datasets to look at possible association between prognosis and mature / precursor miRNA as well as miRNA target (FSCN1) expression in skin cutaneous melanoma (SKCM) using the tools developed by Anaya et al. 2016 PeerJ Computer Science 2:e67. Cox regression models and Kaplan-Meier analysis (using different percentiles) did not show any association of our candidates with survival on a cohort of 459 SKCM patients (median survival of 2.4 years, see Kaplan plots below).

      Finally, during the revision process, we could have access to 9 relapsed melanoma for research purposes from the Dermatology Department of Nice University Hospital (CHU) following treatment with targeted therapies, immunotherapies or a combination of them. We analyzed in these samples the expression of fibrotic/mesenchymal genes, FSCN1 and the miR-143/145 cluster compared to the mean expression of the same genes/miRNAs in therapy naïve patient-derived xenografts (MEL003, MEL006, MEL015, MEL047). Our results indicate that relapsed tumors acquire a strong fibrotic signature which is associated to increased expression of the miR-143/145 cluster and decreased expression of FSCN1 (8 out of 9 patients).

      This represents a good indicator for further clinical validation but is not solid enough to be incorporated in the manuscript. Overall, validation of our hypotheses in patient samples would require an entire new and highly complex clinical study comparing tumors at diagnosis with relapsed tumors after targeted therapies and ideally processed using single-cell RNA-seq and/or RNA FISH to take into account the stromal compartment.

      Minor comments:

      - Fig. 4C, lower legend: M238P not M238S.

      We apologize for this mistake and corrected it.

      Reviewer #3 (Significance):

      **Nature and significance of the advances:**

      The findings not only suggest the combination therapy with the anti-fibrotic drug Nintedanib to be effective in enhancing MAPKi treatment in melanoma, reducing the development of resistance, but identify the molecular mechanism via the induction o the miR-143/145 cluster and the effects on the target Fascin1.

      **Compare to existing knowledge**

      These two miRNAs have been shown to have both oncogenic and oncosuppressor activities and have already been involved in EMT induction. The findings add yet one more piece to the puzzle.

      **Audience** This manuscript is not only of interest for oncology researchers but also of general interest or the understanding of fundamental biological processes and their effects on cancer therapy.

      **Your expertise**

      Molecular biologist and cancer research, transcriptional control of tumor transfromatin and progression including EMT, microRNAs -143/145

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In the present work Diazzi and co-authors describe the mechanism through which the anti-fibrotic drug Nintedanib potentiates MAPK-targeted therapy efficacy in melanoma cells. Nintedanib prevents the MAPK-induced pro-fibrotic response and is associated with loss of miR-143/-145 cluster expression. These miRs promote melanoma cells de-differentiation towards a pro-fibrotic mesenchymal-like state that correlates with resistance to MAPK inhibitors. Looking for miR-143/-145 targets responsible for this phenotype switch, the authors identified Fascin1 as a crucial regulator of cytoskeleton dynamics and mechanopathways.

      Major comments:

      The manuscript is well written, data are convincing, well presented and supportive of the conclusions.

      Minor points that may be improved:

      • The expression of miR-143/145 increases in melanoma cell lines treated with BRAFi and/or MEKi for 72h (Fig. 2B, Supp. Fig. 2B-F), and also after the development of resistance to MAPK-targeted therapies (Fig. 2A, Supp. Fig. 2A). The transient overexpression of miRs in therapy-naive cells leads to cells de-differentiation toward a mesenchymal/MAPK resistant state. On the other hand, these cells become more sensitive to BRAFi treatment when combined with LNA-mediated inhibition of miRs activity. It would be important to determine if the same occurs also in resistant cells, or whether MAPKi-resistance is established, cells are no longer sensitive to miRs blockade.
      • In 2 out of 4 melanoma PDX samples naïve/resistant to combo BRAFi/MEKi therapy, the expression level of miR-143/145 cluster correlates with the de-differentiated transcriptomic profile of resistant tumor. How is Fascin1 expression in these samples?
      • The clinical relevance of the data could be strongly improved by assessing the expression of the miRs cluster and of its target Fascin1 in resistant subsets of patients, comparing their expression to patients before treatment, making use of available datasets.

      Minor comments:

      • Fig. 4C, lower legend: M238P not M238S

      Significance

      Nature and significance of the advances:

      The findings not only suggest the combination therapy with the anti-fibrotic drug Nintedanib to be effective in enhancing MAPKi treatment in melanoma, reducing the development of resistance, but identify the molecular mechanism via the induction o the miR-143/145 cluster and the effects on the target Fascin1.

      Compare to existing knowledge

      These two miRNAs have been shown to have both oncogenic and oncosuppressor activities and have already been involved in EMT induction. The findings add yet one more piece to the puzzle.

      Audience

      This manuscript is not only of interest for oncology researchers but also of general interest or the understanding of fundamental biological processes and their effects on cancer therapy.

      Your expertise

      Molecular biologist and cancer research, transcriptional control of tumor transfromatin and progression including EMT, microRNAs -143/145

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this very interesting study, Diazzi and colleagues show that during adaptation to MAPK-targeted therapy (MAPKi), melanoma cells upregulate a miRNA profibrotic cluster (miR-143, -145), which drives a phenotypic switch towards a drug resistant undifferentiated mesenchymal-like state. From the miRNA targets, authors identify FSCN1 as a gene that needs to be downregulated during adaptation to MAPKi by the miRNAs, since FSCN1 ablation promotes the drug resistant phenotype. Importantly, authors show in a preclinical mouse melanoma model that the anti-fibrotic drug nintedanib (BIBF) improves response to MAPKi and delays onset of resistance.

      The study conclusions are convincing and the data are adequately replicated and presented, authors should be commended for having the manuscript in such good shape. However, there are a few issues that authors should clarify/expand.

      1. The study starts with the in vivo YUMM1.7 model and combination BRAFi+MEKi, and then authors use this combination in many in vitro experiments. However, when studying resistant lines, only BRAFi-resistant and -sensitive pairs were used. I would suggest including more validation of the upregulation of the miRNA and the fibrotic genes on BRAFi+MEKi-resistant lines, and this could be easily gathered from published transcriptomes of several BRAFi+MEKi-resistant melanoma lines from Roger Lo's lab (Song et al 2017 Cancer Discov, including M238, M229, M249 used by the authors). To complement this approach, miRNA expression could be evaluated in large collections of melanoma cell lines classified as more or less undifferentiated (correlating with more or less resistance) as in Tsoi 2018 Cancer Cell and Verfaille 2015 Nat Commun.

      Related to this, the clinical relevance would increase if findings were validated using patient samples, for example, from published transcriptomes (Hugo 2015 Cell, Song 2017 Cancer Discov, Wagle 2014 Cancer Discov...) or even from TCGA, which could be used to identify if patients with high miRNA have worse prognosis.

      1. While blocking the miRNA improves BRAFi response (Fig.3H), it is not clear that this combination would overcome resistance (using resistant lines), although authors show that BIBF does overcome resistance (Fig.1J). This also applies to line 277 ".. mirroring the effect of miR143/145 ASOs, forced expression of FSCN1 in M238R cells decreased viability in the presence of BRAFi (Fig.5H)." However, the miRNA ASOs were used in parental cells (Fig.3H).
      2. Analysis of cytoskeletal changes. Text (lines 284-287) is missing references, regarding "..morphological changes with cells assuming flattened spindle-like shape" and "..function of FSCN1 in F-actin microfilaments reorganization..". Besides, authors say that transient overexpression of miRNAs reproduced these morphological changes as shown by F-actin staining. These would have benefited from including also side-by-side comparison of BRAFi treatment on these cell lines. To my knowledge, these melanoma lines (M238, M229, etc) have not been characterized in that regard (F-actin, focal adhesions). In Nazarian et al 2010, only brightfield pictures are shown in a supplementary figure. The same applies to YAP and especially MRTF activation upon miRNA overexpression, and whether this mirrors what BRAFi does to YAP and MRTF. In Misek et al 2020 and Kim et al 2015 YAP and MRTF were shown to be more enriched in the nucleus in resistant than in parental cells. Kim et al also show in time course experiments that there is significantly higher nuclear YAP after 7-14 days of BRAFi treatment. In the present manuscript, authors seemed to have assessed nuclear YAP/MRTF after 72h miRNA overexpression. Does it mirror MAPKi?
      3. Regarding the decreased proliferation/survival after miRNA overexpression, is it truly slow cycling and not combined with some cell death? Table S1 has a "cell death of tumor cell lines" theme after miRNA overexpression.

      Related to this, in Supp. Fig.4C the effect on the cell cycle effect is very small, is this significant? It is unclear when the cell cycle was assessed after miRNA overexpression (72h?), it could be a matter of timing. According to Fig.3E, there is a reduction in growth from 60-72h onwards.

      1. Statistics. While multiple comparison tests were used, most graphs have asterisks on top of some bars, and it is unclear what is being compared with what. For example, Fig.2B have asterisks on top of BRAFi+MEKi group, does it mean it is significant vs vehicle group? In this and other similar cases (1J, 2C, S1B and others), a comparison against the combination group (BRAFiMEKi+BIBF) is also relevant. This should be revised throughout manuscript.

      Minor:

      -For all the studies using stable cell lines, authors should state how long after transduction and selection experiments were performed.

      -Authors only show single miRNA overexpression or inhibition. However, both miRNA are upregulated upon MAPKi. Did authors try the double overexpression or blockade?

      -For the 1205Lu xenograft experiment, authors should also show the tumour growth curves, and explain how long treatment was and when miRNA expression was analysed (endpoint?). In addition, why in 5A there are only 3 dots (mice?) per group, while in 5B there are more (6-7 in control, 4-5 in BRAFi)?

      -In a few graphs, the axis legend should give more information. For example, Fig.2 says Fold change, and it should be Fold change expression, or similar; Fig.4G fold change FSCN mRNA expression; Fig. S2 log2 expression (resistant/par), S5A...

      -Fig.1E-G and S1B. Is this at endpoint for each group?

      -Fig.3H and S6B. how long were these experiments? Fig.7B and D. Why the MRTFA signal in miR-neg and siCTRL is so different? Same for UACC in S11A vs s11D.

      -Fig.5C and 5E. FSCN1 knockdown in 5C is very efficient, while not so much in 5E. However, effects on MITF, AXL etc in 5C are quite impressive. are these knockdowns representative?

      -Fig.6-7 legend. When mentioning scale bar, it reads uM, should it be um?

      -Fig.7A. In the graph, the "YAP nuclear enrichment", do the numbers represent the nuclear/cytoplasm ratio?

      -When showing migration and a picture (Fig.3F, 5D, S4D, S5E...), the blue over dark background is difficult to see, using greyscale or a brighter pseudocolour would help.

      Significance

      These findings have important preclinical implications, since the study proposes a biomarker of resistance (profibrotic signature) and importantly, a potential new therapy to delay MAPKi resistance in melanoma (BIBF). It could also apply to other BRAFmutant cancers and diseases cursing with fibrosis.

      Field of expertise: melanoma, drug resistance, cytoskeleton

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript is interesting and well presented. The authors propose the use of an antifibrotic drug to attenuate resistance to RTK inhibitors.

      Specific comments

      1. It is not entirely clear how Nintedanib decreases tumour growth. It may be due to its effect on resistant melanoma cells as proposed, but it could also be due to the effect on CAFs. This should be at least discussed
      2. A potential caveat is that drug used is non-specific as it also blocks PDGFR signalling. Hyperactivation of RTKs is a mechanism of BRAFi resistance and for example in Figure 1J, they see that BIF1120/Nintedanib has a significant effect on BRAFi-resistant cells, which may indicate that the growth inhibition seen in allografts could be a combination of an "anti-fibrotic" role and its own activity inhibiting the survival of resistant cells. This needs to be considered.
      3. Does the viability decrease in BRAFi-sensitive cells? For instance, in the parental cells.
      4. Figure 1 b-e, in vivo and in vivo experiments. How many animals we used? Collagen decrease is not quantified (statistics missing).
      5. The title is not accurate. "prevent" resistance in melanoma is an overestimation because the cells do become resistant, albeit later.

      Significance

      As the authors discussed, they and others have previously studied the contribution of ECM and stromal remodelling to resistance to targeted therapies in melanoma. Previous data from E. Sahai´s lab show that BRAFi activate CAFs and increase the production and remodelling of the extracellular matrix, but in this work, they look at a cell-autonomous mechanism mediated by miRs that promotes fibrosis and propose the use of an antifibrotic drug to attenuate resistance to RTK inhibitors.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank the reviewers for their critical comments and suggestions. We are glad that the reviewers appreciated the quality of the data and the novel findings connecting the secretory trafficking machinery with extracellular matrix-related signaling.

      2. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Jung et al reports on an interesting finding that focal adhesion signaling regulates the expression of Sec23A and thereby regulates COPII-dependent trafficking. The data presented a mostly solid and the finding itself is highly novel, as it tackles an area of secretory trafficking that remains poorly understood, namely the connection between the ECM and secretion.

      I will list below all comments that I have mixing both technical and conceptual topics:

      \*Technical issues:***

      1-The authors should provide a better description of how the designed this siRNA library. What were the inclusion criteria for these 378 genes? I might have missed it, but I could not find this information easily.

      Reply: The library has been designed in-house based on gene annotations and literature to include cytoskeleton structural proteins, motor proteins, and other associated and regulatory proteins. We will add this information in the Materials and Methods section.

      2-Figure 2: I know this is challenging for EM images, but is there a way the authors could quantify these data? How many images were looked at? What was the average width of ER cisterne?

      Reply: We will provide image quantifications and statistics

      3-Figure 4: I think that the characterization of the FA phenotype is a bit underdeveloped. There is no quantification of these data. Is the size of FA changing? Is the number of FA per cell changing? Is the length of FAs changing? I think that more work is needed to increase the confidence in these data.

      I could also not easily see what type of cells these are. A better description of this experiment is also required. Also, how many cells were analyzed. I think it is important that this experiment is done with a sufficient number of cells to increase the confidence in the data.

      Reply: We agree with the reviewer that our observations regarding the focal adhesion (FA) phenotype will benefit from image quantification and we intend to include this in the revised manuscript. All FA experiments were performed on HeLa cells. We will update the materials and methods sections to better describe this experiment.

      \*Conceptual issues:***

      1-The finding that focal adhesion signaling negatively affects ER-export is surprising, because cancer cells that grow on stiff substrates have more focal adhesions and are more invasive and migratory. Both migration and invasion are expected to depend on ER-export. Although the authors did not formally test Sec23A expression under different stiffnesses, I would expect that stiff substrates would lower Sec23A expression and thereby negatively affect ER-export. It would certainly increase the breadth of this work to include data like this and to also discuss this highly surprising finding. However, it is of course the decision of the authors and the editors to decide whether such an experiment would benefit the entire story.

      Reply: In this work, we have shown that cells plated on ECM or matrigel have decreased SEC23A expression compared to control cells. We have also shown that inhibition of FA kinase leads to an increase in SEC23A expression (Figure 5). Whether this translates into a change in ER transport, is a fair point that we will address in the revision. Regarding stiffness, we have done a preliminary experiment that shows that cells plated on a soft synthetic substrate have less SEC23A than cells plated on plastic.This goes in line with our ECM experiments because Matrigel and fibroblast-derived ECM are softer than plastic.

      2-The authors postulate that this novel mechanism could be part of a feedback loop. If this were the case one would expect the acute effect of FA to increase ER-export (or secretion) and the negative feedback will then reduce secretion. However, the acute effect of FA is not addressed in this manuscript. In order to postulate a feedback loop, the authors would need to test the individual nodes of this loop.

      Reply: The question appears to be whether an acute effect on FA would affect the expression of SEC23A and therefore ER transport. If by the acute effect the reviewer means a pharmacological manipulation, we have shown that upon treatment with the FAK inhibitor the expression of SEC23A increases (Fig 5A). Whether this increase in SEC23A expression translates into a corresponding increase in ER transport remains to be seen. This will be tested in our revised manuscript as mentioned above in reply to point # 1.

      Our data encouraged us to propose a hypothetical feedback loop that would connect the deposition of ECM through the expression of SEC23A. We will have more data to support (or reject) this idea once we do the transport experiments as mentioned above. However, we think that a full characterization of this hypothetical loop by testing individual nodes is beyond the scope of this manuscript

      Reviewer #1 (Significance (Required)):

      I think that the basic finding of this manuscript is highly novel, by showing the impact of the ECM and focal adhesions on COPII-dependent trafficking. I think that this will not only appeal to people from the trafficking community, but also to people working on cell migration and on mechanobiology. The work in its current form does not require much extra efforts (max. 3 month). However, if the authors would decide to increase the breadth of data, they would require 3-6 months.

      Reply: We thank reviewer #1 for the comments. We also believe that this story will appeal to a broader audience and would help to bridge the gap between membrane trafficking and mechanobiology communities.

      \*Referees cross-commenting***

      I went through the comments of the two other reviewers and agree with their verdict. Some extra work on the characterization of the early secretory pathway would be good. Both reviewers provided a nice catalogue of possible experiments to choose from.

      Reply: We have characterized the early secretory pathway in terms of ER exit sites, Beta-COP, and Golgi morphology (FIG. 2B-H and S1A-B). Together, these data strongly characterize the nature of ER-block. Moreover, the finding that our interactors affect the expression of SEC23A allows us to explain mechanistically why an ER transport block occurs. This is further strengthened by the rescue experiments (FIG. 3F). We believe that further characterization of the secretory pathway will not contribute substantially to the main message of this manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Jung et al which based on a targeted siRNA screen, demonstrates regulation of SEC23A (component of the SEC23 complex of the COP coat) levels at transcriptional level downstream of focal adhesion signaling. By regulating siRNA mediated downregulation, the authors were able to identify proteins which either increased or decreased traffic of VSVG through the secretory pathway when combined with downregulation in the levels of with either SEC23A or SEC23B. Authors have focused on a group of SEC23B functional interactors, downregulation of which shows them increased size of focal adhesions which also downregulate SEC23A levels, thus providing an explanation for reduced secretory traffic. Authors further show that plating cells on fibronectin or Matrigel, which activate Focal adhesion kinase signaling also results in downregulation of SEC23A transcript levels. The screen is conducted in a well-controlled manner for most parts with a clear explanation of the analysis routines and the data presentation if of very good quality. Most important results have been validated by more than one experimental strategy which lends substantial confidence to the findings. The results also open further avenues for understanding the transcriptional regulation in different physiological and disease contexts.

      There are certain issues, which the authors should address with regards to controls and some conflicting observations with published results with respect the phenotypes associated with downregulating proteins on focal adhesions size. Additionally, authors don't tie the ends by monitoring secretory traffic in cells grown on different matrices but include it in the model. Addressing/explaining these issues could improve this manuscript and the model may have to be tweaked a bit.

      \*Major comments:***

      1)I wonder why the authors only used siRNA control in their screen when the effects are scored in context of double knockdown fashion in combination with mild knockdown of SEC23A and SEC23B to get functional interactors. Control siRNA in combination with SEC23A and SEC23B should have been two ideal negative controls in the screen. Nevertheless, in data presented Figure 1E and whole of Figure 2, using control siRNA in combination with SEC23B siRNA would have been ideal control to show that the combination does not induce any trafficking defects which could impact the findings of the study. Hence, a few of the data presented from some of these figures should have sicontrol+SEC23B siRNA combination as a control.

      Reply: There seems to be a misunderstanding. In the screen, the negative controls are only used as a reference as the scoring is based on a 5X5 matrix centered on the siRNA of interest. This is done to overcome possible plate effects and to normalize data across different biological replicas. As seen in figure 1B, the negative controls (Control siRNA or Control siRNA + SEC23A siRNA or Control siRNA + SEC23B siRNA are very close to 0 (but not exactly 0) as they were not used in the normalization process. It is important to mention that all single knockdowns also contain our control siRNA to keep the same final siRNA concentration in single and double knockdowns. In Fig 1E we will include the images from Control + SEC23A siRNAs and Control + SEC23B siRNA as a reference. For Figure 2 all except 2A and 2H have the single knockdowns as controls.

      2)What is the identity of post-ER structures which authors refer to in Figure 2A? Could the images represent VSVG concentrated at ER exit sites? Authors should stain with markers for ERES to see if the VSVG puncta colocalize with it.

      Reply: We have done the experiment, and indeed these structures colocalize with an ER exit site marker (SEC31A). We intend to include this data into the revised manuscript. Our observations are in agreement with what is known in the literature about VSVG transport.

      3)Based on RNA sequencing results, authors chose to follow up on SEC23A levels in background of siRNA knockdown of components (like MACF1, ROCK1, FERMT2 etc.) which regulate Focal adhesions in cells and show that there is a reduction in both transcript and protein levels of SEC23A. In images shown in Figure 2B and Figure 2C, levels or SEC31A and β-Cop1 are reduced. Authors should test using qPCR and western blots whether there is a downregulation SEC31A, β-Cop1 and SEC23B in siRNA knockdowns of MACF1, ROCK1, FERMT2 etc. It would provide new insights if there were a co-regulation of secretory machinery to modulate the secretory traffic in response to Focal Adhesion based signaling.

      Reply: Our transcriptomics data (FIG 3C and Table 5) shows that SEC31A and COPB1 mRNAs are not altered upon any of the knockdowns. For SEC23B, we observed only a slight decrease in ROCK1 knockdown. This data suggests that a co-regulation of the secretory machinery might not be present. Instead, the curation of secretory pathway genes in our transcriptome data shows that SEC23A is the only commonly differentially expressed gene.

      4)Most major concern in this manuscript surrounds around results presented in Figure 4C. Authors show that in response to all the knockdowns, they see more focal adhesions as monitored by Vinculin staining and this along with the experiments with cells plated on Matrigel and Fibronectin arrive at the conclusion that increased Focal adhesion signaling downregulates SEC23A levels which presumably modulates secretory traffic. I am not an expert on Focal adhesions but based on my understanding of the literature on that topic, downregulation of ROCK1, FEMRT2 disrupts focal adhesions. (See: Theodosiou et. al., Elife, 2016 or Lock et. al., Plos One, 2012 for example). How do authors explain their results in siRNA knockdown of ROCK1 and FEMRT2 which leads to an increased size of focal adhesions which seems contradictory to the published results? To clarify these results authors should test phosphorylation of FAK in their siRNA backgrounds which is another read out of focal adhesion signaling.

      The experiments from cells grown on Fibronectin and Matrigel favor the argument which authors put forth, but authors may have to tweak the model a bit based on FAK phosphorylation and FAK signaling in context of above-mentioned knockdowns.

      Reply: Based on the images for vinculin staining, in our current manuscript we propose that changes in FAs occur upon knocking down our interactors. In our revised manuscript we will provide a more robust quantitative assessment of those changes (change in number, size, or intensity) as mentioned in our reply to Reviewer #1.

      As for the discrepancies in the relation of FA phenotype upon depletion of ROCK1 and FERMT2, we want to point out that this effect depends on the cell type used. For instance, the papers listed by the reviewer here use fibroblasts and keratinocytes respectively while we have used Hela Kyoto cells which are epithelial in nature. Another example is that while in fibroblasts depletion of FERMT2 leads to a rounded morphology and almost an absence of FAs (Theodosiou et. al., Elife, 2016), in podocytes (Qu et al JCS, 2011), it leads to fewer FAs but an increase in their size. Nonetheless, this is a very keen observation from the reviewer and we will address this point in our revised manuscript discussion.

      5)What happens to VSVG traffic or RUSH-Cadherin traffic when cells are plated on Matrigel and Fibronectin? Reduction in secretory traffic of these is an important experiment which is missing to close the loop and validate the model presented. Authors must test these experiments either with cells grown on matrix alone or in combination with siRNA to SEC23B. Authors should also monitor ERES and transport carriers in this background.

      Reply: We agree with the reviewer and intend to perform these experiments.

      6)This is not such a major issue, but it would be good to see a comparison in SEC23A levels in siRNA knockdown condition in comparison to those when cells are grown on different substrates and in ROCK1, FEMRT2 knockdowns (blots of which authors already have in this manuscript).

      Reply: We will assess the level of SEC23A at the protein level for cells plated on matrigel or Fibroblast-derived ECM.

      \*Minor comments:***

      1)Scale bars are missing in EM images in Figure 2H.

      Reply: We will add the scales in our EM images

      2)Show molecular weight markers in Western blots in main figure 3E and supplementary figure S1E.

      Reply: We will add molecular weight markers in our Western-Blots

      Reviewer #2 (Significance (Required)):

      I have looked at the manuscript from through the lens of a cell biologist as that is predominantly my area of expertise. In that respect I find the screen conducted by authors particularly interesting as they aim to connect how extracellular cues regulate the secretory pathway. A screen seems justified as there is no comprehensive understanding linking the two above-mentioned processes. Authors have done a functional interaction screen and analyzed a lot of images to identify candidates which either increase or decrease secretory traffic in combination with SEC23A and SEC23B. Such a functional screen has helped authors identify candidates which were otherwise missed in single siRNA knockdowns in their previous work from 2012. This definitely opens up interesting avenues to test the candidates identified in the screen in different physiological contexts and in disease as also the transcriptional program connecting Focal adhesion signaling with the regulation of components governing secretion. Such functional interaction screens could also be employed to identify crosstalk of different cellular processes with the regulation secretory pathway at ER as well as at the Golgi apparatus.

      Reply: We thank reviewer #2 for the comments. As we mentioned in our reply to reviewer #1, we strongly believe that these results will encourage further research at the crossroads of membrane trafficking and mechanobiology.

      \*Referees cross-commenting***

      I agree with the comments from both the referees that the manuscript is very interesting, most experiments are well controlled, but the quantification of focal adhesion phenotype in knockdowns need to be done in an extensive manner and secretion phenotypes need to measured upon plating cells on different matrix to validate the model presented.

      Reply: These two experiments will be included in our revision

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      \*Summary***

      The authors use a synchronized cargo release assay following codepletion of either Sec23 paralog with cytoskeletal and associated proteins to identify potential functional interactions between COPII trafficking and the cytoskeleton. This screen yields a number of Sec23b functionally interacting molecules that stall cargo trafficking to various degrees within the secretory pathway upon codepletion, and in the case of MACF1 reduce ERES number despite not physically interacting. Depletion of the majority of the identified Sec23b functional interactors alone surprisingly caused the downregulation of Sec23a at the mRNA and protein levels, and cargo trafficking could be partially or fully rescued by Sec23a overexpression depending on the codepleted cytoskeletal factor. RNA-seq enrichment analysis and imaging of a focal adhesion marker suggest that genes involved in cell adhesion were differentially regulated following depletion of the cytoskeletal functional interactors. Finally, the authors show that Sec23a expression levels are reduced when cells are cultured on dishes with high amounts of ECM to induce focal adhesions, and that inhibition of focal adhesion kinase can rescue Sec23a expression levels.

      \*Major comments***

      #1 The authors successfully implicate a group of cytoskeletal proteins and their actions at focal adhesions in negatively regulating Sec23a expression levels and COPII trafficking. This description of a shared, novel mode of COPII transcriptional regulation by cytoskeletal factors is convincingly shown to be at least a contributor to the delayed trafficking in the presence of focal adhesions. In general, the data are reproducible and use appropriate statistical analysis. However, a more robust description of the architecture of early secretory pathway would be beneficial, especially in the case of MACF1 codepletion which cannot be fully rescued by Sec23a-YFP overexpression. In contrast, trafficking during codepletion of FERMT2 is fully rescued by Sec23a-YFP despite both MACF1 and FERMT2 showing similar loss of Sec23a mRNA levels upon codepletion. This data suggests that while the trafficking delay in FERMT2 codepletion might be exclusively due to reduced Sec23a expression levels, there are likely additional causes for the trafficking delay observed in MACF1 codepletion.

      Reply: We thank the reviewer for the appreciation of our results and the importance they might bear for the field. The reviewer has very neatly highlighted that each of our interactor hits might have roles in the secretory pathway beyond the ER or independent of the expression levels of SEC23A. This phenomenon could also explain the differential rescue of the arrival of VSVG at the plasma membrane upon SEC23A overexpression in FERMT2 and MACF1 knockdowns (FIG 3F). For instance, MACF1 has been involved in Golgi to Plasma Membrane transport as well (Kakinuma et al. Exp. Cell Res. 2004, Burgo et al. Dev. Cell 2012). So a possibility is that SEC23A overexpression rescues only ER to Golgi transport but the lack of rescue in the compartment between Golgi and plasma membrane independent of SEC23A expression levels would result in reduced rescue In the case of MACF1 compared to FERMT2. To support this, in our revised manuscript, we will provide example images from the experiment.

      Nonetheless, we agree that these are very important observations from Reviewer #3 and warrant a detailed discussion in the light of other interactors as well, which we intend to highlight in our revised manuscript.

      #2 While there is indeed a reduction in the number of ERESs following MACF1 codepletion, the authors report an even more dramatic reduction in 'transport intermediates / cell' as marked by COPI. However, as recent cyro-EM analysis of ERESs has definitively show, COPI exists stably at ERGIC membranes (1). Thus, an alternative possibility for the more dramatic reduction of COPI sites compared to Sec31a sites in Figures 2B-E is that ERGIC membranes are destabilized following MACF1 codepletion in a manner independent of Sec23a expression, and this destabilization compounds with reduced ERES number to ultimately delay trafficking. To more directly determine whether ERGIC membranes stability is regulated by MACF1, the authors should compare COPI and ERGIC-53 staining among MACF1 codepleted and FERMT2 codepleted cells with and without Sec23a-YFP overexpressed to levels that rescue cargo trafficking. If Sec23a-YFP restores the number of ERGIC puntae marked by these stains in FERMT2 but not MACF1 codepleted cells, it would suggest a role for MACF1 in forming or stabilizing ERGIC membranes which are known to associate with microtubules and WHAMM, an actin nucleator. Additionally, it would be useful to costain COPII with COPI or ERGIC-53 in control, MACF1 depleted, MACF1 codepleted, and MACF1 codepleted and Sec23a-YFP rescued cells to determine their colocalization. COPII and ERGIC membranes should be almost entirely coupled and juxtaposed in control cells and may be decoupled upon loss of MACF if plays a role in ERGIC membrane localization and stability. These proposed experiments are relevant because ERGIC membranes are sites of COPII cargo delivery and changes in ERGIC stability or localization would suggest an additional mechanism for cytoskeletal regulation of COPII trafficking. These immunofluorescence studies should be straightforward and completed in a few weeks.

      Reply: Although a possible additional role of MACF1 in the organisation of early secretory pathway, stability of ERES, etc., independent of the expression of SEC23A is interesting on its own, we believe that an extensive characterization of these possible roles/ pathways as proposed by the reviewer is beyond the scope this manuscript.

      #3 The choice to use VSVG and E-Cadherin for the synchronized release assays unfortunately convolutes interpreting the 'transport ratios' used by the authors to compare the effects of the various codepletions. Each protein progresses beyond the Golgi during secretion, and the authors choose to calculate the ratio of cargo intensity at the plasma membrane normalized to the total cellular cargo. This means that the synchronized release assays and calculated 'transport ratios' assay not only ER to Golgi trafficking, but also trafficking from the Golgi to the plasma membrane. In instances where Sec23a-YFP overexpression does not fully rescue the codepletion, it is possible that additional trafficking delays occur during Golgi to plasma membrane trafficking that cause the 'transport score' to decrease. Thus, the 'transport score' as the authors calculate it is needlessly nonspecific to COPII trafficking and should not be used to compare the codepletions for COPII functional interactors.

      Reply: We agree that the “transport score” used here and in our previous genome-wide screen (Simpson et. al Nat. Cell Biol. 2012) does not allow us to distinguish between the individual transport substeps in the transport of VSVG from the ER to the plasma membrane. However, as we see in Fig 1E, the proteins that we have decided to follow in more detail in this study do have a clear ER transport block phenotype (except for CRKL). So for 6 out of 7 of these proteins, the images clearly show that the decrease in the “transport score” is due to a decreased ER to Golgi transport.

      #4 To mitigate unwanted contributions of post-COPII trafficking events from altering 'transport scores,' the authors should use a cargo for synchronized release assays that does not progress past the Golgi such as α-Mannosidase II and quantify a ratio of the perinuclear cargo signal to whole cell signal. Ideally, the screen would be repeated with a more appropriate cargo generating new 'transport scores' for the full list of cytoskeletal proteins. However, this may not be feasible, and as such 'transport scores' based on a Golgi resident protein should at least be produced for the 7 Sec23b functional interactors featured in this manuscript. These Golgi 'transport scores' would add much needed quantification of ER to Golgi transport delays that currently can only be inferred from the representative images in Figure 1E, which unfortunately show significant heterogeneity among cells from the same image. The authors should also explicitly state that any 'transport score' from a synchronous release assay using a cargo destined for the plasma membrane will take into account trafficking rate changes due not only to COPII, but also COPI from the ERGIC to the Golgi, and transport carriers departing from the TGN. These synchronized release assays would likely take between a few weeks to a few months depending on their ability to automate image analysis.

      Reply: We consider that having a “Golgi transport score” won't add any new information as the proteins that we have chosen to follow are the ones that show a strong ER-block phenotype. However, we agree that such a “Golgi score” would indeed be useful if one would like to study other interactors, for instance, the ones that induce transport acceleration.

      Also, we don't expect all cells to behave similarly as the level of knockdown might be slightly different or because of the cell to cell variability. Even in control conditions (no knockdown), this heterogeneity is evident. As suggested by the reviewer, in our revised manuscript we will explicitly state that a change in the transport scores could mean a change in any sub-step of the transport from the ER to the PM in our assay.

      \*Minor comments***

      It would be useful for the authors to quantify the number of focal adhesions present from Vinculin stains from Figure 4C and 5C instead of just showing representative images. It would be interesting to determine if there is a meaningful relationship between focal adhesion number induced by the codepletions or tissue culture coating and Sec23a expression levels like in Figure 3D. Generally, the figures, text, and references were appropriate.

      Reply: As also pointed out by the other reviewers we will quantify the FA changes

      Reviewer #3 (Significance (Required)):

      In recent years, significant effort has been devoted to elucidating mechanisms by which COPII trafficking is modulated in response to cellular cues. These studies have revealed that changes in nutrient availability, growth factors, ER stress, autophagy, and T-cell activation all cause changes in COPII trafficking via unique gene expression, splicing, or post-translational control (2-7). This work elucidates a novel mechanism of transcriptional control driven by focal adhesions. Additionally, it provides a number of potentially useful Sec23a and Sec23b functional interactors among cytoskeletal factors for further study. These unexplored factors may have unique mechanism of COPII regulation that could contribute to our understanding ER export modulation. Altogether, this and similar works are building an increasingly complex set of regulatory pathways that when integrated ultimately dictate COPII trafficking kinetics.

      The reported findings are not only relevant to those who study COPII trafficking, but also other fields where secretion is studied in the context of the ECM. This work would suggest that secretion of factors involved in crosstalk between cells, including in tumors, is likely to be controlled by the interactions of cells with ECM.

      Reply: We thank reviewer #3 for the comments and insightful discussion about the limitations of our assay that we will highlight in the revised manuscript and in general for the insight into the early secretory pathway regulation. Furthermore their explicit summary of how our study could bridge COPII trafficking, ECM signaling and the relevance to various pathophysiologies is highly appreciated.

      Expertise keywords: cell biology, light microscopy, membrane trafficking

      References

      1.Weigel A V., Chang CL, Shtengel G, Xu CS, Hoffman DP, Freeman M, et al. ER-to-Golgi protein delivery through an interwoven, tubular network extending from ER. Cell. 2021 Apr;184(9):2412-2429.e16.

      2.Farhan, H., Wendeler, M. W., Mitrovic, S., Fava, E., Silberberg, Y., Sharan, R., Zerial, M., & Hauri, H. P. (2010). **MAPK signaling to the early secretory pathway revealed by kinase/phosphatase functional screening. Journal of Cell Biology, 189(6), 997-1011.

      3.Zacharogianni, M., Kondylis, V., Tang, Y., Farhan, H., Xanthakis, D., Fuchs, F., Boutros, M., & Rabouille, C. (2011). ERK7 is a negative regulator of protein secretion in response to amino-acid starvation by modulating Sec16 membrane association. **EMBO Journal, 30(18), 3684-3700.

      4.Lillmann, K.D., V. Reiterer, F. Baschieri, J. Hoffmann, V. Millarte, M.A. Hauser, A. Mazza, N. Atias, D.F. Legler, R. Sharan, et al 2015. **Regulation of Sec16 levels and dynamics links proliferation and secretion. J. Cell Sci. 128:670-682.

      5.Liu, L., Cai, J., Wang, H., Liang, X., Zhou, Q., Ding, C., Zhu, Y., Fu, T., Guo, Q., Xu, Z., Xiao, L., Liu, J., Yin, Y., Fang, L., Xue, B., Wang, Y., Meng, Z. X., He, A., Li, J. L., ... Gan, Z. (2019). Coupling of COPII vesicle trafficking to nutrient availability by the IRE1α-XBP1s axis. Proceedings of the National Academy of Sciences of the United States of America, 116(24), 11776-11785.

      6.Jeong, Y.-T., Simoneschi, D., Keegan, S., Melville, D., Adler, N. S., Saraf, A., Florens, L., Washburn, M. P., Cavasotto, C. N., Fenyö, D., Cuervo, A. M., Rossi, M., & Pagano, M. (2018). The ULK1-FBXW5-SEC23B nexus controls autophagy. ELife, 1-25.

      7.Wilhelmi, I., Kanski, R., Neumann, A., Herdt, O., Hoff, F., Jacob, R., Preußner, M., & Heyd, F. (2016). Sec16 alternative splicing dynamically controls COPII transport efficiency. Nature Communications, 7, 12347. https://doi.org/10.1038/ncomms12347

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      4. Description of analyses that authors prefer not to carry out

      Reviewer #3 suggested to robustly characterise the early secretory pathway, in response to the depletion of our interactors, for instance, the role of MACF1 in the organization and the stability of ERES. This view is also supported by reviewer #1. However, in our revised manuscript we would like to focus more on the novel aspect of our study (as highlighted by all the reviewers), namely how ECM signaling and changes in FAs affect SEC23A and possibly ER transport. For this, we would like to present a more quantitative outlook of the FA phenotype and concentrate on the transport experiments. The reason for not dwelling into a more extensive characterization of the early secretory pathway is that these experiments are very interesting on their own, and merit a separate study that would deconvolve in detail the individual trafficking steps, and their relation to SEC23A expression, ERES stability, and ECM signaling.

      Reviewer #2 suggested that to better characterize the FA phenotype and solve the apparent discrepancies between our data and the literature, we could test FAK phosphorylation. As we mentioned in our reply to this point, we think that most of the discrepancies arise from the different cell types used. Nevertheless, we agree that a quantitative approach is needed for a better characterisation of FA phenotype, therefore we intend to perform quantification of the vinculin stainings.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      The authors use a synchronized cargo release assay following codepletion of either Sec23 paralog with cytoskeletal and associated proteins to identify potential functional interactions between COPII trafficking and the cytoskeleton. This screen yields a number of Sec23b functionally interacting molecules that stall cargo trafficking to various degrees within the secretory pathway upon codepletion, and in the case of MACF1 reduce ERES number despite not physically interacting. Depletion of the majority of the identified Sec23b functional interactors alone surprisingly caused the downregulation of Sec23a at the mRNA and protein levels, and cargo trafficking could be partially or fully rescued by Sec23a overexpression depending on the codepleted cytoskeletal factor. RNA-seq enrichment analysis and imaging of a focal adhesion marker suggest that genes involved in cell adhesion were differentially regulated following depletion of the cytoskeletal functional interactors. Finally, the authors show that Sec23a expression levels are reduced when cells are cultured on dishes with high amounts of ECM to induce focal adhesions, and that inhibition of focal adhesion kinase can rescue Sec23a expression levels.

      Major comments

      The authors successfully implicate a group of cytoskeletal proteins and their actions at focal adhesions in negatively regulating Sec23a expression levels and COPII trafficking. This description of a shared, novel mode of COPII transcriptional regulation by cytoskeletal factors is convincingly shown to be at least a contributor to the delayed trafficking in the presence of focal adhesions. In general, the data are reproducible and use appropriate statistical analysis. However, a more robust description of the architecture of early secretory pathway would be beneficial, especially in the case of MACF1 codepletion which cannot be fully rescued by Sec23a-YFP overexpression. In contrast, trafficking during codepletion of FERMT2 is fully rescued by Sec23a-YFP despite both MACF1 and FERMT2 showing similar loss of Sec23a mRNA levels upon codepletion. This data suggests that while the trafficking delay in FERMT2 codepletion might be exclusively due to reduced Sec23a expression levels, there are likely additional causes for the trafficking delay observed in MACF1 codepletion.

      While there is indeed a reduction in the number of ERESs following MACF1 codepletion, the authors report an even more dramatic reduction in 'transport intermediates / cell' as marked by COPI. However, as recent cyro-EM analysis of ERESs has definitively show, COPI exists stably at ERGIC membranes (1). Thus, an alternative possibility for the more dramatic reduction of COPI sites compared to Sec31a sites in Figures 2B-E is that ERGIC membranes are destabilized following MACF1 codepletion in a manner independent of Sec23a expression, and this destabilization compounds with reduced ERES number to ultimately delay trafficking. To more directly determine whether ERGIC membranes stability is regulated by MACF1, the authors should compare COPI and ERGIC-53 staining among MACF1 codepleted and FERMT2 codepleted cells with and without Sec23a-YFP overexpressed to levels that rescue cargo trafficking. If Sec23a-YFP restores the number of ERGIC puntae marked by these stains in FERMT2 but not MACF1 codepleted cells, it would suggest a role for MACF1 in forming or stabilizing ERGIC membranes which are known to associate with microtubules and WHAMM, an actin nucleator. Additionally, it would be useful to costain COPII with COPI or ERGIC-53 in control, MACF1 depleted, MACF1 codepleted, and MACF1 codepleted and Sec23a-YFP rescued cells to determine their colocalization. COPII and ERGIC membranes should be almost entirely coupled and juxtaposed in control cells and may be decoupled upon loss of MACF if plays a role in ERGIC membrane localization and stability. These proposed experiments are relevant because ERGIC membranes are sites of COPII cargo delivery and changes in ERGIC stability or localization would suggest an additional mechanism for cytoskeletal regulation of COPII trafficking. These immunofluorescence studies should be straightforward and completed in a few weeks.

      The choice to use VSVG and E-Cadherin for the synchronized release assays unfortunately convolutes interpreting the 'transport ratios' used by the authors to compare the effects of the various codepletions. Each protein progresses beyond the Golgi during secretion, and the authors choose to calculate the ratio of cargo intensity at the plasma membrane normalized to the total cellular cargo. This means that the synchronized release assays and calculated 'transport ratios' assay not only ER to Golgi trafficking, but also trafficking from the Golgi to the plasma membrane. In instances where Sec23a-YFP overexpression does not fully rescue the codepletion, it is possible that additional trafficking delays occur during Golgi to plasma membrane trafficking that cause the 'transport score' to decrease. Thus, the 'transport score' as the authors calculate it is needlessly nonspecific to COPII trafficking and should not be used to compare the codepletions for COPII functional interactors.

      To mitigate unwanted contributions of post-COPII trafficking events from altering 'transport scores,' the authors should use a cargo for synchronized release assays that does not progress past the Golgi such as α-Mannosidase II and quantify a ratio of the perinuclear cargo signal to whole cell signal. Ideally, the screen would be repeated with a more appropriate cargo generating new 'transport scores' for the full list of cytoskeletal proteins. However, this may not be feasible, and as such 'transport scores' based on a Golgi resident protein should at least be produced for the 7 Sec23b functional interactors featured in this manuscript. These Golgi 'transport scores' would add much needed quantification of ER to Golgi transport delays that currently can only be inferred from the representative images in Figure 1E, which unfortunately show significant heterogeneity among cells from the same image. The authors should also explicitly state that any 'transport score' from a synchronous release assay using a cargo destined for the plasma membrane will take into account trafficking rate changes due not only to COPII, but also COPI from the ERGIC to the Golgi, and transport carriers departing from the TGN. These synchronized release assays would likely take between a few weeks to a few months depending on their ability to automate image analysis.

      Minor comments

      It would be useful for the authors to quantify the number of focal adhesions present from Vinculin stains from Figure 4C and 5C instead of just showing representative images. It would be interesting to determine if there is a meaningful relationship between focal adhesion number induced by the codepletions or tissue culture coating and Sec23a expression levels like in Figure 3D. Generally, the figures, text, and references were appropriate.

      Significance

      In recent years, significant effort has been devoted to elucidating mechanisms by which COPII trafficking is modulated in response to cellular cues. These studies have revealed that changes in nutrient availability, growth factors, ER stress, autophagy, and T-cell activation all cause changes in COPII trafficking via unique gene expression, splicing, or post-translational control (2-7). This work elucidates a novel mechanism of transcriptional control driven by focal adhesions. Additionally, it provides a number of potentially useful Sec23a and Sec23b functional interactors among cytoskeletal factors for further study. These unexplored factors may have unique mechanism of COPII regulation that could contribute to our understanding ER export modulation. Altogether, this and similar works are building an increasingly complex set of regulatory pathways that when integrated ultimately dictate COPII trafficking kinetics.

      The reported findings are not only relevant to those who study COPII trafficking, but also other fields where secretion is studied in the context of the ECM. This work would suggest that secretion of factors involved in crosstalk between cells, including in tumors, is likely to be controlled by the interactions of cells with ECM.

      Expertise keywords: cell biology, light microscopy, membrane trafficking

      References

      1.Weigel A V., Chang CL, Shtengel G, Xu CS, Hoffman DP, Freeman M, et al. ER-to-Golgi protein delivery through an interwoven, tubular network extending from ER. Cell. 2021 Apr;184(9):2412-2429.e16.

      2.Farhan, H., Wendeler, M. W., Mitrovic, S., Fava, E., Silberberg, Y., Sharan, R., Zerial, M., & Hauri, H. P. (2010). MAPK signaling to the early secretory pathway revealed by kinase/phosphatase functional screening. Journal of Cell Biology, 189(6), 997-1011.

      3.Zacharogianni, M., Kondylis, V., Tang, Y., Farhan, H., Xanthakis, D., Fuchs, F., Boutros, M., & Rabouille, C. (2011). ERK7 is a negative regulator of protein secretion in response to amino-acid starvation by modulating Sec16 membrane association. EMBO Journal, 30(18), 3684-3700.

      4.Lillmann, K.D., V. Reiterer, F. Baschieri, J. Hoffmann, V. Millarte, M.A. Hauser, A. Mazza, N. Atias, D.F. Legler, R. Sharan, et al 2015. Regulation of Sec16 levels and dynamics links proliferation and secretion. J. Cell Sci. 128:670-682.

      5.Liu, L., Cai, J., Wang, H., Liang, X., Zhou, Q., Ding, C., Zhu, Y., Fu, T., Guo, Q., Xu, Z., Xiao, L., Liu, J., Yin, Y., Fang, L., Xue, B., Wang, Y., Meng, Z. X., He, A., Li, J. L., ... Gan, Z. (2019). Coupling of COPII vesicle trafficking to nutrient availability by the IRE1α-XBP1s axis. Proceedings of the National Academy of Sciences of the United States of America, 116(24), 11776-11785.

      6.Jeong, Y.-T., Simoneschi, D., Keegan, S., Melville, D., Adler, N. S., Saraf, A., Florens, L., Washburn, M. P., Cavasotto, C. N., Fenyö, D., Cuervo, A. M., Rossi, M., & Pagano, M. (2018). The ULK1-FBXW5-SEC23B nexus controls autophagy. ELife, 1-25.

      7.Wilhelmi, I., Kanski, R., Neumann, A., Herdt, O., Hoff, F., Jacob, R., Preußner, M., & Heyd, F. (2016). Sec16 alternative splicing dynamically controls COPII transport efficiency. Nature Communications, 7, 12347. https://doi.org/10.1038/ncomms12347

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by Jung et al which based on a targeted siRNA screen, demonstrates regulation of SEC23A (component of the SEC23 complex of the COP coat) levels at transcriptional level downstream of focal adhesion signaling. By regulating siRNA mediated downregulation, the authors were able to identify proteins which either increased or decreased traffic of VSVG through the secretory pathway when combined with downregulation in the levels of with either SEC23A or SEC23B. Authors have focused on a group of SEC23B functional interactors, downregulation of which shows them increased size of focal adhesions which also downregulate SEC23A levels, thus providing an explanation for reduced secretory traffic. Authors further show that plating cells on fibronectin or Matrigel, which activate Focal adhesion kinase signaling also results in downregulation of SEC23A transcript levels. The screen is conducted in a well-controlled manner for most parts with a clear explanation of the analysis routines and the data presentation if of very good quality. Most important results have been validated by more than one experimental strategy which lends substantial confidence to the findings. The results also open further avenues for understanding the transcriptional regulation in different physiological and disease contexts.

      There are certain issues, which the authors should address with regards to controls and some conflicting observations with published results with respect the phenotypes associated with downregulating proteins on focal adhesions size. Additionally, authors don't tie the ends by monitoring secretory traffic in cells grown on different matrices but include it in the model. Addressing/explaining these issues could improve this manuscript and the model may have to be tweaked a bit.

      Major comments:

      1)I wonder why the authors only used siRNA control in their screen when the effects are scored in context of double knockdown fashion in combination with mild knockdown of SEC23A and SEC23B to get functional interactors. Control siRNA in combination with SEC23A and SEC23B should have been two ideal negative controls in the screen. Nevertheless, in data presented Figure 1E and whole of Figure 2, using control siRNA in combination with SEC23B siRNA would have been ideal control to show that the combination does not induce any trafficking defects which could impact the findings of the study. Hence, a few of the data presented from some of these figures should have sicontrol+SEC23B siRNA combination as a control.

      2)What is the identity of post-ER structures which authors refer to in Figure 2A? Could the images represent VSVG concentrated at ER exit sites? Authors should stain with markers for ERES to see if the VSVG puncta colocalize with it.

      3)Based on RNA sequencing results, authors chose to follow up on SEC23A levels in background of siRNA knockdown of components (like MACF1, ROCK1, FERMT2 etc.) which regulate Focal adhesions in cells and show that there is a reduction in both transcript and protein levels of SEC23A. In images shown in Figure 2B and Figure 2C, levels or SEC31A and β-Cop1 are reduced. Authors should test using qPCR and western blots whether there is a downregulation SEC31A, β-Cop1 and SEC23B in siRNA knockdowns of MACF1, ROCK1, FERMT2 etc. It would provide new insights if there were a co-regulation of secretory machinery to modulate the secretory traffic in response to Focal Adhesion based signaling.

      4)Most major concern in this manuscript surrounds around results presented in Figure 4C. Authors show that in response to all the knockdowns, they see more focal adhesions as monitored by Vinculin staining and this along with the experiments with cells plated on Matrigel and Fibronectin arrive at the conclusion that increased Focal adhesion signaling downregulates SEC23A levels which presumably modulates secretory traffic. I am not an expert on Focal adhesions but based on my understanding of the literature on that topic, downregulation of ROCK1, FEMRT2 disrupts focal adhesions. (See: Theodosiou et. al., Elife, 2016 or Lock et. al., Plos One, 2012 for example). How do authors explain their results in siRNA knockdown of ROCK1 and FEMRT2 which leads to an increased size of focal adhesions which seems contradictory to the published results? To clarify these results authors should test phosphorylation of FAK in their siRNA backgrounds which is another read out of focal adhesion signaling. The experiments from cells grown on Fibronectin and Matrigel favor the argument which authors put forth, but authors may have to tweak the model a bit based on FAK phosphorylation and FAK signaling in context of above-mentioned knockdowns.

      5)What happens to VSVG traffic or RUSH-Cadherin traffic when cells are plated on Matrigel and Fibronectin? Reduction in secretory traffic of these is an important experiment which is missing to close the loop and validate the model presented. Authors must test these experiments either with cells grown on matrix alone or in combination with siRNA to SEC23B. Authors should also monitor ERES and transport carriers in this background.

      6)This is not such a major issue, but it would be good to see a comparison in SEC23A levels in siRNA knockdown condition in comparison to those when cells are grown on different substrates and in ROCK1, FEMRT2 knockdowns (blots of which authors already have in this manuscript).

      Minor comments:

      1)Scale bars are missing in EM images in Figure 2H.

      2)Show molecular weight markers in Western blots in main figure 3E and supplementary figure S1E.

      Significance

      I have looked at the manuscript from through the lens of a cell biologist as that is predominantly my area of expertise. In that respect I find the screen conducted by authors particularly interesting as they aim to connect how extracellular cues regulate the secretory pathway. A screen seems justified as there is no comprehensive understanding linking the two above-mentioned processes. Authors have done a functional interaction screen and analyzed a lot of images to identify candidates which either increase or decrease secretory traffic in combination with SEC23A and SEC23B. Such a functional screen has helped authors identify candidates which were otherwise missed in single siRNA knockdowns in their previous work from 2012. This definitely opens up interesting avenues to test the candidates identified in the screen in different physiological contexts and in disease as also the transcriptional program connecting Focal adhesion signaling with the regulation of components governing secretion. Such functional interaction screens could also be employed to identify crosstalk of different cellular processes with the regulation secretory pathway at ER as well as at the Golgi apparatus.

      Referees cross-commenting

      I agree with the comments from both the referees that the manuscript is very interesting, most experiments are well controlled, but the quantification of focal adhesion phenotype in knockdowns need to be done in an extensive manner and secretion phenotypes need to measured upon plating cells on different matrix to validate the model presented.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Jung et al reports on an interesting finding that focal adhesion signaling regulates the expression of Sec23A and thereby regulates COPII-dependent trafficking. The data presented a mostly solid and the finding itself is highly novel, as it tackles an area of secretory trafficking that remains poorly understood, namely the connection between the ECM and secretion.

      I will list below all comments that I have mixing both technical and conceptual topics:

      Technical issues:

      1-The authors should provide a better description of how the designed this siRNA library. What were the inclusion criteria for these 378 genes? I might have missed it, but I could not find this information easily.

      2-Figure 2: I know this is challenging for EM images, but is there a way the authors could quantify these data? How many images were looked at? What was the average width of ER cisterne?

      3-Figure 4: I think that the characterization of the FA phenotype is a bit underdeveloped. There is no quantification of these data. Is the size of FA changing? Is the number of FA per cell changing? Is the length of FAs changing? I think that more work is needed to increase the confidence in these data. I could also not easily see what type of cells these are. A better description of this experiment is also required. Also, how many cells were analyzed. I think it is important that this experiment is done with a sufficient number of cells to increase the confidence in the data.

      Conceptual issues:

      1-The finding that focal adhesion signaling negatively affects ER-export is surprising, because cancer cells that grow on stiff substrates have more focal adhesions and are more invasive and migratory. Both migration and invasion are expected to depend on ER-export. Although the authors did not formally test Sec23A expression under different stiffnesses, I would expect that stiff substrates would lower Sec23A expression and thereby negatively affect ER-export. It would certainly increase the breadth of this work to include data like this and to also discuss this highly surprising finding. However, it is of course the decision of the authors and the editors to decide whether such an experiment would benefit the entire story.

      2-The authors postulate that this novel mechanism could be part of a feedback loop. If this were the case one would expect the acute effect of FA to increase ER-export (or secretion) and the negative feedback will then reduce secretion. However, the acute effect of FA is not addressed in this manuscript. In order to postulate a feedback loop, the authors would need to test the individual nodes of this loop.

      Significance

      I think that the basic finding of this manuscript is highly novel, by showing the impact of the ECM and focal adhesions on COPII-dependent trafficking. I think that this will not only appeal to people from the trafficking community, but also to people working on cell migration and on mechanobiology. The work in its current form does not require much extra efforts (max. 3 month). However, if the authors would decide to increase the breadth of data, they would require 3-6 months.

      Referees cross-commenting

      I went through the comments of the two other reviewers and agree with their verdict. Some extra work on the characterization of the early secretory pathway would be good. Both reviewers provided a nice catalogue of possible experiments to choose from.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons

      1. General Statements

      We want to thank all three reviewers for their positive feedback, constructive comments, and suggestions for clarity and improvement. We are delighted to find their consensus that the manuscript represents a contribution to the field.

      Accordingly, we made changes in the text (all highlighted in blue in the revised manuscript) and added a new figure as detailed in the point-by-point response.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors describe results of the comprehensive analysis of the prevalence and functionality of intrinsically disordered regions of the pathogen-encoded signaling receptor Tir, which serves as an illustrative example of the bacterial effector proteins secreted by Attaching and Effacing (A/E) pathogens. This is an interesting and important study that represents an impressive amount of data generated computationally and using a broad spectrum of biophysical techniques. The work serves as a model of the well-designed and perfectly conducted study, where intriguing conclusions are based on the results of the comprehensive experiments. The manuscript is well-written and concise, and I have a real pleasure reading it. The text and figures are clear and accurate.

      We thank the Reviewer for these positive comments on our work.

      Although, in general, prior studies are referenced appropriately, the authors should mention that the pre-formed structural elements they found in Tir are in line with the concept of "PreSMos" (pre-structured motifs) previously introduced and described in several important studies from the laboratory of Kyou-Hoon Han.

      We thank the Reviewer for this suggestion. We have added a sentence to acknowledge the presence of “PreSMos” in the target-free state of Tir as putative signatures for target-binding, referring to a review article summarizing several local structural elements in unbound IDPs:

      “This supports the presence of pre-structured motifs (PreSMos) as pre-existing signatures for target binding and function within target-free Tir (72)**.”

      Please, note that we decided to keep this discussion to a minimum, as we cannot rule out the contribution of the induced fit model to the binding mechanism (i.e., disorder-to-order transition upon binding).

      Reviewer #1 (Significance (Required)):

      Solid evidence is provided that structural disorder and short linear motifs represent common features of A/E pathogen effectors. In fact, using a set of bioinformatics tools, the authors first show that although prokaryotic proteins typically contain significantly less intrinsic disorder than eukaryotic proteins, A/E pathogen effectors are as disordered as eukaryotic proteins. Using the translocated intimin receptor (Tir) as a subject of focused study, the authors then utilized a number of biophysical techniques to draw an impressive picture of disorder-based functionality. This study clearly represents a major advancement in the field of functional intrinsic disorder in general and in disorder-based functionality of proteins expressed by pathogenic bacteria. This was adds significantly to the field and will have a noticeable impact.

      Again, reading this manuscript was a real joy. Finally, this work perfectly fits in the area of my expertise, since for the past 25 years or so I am working on the different aspects of intrinsically disordered proteins.

      Thank you for this encouraging assessment.

      **Referee Cross-commenting**

      I agree with the amended recommendation of reviewer #3 to add in the manuscript EPEC O127.

      According to the suggestion of Reviewer #3, we have now included EPEC O127:H6 in the manuscript.

      I completely agree with comments of reviewer #2 and partially agree with reviewer #3. In my view, comparison of various strains as references for EPEC represents an interesting but independent project. It can be recommended to the authors as one of the potential future developments of their work.

      Thanks for the suggestion. We are pursuing that line of research.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The general impression is that this is an excellent study that establishes

      The C-terminal intracellular region of Tir called C-Tir spanning residues 338 to 550 is largely disordered, however, observe helical structural elements involved with lipid interactions; multi-phosphorylation. The intracellular N-terminal part of Tir called N-Tir spanning residues 1 to 233 is also partially disordered but include a folded domain that is shown to assemble into a dimer

      The only major concern is that no SDS-PAGE gels or size exclusion chromatograms have been included to verify purity and monodispersed of the various constructs worked on. In particular, the SAXS and CD measurement is highly sensitive to purity, and the level of degradation as IDPs are notorious for being difficult to handle in solution. it would strengthen the arguments made based that

      We produced N-Tir and C-Tir as fusion proteins with a cleavable N-terminal thioredoxin tag (Trx-His6) and C-terminal Strep-tag. The latter allowed us to purify them via Strep-tag affinity chromatography as indicated by SDS-PAGE (please see Fig. S1).

      We agree with the Reviewer that even small amounts of impurities (i.e., higher oligomers/degradation) can interfere with the data analysis and make interpretation of the resulting data difficult and potentially misleading. So, to avoid such problems, all samples were purified in monodispersed forms by size-exclusion chromatography (SEC) before any biophysical study.

      Following the Reviewer's suggestion, we added a new supplementary figure (Fig. S5) showing the SEC-SAXS chromatogram profiles of C-Tir, N-Tir, and NS-Tir. Briefly, in the inline SEC-SAXS experiment, the sample eluates from an HPLC system directly and continuously into a BioSAXS flow cell for subsequent X-ray interrogation. Under our experimental conditions, C-Tir elutes as a single peak with Rg-values and mass compatible with a disordered monomeric protein, providing an excellent fit to the experimental SAXS curves. For N-Tir and NS-Tir, by SEC-SAXS, we separated the dimer from small amounts of high-order oligomers to yield the experimental SAXS curves of the pure dimers.

      “Fig. S5. SEC-SAXS chromatograms of (A) C-Tir, (B) N-Tir, and (C) NS-Tir. Each plane shows normalized total scattering intensity I(s), over the entire s range, from each frame acquired along elution volume and respective Rg-value (black circles). The flat variation of Rg reflects a pure monodisperse sample. The column type for size exclusion chromatography and sample concentrations are on the top left of each panel. For reference, the retention volume for monomeric BSA (66.4 kDa) is displayed by red triangles.”

      **Minor Comments**

      Read through the manuscript to remove passages with spoken language

      We thank the Reviewer for this suggestion. We went through the manuscript and improved the writing to reduce passages with spoken language.

      Line 263, "To do so", should be removed

      Line 290 "Our data thus" replaced with "this"

      We have amended the manuscript accordingly.

      Line 292 "lipid bilayers that might potentially fine-tune Tir's activity in the host cell." Weak sentence and the word fine-tune is slang. Rewrite the sentence. The interaction with lipids is fascinating!

      Thanks for the suggestion. The sentence has now been changed to “**This shows that C-Tir can undergo multivalent and tunable electrostatic interaction with lipid bilayers via pre-structured elements, suggesting that membrane-protein interplay at the intracellular side might control the activity and interactions of Tir in host cells.**”

      We also reinforce this fascinating message in the abstract by adding the sentence: “Membrane affinity is residue-specific and modulated by lipid composition, suggesting a previously unrecognized mechanism for interaction with the host.”

      Line 192 "In figure Fig. 3A," remove the Fig

      Fixed.

      Line 326, "In a similar fashion," is redundant. Rewrite the sentences below.

      We have modified the sentence as follows: “We evaluated whether the N-terminal cytosolic region of Tir (N-Tir; Fig S1) was also intrinsically disordered ...

      Line 342 add spaces between digit and SI unit "52kDa" there are more cases of this.

      Thank you for pointing this out. This has now been corrected to 52 kDa.

      Reviewer #2 (Significance (Required)):

      I expect this study to have broad relevance to microbiologists working with the intimin and translocated intimin receptor, in particular the lipid interaction is likely to be followed up by the community.

      We thank the reviewer for this comment. Indeed, we believe that further studies on Tir's lipid-binding ability as a novel molecular strategy in host-pathogen interactions, will potentially provide new insights on virulence, transmembrane signaling in general, and disorder-mediated functions.

      **Referee Cross-commenting**

      What reviewer 3 suggested in the comments sounds like added value and should be included.

      I agree with reviewer 1, that the strain comparison potentially is beyond the scope presented in this manuscript.

      We have now included EPEC O127:H6 in the manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      This interesting manuscript look at the structure of the Nter and Cter of the effector Tir from enteropathogenic E. coli. The authors confirmed previous study highlighting the "disordered" part of the Cter. However, the extended experimental work (NMR, Small-angle X-ray scattering and CD spectroscopy) from this study also reveals the connection between different area of Tir and its implication during Tir phosphorylation and its interactions with SH2 domain.

      We thank the Reviewer for this positive remark. Indeed, in our work, we highlight the structural features of the SH2-mediated interaction between Tir and host SHP-1 protein, and we also show that C-Tir is capable of lipid interaction via pre-structured motifs and that N-Tir is disordered but assembled into a dimer. Overall, we provide an updated and wide picture of Tir's intracellular side that goes beyond the scrutiny of previously described disorder features.

      **Major Comments:**

      The authors used E2348/69 (O127:H7) strain as a reference for EPEC. However, this strain are the least effectors of all the EPEC sequences and may over estimated the PDR in EPEC. It would be wiser to use a strain like B171 as a reference for EPEC to be able to conclude "Disordered Proteins (PDR) with long disordered regions occur in EPEC effectors similar to the human proteome". I believe that the PDR in EPEC is similar to EHEC and CR. I do not have any major concern for the rest of the work.

      We thank the Reviewer for this comment. So, to clarify, we amended “EPEC” with “EPEC O127:H6” in text and figures.

      We also added a paragraph at the beginning of the Discussion section to acknowledge that our prediction analysis concerns EPEC O127:H6 and two additional representative A/E bacteria strains:

      “Among the enteropathogenic Escherichia coli strains EPEC O127:H6 (E2348/69) is commonly used as a prototype strain to study EPEC biology, genetics, and virulence (69). Here, we have determined the structural disorder propensity of EPEC O127:H6 sequences and two additional representatives of A/E bacteria: EHEC O157:H7 and CR ICC168.

      Finally, the Reviewer suggests to include EPEC strain B171 (serotype O111:NM) in our analysis. We agree that considering additional strains would be of value, however we believe that this is beyond the scope of this manuscript, which mainly focuses on the characterization of the structural features of the E2348/69 Tir effector. We are currently working on a broader comparative analysis among different Escherichia coli pathogenic strains, including B171, and we hope to share our findings with the community in the near future.

      **Minor comments**

      Statistic problem: Mann Whitney U Test (Wilcoxon Rank Sum Test) is a comparison of two independent samples with the underlying assumption is normally distributed or that the samples were sufficiently large. It is not certain that any of this assumption is correct. In addition, the effector are part of the whole proteome. Can it be then considered that both groups are independent?

      We thank the Reviewer for this remark, which allows us to clarify the choice of this particular test. Indeed the Mann Whitney U-test is a non-parametric test to compare two samples with the alternative hypothesis being that one of the two samples is stochastically greater than the other. As it is a nonparametric test samples are not required to be normally distributed, as it is for the Student t-test.

      Regarding the independence of the samples, when comparing the effectors collections to their corresponding proteomes, we did exclude the effectors sequences from the latter. We have clarified this point in the Supplementary Material and Methods section.

      Line 120 and 442: O127 not H127

      Thank you for pointing this out. It has now been corrected to O127.

      Line 212: positions 409 or 405?

      Yes, it should be 405. Thank you.

      Reviewer #3 (Significance (Required)):

      **Nature and significance:**

      Tir plays a major role during EPEC infection. It is a signalling platform that has been reported to interact with multiple proteins. Whereas the extracellular part has been well characterised and crystallised, the intracellular part has been proven so far to be difficult to study. Over the last decade, no progress has been made to explain how Tir works. This manuscript provides interesting information that shade some light on how the protein could work.

      **Existing literature:**

      The last research manuscript trying to highlight the structural function of Tir dates from 2007 (PMC1896257). This study is far more extended and in depth than any other previous work done.

      **Audience:**

      the Audience may probably limited to researcher working on the field of cellular microbiology and the function associated with bacterial effector in the host. This study could be also a useful tool to identify new effectors base on their "disorder".

      We thank the Reviewer for recognizing the importance of this study. We agree that our work highlights the pivotal role of disordered regions in bacterial effectors, thus enabling a better understanding of the molecular mechanisms used by pathogens to subvert the host-cell processes. We indeed believe that our work can stimulate further research on the characterization of intrinsically disordered effectors, and also beyond the cellular microbiology field, in order to gain a broader knowledge on the molecular dialogue at the host-pathogen interface, which is essential to design better therapeutic strategies.

      **Expertise:**

      I have been working on A/E pathogens for the last 15 years with a particular interest in Tir signalling. My domain of expertise is more in relation to cell signalling than crystallography or structural study.

      **Referee Cross-commenting**

      I agree with both reviewers. My comment about EPEC is more about the conclusion for some of the figures. I don't think they should conclude for the whole EPEC. The Tir variation among EHEC O157:H7 is low, but it is far more diverse for EPEC. Simply adding in the manuscript EPEC O127 should be enough.

      We thank the Reviewer for this comment. As mentioned above, we now state in the manuscript, in both Results and Discussion sections, that we used E2348/69 as a representative strain for EPEC.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      This interesting manuscript look at the structure of the Nter and Cter of the effector Tir from enteropathogenic E. coli. The authors confirmed previous study highlighting the "disordered" part of the Cter. However, the extended experimental work (NMR, Small-angle X-ray scattering and CD spectroscopy) from this study also reveals the connection between different area of Tir and its implication during Tir phosphorylation and its interactions with SH2 domain.

      Major Comments:

      The authors used E2348/69 (O127:H7) strain as a reference for EPEC. However, this strain are the least effectors of all the EPEC sequences and may over estimated the PDR in EPEC. It would be wiser to use a strain like B171 as a reference for EPEC to be able to conclude "Disordered Proteins (PDR) with long disordered regions occur in EPEC effectors similar to the human proteome". I believe that the PDR in EPEC is similar to EHEC and CR. I do not have any major concern for the rest of the work.

      Minor comments

      Statistic problem: Mann Whitney U Test (Wilcoxon Rank Sum Test) is a comparison of two independent samples with the underlying assumption is normally distributed or that the samples were sufficiently large. It is not certain that any of this assumption is correct. In addition, the effector are part of the whole proteome. Can it be then considered that both groups are independent?

      Line 120 and 442: O127 not H127

      Line 212: positions 409 or 405?

      Significance

      Nature and significance:

      Tir plays a major role during EPEC infection. It is a signalling platform that has been reported to interact with multiple proteins. Whereas the extracellular part has been well characterised and crystallised, the intracellular part has been proven so far to be difficult to study. Over the last decade, no progress has been made to explain how Tir works. This manuscript provides interesting information that shade some light on how the protein could work.

      Existing literature:

      The last research manuscript trying to highlight the structural function of Tir dates from 2007 (PMC1896257). This study is far more extended and in depth than any other previous work done.

      Audience:

      the Audience may probably limited to researcher working on the field of cellular microbiology and the function associated with bacterial effector in the host. This study could be also a useful tool to identify new effectors base on their "disorder".

      Expertise:

      I have been working on A/E pathogens for the last 15 years with a particular interest in Tir signalling. My domain of expertise is more in relation to cell signalling than crystallography or structural study.

      Referee Cross-commenting

      I agree with both reviewers. My comment about EPEC is more about the conclusion for some of the figures. I don't think they should conclude for the whole EPEC. The Tir variation among EHEC O157:H7 is low, but it is far more diverse for EPEC. Simply adding in the manuscript EPEC O127 should be enough.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The general impression is that this is an excellent study that establishes The C-terminal intracellular region of Tir called C-Tir spanning residues 338 to 550 is largely disordered, however, observe helical structural elements involved with lipid interactions; multi-phosphorylation. The intracellular N-terminal part of Tir called N-Tir spanning residues 1 to 233 is also partially disordered but include a folded domain that is shown to assemble into a dimer

      The only major concern is that no SDS-PAGE gels or size exclusion chromatograms have been included to verify purity and monodispersed of the various constructs worked on. In particular, the SAXS and CD measurement is highly sensitive to purity, and the level of degradation as IDPs are notorious for being difficult to handle in solution. it would strengthen the arguments made based that

      Minor Comments

      Read through the manuscript to remove passages with spoken language

      Line 263, "To do so", should be removed

      Line 290 "Our data thus" replaced with "this"

      Line 292 "lipid bilayers that might potentially fine-tune Tir's activity in the host cell." Weak sentence and the word fine-tune is slang. Rewrite the sentence. The interaction with lipids is fascinating!

      Line 192 "In figure Fig. 3A," remove the Fig

      Line 326, "In a similar fashion," is redundant. Rewrite the sentences below.

      Line 342 add spaces between digit and SI unit "52kDa" there are more cases of this.

      Significance

      I expect this study to have broad relevance to microbiologists working with the intimin and translocated intimin receptor, in particular the lipid interaction is likely to be followed up by the community.

      Referee Cross-commenting

      What reviewer 3 suggested in the comments sounds like added value and should be included.

      I agree with reviewer 1, that the strain comparison potentially is beyond the scope presented in this manuscript.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors describe results of the comprehensive analysis of the prevalence and functionality of intrinsically disordered regions of the pathogen-encoded signaling receptor Tir, which serves as an illustrative example of the bacterial effector proteins secreted by Attaching and Effacing (A/E) pathogens. This is an interesting and important study that represents impressive amount of data generated computationally and using a broad spectrum of biophysical techniques. The work serves as a model of the well-designed and perfectly conducted study, where intriguing conclusions are based on the results of the comprehensive experiments. The manuscript is well-written and concise, and I have a real pleasure reading it. The text and figures are clear and accurate.

      Although, in general, prior studies are referenced appropriately, the authors should mention that the pre-formed structural elements they found in Tir are in line with the concept of "PreSMos" (pre-structured motifs) previously introduced and described in several important studies from the laboratory of Kyou-Hoon Han.

      Significance

      Solid evidence is provided that structural disorder and short linear motifs represent common features of A/E pathogen effectors. In fact, using a set of bioinformatics tools, the authors first show that although prokaryotic proteins typically contain significantly less intrinsic disorder than eukaryotic proteins, A/E pathogen effectors are as disordered as eukaryotic proteins. Using the translocated intimin receptor (Tir) as a subject of focused study, the authors then utilized a number of biophysical techniques to draw an impressive picture of disorder-based functionality. This study clearly represents a major advancement in the field of functional intrinsic disorder in general and in disorder-based functionality of proteins expressed by pathogenic bacteria. This was adds significantly to the field and will have a noticeable impact.

      Again, reading this manuscript was a real joy. Finally, this work perfectly fits in the area of my expertise, since for the past 25 years or so I am working on the different aspects of intrinsically disordered proteins.

      Referee Cross-commenting

      I agree with the amended recommendation of reviewer #3 to add in the manuscript EPEC O127.

      I completely agree with comments of reviewer #2 and partially agree with reviewer #3. In my view, comparison of various strains as references for EPEC represents an interesting but independent project. It can be recommended to the authors as one of the potential future developments of their work.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Author Response to Reviewer Comments

      Review Commons

      Manuscript number: RC-2021-00979

      Corresponding author(s): Horvitz, H Robert

      Reviewer #1:

      Major comments: The manuscript is very well written and results have been very clearly presented. The key conclusions drawn by the authors are convincing. However, one of the claims by the authors is not supported by the data. In lines 206-215 the authors discuss experiments where they visualized the morphology of the AIAs in ctbp-1 mutants where ctbp-1 expression is restored temporally in the L4-young adult stage using a heat-shock promoter construct. The authors conclude that "ctbp-1 can act ... in older worms to maintain aspects of AIA morphology in a manner similar to AIA gene expression." However, the data presented in Fig. 3I-L show no statistically significant difference between ctbp-1 mutants and mutants with the HS-construct, either with and without heat shock. Thus, although there seems to be some effect of the heat shock, this is not significant and thus does not support the conclusion of the authors. In addition, an important control is missing. How does the heat shock affect the morphology of AIAs in wt or ctbp-1 animals, without the hs-construct?

      We agree with this comment and have updated the manuscript to clarify that suggestion of the activity of CTBP-1 in preventing further disruption of AIA morphology is speculative. We will conduct the suggested control experiment and include the results in a revised version of the manuscript.

      Apart from the above, all strong claims by the authors are valid. In addition, the authors suggest a mechanism, where CTBP-1 regulates the function of the EGL-13 transcription factor in AIA and that overexpression of CEH-28 in AIA contributes to the olfactory adaptation defect observed in the ctbp-1 mutant animals. These mechanistic speculations could be relatively easily strengthened by two additional experiments. One, does ctbp-1 loss of function affect egl-13 expression? The model presented in Fig 8 suggests that egl-13 expression levels are not affected, but from the data in the paper it is not even clear of egl-13 is expressed in AIA. Whether egl-13 is expressed in AIA, and if its expression levels are affected by mutation of ctbp-1 could be tested using egl-13::gfp expressing animals.

      This is an excellent suggestion and experiments we had been attempting already. We will include findings from these experiments once they are complete in a revised version of the manuscript.

      Two, does overexpression of ceh-28 cause an olfactory adaptation defect? This could be tested by cell specific overexpression of ceh-28 in AIA.

      This is also a great suggestion. We will conduct this experiment and include the findings in a revised manuscript.

      The data and the methods have been presented in such a way that they can be reproduced. I do have some doubts with regard to the statistical analysis. The authors report that statistical analysis involved unpaired t-tests. But as all results involve the analysis of data from 3-5 different strains, a multiple sample analysis should be used. To correct for the number of samples, one should first use an ANOVA to test for statistical differences, followed by a post hoc analysis to identify those that are significantly different.

      We agree with this criticism. We have replaced instances of multiple sample analyses with a one-way ANOVA test followed by Tukey’s multiple test correction. The current version of the manuscript reflects these changes in figures, figure legends and in the Materials and Methods.

      Reviewer #2: \*Major comments:**

      1. The paper is well written and figures are clearly organized. The authors made suitable conclusions based on the data provided. Materials and methods are appropriately described for reproductivity.*

      We agree and are currently attempting such experiments. Meaningful results from these experiments will be included in a revised manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): \*Major comments:**

      • The key conclusions of this manuscript are highly convincing and are supported by multiple mutant alleles and rescue experiments.*

      *

      • There are certain claims in the manuscript that need to be clarified (detailed below).*

      *

      • No additional experiments are essential to support the claims of the paper.*

      - Most of the data and the methods presented well - however a Table listing genes identified in the AIA-specific RNA Seq is required. The GEO accession number has been made available for the RNA Sequencing data however listed the genes identified would aid the reader. Were ctbp-1 and egl-13 shown to be expressed in the AIAs using this approach?

      We have included such a table, replacing Fig. S6 (which previously showed only ceh-28 expression) with a table listing expression of all confirmed hits from the scRNA-Seq experiment. ctbp-1 and egl-13 were also found to be expressed in the AIA neurons in this scRNA-Seq experiment.

      - No evidence is presented that EGL-13 is expressed in the AIAs?

      As noted above, the scRNA-Seq experiment showed egl-13 expression in the AIAs. We also will assay egl-13 expression in the AIAs using a GFP reporter and include the results in a revised manuscript.

      - Can the authors comment and include in the manuscript information regarding whether the promoters of AIA-expressed genes that are regulated by EGL-13 contain EGL-13 binding sites? Also, are the promoters of AIA-expressed genes not regulated by EGL-13 missing these sites?

      We have added such information to the manuscript. Briefly, our analysis identified no promising candidates for EGL-13 binding sites in the promoter regions of either ceh-28 or acbp-6, suggesting that regulation of these by EGL-13 is likely indirect. Further, no previous work has indicated that either of these genes is regulated directly by EGL-13, although in the case of acbp-6 little is known about this gene or the ways in which it is regulated. However, the claim that EGL-13 regulates expression of acbp-6 and ceh-28 indirectly is speculative and is not a conclusion of this current work.

      - Experiments and statistical analysis are adequate.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript, Saul et al. found that the CTB-1 transcriptional co-repressor acts cell-autonomously to maintain aspects of AIA neuronal fate, morphology and function. They found that CTBP-1 utilizes the Sox transcription factor EGL-13 to transcriptionally repress specific genes in the AIA neurons. This work proposes that CTBP-1 and other co-repressors play critical roles in selectively maintaining or repressing expression of specific genes.

      Major comments:

      • The key conclusions of this manuscript are highly convincing and are supported by multiple mutant alleles and rescue experiments.
      • There are certain claims in the manuscript that need to be clarified (detailed below).
      • No additional experiments are essential to support the claims of the paper.
      • Most of the data and the methods presented well - however a Table listing genes identified in the AIA-specific RNA Seq is required. The GEO accession number has been made available for the RNA Sequencing data however listed the genes identified would aid the reader. Were ctbp-1 and egl-13 shown to be expressed in the AIAss using this approach?
      • No evidence is presented that EGL-13 is expressed in the AIAs?
      • Can the authors comment and include in the manuscript information regarding whether the promoters of AIA-expressed genes that are regulated by EGL-13 contain EGL-13 binding sites? Also, are the promoters of AIA-expressed genes not regulated by EGL-13 missing these sites?
      • Experiments and statistical analysis are adequate.

      Minor comments:

      I list below a number of changes and typographical errors that will improve the manuscript.

      Page 11 Line 235 - the authors state that ctbp-1 L4s have an increased attraction to butanone. As the chemotaxis index is 0 for the ctbp1- mutant compared to -0.5 in WT I understand what the authors mean hear but the statement of "increased attraction" suggests that ctbp1- mutants are attracted to butanone when they are actually ambivalent to it.

      Page 12 Line 248 - change functioning to functional

      Page 18 Line 397 - it would be helpful to the reader if the authors referred back to the ctbp1- mutant data (Figure 5) for comparison in Fig 7D.

      Page 19 Line 404 - remove the word causally

      Page 19 Line 414 "However, while conditioned ctbp-1 ceh-28 double mutants appeared similar to both the wild type and ctbp-1 single mutants at the L1 stage (Fig. 7I-J), these double mutants displayed an intermediate phenotype between wild-type and ctbp-1 animals for adaptation at the L4 larval stage (Fig. 7K-L).

      This sentence is confusing as the ctbp-1 ceh-28 phenotype is not significant different to the ctbp-1 single mutant.

      Page 50 Line 1001 - change mlg-1 to mgl-1

      Figure 7A-C - please label with the genotype examined.

      Significance

      • This work identifies a function for the transcriptional corepressor CTBP-1 in controlling the expression of a subset of genes in the AIA neurons. It suggests that CTBP-1 may play a similar role in controlling subsets of gene expression in diverse neuronal classes. This would be interesting to examine in single cell sequencing experiments of all C. elegans neurons.
        • This work adds to the literature that describes CTBP-1 functions in the C. elegans nervous system. It also speculates that other transcriptional co-repressors play similar functions in other cells and tissues in other organisms.
        • An audience with interests in cell fate determination and the function of specific gene regulatory modules that control subsets of genes within a cell.
        • My field of expertise is C. elegans neurobiology (axon guidance and cell fate) and I am therefore well-qualified to review this manuscript.

      Referee Cross-commenting

      Comments from other reviewers are fair. I am happy with the overall conclusions.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this paper, the authors identified several mutations from a forward genetic screen in the transcriptional corepressor gene ctbp-1 that cause mixexpression of a M4 neuronal marker in the two AIA interneurons in C. elegans. ctbp-1 mutant AIA neurons also display a defect in morphology and sensory function. The penetrance and severity of these defects in gene expression, morphology, and function progressively increase with age. Their data suggests that ctbp-1 acts cell-autonomously and in older worms to maintain gene expression, morphology, and function in AIA neurons. Single-cell RNA sequencing was performed to identify changes in AIA transcriptional profiles between wild type and ctbp-1 mutants. Using the data from AIA transcriptional profiles, they showed that ctbp-1 mutant AIA neurons lose the expression of two genes characteristic of the adult AIA while misexpress at least two genes uncharacteristic of AIA. Taken together, their findings demonstrate that ctbp-1 acts to maintain the AIA identity at the level of gene expression, morphology, and function, while ctbp-1 does not act to establish the AIA cell identity. Furthermore, the authors identified a few mutations of a SOX family transcription factor gene egl-13 from a froward genetic screen that suppress the ctbp-1 mutant phenotype. The authors conclude their results that ctbp-1 maintains AIA function and some aspects of AIA gene expression by antagonizing egl-13 function and that ctbp-1 maintains AIA morphology through pathways independently of egl-13.

      Major comments:

      1. The paper is well written and figures are clearly organized. The authors made suitable conclusions based on the data provided. Materials and methods are appropriately described for reproductivity.
      2. It would strengthen the model (Figure 8) by testing physical interaction between CTBP-1 and EGL-13 in AIA using BiFC.

      Minor comments:

      1. The authors mentioned a previous finding that the mammalian ortholog of EGL-13, SOX6, interacts with the mammalian ortholog of CTBP-1, CtBP2. The authors should also discuss the function of interacting SOX6 and CTBP-1 in mammalian systems.
      2. It would be good to increase the font size of some figures and tables for easier reading.

      Significance

      This study identifies roles of conserved transcriptional corepressor CTBP-1 and a SOX family transcription factor gene egl-13 from unbiased forward genetic screens in the maintenance of AIA interneurons in C. elegans.

      Since CTBP-1 and EGL-13 have mammalian orthologs, although the roles of their mammalian orthologs were not discussed, this study may have broad implications for development in a range of organisms.

      The findings of this study will be of interest to a broad audience in the field of developmental biology, particularly in transcriptional regulation of cell identity maintenance.

      I have expertise in transcriptional regulation of sensory neuron diversification using C. elegans as a model. I am comfortable about evaluating this manuscript.

      Referee Cross-commenting

      I agree with the comments from reviewers 1 and 3.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Saul et al identify the transcriptional corepressor ctbp-1 as a regulator of ceh-28::gfp expression in the AIA neurons of the nematode C. elegans. They find 18 independent mutants in this gene, including several presumptive null alleles. Using cell specific rescue and temporal expression of the ctbp-1 gene, the authors show that ctbp-1 acts cell autonomously in AIA to regulate ceh-28 expression, and can do so in young adult animals. Next, using various reporters they show that the AIAs do not transdifferentiate to M4-like cells, but the AIAs do show morphological defects, which increase with age of the animal. Using behavioral experiments the authors next determine the functionality of the AIAs in the ctbp-1 mutant animals. They find that loss of ctbp-1 in AIA affects the function of the AIAs and that ctbp-1 does so on young adult animals. The authors conclude the characterization of the AIAs of ctbp-1 mutant animals by identifying several other genes whose expression is misregulated in ctbp-1 animals, using a single cell RNAseq experiment, confirmed using gfp-fusion constructs. These experiments identity one other gene, acbp-6, that is misexpressed in the AIAs of L4 ctbp-1 animals and 2 genes, sra-11 and glr-2 that are normally expressed in AIA, but not in ctbp-1 animals.

      To find out how ctbp-1 regulates gene expression in AIA, the authors perform a genetic suppressor screen and show that loss of function of egl-13 suppresses the ceh-28::gfp misexpression in AIA in ctbp-1 mutants. They show that egl-13 functions cell-autonomously in the AIAs. They find it does not suppress the morphological defects of the AIAs in ctbp-1 mutants, but it does suppress the effect of ctbp-1 loss of function on olfactory adaptation. In addition, mutation of egl-13 suppressed the misexpression of acbp-6, but not that of sra-11 and glr-2. Finally, the authors show that the olfactory adaptation defect observed in ctbp-1 mutant animals can be partially suppressed by inactivating ceh-28 suggesting that the behavioral defect is caused in part by overexpression of ceh-28.

      The manuscript is very well written and results have been very clearly presented. The key conclusions drawn by the authors are convincing. However, one of the claims by the authors is not supported by the data. In lines 206-215 the authors discuss experiments where they visualized the morphology of the AIAs in ctbp-1 mutants where ctbp-1 expression is restored temporally in the L4-young adult stage using a heat-shock promoter construct. The authors conclude that "ctbp-1 can act ... in older worms to maintain aspects of AIA morphology in a manner similar to AIA gene expression." However, the data presented in Fig. 3I-L show no statistically significant difference between ctbp-1 mutants and mutants with the HS-construct, either with and without heat shock. Thus, although there seems to be some effect of the heat shock, this is not significant and thus does not support the conclusion of the authors. In addition, an important control is missing. How does the heat shock affect the morphology of AIAs in wt or ctbp-1 animals, without the hs-construct?

      Apart from the above, all strong claims by the authors are valid. In addition, the authors suggest a mechanism, where CTBP-1 regulates the function of the EGL-13 transcription factor in AIA and that overexpression of CEH-28 in AIA contributes to the olfactory adaptation defect observed in the ctbp-1 mutant animals. These mechanistic speculations could be relatively easily strengthened by two additional experiments. One, does ctbp-1 loss of function affect egl-13 expression? The model presented in Fig 8 suggests that egl-13 expression levels are not affected, but from the data in the paper it is not even clear of egl-13 is expressed in AIA. Whether egl-13 is expressed in AIA, and if its expression levels are affected by mutation of ctbp-1 could be tested using egl-13::gfp expressing animals.

      Two, does overexpression of ceh-28 cause an olfactory adaptation defect? This could be tested by cell specific overexpression of ceh-28 in AIA.

      These are relatively simple experiments that would not take much time or investments, but would strengthen or clarify the model presented.

      The data and the methods have been presented in such a way that they can be reproduced. I do have some doubts with regard to the statistical analysis. The authors report that statistical analysis involved unpaired t-tests. But as all results involve the analysis of data from 3-5 different strains, a multiple sample analysis should be used. To correct for the number of samples, one should first use an ANOVA to test for statistical differences, followed by a post hoc analysis to identify those that are significantly different.

      Minor comments:

      Page 7, in the heat shock rescue experiment that authors conclude that ctbp-1 acts "in older worms" to prevent expression of ceh-28 in AIA. "Older" is quite unspecific. Please be specific, i.e. in L4-young adult animals. The same applies to various other phrases where "older" worms are mentioned. Line 229, the authors state that animals were "briefly starved". Please be precise and indicate how long the animals were starved.

      Significance

      Most studies that address cell fate, focus on the first phase where cell fate is determined. How cell fate is maintained is far less well understood. This manuscript convincingly identifies two transcription regulators that are important for cell fate maintenance, both a transcriptional repressor and an activator. The manuscript provides first clues as to how this process functions, and as such provides important conceptual insights. These not only apply to the worm, C. elegans, but as these are strongly conserved proteins, probably also provide a firm basis for our understanding of cell fate maintenance mechanisms in higher organisms including mammals. In addition, this study reports an excellent model that can be used to further unravel this mechanism. As such, I expect that this manuscript will be of interest to a broad range of scientists, interested in cell fate determination and maintenance and transcriptional control.

      My expertise lies in C. elegans behavior, where we focus on identification of the molecular and cellular mechanisms that allow C. elegans to respond to its environment even in changing circumstances. In addition, we study the mechanisms of cell fate determination and maintenance in C. elegans sensory neurons.

      Referee Cross-commenting

      I agree with the comments of reviewers 2 and 3.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      Authors developed a novel primer/probe set for detection of subgenomic (sgE) transcripts for SARS-CoV-2 with the aim to develop a system that may predict the presence of infectious virus in patient samples. After studying the specificity and sensitivity of their system, they compared it with already validated/published systems for diagnostic of SARS-CoV-2 infection. Interestingly, they also studied the effect of the conditions of isolation. They showed Vero E6 expressing TMPRSS2 (Vero E6-TMPRSS2) to be more sensitive to infection than Vero E6, allowing a higher number of isolation from patient samples. They also showed their system to be more sensitive than a previously published sgE system as well as than a negative-strand RNA assay but less sensitive than the WHO/Charité primer/probe set. Anyway, all samples containing infectious particles (successful virus isolation on Vero E6-TMPRSS2) were detected with their primer/probe system contrary to the other tested sgE assay. They showed the negative strand assay to be unlikely to detect virus genetic material in samples which nevertheless contain infectious particles.

      **Major comments:**

      Are the key conclusions convincing?

      I salute the intention of the authors to try to fix cut-off values for infectious patients but I would be more careful on the assertion of "using a total viral RNA Ct cut-off of >31 or specifically testing for sgRNA can serve as an effective rule-out test for viral infectivity". It is true that in this study, virus was not isolated from any of the samples below a Ct of 31 or negative in the developed sgE assay but all those assays are done on cell culture. We do not know how the transmission could occur for those samples from human to human. Being able to fix a cut-off in Ct value for a define PCR/RT-PCR system would be a great improvement for SARS-CoV-2 infected patient having to stay in quarantine. It is even more important for Ebola positive patients in Africa who has to stay in quarantine in precarious conditions under tents, warm temperatures and without privacy for long period because they still positive by RT-PCR. Unfortunately, fix those values would need a very high number of experiments, including animal experiment.

      We appreciate the reviewer’s acknowledgment of the significance of this issue. We agree that in vivo animal experiments to more precisely determine the lowest infectious or transmissible dose would be valuable. But such experiments are outside the scope of the current study. To acknowledge the reviewer’s important point regarding the unavoidable limitations of cell culture systems, we have modified the abstract (line 51) to say “an effective rule out test for the presence of culturable virus,” a conclusion that is fully supported by our data.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      No

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      No

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Yes.

      Are the data and the methods presented in such a way that they can be reproduced?

      Kinetic of SARS-CoV-2 (figure 2):

      The method is not detailed in the Methods part and is not clear in the figure legend. When supernatant are collected, is it all the supernatant that is remove? An aliquot? If aliquot, do you replace with new medium?

      We apologize for this omission and have included the requested details in the methods. We seed a separate well for each time point and collected the entire supernatant for a given time point, rather than replacing media. We added the following text to the methods section (lines 402-412): “Viral growth kinetics were measured in Vero E6 or Vero E6 TMPRSS2 cells at an MOI of 0.001. Separate wells were seeded for each time point, and growth curves were conducted in technical duplicates for each biological experiment. Supernatants and cell lysates were collected twice daily 1 & 2 dpi, and again on 3, 4, 7 and 8 dpi (Vero E6 TMPRSS2 cells were harvested for the final time at day 7 due to faster growth kinetics in this cell type). For each time point, the supernatant was removed and clarified to remove cellular debris, before being split into separate aliquots for RNA extraction (mixed 1:1 with AVE lysis buffer) and viral titration (by focus assay). Dead cells/debris that was pelleted after clarifying supernatants was combined with cells scraped from each well into PBS and spun again to obtain a pellet of all cell material from each timepoint. This pellet was then lysed in AVE viral lysis buffer for RNA extraction.”

      Stability of infectious SARS-CoV-2:

      I am very surprise by your results on stability of cultured virus, knowing we observed a decreased of SARS-CoV-2 titer in our lab after freezing/thawing steps. Do you freeze cell supernatant directly or do you prepare your samples another way? Please state it in the Methods part

      We measured the stability after freeze/thaw for our normal high concentration viral stocks. Our viral stocks are grown in DMEM with 10% FBS, 1% HEPES, 1% pen/strep, and clarified before use. It is possible that lab-lab variation in the media components or HEPES concentration used to prepare viral stocks explains the differences seen in our work vs the reviewer’s lab. We have added the following additional detail to the methods section (lines 415-418) of the manuscript to clarify how these experiments were performed: “High concentration viral stocks (prepared as above in DMEM, 10% FBS, 1% HEPES, 1% pen/strep) were used to measure viral stability over time and after multiple freeze-thaw cycles. Stocks were stored at the indicated temperatures in the dark and aliquots were removed at the indicated days or after each freeze-thaw cycle for measuring infectious virus by focus assay.”

      Are the experiments adequately replicated and statistical analysis adequate?

      Yes

      **Minor comments:**

      Specific experimental issues that are easily addressable.

      Figure 2C and D: Instead of Ct values in cells, it would be more relevant to normalize these results with an endogenous gene and present results as fold change to mock-infected cells. Because you affirm that the level of RNA decline than stay stable over the time but you also note there is CPE. If you have less cells but same level of viral RNA, it means you have an increase in the RNA level in alive cells.

      We have measured the GAPDH level in these cells over time, and that data is included as gray lines in Fig 2 C&D (see new figure 2). As we are combining the cell pellet from clarified supernatants with the cells that remain adherent to the dish for each harvested timepoint we expect to be harvesting the majority of cells/cell debris for each time point. The levels of GAPDH remain broadly similar over the viral growth curve, with no drop in RNA levels.

      It would have been interesting to have the results of isolation at different time-point of treatment for patient samples (figure 3A and B) to see if the virus is stable in samples

      We have access to only limited volume (several hundred µl) of residual patient sample which would make it technically challenging to compare multiple days of storage conditions/ temperatures. Unfortunately, we do not have any remaining sample volume for the specimens used in this study, and so we are unable to perform additional isolations at other times/temperatures. While we agree this would be an interesting line of future inquiry, we feel it is outside the scope of the current study.

      Are prior studies referenced appropriately? Yes

      Are the text and figures clear and accurate?

      Yes.

      Line 140: "this delay in virus and RNA production". You do not talk about RNA yet...

      We have removed “and RNA” from this sentence and replaced with “infectious virus production”.

      Line 156 to 163: sgE RNA detected in cell free supernatant. Can't it come from lysed cells?

      We have replaced “cell-free” with “clarified”.

      Line 167: "...virus in cell culture time course experiment in TMPRRS2 expressing cells (fig.2)"

      We have modified this text to read according to the Reviewer’s suggestion.

      Ligne 258: Fig 6A and B

      We have added the missing reference to Fig 6B as requested.

      -Do you have suggestions that would help the authors improve the presentation of their data and conclusions? No

      Reviewer #1 (Significance (Required)):

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This new primer/probe system will participate to the accurate diagnostic of SARS-CoV-2. The comparison with the existing methods is relevant to highlight the strengths and weaknesses of each system. Comparison of isolation of SARS-CoV-2 on commonly used Vero E6 with Vero E6-TMPRSS2 will lead to a great improvement of the isolation method for SARS-CoV-2.

      We appreciate the Reviewer’s assessment of the significance of our study and the improvement in our isolation method compared to the existing standard of using Vero E6 cells.

      Place the work in the context of the existing literature (provide references, where appropriate).

      Properly done in the introduction of the paper.

      State what audience might be interested in and influenced by the reported findings.

      Diagnostic laboratories

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Virology, Molecular Biology, cell biology

      Not enough expertise to evaluate ROC data/analysis

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Bruce et al present a new RT-PCR assay with primer sets that specifically detect sgE RNA from SARS-CoV2 samples. The authors compare this assay to other diagnostic assays in an effort to identify assays capable of correlating RNA detection with culturable virus (i.e. infectious virus). While this new assay identified 100% of culturable isolates, only 56% of isolates testing positive actually had culturable virus. Compared with other assays, the WHO total E RNA assay had better parameters when used at a cutoff Ct value of 31 (PPV of 61%). Overall, this manuscript provides a novel primer probe set for RT-PCR diagnostic assay and conducted comparisons with other assays on the same clinical samples. There are some areas that the authors should address prior to publication.

      **Major comments:**

      The authors repeatedly tout VeroE6 TMRSS2 cells as supporting higher viral infection. Therefore, the authors should address why one clinical isolate (E16) was culturable in VeroE6 but not VeroE6 TMRSS2. Was this experiment repeated multiple times? What are the reasons for this discrepancy?

      We did not have sufficient residual sample volume to repeat isolation attempts of any clinical specimen, so we are limited to a single data point for each cell line. It is possible that this sample had levels of infectious virus at the limit of detection, and stochastic probability meant infectious virus was only present in the aliquot used to infect the Vero E6 (rather than Vero E6-TMPRSS2) cells. It is also possible that viral adaptation/evolution occurred in the VeroE6 well that allowed this virus to successfully grow, but we do not have sequencing data or remaining nucleic acids to test this theory.

      The authors' argument at lines 166-169 is not supported by the data in Fig. 2. The levels of viral RNA between VeroE6 and VeroE6 TMRSS2 appear to show similar trends in the supernatant across the time course but the infectious viral levels are dramatically different. This discordance between FFU levels and RNA levels cannot be explained by instability of viral particles alone. Have the authors looked into differences in viral particles produced from these two cell lines? The authors should collect virus particles from these two cell lines and conduct the stability experiment in Fig 2D to directly test the hypothesis that indeed the drop seen in FFU in VeroE6 TMRSS2 is due to instability.

      We apologize for the confusion. We did not intend to make claims about differences in particle stability as a result of the cell line used for viral production, but rather to highlight a general observation that RNA was more stable than infectious virus. This is more obvious in the TMRPSS2 cell line, as replication is faster and more synchronized than in Vero E6 cells (the TMRPSS2 cells are largely dead by day 4, whereas infection progresses more slowly in Vero E6 cells so that new virions continue to be produced during the measured time period). We have added clarifying text at line 167-169, “We observed that SARS-CoV-2 RNA species persist for much longer than infectious virus in cell culture time course experiments, a feature that was most obvious in Vero E6 TMRPSS-2 cells due to their viral kinetics but is likely not cell specific (Fig 2).”

      The evidence for the packaging of sgE RNA into virions is weak. GAPDH detection by PCR is not a proof that the concentration process did not pellet RNA nonspecifically. First, the authors should provide ample information about viral isolation process at line 379 including rotor, centrifuge and speed utilized. In addition, ribosomes typically stay intact following viral lysis (and can be found in supernatant after release from dead cells). Actively translating ribosomes can contain sgE RNA as well. The authors should consider detecting ribosomal RNAs in their samples to rule out the possibility of contaminating ribosomes. In addition, the authors should strongly consider repeating the experiment with high EDTA concentration to break up ribosomes and only pellet virions.

      We have added additional experimental details (rotor, centrifuge and speed) describing how the viral concentration step was performed (line 389-394), “Viral RNA (courtesy of David Bauer, The Francis Crick Institute, UK) from concentrated SARS-CoV-2 (England02 strain, B lineage ‘Wuhan-like’) was obtained by clarifying viral supernatants (2 x 4000 rpm for 30 mins at 4°C in a Beckman Allegra X-30R centrifuge with a SX4400 rotor), overlaying clarified media onto a 30% sucrose/PBS cushion (1/4th tube volume) and concentrating by ultracentrifugation in a Beckman ultra XPN-90 centrifuge with SW32TI rotor for 90 min at 25,500 rpm at 4°C. Pellets were then resuspended in buffer and extracted with TRIzol LS.” We thank the reviewer for their suggestion of including an additional control, and we have added an 18S primer-probe set (see new Figure 8). This data, while not as pronounced as the GAPDH control, suggests that the ultracentrifugation step has removed significant amounts of 18S RNA (though the clarified supernatants retain similar amounts of 18S RNA as the cells, suggesting that clarification alone is not sufficient to remove contaminating ribosomes). While we agree that repeating the ultracentrifuge concentration with high concentrations of EDTA is an interesting line of inquiry we feel it is outside the scope of this manuscript (and we face additional technical restrictions to pursue this as we currently lack access to an ultracentrifuge at BSL-3). We have updated the discussion to include the possibility of residual ribosome-protected fragments of sgE as a potential alternative interpretation (line 350-352).

      **Minor comments:**

      At line 197, the authors refer to "viruses" with lower levels of SARS-CoV2 RNA. This is incorrect and should be changed to "isolates" as the SARS-CoV2 virus particle does not package variable amount of genomic RNA.

      We have changed this to “clinical specimens” for clarity.

      The authors statement on lines 210-212 does not seem to be supported clearly by Fig. 5. The authors should consider including trendlines as well as other analyses that help show the correlation between viral RNA vs FFU. In addition, the authors should label the Y-axis clearly for Fig. 5.

      We have added clarifying labels to both the X and Y axes. Due to the limited sample volume we were unable to directly measure the infectious titers from the clinical samples used in this study, and thus the FFU/mL represents the titer post-isolation while the CT represents the amount of RNA pre-isolation. Nonetheless, we do see broad trends (ie, the colored dots are generally arranged in rainbow order from left to right, though we agree there is variation within this trend). We have also modified the text at lines 212-217 to reflect the reviewer’s concern- “Greater initial viral RNA levels was broadly associated with faster viral growth in both cell lines (seen in the progression of colors from left to right), however we saw significant variation within these trends. Our data suggests that when standard SARS-CoV-2 RNA RT-PCR values are the only available data for patient or population-level viral loads, they are useful in gauging the presence of infectious virus in patient NP samples (Fig 5).”

      The authors should expand on the methodology for creating ROC curves at line 467.

      We have included the following text in the methods section for ROC curve analysis:

      “ROC curves were generated using R and plotted with the ggplot2 package

      [43]. For each potential scoring marker (CT_e, CT_sge1, CT_sge2, neg_e,) samples were ordered by that marker, followed by culturable status. The false-positive rate was calculated as the cumulative count of culturable samples (after ordering by marker intensity) divided by the total count of culturable samples; the true positive rate was calculated as the cumulative count of non-culturable samples (after ordering) divided by the total count of non-culturable samples. The false positive rate was plotted on the X axis of the ROC curves and the true positive rate on the Y axis.”

      Reviewer #2 (Significance (Required)):

      This study is significant because it assesses the utility of several clinical assays for the measurement of viral RNA and correlating it with culturable virus. This is important in the field because it helps to identify methods whereby infectivity can be predicted from a simple diagnostic test. This is important to know as a virologist working in the SARS-CoV2 field. It is also important from a public health perspective to better define quarantine requirements for persons testing positive. While the study provided a new primer probe set, it appears that the already available WHO total E RNA assay is superior in predicting infectivity and this study provides further evidence to support this notion.

      We appreciate the Reviewer’s assessment that this study is significant and provides information of high interest to SARS-CoV-2 virologists that also has important public health implications.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Authors developed a novel primer/probe set for detection of subgenomic (sgE) transcripts for SARS-CoV-2 with the aim to develop a system that may predict the presence of infectious virus in patient samples. After studying the specificity and sensitivity of their system, they compared it with already validated/published systems for diagnostic of SARS-CoV-2 infection. Interestingly, they also studied the effect of the conditions of isolation. They showed Vero E6 expressing TMPRSS2 (Vero E6-TMPRSS2) to be more sensitive to infection than Vero E6, allowing a higher number of isolation from patient samples. They also showed their system to be more sensitive than a previously published sgE system as well as than a negative-strand RNA assay but less sensitive than the WHO/Charité primer/probe set. Anyway, all samples containing infectious particles (successful virus isolation on Vero E6-TMPRSS2) were detected with their primer/probe system contrary to the other tested sgE assay. They showed the negative strand assay to be unlikely to detect virus genetic material in samples which nevertheless contain infectious particles.

      **Major comments:**

      Are the key conclusions convincing?

      I salute the intention of the authors to try to fix cut-off values for infectious patients but I would be more careful on the assertion of "using a total viral RNA Ct cut-off of >31 or specifically testing for sgRNA can serve as an effective rule-out test for viral infectivity". It is true that in this study, virus was not isolated from any of the samples below a Ct of 31 or negative in the developed sgE assay but all those assays are done on cell culture. We do not know how the transmission could occur for those samples from human to human. Being able to fix a cut-off in Ct value for a define PCR/RT-PCR system would be a great improvement for SARS-CoV-2 infected patient having to stay in quarantine. It is even more important for Ebola positive patients in Africa who has to stay in quarantine in precarious conditions under tents, warm temperatures and without privacy for long period because they still positive by RT-PCR. Unfortunately, fix those values would need a very high number of experiments, including animal experiment.

      We appreciate the reviewer’s acknowledgment of the significance of this issue. We agree that in vivo animal experiments to more precisely determine the lowest infectious or transmissible dose would be valuable. But such experiments are outside the scope of the current study. To acknowledge the reviewer’s important point regarding the unavoidable limitations of cell culture systems, we have modified the abstract (line 51) to say “an effective rule out test for the presence of culturable virus,” a conclusion that is fully supported by our data.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. No

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. Yes.

      Are the data and the methods presented in such a way that they can be reproduced?

      Kinetic of SARS-CoV-2 (figure 2): The method is not detailed in the Methods part and is not clear in the figure legend. When supernatant are collected, is it all the supernatant that is remove? An aliquot? If aliquot, do you replace with new medium?

      We apologize for this omission and have included the requested details in the methods. We seed a separate well for each time point and collected the entire supernatant for a given time point, rather than replacing media. We added the following text to the methods section (lines 402-412): “**Viral growth kinetics were measured in Vero E6 or Vero E6 TMPRSS2 cells at an MOI of 0.001. Separate wells were seeded for each time point, and growth curves were conducted in technical duplicates for each biological experiment. Supernatants and cell lysates were collected twice daily 1 & 2 dpi, and again on 3, 4, 7 and 8 dpi (Vero E6 TMPRSS2 cells were harvested for the final time at day 7 due to faster growth kinetics in this cell type). For each time point, the supernatant was removed and clarified to remove cellular debris, before being split into separate aliquots for RNA extraction (mixed 1:1 with AVE lysis buffer) and viral titration (by focus assay). Dead cells/debris that was pelleted after clarifying supernatants was combined with cells scraped from each well into PBS and spun again to obtain a pellet of all cell material from each timepoint. This pellet was then lysed in AVE viral lysis buffer for RNA extraction.”

      Stability of infectious SARS-CoV-2: I am very surprise by your results on stability of cultured virus, knowing we observed a decreased of SARS-CoV-2 titer in our lab after freezing/thawing steps. Do you freeze cell supernatant directly or do you prepare your samples another way? Please state it in the Methods part

      We measured the stability after freeze/thaw for our normal high concentration viral stocks. Our viral stocks are grown in DMEM with 10% FBS, 1% HEPES, 1% pen/strep, and clarified before use. It is possible that lab-lab variation in the media components or HEPES concentration used to prepare viral stocks explains the differences seen in our work vs the reviewer’s lab. We have added the following additional detail to the methods section (lines 415-418) of the manuscript to clarify how these experiments were performed: “High concentration viral stocks (prepared as above in DMEM, 10% FBS, 1% HEPES, 1% pen/strep) were used to measure viral stability over time and after multiple freeze-thaw cycles. Stocks were stored at the indicated temperatures in the dark and aliquots were removed at the indicated days or after each freeze-thaw cycle for measuring infectious virus by focus assay.”

      Are the experiments adequately replicated and statistical analysis adequate? Yes

      **Minor comments:**

      Specific experimental issues that are easily addressable.

      Figure 2C and D: Instead of Ct values in cells, it would be more relevant to normalize these results with an endogenous gene and present results as fold change to mock-infected cells. Because you affirm that the level of RNA decline than stay stable over the time but you also note there is CPE. If you have less cells but same level of viral RNA, it means you have an increase in the RNA level in alive cells.

      We have measured the GAPDH level in these cells over time, and that data is included as gray lines in Fig 2 C&D (see updated figure). As we are combining the cell pellet from clarified supernatants with the cells that remain adherent to the dish for each harvested timepoint we expect to be harvesting the majority of cells/cell debris for each time point. The levels of GAPDH remain broadly similar over the viral growth curve, with no drop in RNA levels.

      It would have been interesting to have the results of isolation at different time-point of treatment for patient samples (figure 3A and B) to see if the virus is stable in samples

      We have access to only limited volume (several hundred µl) of residual patient sample which would make it technically challenging to compare multiple days of storage conditions/ temperatures. Unfortunately, we do not have any remaining sample volume for the specimens used in this study, and so we are unable to perform additional isolations at other times/temperatures. While we agree this would be an interesting line of future inquiry, we feel it is outside the scope of the current study.

      Are prior studies referenced appropriately? Yes

      Are the text and figures clear and accurate? Yes.

      Line 140: "this delay in virus and RNA production". You do not talk about RNA yet...

      We have removed “and RNA” from this sentence and replaced with “infectious virus production”.

      Line 156 to 163: sgE RNA detected in cell free supernatant. Can't it come from lysed cells?

      We have replaced “cell-free” with “clarified”.

      Line 167: "...virus in cell culture time course experiment in TMPRRS2 expressing cells (fig.2)"

      We have modified this text to read according to the Reviewer’s suggestion.

      Ligne 258: Fig 6A and B

      We have added the missing reference to Fig 6B as requested.

      -Do you have suggestions that would help the authors improve the presentation of their data and conclusions? No

      Reviewer #1 (Significance (Required)):

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This new primer/probe system will participate to the accurate diagnostic of SARS-CoV-2. The comparison with the existing methods is relevant to highlight the strengths and weaknesses of each system. Comparison of isolation of SARS-CoV-2 on commonly used Vero E6 with Vero E6-TMPRSS2 will lead to a great improvement of the isolation method for SARS-CoV-2.

      We appreciate the Reviewer’s assessment of the significance of our study and the improvement in our isolation method compared to the existing standard of using Vero E6 cells.

      Place the work in the context of the existing literature (provide references, where appropriate). Properly done in the introduction of the paper.

      State what audience might be interested in and influenced by the reported findings. Diagnostic laboratories

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Virology, Molecular Biology, cell biology Not enough expertise to evaluate ROC data/analysis

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Bruce et al present a new RT-PCR assay with primer sets that specifically detect sgE RNA from SARS-CoV2 samples. The authors compare this assay to other diagnostic assays in an effort to identify assays capable of correlating RNA detection with culturable virus (i.e. infectious virus). While this new assay identified 100% of culturable isolates, only 56% of isolates testing positive actually had culturable virus. Compared with other assays, the WHO total E RNA assay had better parameters when used at a cutoff Ct value of 31 (PPV of 61%). Overall, this manuscript provides a novel primer probe set for RT-PCR diagnostic assay and conducted comparisons with other assays on the same clinical samples. There are some areas that the authors should address prior to publication.

      **Major comments:**

      The authors repeatedly tout VeroE6 TMRSS2 cells as supporting higher viral infection. Therefore, the authors should address why one clinical isolate (E16) was culturable in VeroE6 but not VeroE6 TMRSS2. Was this experiment repeated multiple times? What are the reasons for this discrepancy?

      We did not have sufficient residual sample volume to repeat isolation attempts of any clinical specimen, so we are limited to a single data point for each cell line. It is possible that this sample had levels of infectious virus at the limit of detection, and stochastic probability meant infectious virus was only present in the aliquot used to infect the Vero E6 (rather than Vero E6-TMPRSS2) cells. It is also possible that viral adaptation/evolution occurred in the VeroE6 well that allowed this virus to successfully grow, but we do not have sequencing data or remaining nucleic acids to test this theory.

      The authors' argument at lines 166-169 is not supported by the data in Fig. 2. The levels of viral RNA between VeroE6 and VeroE6 TMRSS2 appear to show similar trends in the supernatant across the time course but the infectious viral levels are dramatically different. This discordance between FFU levels and RNA levels cannot be explained by instability of viral particles alone. Have the authors looked into differences in viral particles produced from these two cell lines? The authors should collect virus particles from these two cell lines and conduct the stability experiment in Fig 2D to directly test the hypothesis that indeed the drop seen in FFU in VeroE6 TMRSS2 is due to instability.

      We apologize for the confusion. We did not intend to make claims about differences in particle stability as a result of the cell line used for viral production, but rather to highlight a general observation that RNA was more stable than infectious virus. This is more obvious in the TMRPSS2 cell line, as replication is faster and more synchronized than in Vero E6 cells (the TMRPSS2 cells are largely dead by day 4, whereas infection progresses more slowly in Vero E6 cells so that new virions continue to be produced during the measured time period). We have added clarifying text at line 167-169, “We observed that SARS-CoV-2 RNA species persist for much longer than infectious virus in cell culture time course experiments, a feature that was most obvious in Vero E6 TMRPSS-2 cells due to their viral kinetics but is likely not cell specific (Fig 2).”

      The evidence for the packaging of sgE RNA into virions is weak. GAPDH detection by PCR is not a proof that the concentration process did not pellet RNA nonspecifically. First, the authors should provide ample information about viral isolation process at line 379 including rotor, centrifuge and speed utilized. In addition, ribosomes typically stay intact following viral lysis (and can be found in supernatant after release from dead cells). Actively translating ribosomes can contain sgE RNA as well. The authors should consider detecting ribosomal RNAs in their samples to rule out the possibility of contaminating ribosomes. In addition, the authors should strongly consider repeating the experiment with high EDTA concentration to break up ribosomes and only pellet virions.

      We have added additional experimental details (rotor, centrifuge and speed) describing how the viral concentration step was performed (line 389-394), “Viral RNA (courtesy of David Bauer, The Francis Crick Institute, UK) from concentrated SARS-CoV-2 (England02 strain, B lineage ‘Wuhan-like’) was obtained by clarifying viral supernatants (2 x 4000 rpm for 30 mins at 4°C in a Beckman Allegra X-30R centrifuge with a SX4400 rotor), overlaying clarified media onto a 30% sucrose/PBS cushion (1/4th tube volume) and concentrating by ultracentrifugation in a Beckman ultra XPN-90 centrifuge with SW32TI rotor for 90 min at 25,500 rpm at 4°C. Pellets were then resuspended in buffer and extracted with TRIzol LS.” We thank the reviewer for their suggestion of including an additional control, and we have added an 18S primer-probe set (see new Figure 8). This data, while not as pronounced as the GAPDH control, suggests that the ultracentrifugation step has removed significant amounts of 18S RNA (though the clarified supernatants retain similar amounts of 18S RNA as the cells, suggesting that clarification alone is not sufficient to remove contaminating ribosomes). While we agree that repeating the ultracentrifuge concentration with high concentrations of EDTA is an interesting line of inquiry we feel it is outside the scope of this manuscript (and we face additional technical restrictions to pursue this as we currently lack access to an ultracentrifuge at BSL-3). We have updated the discussion to include the possibility of residual ribosome-protected fragments of sgE as a potential alternative interpretation (line 350-352).

      **Minor comments:**

      At line 197, the authors refer to "viruses" with lower levels of SARS-CoV2 RNA. This is incorrect and should be changed to "isolates" as the SARS-CoV2 virus particle does not package variable amount of genomic RNA.

      We have changed this to “clinical specimens” for clarity.

      The authors statement on lines 210-212 does not seem to be supported clearly by Fig. 5. The authors should consider including trendlines as well as other analyses that help show the correlation between viral RNA vs FFU. In addition, the authors should label the Y-axis clearly for Fig. 5.

      We have added clarifying labels to both the X and Y axes. Due to the limited sample volume we were unable to directly measure the infectious titers from the clinical samples used in this study, and thus the FFU/mL represents the titer post-isolation while the CT represents the amount of RNA pre-isolation. Nonetheless, we do see broad trends (ie, the colored dots are generally arranged in rainbow order from left to right, though we agree there is variation within this trend). We have also modified the text at lines 212-217 to reflect the reviewer’s concern- “Greater initial viral RNA levels was broadly associated with faster viral growth in both cell lines (seen in the progression of colors from left to right), however we saw significant variation within these trends. Our data suggests that when standard SARS-CoV-2 RNA RT-PCR values are the only available data for patient or population-level viral loads, they are useful in gauging the presence of infectious virus in patient NP samples (Fig 5).”

      The authors should expand on the methodology for creating ROC curves at line 467.

      We have included the following text in the methods section for ROC curve analysis:

      “ROC curves were generated using R [43]. For each potential scoring marker (CT_e, CT_sge1, CT_sge2, neg_e,) samples were ordered by that marker, followed by culturable status. The false-positive rate was calculated as the cumulative count of culturable samples (after ordering by marker intensity) divided by the total count of culturable samples; the true positive rate was calculated as the cumulative count of non-culturable samples (after ordering) divided by the total count of non-culturable samples. The false positive rate was plotted on the X axis of the ROC curves and the true positive rate on the Y axis.”

      Reviewer #2 (Significance (Required)):

      This study is significant because it assesses the utility of several clinical assays for the measurement of viral RNA and correlating it with culturable virus. This is important in the field because it helps to identify methods whereby infectivity can be predicted from a simple diagnostic test. This is important to know as a virologist working in the SARS-CoV2 field. It is also important from a public health perspective to better define quarantine requirements for persons testing positive. While the study provided a new primer probe set, it appears that the already available WHO total E RNA assay is superior in predicting infectivity and this study provides further evidence to support this notion.

      We appreciate the Reviewer’s assessment that this study is significant and provides information of high interest to SARS-CoV-2 virologists that also has important public health implications.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Bruce et al present a new RT-PCR assay with primer sets that specifically detect sgE RNA from SARS-CoV2 samples. The authors compare this assay to other diagnostic assays in an effort to identify assays capable of correlating RNA detection with culturable virus (i.e. infectious virus). While this new assay identified 100% of culturable isolates, only 56% of isolates testing positive actually had culturable virus. Compared with other assays, the WHO total E RNA assay had better parameters when used at a cutoff Ct value of 31 (PPV of 61%). Overall, this manuscript provides a novel primer probe set for RT-PCR diagnostic assay and conducted comparisons with other assays on the same clinical samples. There are some areas that the authors should address prior to publication.

      Major comments:

      -The authors repeatedly tout VeroE6 TMRSS2 cells as supporting higher viral infection. Therefore, the authors should address why one clinical isolate (E16) was culturable in VeroE6 but not VeroE6 TMRSS2. Was this experiment repeated multiple times? What are the reasons for this discrepancy?

      -The authors' argument at lines 166-169 is not supported by the data in Fig. 2. The levels of viral RNA between VeroE6 and VeroE6 TMRSS2 appear to show similar trends in the supernatant across the time course but the infectious viral levels are dramatically different. This discordance between FFU levels and RNA levels cannot be explained by instability of viral particles alone. Have the authors looked into differences in viral particles produced from these two cell lines? The authors should collect virus particles from these two cell lines and conduct the stability experiment in Fig 2D to directly test the hypothesis that indeed the drop seen in FFU in VeroE6 TMRSS2 is due to instability.

      -The evidence for the packaging of sgE RNA into virions is weak. GAPDH detection by PCR is not a proof that the concentration process did not pellet RNA nonspecifically. First, the authors should provide ample information about viral isolation process at line 379 including rotor, centrifuge and speed utilized. In addition, ribosomes typically stay intact following viral lysis (and can be found in supernatant after release from dead cells). Actively translating ribosomes can contain sgE RNA as well. The authors should consider detecting ribosomal RNAs in their samples to rule out the possibility of contaminating ribosomes. In addition, the authors should strongly consider repeating the experiment with high EDTA concentration to break up ribosomes and only pellet virions.

      Minor comments:

      -At line 197, the authors refer to "viruses" with lower levels of SARS-CoV2 RNA. This is incorrect and should be changed to "isolates" as the SARS-CoV2 virus particle does not package variable amount of genomic RNA.

      -The authors statement on lines 210-212 does not seem to be supported clearly by Fig. 5. The authors should consider including trendlines as well as other analyses that help show the correlation between viral RNA vs FFU. In addition, the authors should label the Y-axis clearly for Fig. 5.

      -The authors should expand on the methodology for creating ROC curves at line 467.

      Significance

      This study is significant because it assesses the utility of several clinical assays for the measurement of viral RNA and correlating it with culturable virus. This is important in the field because it helps to identify methods whereby infectivity can be predicted from a simple diagnostic test. This is important to know as a virologist working in the SARS-CoV2 field. It is also important from a public health perspective to better define quarantine requirements for persons testing positive. While the study provided a new primer probe set, it appears that the already available WHO total E RNA assay is superior in predicting infectivity and this study provides further evidence to support this notion.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Authors developed a novel primer/probe set for detection of subgenomic (sgE) transcripts for SARS-CoV-2 with the aim to develop a system that may predict the presence of infectious virus in patient samples. After studying the specificity and sensitivity of their system, they compared it with already validated/published systems for diagnostic of SARS-CoV-2 infection. Interestingly, they also studied the effect of the conditions of isolation. They showed Vero E6 expressing TMPRSS2 (Vero E6-TMPRSS2) to be more sensitive to infection than Vero E6, allowing a higher number of isolation from patient samples. They also showed their system to be more sensitive than a previously published sgE system as well as than a negative-strand RNA assay but less sensitive than the WHO/Charité primer/probe set. Anyway, all samples containing infectious particles (successful virus isolation on Vero E6-TMPRSS2) were detected with their primer/probe system contrary to the other tested sgE assay. They showed the negative strand assay to be unlikely to detect virus genetic material in samples which nevertheless contain infectious particles.

      Major comments:

      -Are the key conclusions convincing?

      I salute the intention of the authors to try to fix cut-off values for infectious patients but I would be more careful on the assertion of "using a total viral RNA Ct cut-off of >31 or specifically testing for sgRNA can serve as an effective rule-out test for viral infectivity". It is true that in this study, virus was not isolated from any of the samples below a Ct of 31 or negative in the developed sgE assay but all those assays are done on cell culture. We do not know how the transmission could occur for those samples from human to human. Being able to fix a cut-off in Ct value for a define PCR/RT-PCR system would be a great improvement for SARS-CoV-2 infected patient having to stay in quarantine. It is even more important for Ebola positive patients in Africa who has to stay in quarantine in precarious conditions under tents, warm temperatures and without privacy for long period because they still positive by RT-PCR. Unfortunately, fix those values would need a very high number of experiments, including animal experiment.

      -Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No

      -Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. No

      -Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. Yes.

      -Are the data and the methods presented in such a way that they can be reproduced?

      -Kinetic of SARS-CoV-2 (figure 2): The method is not detailed in the Methods part and is not clear in the figure legend. When supernatant are collected, is it all the supernatant that is remove? An aliquot? If aliquot, do you replace with new medium? -Stability of infectious SARS-CoV-2: I am very surprise by your results on stability of cultured virus, knowing we observed a decreased of SARS-CoV-2 titer in our lab after freezing/thawing steps. Do you freeze cell supernatant directly or do you prepare your samples another way? Please state it in the Methods part

      -Are the experiments adequately replicated and statistical analysis adequate? Yes

      Minor comments:

      • Specific experimental issues that are easily addressable.

      Figure 2C and D: Instead of Ct values in cells, it would be more relevant to normalize these results with an endogenous gene and present results as fold change to mock-infected cells. Because you affirm that the level of RNA decline than stay stable over the time but you also note there is CPE. If you have less cells but same level of viral RNA, it means you have an increase in the RNA level in alive cells. It would have been interesting to have the results of isolation at different time-point of treatment for patient samples (figure 3A and B) to see if the virus is stable in samples

      -Are prior studies referenced appropriately? Yes

      -Are the text and figures clear and accurate? Yes.

      Line 140: "this delay in virus and RNA production". You do not talk about RNA yet...

      Line 156 to 163: sgE RNA detected in cell free supernatant. Can't it come from lysed cells?

      Line 167: "...virus in cell culture time course experiment in TMPRRS2 expressing cells (fig.2)"

      Ligne 258: Fig 6A and B

      -Do you have suggestions that would help the authors improve the presentation of their data and conclusions? No

      Significance

      -Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This new primer/probe system will participate to the accurate diagnostic of SARS-CoV-2. The comparison with the existing methods is relevant to highlight the strengths and weaknesses of each system. Comparison of isolation of SARS-CoV-2 on commonly used Vero E6 with Vero E6-TMPRSS2 will lead to a great improvement of the isolation method for SARS-CoV-2.

      -Place the work in the context of the existing literature (provide references, where appropriate). Properly done in the introduction of the paper.

      -State what audience might be interested in and influenced by the reported findings. Diagnostic laboratories

      -Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Virology, Molecular Biology, cell biology Not enough expertise to evaluate ROC data/analysis

  4. Sep 2021
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewer for their input. Our response to their comments is in the attached preliminary revision plan.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      • Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Please place your comments about significance in section 2.

      This manuscript provides a detailed and very clear description of multiSero, which is an open source multiplex-ELISA platform for analyzing antibody responses to SARS-CoV-2 infection. This tool is a very promising step towards fully open-source multiplex testing. Using terrific visualizations the different steps involved in measuring the antibody levels is carefully explained. It starts with a clear explanation of the principle of printed antigen arrays, the usage of developed and opensource software Pysero to analyse the colorimetric signal of each spot associated with a different antigen. The colorimetric signal was read using both a commercial reader and an inexpensive, open plate reader. The comparison between the two proved that the open plate reader is as good as the commercial reader is.

      Major comments:

      • Are the key conclusions convincing? • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. • Are the data and the methods presented in such a way that they can be reproduced? • Are the experiments adequately replicated and statistical analysis adequate?

      The authors provide a new method to measure antibody levels. A comparison with an exisisting ELISA for anti-spike IgG would be worthwhile.

      A gradient boosting tree was used to combine the signal from multiple antigens. However, this did not lead to any notable improvements in classification performance. This result could be due to a property of the data or the algorithm. Two things that would be very useful here would be to plot the data (e.g. anti-Spike vs anti-N) and use a much simpler algorithm such as a logistic regression.

      The performance of the tool is based on one positive and one negative pool. And as the the authors mention, antibody levels are highly dependent on severity and time since infection. The performance of the classifier therefore strongly depends on the characteristics of the positive pool. It would improve the manuscript by providing additional information, if possible. If not, I think this should be mentioned as a short-coming in the discussion. Possibly, having serum panel with more asymptomatic infections or longer time since infection, would result in a poorer performance from the classifier.

      Related to the point above is what is written in line 248-249. The direction of the performance of the tool with additional samples depends on the characteristics (time since infection, age, severity) of the currently used samples and the samples to be added. The assumption that the performance can only increase is in my opinion not correct.

      The authors compared three normalization methods to circumvent using a standard curve. The normalization of ODs by the mean of anti-IgG Fc ODs is most promising as shown in Fig. S5. A comparison between this normalization method and using a standard curve is not given. It would be worthwile to look at the distribution of a serum panel from different plates, in relative antibody units as well as normalized ODs. Is the captured antibody distribution by normalized ODs as good as relative antibody concentrations derived from the standard dilution.

      In the abstract, the reader is told that the multiSero tool could be used with up to 48 antigens. I assume that at this number of antigens, the use of duplicate/triplicate antigens is not possible anymore? Also, the layout and spacing of the antigen array with more antigens would introduce more experimental artificats like comets and debris ?

      In FigS3, and line 146/147 the authors state that they find the that the presence of comets odes not cause observable bias or variance. This strikes me as rather subjective, and my impression of FigS3 B3 is that there is some bias due to comets?

      Minor comments:

      • Specific experimental issues that are easily addressable. • Are prior studies referenced appropriately? • Are the text and figures clear and accurate? • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Important reason for developing the multiSero tool according to the authors is the deployment of high content, multiplex serology platforms across the world and this paper makes a huge step towards this goal. Main hurlde of implementing the multisero tool in low-resource settings is its dependency on printed antigen arrays, which are produced by machines costing around 100,000-300,000 $ as mentioned by the authors. The authors also realize this and acknowledge this bottleneck in the discussion. I think it would possibly be good to elaborate a little further on why this is a limitation. Less freedom with the user what they want to test because dependent on producer of printed 96 well-plates?

      In line 47, I suppose the word are is missing.

      The overall language use is very clear. An improvement in my opinion would be to replace words such as cognate (line 46) and « in lieu of » (line 227) by easier alternatives, such as associated and instead of.

      Comets and debris are first mentioned in line 129/130 but require more explanation. An explanation of what is meant with comets only became clear to me after reading the discussion. I would use the explanation mentioned in line 250/250 right after the first time mentioning comets. What debris means, remains unclear to me.

      In line 174, I suppose that the word points should be line.

      Pysero sometimes starts with a capital P, sometimes with a lower case p, see for example line 108 and 109.

      Authors find using a standard curve as labor-intensive (line 190), I find this too strongly put.

      In line 281-283 the authors mention they are unaware of examples of classifiers distinguishing positive from negative samples based on more than antigen. Examples could be the classification of cholera using 2-6 antigens by : Azman et al, 2019 in Sci. Transl. Med.

      In Fig S1 the Nauttilus plate reader is shown. The costs of this reader are estimated to be less than 1500$. These are the costs without the motorized

      Significance

      With this manuscript, the authors show that multiplex serology platforms can become more accessible to low and medium income countries due to their development of a new open source tool. This means that multiplex serology seems to be becoming more accessible in low-resource settings. Next step is to use this multiSero tool in a low-source setting.

      Specific audience potentially interested are computational biologists involved in the analysis, visualization, and interpretation of the results of techniques and microbiologists quantitating and measuring antibodies. A broader audience that could be interested are infectious disease epidemiologists, especially those that are involved in serosurveillance and are keen to pick up new methods to potentially improve epidemiologal descriptions of immunity to several infections in low-and medium-resource settings.

      My field of expertise is limited to field-epidemiology and sero-epidemiology. Techniques such as the detection of spots and registering grids with multiSero are outside of my expertise. The construction of the Nautilus reader is new to me, and therefore hard to assess how easy it would be set up such a system in low-resource settings. I also feel my expertise regarding the choice of classifiers is limited, as I have not used gradient boosting before. Further, I am not an expert in the field of new developments in multipex assays and therefore not up-to-date with the latest literature in this field.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      There is a need for multiplex serological tests,and ELISA is the most applicable platform. However, the current ELISA based multiplex serological tests are heavily dependent on expensive and sophisticated instruments and softwares, and this hinders the wide application. To address this challenge, by incorporating open-resourced instruments, developing new analysis software, the authors proposed an integrated platform for multiplex serological test. To test the platform, SARS-CoV-2 was included as the example. Overall, this study is more technical oriented. The major contents are the establishment and optimization of the platform. The aim is focused and clear, the design of the experiments are comprehensive. The conclusions could be supported by the data.

      Major comments:

      • Are the key conclusions convincing? Yes
      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No
      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. No
      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. N/A
      • Are the data and the methods presented in such a way that they can be reproduced? Yes
      • Are the experiments adequately replicated and statistical analysis adequate? Yes

      Minor comments:

      • Specific experimental issues that are easily addressable.
      • Are prior studies referenced appropriately? Yes
      • Are the text and figures clear and accurate? Yes
      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? No

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This study is more technical centered. The major contribution is the development of an ELISA-based platform for multiplex serological test. The authors intended to make their platform applicable at resource limited regions. However, the problem here is the current platform is still too complicate for wide application in real world. For a platform which may could be widely applied, especially at poor regions, it needs to meet several key features: 1. Low cost; 2. Standardized; 3. Simple (reduce operation to as few as possible). The major focus of this study is the first feature, and the other two features were bared touched. But, even "low cost" is still valuable and worth publication. The reviewer suggest the author to modify the manuscript to better reflect the fact.

      • Place the work in the context of the existing literature (provide references, where appropriate).

      The existing literatures were well referenced.

      • State what audience might be interested in and influenced by the reported findings.

      Researchers who are interested in assay development.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Protein microarray technology. Assay development. SARS-CoV-2 antibody response analysis.

      The reviewer is not familiar with the software part.

      Other specific points:

      1. The authors mentioned that the multiplex serological test could be applied to differentiate infection and vaccination, in the case of SARS-CoV-2, how could this be possible if there is no specific biomarker?
      2. Have the authors also tested IgM?
      3. To simplify the normalization, the authors have tried several strategies, however, none works well. The results need to be further explained. Is there any other strategy could be attempted?
      4. What's the definition of the "background"?
      5. What's the rationale to select the two concentrations? Will more concentrations be better?
      6. The authors stated "open source analysis tools can be adapted for multiplexed detection of pathogens by printing pathogen-specific antibodies, instead of antigens". This is true, however, highly specific antibodies are required.
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank the reviewers for their constructive and helpful comments on our manuscript and are delighted to find their consensus that the manuscript represents an important contribution to the field. We provide a detailed response to specific points below. In addition, we propose to include new data showing that our method can be applied to experimentally infected lung tissue. Namely, we show highly sensitive detection of SARS-CoV-2 RNA in infected hamster lung section.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      Reviewer #1 **Major comments:**

      The authors used approaches provided in FISH-quant (Mueller et al, Nat Methods 2013) and big-fish. However, these tools to analyze RNA aggregates were not designed and validated for such massive aggregations as observed by SARS-Cov-2. They were developed for cases such as transcription sites with much smaller aggregations, with a few tens to a hundred molecules. With a regular spot detection approach, usually a few thousand spots can be detected in a cell (e.g. King et al, J Virol 2018), but this depends also on the used microscope and the available cellular volume. Higher RNA concentrations cannot be resolved with a standard approach, because RNA spots start to overlap. Decomposing RNA aggregations can help but will not work reliably for the high RNA densities observed for SARS-Cov-2, especially at later infection time-points. The tools will then not provide accurate estimates anymore. To my knowledge, there is currently not accurate quantification method for such massive RNA levels in smFISH. What has been done in the past, is using cellular intensity as an approximation and perform calibrations with cells having lower and thus still resolvable RNA counts (Raj et al., PLO Biology; https://doi.org/10.1371/journal.pbio.0040309.sg003). The authors proposed three expression regimes (partially resistant, permissive, and super permissive). My concerns here apply mainly to the category super-permissive, where an accurate estimation can't be performed. Here a more cautious quantification should be applied. __To a lesser extent, this will also apply to some of quantifications of gRNAs per factory, with counts exceeding 100s of molecules. As mentioned above, this does not affect any of the conclusions, but would reflect more accurately what kind of reliable information can be drawn from such experiments.__

      We agree with the reviewer that approaches like FISH-quant and Big-FISH cannot reliably quantify RNA spots with high spatial density such as our examples of “super-permissive” cells. Single molecule quantitation of such cases is likely to underestimate RNA expression as noted by us and King et al 2018 (doi: 10.1128/JVI.02241-17). Therefore, we integrated the combined smFISH signal intensity within entire cellular volumes and compared to the median intensity of single molecules in cells with lower infection density. We will (i) revise the methods and results sections to explain more carefully and explicitly the quantification of RNA in super-permissive cells. (ii) Provide a calibration plot for the quantitation as previously reported (Raj et al 2006, doi: 10.1371/journal.pbio.0040309).

      We agree that high local RNA density has the potential to interfere with quantification of gRNAs within viral factories. We have used the “cluster.decomposition()” function of Big-FISH to quantify viral factories, which is conceptually similar to the “Integrated intensity” mode of FISH-quant. Applying this algorithm to non-super permissive cells allows us to use the mean intensity of a reference single-molecule spot to estimate the number of molecules in a cluster. We are confident such estimates are reliable in the majority of viral factories, which contain less than or equal to 200 single gRNA molecules. We will revise the methods section to clarify this method of analysis.

      Reviewer #1 __**Minor comments:**__

      1.Page 6; the authors state that "smFISH identifies ... cellular distribution .... within ER-like membranous structures". However, the authors didn't directly show such a localization, could they provide an experiment with an ER stain?

      This text was based on previous light microscopy and EM studies that reported SARS-CoV-2 RNA in ER-derived membranes (termed Double Membrane Vesicles - DMVs) or co-localisation of anti-dsRNA (J2) with ER-markers (Cortese et al 2020; Hackstadt et al 202; Mendonca et al 2021)*. We propose to clarify the text on page 6 including the citation of these publications and to tone down our claim that the virus is located in ER-like membranous structures.

      *Cortese et al 2020, doi: 10.1016/j.chom.2020.11.003

      Hackstadt et al 2021, doi: 10.3390/v13091798

      Mendonca et al 2021, doi: 10.1038/s41467-021-24887-y

      2.It might be worthwhile pointing out that the probe-sets can be used in different host organisms (Vero - African green monkey; human cell lines).

      We propose to revise the text to emphasise more clearly the applicability of SARS-CoV-2 probes for the study of many different host organisms.

      3.I really liked the experiment, where the authors showed absence of signal when infecting with another virus & elegant control with the J2 AB. Maybe the authors could explain more clearly that the used a different coronavirus & that based on their sequence alignment no/little signal would be expected.

      Thank you for this supportive comment. We plan to follow the reviewer’s suggestion and expand our explanation of the rationale of this experiment in the text.

      7.The experiment with the isolated virions shows nicely that the smFISH approach has single-virus sensitivity. Did the authors compare the intensity of these isolated virions with the signal in Fig 1B? This might be a question of personal taste, but to me, this section might actually fit better in the first paragraph of page 4/5, where the authors describe single virions in cells.

      Thank you for the interesting question. We have not performed a direct comparison of the spot intensities of intracellular genomic RNA molecules and those from the isolated virions, because isolated SARS-CoV-2 requires poly-L-lysine coating for the coverslip attachment while our infection strategy utilises cells growing on uncoated glass. Nonetheless, the isolated virion spot intensities follow a unimodal distribution, and their shape approximates to the point-spread function of the microscope. Since spots at 2 hpi are largely derived from non-replicative viral genomes and they are measured in the intracellular environment with the same background (autofluorescence), they are a better ‘single RNA molecule’ reference.

      We also thank the reviewer for suggesting rearranging the text section. To address this point we plan to move the relevant text to the second paragraph of the Results section.

      8.Page 6. The authors state "+ORF-N and +ORF-S single labelled spots, corresponding to sgRNAs, were more uniformly distributed throughout the cytoplasm than dual labelled gRNA". This is difficult to appreciate from the image. Is this something the authors could quantify, e.g. with the metrics proposed by Stueland et al, Scientific Reports 2019?

      To address this point, we plan to: (i) present an alternative image illustrating a clearer example of differential spatial localisation of gRNA and sgRNA, and (ii) perform quantification of spatial dispersion indices for gRNA and sgRNA using the suggested method for our revision.

      9.Page 6. The authors perform a FISH/IF experiment including a co-localization analysis, where a "limited overlap" with sgRNAs was observed. I was wondering if this overlap could actually be simply due to rather high density of the sgRNAs. Maybe a control analysis by slightly changing the RNA positions could provide insight here, and give a threshold for what's to be expected randomly at a given RNA density.

      The reviewer’s comment is correct, in that a high density of sgRNAs and nucleocapsid protein could lead to signal overlap due to chance. This is why we excluded “super-permissive” cells from this analysis. Our co-localisation data showed that gRNA spots had a bimodal nucleocapsid immunofluorescence intensity distribution (data not shown), suggesting nucleocapsid-associated and “free” gRNAs, providing a threshold for this analysis. Nevertheless, we agree with the reviewer that the analysis of randomly positioned transcripts of the same density would provide a valuable control. In our revised MS we will include: (i) a random distribution analysis comparing the overlap between sgRNA and nucleocapsid in the “Observed” and a “Randomised” simulation, and (ii) a plot showing a full distribution of co-localised nucleocapsid immunofluorescence intensity for both genomic and sub-genomic viral RNAs.

      10.I don't fully follow the argument about stability on page 8. The authors also see an increase in the RNA levels. Couldn't this increase compensate for loss of RNA due to degradation? Would it be possible to perform an experiment at a very high REMDESIVIR concentrations which would blocks transcription?

      Remdesivir is a nucleoside analogue that inhibits viral RNA polymerase activity. While this drug inhibits viral replication, the inhibition is incomplete and using higher concentrations results in cellular toxicity. At the present time there are no stronger polymerase inhibitors available, so these experiments are the best approximation possible to assess viral RNA stability. We propose to revise the text to discuss the limitations of Remdesivir for modelling RNA stability.

      12.How did the authors define/detect replication factories? I couldn't find information about this in the methods.

      This is a good point raised by both the reviewers. Please see [Reviewer 2 General comment #1] for our response.

      Reviewer #2 **General comments:**

      1.The authors' definition of viral factories, in part as foci with at least 4 gRNA molecules, comes across as arbitrary. Perhaps a clearer explanation of this cutoff would be helpful to the readers' understanding of this definition. Additionally, confirmation of the functionality of such factories by immunofluorescence with anti-RdRp, for example, in addition to identifying staining of gRNAs and (-) sense viral RNAs at each focus could provide valuable support to the authors' conclusions.

      We thank both reviewers for requesting further information on our explanation of viral factories. We defined viral factories as smFISH signals with spatially extended foci that exceed the size of the point spread function of the microscope and the intensity of a reference single molecule. We then filtered these candidate factories based on the radius of the signal foci with EM-measured radii of double-membrane vesicles and single-membrane vesicles formed by SARS-CoV-2 (150 nm pre-8hpi and 200 nm post-8hpi) (Cortese et al 2020; Mendoca et al 2021). Our terminology encompasses both replication and viral assembly sites. The threshold of 4 genomic RNA molecules was selected as a technical threshold to limit an over-estimation of viral factories at later timepoints. For our spinning-disk confocal imaging system, we found the threshold of 3-7 RNA molecules provided satisfactory results. We propose to revise both the Results and Methods sections to clarify our rationale for defining and quantifying viral factories.

      As the reviewer mentioned, we have shown a partial overlap of positive sense genomic RNAs with negative sense genomic RNAs (Figure 2D, S2C), suggesting these viral factories represent double membrane vesicles. The use of antibodies against the viral polymerase (nsp12) is also a possibility to detect replication centres. However, replication centres are not the only ‘viral factories’ as there are also double-membrane structures where viral particles assemble (Mendoca et al 2021) and they, in principle, lack negative sense RNA and replication machinery, so neither smFISH probes against the negative strand nor a nsp12 antibody will comprehensively detect viral factories. We appreciate the valuable suggestion, but the classification of viral factories into replication and assembly sites would be challenging due to reagent availability and is beyond the scope of this manuscript.

      2.The random distribution of super-permissive cells in each cell line was demonstrated early in the infection, primarily at 8 hpi. The authors do not show how this pattern changes over time (8, 10, 12, 16, 24 hpi, for example). Do clusters of super-permissive cells appear at later time points, or does the pattern of 'highly' infected cells remain random for each virus? Any strain-specific differences identified from such patterns may be important for understanding infection progression. Finally, the authors do acknowledge this point, but it cannot be overstated that these data were taken from cell culture systems that have limited similarities to the human respiratory epithelium. A better model for such studies might be primary cultured human bronchial epithelial cells, but of course, these cells are not as readily accessible as the cell lines used in this manuscript.

      We share the same view that the presence and the spatial distribution of “super-permissive” cells can provide unique insights into SARS-CoV-2 infection dynamics. Our findings suggest that even at 24 hours post infection (hpi), not all cells become “super-permissive” and the culture maintains a heterogenous population of “partially resistant”, “permissive” and “super-permissive” cells (Figure 3C, S3C-D). We agree with the reviewer that the spatial distribution of “super-permissive” cells at later timepoints is of interest. To address this point, we plan to: (i) analyse the spatial distribution of “super-permissive” cells at 24 hpi, and (ii) compare the distribution of “super-permissive” cells at 24 hpi between VIC and B.1.1.7 strains.

      We appreciate the comment on the limitations of the cell culture systems to the human respiratory tract. However, Calu-3 and A549-ACE2 lung epithelial cells have been used in many studies over the last year and we feel it is important to publish single cell quantitation with these models to enable comparison with the published literature. We believe our results provide valuable information on the intrinsic nature of host cell susceptibility to support viral replication. During the review of this manuscript, we applied our smFISH probes to detect SARS-CoV-2 RNA in infected Golden Syrian hamster lung sections, which show an uneven distribution of infected cells. While the identification and spatial characterisation of susceptible cell types in the lung are beyond the scope of this manuscript, we are excited to include this data in our revised paper to demonstrate the utility of this sensitive approach to track spatiotemporal viral infection dynamics.

      3.The difference in early replication kinetics between the VIC and B.1.1.7 strains is an exciting finding that may have implications for clinical outcomes and transmissibility of these viruses. However, the authors did not clearly demonstrate how these differences in RNA production correlate to infectious viral load released from these cells (in bulk) at each time point. An explanation of this omission would be helpful.

      We will provide data on the level of infectious virus secreted from VIC and B.1.1.7 infected cells at all time points in the revised paper.

      In my opinion, findings related to specific cell lines are of much less importance (and are much less biologically relevant) that identification of replicative differences among strains. Such differences could be used, in part, to aid prediction of the transmissibility of VOC, for example. I think this point gets a bit 'lost in the weeds' of the rest of the paper.

      To address this comment, we will revise text on the differential replication kinetics of the SARS-CoV-2 strains to make this more prominent in our paper.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      Reviewer #1 __**Minor comments:**__

      4.I might have missed this, but they authors could also mention the positive control data about but Calu3 infected with SARS-COv2. One thing I was wondering: why did the authors use two different cell lines for this experiment?

      To address this point, we have added a sentence about a positive control visualising SARS-CoV-2 in Calu-3 cells using our probe set (page 5 – line 17).

      The experiments with HCoV-229E were done in Huh-7.5 cells because SARS-CoV-2 and HCoV-229E have distinct cell preferences. Using the J2 antibody we show that the levels of the dsRNA derived from viral replication are similar in the two cell lines and with the two viruses. Therefore, the lack of smFISH signal in HCoV-229E infected cells supports the high specificity of the probe set.

      5.Fig 1E. Would be nice to have the intensity scale for all time-points to permit a comparison of image intensities along the different time-points.

      6.Fig 3B. Would be important to have intensity scale bars to judge the signal intensities across the different time-points.

      The fluorescence intensity scale in Figure 1E is applicable to all timepoints, except for the lower panel at 24 hpi, which was intended to show wider dynamic contrast range. To address this point, we have provided intensity scales for all time-points studied in this figure and also Figure 3B.

      11.Fig 3C. maybe indicate the two groups with dashed lines.

      We have added a dashed line at the 102 mark in Figure 3C to visually differentiate “partially resistant” and “permissive” cells.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In "Absolute quantitation of individual SARS-CoV-2 RNA molecules: a new paradigm for infection dynamics and variant differences", Lee and colleagues adapt fluorescence in situ hybridization (FISH) to track viral RNAs at the single-molecule level, illustrating heterogeneity during the infection process with potential for significant clinical implications. The authors have meticulously demonstrated use of this approach to investigate the kinetics of early infections, as well as infection heterogeneity between the original and variant strains. Most notably, the authors have identified differences in early infection kinetics between an early strain and more transmissible variant.

      General Comments:

      1.The authors' definition of viral factories, in part as foci with at least 4 gRNA molecules, comes across as arbitrary. Perhaps a clearer explanation of this cutoff would be helpful to the readers' understanding of this definition. Additionally, confirmation of the functionality of such factories by immunofluorescence with anti-RdRp, for example, in addition to identifying staining of gRNAs and (-) sense viral RNAs at each focus could provide valuable support to the authors' conclusions.

      2.The random distribution of super-permissive cells in each cell line was demonstrated early in the infection, primarily at 8 hpi. The authors do not show how this pattern changes over time (8, 10, 12, 16, 24 hpi, for example). Do clusters of super-permissive cells appear at later time points, or does the pattern of 'highly' infected cells remain random for each virus? Any strain-specific differences identified from such patterns may be important for understanding infection progression. Finally, the authors do acknowledge this point, but it cannot be overstated that these data were taken from cell culture systems that have limited similarities to the human respiratory epithelium. A better model for such studies might be primary cultured human bronchial epithelial cells, but of course, these cells are not as readily accessible as the cell lines used in this manuscript.

      3.The difference in early replication kinetics between the VIC and B.1.1.7 strains is an exciting finding that may have implications for clinical outcomes and transmissibility of these viruses. However, the authors did not clearly demonstrate how these differences in RNA production correlate to infectious viral load released from these cells (in bulk) at each time point. An explanation of this omission would be helpful.

      Significance

      Adaptation of RNA-based imaging to understand viral infection cycles is critical to the development of antivirals and other mitigation strategies, highlighting the significance of this work. This manuscript represents an almost herculean effort to identify viral replication dynamics using a series of thoughtful and well-controlled experiments. This paper is likely to be valuable to the field, and will serve as a launch pad for future studies in the role of viral RNA production in SARS-CoV-2 infection, clinical outcomes, and transmissibility.

      Expertise keywords: influenza virus, virus transmission, oligonucleotide-based imaging and therapeutics

      I do not have significant experience with quantitation of fluorescence imaging and signal co-localization in cell images.

      Referees cross-commenting

      Reviewer 1's comments regarding the application of smFISH and RNA quantitation are very helpful and address some key limitations of the research presented in this manuscript. I agree that the experiments are well thought out and include appropriate controls. I think the reviewer's comments and concerns are fair and that it would be appropriate to ask the authors to address their points.

      However, my primary concern remains with the biology and focus of the manuscript. In my opinion, findings related to specific cell lines are of much less importance (and are much less biologically relevant) that identification of replicative differences among strains. Such differences could be used, in part, to aid prediction of the transmissibility of VOC, for example. I think this point gets a bit 'lost in the weeds' of the rest of the paper.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The authors use single-molecule FISH (smFISH) to study the early-time points of SARS-Cov-2 infection/replication. By targeting genome and sub-genomic RNAs, they can decipher different stages during the infection cycle, and identify different cell populations with distinct behavior. By applying both smFISH and IF with the J2 antibody recognizing dsRNA, the authors nicely demonstrate how smFISH is more sensitive, especially during early infection when viral RNA levels are still relatively low. The investigation of the two SARS-Cov-2 strains is well thought through and provides evidence that these strains have similar viral uptake and infection rates, but differ in the replication kinetics, opening the door for future investigations. The paper is a pleasure to read and the authors provide a wealth of controls that not only convincingly illustrate the specificity of their approach but also how it provides unique information, complementing both IF and sequencing-based approaches. The provided methods are explained in detail and will allow users to quickly get started. Paper provides not only very interesting biological insights, but also nicely illustrates how smFISH can be used to study infection by providing unique information.

      Major comments:

      The key conclusions were convincingly presented and, as far as I can judge as a biophysicst with limited experience in SARS-Cov-2 biology, backed-up with the adequate controls and analysis. In general, the authors provide exemplary validations to illustrate the specific of their approach. RNA detection and single-molecule sensitivity is validated in several experiments, by the "standard" probe-splitting approach, where a dual-color labeling of the same RNA is performed, but also by RNAse and Remdesivir treatment. Further, the authors show the specificity of their smFISH probes by applying them to another coronavirus (HCov-229E), where no signal was detected. Further, the authors provide very detailed methods, which should make it easy for other researches to apply these methods in their own research, and also reproduce the results. The imaging data is nicely complimented with quantitative analysis where needed and the provided plots are both adequately chosen and visually pleasing.

      However, I have one major concern about the RNA abundance analysis. While this comment concerns some of the analysis, it does not question the obtained conclusions. The authors used approaches provided in FISH-quant (Mueller et al, Nat Methods 2013) and big-fish. However, these tools to analyze RNA aggregates were not designed and validated for such massive aggregations as observed by SARS-Cov-2. They were developed for cases such as transcription sites with much smaller aggregations, with a few tens to a hundred molecules. With a regular spot detection approach, usually a few thousand spots can be detected in a cell (e.g. King et al, J Virol 2018), but this depends also on the used microscope and the available cellular volume. Higher RNA concentrations cannot be resolved with a standard approach, because RNA spots start to overlap. Decomposing RNA aggregations can help but will not work reliably for the high RNA densities observed for SARS-Cov-2, especially at later infection time-points. The tools will then not provide accurate estimates anymore. To my knowledge, there is currently not accurate quantification method for such massive RNA levels in smFISH. What has been done in the past, is using cellular intensity as an approximation and perform calibrations with cells having lower and thus still resolvable RNA counts (Raj et al., PLO Biology; https://doi.org/10.1371/journal.pbio.0040309.sg003). The authors proposed three expression regimes (partially resistant, permissive, and super permissive). My concerns here apply mainly to the category super-permissive, where an accurate estimation can't be performed. Here a more cautious quantification should be applied. To a lesser extent, this will also apply to some of quantifications of gRNAs per factory, with counts exceeding 100s of molecules. As mentioned above, this does not affect any of the conclusions, but would reflect more accurately what kind of reliable information can be drawn from such experiments.

      Minor comments:

      I have a few minor comments/questions.

      1.Page 6; the authors state that "smFISH identifies ... cellular distribution .... within ER-like membranous structures". However, the authors didn't directly show such a localization, could they provide an experiment with an ER stain?

      2.It might be worthwhile pointing out that the probe-sets can be used in different host organisms (Vero - African green monkey; human cell lines).

      3.I really liked the experiment, where the authors showed absence of signal when infecting with another virus & elegant control with the J2 AB. Maybe the authors could explain more clearly that the used a different coronavirus & that based on their sequence alignment no/little signal would be expected.

      4.I might have missed this, but they authors could also mention the positive control data about but Calu3 infected with SARS-COv2. One thing I was wondering: why did the authors use two different cell lines for this experiment?

      5.Fig 1E. Would be nice to have the intensity scale for all time-points to permit a comparison of image intensities along the different time-points.

      6.Fig 3B. Would be important to have intensity scale bars to judge the signal intensities across the different time-points.

      7.The experiment with the isolated virions shows nicely that the smFISH approach has single-virus sensitivity. Did the authors compare the intensity of these isolated virions with the signal in Fig 1B? This might be a question of personal taste, but to me, this section might actually fit better in the first paragraph of page 4/5, where the authors describe single virions in cells.

      8.Page 6. The authors state "+ORF-N and +ORF-S single labelled spots, corresponding to sgRNAs, were more uniformly distributed throughout the cytoplasm than dual labelled gRNA". This is difficult to appreciate from the image. Is this something the authors could quantify, e.g. with the metrics proposed by Stueland et al, Scientific Reports 2019?

      9.Page 6. The authors perform a FISH/IF experiment including a co-localization analysis, where a "limited overlap" with sgRNAs was observed. I was wondering if this overlap could actually be simply due to rather high density of the sgRNAs. Maybe a control analysis by slightly changing the RNA positions could provide insight here, and give a threshold for what's to be expected randomly at a given RNA density.

      10.I don't fully follow the argument about stability on page 8. The authors also see an increase in the RNA levels. Couldn't this increase compensate for loss of RNA due to degradation? Would it be possible to perform an experiment at a very high REMDESIVIR concentrations which would blocks transcription?

      11.Fig 3C. maybe indicate the two groups with dashed lines.

      12.How did the authors define/detect replication factories? I couldn't find information about this in the methods.

      Significance

      The authors their established smFISH approach for the detection of SARS-Cov-2 RNA. As mentioned above, they provide extensive validations and detailed protocols (including the necessary probe sequences). This should allow also relative newcomers to the field to quickly perform these experiments. While the technical advance might not be major, the convincing presentation will certainly be appealing for an audience which has not be using imaging-based approaches to study (early) viral infection events and was relying more on other approaches, such as sequencing or bulk-PCR.

      There are a few papers using smFISH to study SARS-Cov-2, but to my knowledge this study provides the most detailed analysis of the early time-points of infection, where smFISH with its sensitivity really shines. This paper not only provide new insights about SARS-Cov-2 biology, but is very nicely illustrating what kind of unique information smFISH can provide and how this complements orthogonal approaches such as single-cell RNA-seq. Hence, this will certainly be interesting for virologists/biologists working on this pathogen by providing new insight about the replication kinetics, but can also help them to potentially integrate smFISH into their own research.

      I'm a biophysicist working on transcriptional regulation. I contributed to development of both experimental methods and analysis tools to study single-molecule FISH data. I have only limited expertise in virology, and thus not evaluate in detail the biological findings concerning SARS-Cov-2.

      Referees cross-commenting

      I completely agree with the assessment of reviewer #2 and have nothing to add.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We want to thank all three reviewers for their positive and constructive comments and suggestions for improvement. We have now thoroughly revised the manuscript including new analysis, extra figures, and new material in the wiki. The manuscript has significantly improved because of the reviewers input. Detailed responses to questions and comments are given below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Lange et al. have developed an automatic feeding system for zebrafish facilities. The system is open-source and relatively easy to implement. The authors propose to systems, one that delivers the same amount of food for each aquarium (ZAF) and a second (ZAF+) that can adjust the amount of delivered food to each aquarium. The authors show no difference in fish weight, spawning and water quality, when fed using the automatic system or manually.

      In my opinion, the ZAF and ZAF+ are an excellent first approach to solve the complex problem of automatizing feeding in fish facilities. So far, only one company offers this option which is extremely expensive and demands a lot of maintenance.

      The manuscript is very well written and easy to follow. The supplementary material is very well detailed. It is clear that the authors intended to facilitate the implementation of the ZAF by potential users.

      We appreciate the supportive comments from Reviewer 1 and address all comments below:

      I just have a few comments regarding the system:

      1) The authors do not indicate how the system is cleaned. the system drains itself, but will any deposits of food remain in the tubes ? Why is the system not flushed with clear water after each feeding? do the tubes get clogged ?

      We agree that the cleaning process was not clearly explained in the manuscript. We added clear sentences in ‘Box 1’ to describe the first cleaning step (see text and figure). Indeed, after each feeding we flush water and then air into the tubes. Moreover, we explain in ‘Box 2’ that we have a second level of cleaning in the form of a special cleaning program that is run at least once a day with no food distribution (i.e same program as used for feeding but without actual food mixed, we flush lots of clean water and then air in the system). Finally, in the discussion we clarify the different cleaning steps by adding extra explanations in the first paragraph.

      All these procedures and programs are very effective in preventing system clogging and in reducing the accumulation of debris and algae. After more than 19 months of ZAF and ZAF+ feeding in our facility we never experienced any tube clogging.

      2) How long the system was tested for?

      ZAF has run in the facility for 9 months and ZAF+ for 10 months since September. We added a sentence about the testing time in the discussion. We never experienced any major problems, only a few minor malfunctions, reported in the new troubleshooting table added to the wiki (suggested by the reviewer 2).

      3) The ZAFs were used to feed 16 aquariums. For such a small rack, manually feeding takes less than 5 min. The authors should highlight that, at least for such small systems, the ZAFs will be especially very useful for feeding during weekends and holidays. Still, adding 16 commercially available small automatic feeders to each aquarium, could be simpler to implement.

      As noticed by the reviewer, ZAFs are very useful when staff are not present (week end, vacation, etc..). To emphasize on this particular point we added a sentence in the discussion's first paragraph. The small automatic feeders available commercially are usually very difficult to attach to zebrafish facilities . Indeed they can’t adapt to conventional lab aquatic facility racks because they are designed for pet aquariums. They also have less features compared to the ZAFs (difficult to adapt the food quantity, more food waste, cumbersome...). Additionally, by multiplying the number of devices (you need one small feeder per tank), one increases the risk of possible malfunction as well as the maintenance time required for food filling, cleaning etc...

      Thus, usage of small automatic feeders in laboratory aquatic housing racks is complex to adapt, a source of feeding error, is more cumbersome, and potentially more time consuming etc… They are simply not designed for professional aquaculture systems. Whereas ZAFs can be easily adapted to all the commercially available aquatic facilities. The fact that ZAFs simply ‘interfaces’ via tubes to fish facility racks makes them very versatile and unintrusive.

      4) How do authors envisage implementing the ZAFs in much larger facilities (from 100 to 1000 tanks) ? Implementing a specific ZAF for each rack containing ~20 tanks may not be realistic.

      Indeed building multiple ZAFs will be complex and resource consuming. Thus, we designed ZAFs to be adaptable and modular, so one ZAF ( or ZAF+) can easily be scaled to handle bigger facilities. The supplementary information and the wiki describe all the steps required to build a ZAF for 16 tanks and a ZAF+ for 30 tanks and many tips to scale up these devices without major modifications (up to 80 tanks for ZAF no restrictions for ZAF+). Of course, we do think that for truly large facilities, there is probably a sweet spot that balances the number of individual devices and the per-device capability. Having a single device feeding 1000 tanks is probably not wise, perhaps 5 devices for 200 tanks each (ZAF+) would be the best. Please note that the hardware cost and complexity scales roughly linearly with the number of tanks, no surprises here. Moreover, in the case of ZAF+ it is possible to use splitters to feed even more tanks from the same line (ZAF+).

      We added pages in the ZAF/ZAF+ wiki, to help the users extend the feeding capacities of their desired ZAFs (see in the wiki “tips to scale up ZAF “- “tips to scale up ZAF+”). We also mentioned in the discussion the possibility of distributing food to more tanks with one device by increasing the outputs and referenced the wiki accordingly.

      Having said this, we did not primarily design ZAFs for super large fish facilities, instead we designed the ZAF systems to facilitate adoption of fish models by many small and medium sized labs. We hope that our system will lower the bar for labs with moderate ressources to get started with aquatic models, or labs that just want to ‘try’ a new aquatic model organism ‘on-the-side’.

      5) how the length of the tubes influences the efficiency of feeding ? For feeding many tanks with the same ZAF it is necessary that the tubes will be of the same length. In that case, the system will become very cumbersome. Longer tubes will probably need stronger pumps. What's the maximal length of tubes tested ? That will limit the number of aquariums a ZAF can feed.

      how the length of the tubes influences the efficiency of feeding ? For ZAF the size of the tubes is very important because its design assumes homogeneous food distribution. In contrast, ZAF+ distributes the entire amount of water and food mix to each tank sequentially, so the tube length is not an issue. To make sure that tube length or tube layout is not affecting feeding efficiency we evaluated the weight of fish coming from tanks housed on two different rows (top and bottom). This was not clearly explained in the methods section -- we changed the text to reflect that. Additionally, at the end of each ZAF+ run, the washing sequence runs a relatively large quantity of water to ensure that all food gets flushed out to the right tanks. We did not evaluate the precise amount of food delivered. However after each feeding and cleaning all tubes are empty (see last sentences of the Box 2).

      For feeding many tanks with the same ZAF it is necessary that the tubes will be of the same length. In that case, the system will become very cumbersome. This is a fair concern. However, with a good design and with the help of cable tie it is very easy to organise the tubing, and avoid ‘tube-hell’. We added a sentence to clarify the organisation in the wiki (see ZAF>Hardware>Tubing in wiki) .

      Longer tubes will probably need stronger pumps. What's the maximal length of tubes tested ? That will limit the number of aquariums a ZAF can feed. We never precisely measured that because the generic pumps we use are very powerful and their running time can be adjusted in the software by changing the constants in the code source (see troubleshooting new supplementary table). Therefore the length of tubes should not be a limiting factor. Even stronger pumps (more amps) can be readily sourced on Amazon if really needed -- although we doubt that this is necessary. Regarding the number of tanks that ZAF can feed, we simply recommend adding more pumps to increase its capacity (see previous comments or “tips to scale up ZAF” in the wiki).

      Despite these comments, this is an excellent first approach, and the fact that the authors made it open-source and open access, make the ZAFs a very important contribution to the community. I have no doubt that some fish facilities will implement it and the community will help to improve it. Thank you. We do think that the main benefit of an open source project is the community around it. We are currently collecting a growing list of interested labs and we are interested in organising an online workshop to discuss ZAF and ZAF+, with some talks, QAs, and more to help people getting started.

      Reviewer #1 (Significance (Required)):

      This is the first open-source open-access automatic feeding system ever published.

      It is the first but very important step to the automation of research fish facilities.

      **Referee Cross-commenting**

      I agree with all the other reviewers.

      We also have to take into account that the system is a first prototype and although not ideal, it is open source. This will allow other labs to develop and improve their own models based on the ZAF.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary**

      The manuscript proposes an open source automated feeder for zebrafish facilities, although it would be amenable to other species. Overall, the manuscript is clearly written and easy to understand, the wiki is well sourced and clear. The commitment to open source is commendable.

      I have some questions regarding the long-term sustainability of this setup, as well as some discrepancies in the methods. Finally, as this aims to be useful to people with no engineering/electronics competence, I feel that it is not yet at a level that is accessible enough.

      We are very pleased to see that the Reviewer appreciates our manuscript and our commitment to open access. We thanks the Reviewer for his comments, in particular the comments about accessibility, and address them bellow:

      **Major comments**

      It would be useful to have a centralized list of parts and components, which would make it easier for users to order all that is needed to assemble the ZAF or ZAF+, at the moment the information is distributed through the wiki as hyperlinks.

      Extremely important! This was clearly an oversight on our part. We agree that a table listing all the components would help for constructing ZAF and ZAF+. We have added two tables in the wiki, one for ZAF and another for ZAF+, with all the necessary parts and components required to build both devices, with articles number, supplier and cost in dollars. Thanks to the reviewer for this excellent suggestion.

      A troubleshooting guide for the common problems the team ran into (if any) would be useful for newcomers, even just as issues on the GitHub. The team may also consider some form of chat/forum/google group to allow discussions between users and experts.

      The reviewer raised an important point so we added to the ZAF wiki a troubleshooting guide to help users by listing the minor malfunctions that we observed. Additionally, users will be able to ask questions or report bugs on the ZAF GitHub using issues. Github issues will allow discussion and to track ideas and feedback within the ZAF user community. Finally, we just created a Gitter room: https://gitter.im/ZAF-Zebrafish-Automatic-Feeder to enable more interactive discussion.

      Did the author observe any algal or bacterial growth in the feeding tubes over the 60 days? Do they have an estimate on how long the tubes stay "clean" enough? The authors mention tube changing every 10 weeks, can they explain the rationale, and did they assess the bacterial/algal contamination over that time? Do the splitter panel and food mixing flask also need replacing regularly?

      After several weeks of usage we indeed observed algal and bacterial growth in the tubes. In order to report and justify the need to change the tubes, we made a new supplementary figure illustrating the tube cleanliness over time, mainly algal and bacterial (see Suppl. Fig 3). We realised that 12 weeks is actually the optimal tubing renewing period in our facility. Algal and bacterial growth depends on the facility environment characteristics such as light intensity, water and air temperature, as well as feeding frequency and therefore might be adapted to the users facility specs. The splitter tubing can be changed based on user observations; we now mention this in the ZAF tubing supplementary material and on the wiki.

      The authors mention that the tubing needs to be of similar length to ensure similar resistance and food distribution, did they compare the body weight of fish in racks at the top or at the bottom of their system? There are no overall differences, but maybe the bottom racks would received slightly more food? Furthermore, did they quantify the differences in food/water delivery as a function of length differences?

      The requirement for similar length is only necessary for ZAF because its accessible design assumes homogeneous distribution of the water-food mix through a passive splitter system which is susceptible to variable fluid resistance. In contrast, ZAF+ distributes the water-food mix one tank at a time -- ensuring that the correct amount of food is entirely flushed through any required tube length (the pumps are strong enough and we flush enough water). In the eventuality that the tube length is too long the user can adjust the pump running time by changing constants in the code (see troubleshooting table in the wiki and corresponding links).

      We thank the reviewer for suggesting to evaluate the fish weight on fish from two extremal heights. Although we did not explicitly report this in the first version of the manuscript, we had actually anticipated this potential issue and therefore we did collect data for ZAF and ZAF+ for tanks housed on the top and bottom rows. We added a clear description of the weighting process in the material and method, highlighting the housing condition of the tanks tested.

      Finally, after each feeding run the tubes have been fully flushed and are empty without food debris or pellets remaining, irrespective of their sizes. So we did not find it relevant to evaluate the precise amount of food effectively delivered as we control that already upstream.

      Methods fish weight: The methods mention different amounts of food than the wiki, the rationale in the wiki is also different from the 5% of body weight outlined in the methods (which then matches the food amount of the methods). Which is the correct amount?

      We thank the reviewer for noticing the inconsistency. The method numbers are the correct one so we changed the wiki, we made a mistake when editing the figures. We wrote some sections of the wiki early during the development of the hardware. We unfortunately forgot to correct the inconsistencies.

      The code is decently commented for scientific software with clear variable names, but I wonder how flexible it is if users cannot get access to the specific hardware (especially the pumps) used in ZAF/ZAF+? Can the authors briefly comment on this point?

      The pumps are just built from 12V motors, you can find a large variety of such pumps online (Amazon, etc…), we have ourselves tried several, but there is no need to have the exact same model. We added a note to the tubing section of the ZAF and ZAF+ about that.

      The only components that cannot be easily exchanged are the arduino and Raspberry PI, but that is not an issue as these are very easily sourced components.

      The wiki could use more pictures or, to borrow the Proust Madeleine allusion, schematics akin to LEGO with more intermediary steps clearly outlined. Some pictures are also a bit small/busy (such as 2D and 2E in the frame section, or the magnet pictures), they may benefit from cartoons/schematics to clarify what is done. Alternatively, videos/timelapses may help with better visualising the assembly.

      We appreciate the reviewer comments and added new pictures, schematic and extra legends in the wiki to help potential ZAFs builders. In the wiki for ZAF hardware we increased the size of all the pictures for all the different steps and added new legends to clarify the assembly. There are also now more pictures illustrating the construction steps (i.e in “frame”, “pumps and valve”) and we added a simple schematic for “servo and food container”. Picture sizes have been increased in “ZAF electronics” and added to the “Raspberry Pi and Servo Hat” section. We increased the picture sizes and added more legends to the ZAF+- Hardware “Pumps & Valve'. Moreover, we added more photos to the “tubing” section and the “ZAF+ Electronics” section.

      We agree that videos or gifs would have been great to visualize the assembly. Unfortunately, we did not record such videos during the construction. We created ZAF as an open source project and clearly hope to generate a community that will share assembly pro-tips and may be constructions videos on the github.

      Our institute is expanding on zebrafish research so we will build additional ZAFs and will use this opportunity to prepare nice videos to add to the wiki. We envision that the wiki will be improved over time with better material, some of it contributed, as well as perhaps newer and better versions of ZAF.

      The main question that would affect if this approach were taken up would be how reliable it is in the long run. Have the authors experienced any issue over the 2 months test? Is this system still being used currently? If so, could the authors update the water quality logs?

      The reviewer suggests that the key question is to see if using ZAFs all year long is possible. We can reply yes, it is actually possible! We have used ZAF for 9 months, and now ZAF+ for the past 10 months in our fish facility, with great success. We never experienced major malfunctions and the minor issues we encountered are reported in the troubleshooting table. Since ZAF and ZAF+ have been used daily for months with logs recorded every day we have updated the water logs quality to 3 months. We have been using the ZAFs in full autonomy for a total of 19 months, frankly invaluable.

      Getting a sense of how long it can run without problems, how much troubleshooting is involved per month would be very useful in answering those questions.

      Except manual cleaning and tube replacement, there is no other big maintenance on ZAF. Of course, the food reserve needs to be changed at least once per week. We listed the malfunctions in the troubleshooting guide in the wiki. In our facility ZAFs require an average of 1 hour of maintenance per month. And if any hardware part fails you can just immediately replace it because all the parts are cheap and easily replaceable. Actually, we recommend keeping spare parts of all the key components (pumps, valves, arduino, Raspberry Pi, tubes, ...).

      **Minor comments**

      • Main text page 3: Fig. Supp. 2 instead of Supp. Fig. 2. Furthermore, would the authors have similar data for the manual feeding? If so, it could be useful to add here for comparison (although that is not necessary if the data is unavailable).

      We changed the text but we don’t have data available for the water logs with manual feeding.

      Main text page 3: it would be useful to add how long it takes to change all the tubing after 10 weeks?

      This is really dependent on ZAF tubing and the fish facility, in our hand for about one hour. We mentioned it in the results section, ZAF paragraph.

      Methods fish weight: The phrasing as it stands make it unclear the same method was used for ZAF and ZAF+, the authors may consider to start with the description of the common weighting method, then the specifics of ZAF+.

      Thank you, we changed the text accordingly.

      Supp.Fig.1a: "Waste water drain pipe"

      Thank you, we changed the text accordingly.

      Acknowledgments: "...for their help..."

      Thank you, we changed the text accordingly.

      ZAF - Servo Hat connection: "to control the pumps"

      Thank you, we changed the text accordingly.

      ZAF - Installation: the dependencies should be listed as they are in ZAF+, or the two sections merged, unless the GUI is not functional (see below).

      Thank you, we now list the dependencies in the wiki.

      ZAF - How to use: there is no mention of the GUI, is it not yet implemented? If not, is the touch screen needed?

      The standard ZAF hardware is controlled by a very simple python-based program that works with a command line interface. Therefore to interact with the Raspberry Pi for installation and configuration we strongly recommend building ZAF with a screen, and the touch screen is an easy way to be able to quickly point and click in the absence of a mouse -- which can be cumbersome when no clean horizontal surfaces are available in a lab environment.

      ZAF+ - soldering: "A 12V power supply (at least 10A best 20A) provides power to the electronics, except the Raspberry Pi and the two Arduino Megas." It seems the sentence is incomplete, or at least I cannot make sense of it.

      Changed to “A 12V power supply (at least 10A, but ideally 20A) provides power to the electronics, except for the Raspberry Pi and the two Arduino Megas that are powered by the Raspberry Pi 5V GPIOs.”

      Reviewer #2 (Significance (Required)):

      This manuscript provides a significant technical advance to the zebrafish field. The proposed automated feeder would be a very useful option for smaller labs, to ensure the consistency of feeding, and to remove one of the routine aspects of fish husbandry.

      As the authors state, there is certainly interest in the zebrafish community [9,10] for automation of feeding. I am not aware of other DIY fully automated feeding system, commercial systems do exist, but are expensive.

      The manuscript, and proposed automated feeder, would certainly be of interest within the zebrafish community, as well as other researchers using aquatic models that can rely on dry food. How many in the community would embrace this method will depend on how confident they are in the long-term stability.

      I am neither electronics, nor husbandry expert. As such I am not qualified to comment on any long-term approach this may prove, if any, for fish health. My expertise lies in image and data analysis, as well as microscopy.

      **Referee Cross-commenting**

      I think the major points are shared by all reviewers, I think the other reviews are fair in their content and I have nothing specific to comment on.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      This technical report describes an open-source fully automated feeding system for husbandry of zebrafish (and potentially other aquatic organisms). It provides detailed instructions for assembling individual components into two different feeding systems of varying adaptability, as well as their operation. Links to relevant control software are also provided. The characterization of the systems' performance appears somewhat limited (e.g. only maintenance of adult fish over a period of 8 weeks and use of dry food is documented). These systems could be of use for husbandry in a large number of research labs, and, in

      addition, for automated reward delivery in large-scale associative conditioning assays.

      We thank the Reviewer for his encouraging comments and appreciate his helpful suggestions. We answer to the Reviewer comments bellow:

      **Major comments:**

      Providing food to large numbers of tanks in aquatic animal facilities in a regular fashion is a time- and resource-consuming process. Some automated feeding systems for large numbers of tanks are commercially available, but these feeder robots are expensive and are restricted to systems of specific vendors. Therefore, an adaptable automated system that can be assembled from off-the-shelf components is a very attractive option for many research labs to both save resources and standardize the feeding process.

      The instructions for assembly provided by the authors appear quite detailed and sufficient to allow non-experts the assembly and operation of the automated feeder systems. The design of the system appears appropriate for the task.

      While additional experiments are not required to support the claims of the article, I feel that it would be significantly improved by the provision of additional information. My suggestions in that regard include:

      Description of the washing procedure of the system (which solvents, how often, how long?). The authors mention that an exchange of the tubing is required every 10 weeks, but since the tubing transports liquid food mixture, it is easily conceivable that microbial growth will occur rapidly in the system without thorough hygiene / washing procedures. Also could the authors provide some information, which type of tubing material they are using (Silicone, Tygon etc.)?

      Description of the washing procedure of the system (which solvents, how often, how long?).

      We agree that the cleaning procedure must be clarified. So we added a more clear description of the process in the first paragraph of the discussion and clarified the explanation about cleaning in Box 1 and Box 2 (suggested also by the reviewer1). To summarise there are two levels of cleaning, the first one happens just after a food distribution program by flushing water and air in the system (Box1). Additionally at least once a day, we run an entire program without food, to rinse/clean the system (Box2). This last step is programmable using ZAFs software.

      The authors mention that an exchange of the tubing is required every 10 weeks, but since the tubing transports liquid food mixture, it is easily conceivable that microbial growth will occur rapidly in the system without thorough hygiene / washing procedures

      Following all reviewers' comments we added an extra supplementary figure justifying the need of changing the tubes every 12 weeks (updated based on our latest observations). We monitored the cleanliness (algal/microbial growth) of the tubes and realized that it becomes necessary to replace the tubes every 12 weeks (supp figure 3). Interestingly, we remarked that the microbial and algal growth depends on the facility specificities such as light intensity and temperature.

      Also could the authors provide some information, which type of tubing material they are using (Silicone, Tygon etc.)?

      For ZAF we used silicone based tubing then we changed to PVC based tubes for ZAF+ because they are cost effective and have similar specifications for our usage. We added a note about the tubing material in the wiki ZAF tubing and ZAF+ tubing.

      In a related point, I was left wondering how long the food is being mixed in the mixing flask before being applied to the animals? Too long mixing might lead to a loss of nutrients into the solution (through diffusion). Could the authors comment on that, please? Do the food pellets remain more or less integral so that the majority of delivered food is actually ingested by the fish?

      • In a related point, I was left wondering how long the food is being mixed in the mixing flask before being applied to the animals? Too long mixing might lead to a loss of nutrients into the solution (through diffusion). Could the authors comment on that, please? Very relevant point, indeed it is very important for the food to not be mixed too long in water to avoid pellet dissolution in water and loss of nutrients. The food manufacturer website mentioned: “duration of “wet” feeding should be kept short” (https://zebrafish.skrettingusa.com/pages/faq). Therefore we adapted our feeding program to keep the “wet” feeding extremely short. For ZAF and ZAF+, the software is designed to deliver the mix of food and water to tank(s) within 3 minutes at most. To clarify this, we added in the Box describing the feeding, a sentence : “Overall, they share many common features, like the quick distribution of food and water mix, to avoid pellet dissolution in water and loss of nutrients.”

      • Do the food pellets remain more or less integral so that the majority of delivered food is actually ingested by the fish? We manually evaluated the integrity of food pellets in the early phase of development, these parameters being difficult to quantify, we decided to record the fish weight as a readout of good food delivery and general effectiveness. However, we clearly understand the reviewer's remarks and therefore added to the manuscript a supplementary video that shows the distribution of the food pellets and their integrity once they reach the tanks.

      In yet another related point, I was left wondering, whether the authors observed any negative impact of feeder usage on water quality (besides pH and conductivity, which they report)? Especially, with regards to ammonia that might arise from the decomposition of uneaten food items?

      Ammonia toxicity is mentioned to induce clinical and microscopic changes that reduce growth and increase susceptibility to pathogens according to aquaculture textbooks as summarized here: https://zebrafish.org/wiki/health/disease_manual/water_quality_problems#ammonia_toxicity). However, we never experienced such abnormal phenotypes in our facility and our regular aquatic PCR health monitoring profiles have always been negative for pathogens. Additionally, high ammonia is influenced by husbandry conditions, such as important fish density or inappropriate water circulation, characteristics that are not present in our fish facility. Therefore we did not find relevant to test for ammonia levels.

      The authors only tested the feeder on adult fish, but discuss that it would easily be transferable to a system that is used for raising fish fry. In that context, could the authors comment, on whether the system of using water as the carrier for the dry food (after mixing) would work as well for the smaller pellets required in feeding fish fry (e.g. 75 or 100 um pellet size as compared to the 500 um pellet size they use)? With smaller pellets, break-down of the dry food during the mixing process seems to be an even larger problem, I could imagine.

      We appreciate the reviewer's comment about using different food pellets sizes, a very important point for ZAFs adoption beyond adult fish. During ZAFs testing we actually tested different food sizes (from 100uM pellets to 500uM) and did not observe differences in pellet distribution. Most of the industrial aquatic food pellets are oily and designed for automatic distribution (for large farming environments). Therefore they keep their integrity and are not easily broken. Besides, during food distribution, as mentioned previously, the duration of wet food (water and food mix) is relatively short, which helps maintain pellet integrity.

      **Minor comments:**

      (1) the average weight of animals is given as lying in the range of 5 to 6g. That seems very high. The "standard" weight range of adult zebrafish is more around 1g [see, for example: Clark, T. S., Pandolfo, L. M., Marshall, C. M., Mitra, A. K. & Schech, J. M. Body Condition Scoring for Adult Zebrafish (Danio rerio). j am assoc lab anim sci (2018)]. Could the authors comment on that discrepancy?

      Good observation by the reviewer. We did make a mistake during figure preparation and our legends were actually not reflecting the exact weight of the fish. The scale bars of the figures have been changed to reflect the real weight of the fish (below 1g). We thank the reviewer for noticing the mistakes.

      (2) The authors state that spawning success is not negatively affected by the automated feeding, and they quantify the number of successful crosses. Could the authors briefly confirm or state, that or whether the clutch size was also unaffected?

      We never precisely quantified the clutch size/quality but we are now using ZAFs for the feeding of our facility for 19months and never observed any problem with our clutch. Our lab is working on early development and crucially relies on clutch quality.

      (3) The manual feeding procedure / regime that is used to compare husbandry success against the automated feeding regime is not described in any detail. That seems important given the topic of the article.

      We agreed and added a brief description of the protocol in the Methods section (“Animal and husbandry”).

      (4) The authors cite two recent papers that describe semi-automatic feeding systems for zebrafish in the introduction. The authors might want to consider discussing some key differences between their system and these semi-automatic systems in the discussion.

      The two published semi-automatic feeding systems are completely different from the devices presented in our paper. They are also open access but they are devices that need to be manually operated by facility staff. In contrast, our solutions are fully automatic and do not require the human hand during operation. We mention these two solutions during our brief literature overview in the introduction. However, since these are in a different category, we did not judge it necessary to comment on them in the discussion.

      (5) What do the error bars in Fig. 1c signify (s.d., s.e.m.)? Please state in Figure legend.

      We thank the reviewer for their attention to details and explain in the figure that we mean standard error of the mean by s.e.m.

      (6) I do think that the system could be of particular interest to researchers that study learning and that use food rewards in automated associative conditioning experiments. While this might be obvious to researchers with such an interest, this aspect is not at all discussed in the paper. Mentioning it might further underscore the versatility of the feeder system.

      We agree with the reviewer that ZAF can be adapted to experimental conditions such as behavioral conditioning, nutritions and drug delivery. Any experiment requiring the automatic delivery of solid pellets or liquid can benefit from ZAF. We revised our text and mentioned it in the discussion.

      (7) A list of all required equipment with vendors and price estimates (e.g. in the Supplement) would make this paper an even more readily accessible resource.

      This is a very important point already suggested by another reviewer. We added two extra tables in the wiki with the necessary parts and components, listing models, references, and prices.

      Reviewer #3 (Significance (Required)):

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This article signifies a purely technical advance in that it provides a characterization of an open-source, scalable automated feeder for aquatic facilities. As such, it presents a significant advance in the field of aquatic animal husbandry. In addition, this system could also be useful for automated large- or medium-scale associative conditioning paradigms, in which food rewards are given as positive reinforcers.

      Place the work in the context of the existing literature (provide references, where appropriate).

      The authors refer to previously published semi-automatic feeder systems. Regardless of the advantages or disadvantages of all these systems, the field will benefit from a broad(er) choice of automatic feeding systems that are described in sufficient detail to be easily assembled in the laboratory.

      State what audience might be interested in and influenced by the reported findings.

      This study is of interest for any research laboratory working with zebrafish or other aquatic model organisms. Thus, the audience for this article is very broad. Specific interest might also arise in researchers that are performing learning studies in zebrafish (see above).

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Zebrafish, neural circuits, sensory systems.

      **Referee Cross-commenting**

      Many of the major points are shared by all three reviewers. Beyond these shared points, I agree with the other reviews; they raise important questions. All reviews are fair, in my opinion.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      This technical report describes an open-source fully automated feeding system for husbandry of zebrafish (and potentially other aquatic organisms). It provides detailed instructions for assembling individual components into two different feeding systems of varying adaptability, as well as their operation. Links to relevant control software are also provided. The characterization of the systems' performance appears somewhat limited (e.g. only maintenance of adult fish over a period of 8 weeks and use of dry food is documented). These systems could be of use for husbandry in a large number of research labs, and, in addition, for automated reward delivery in large-scale associative conditioning assays.

      Major comments:

      Providing food to large numbers of tanks in aquatic animal facilities in a regular fashion is a time- and resource-consuming process. Some automated feeding systems for large numbers of tanks are commercially available, but these feeder robots are expensive and are restricted to systems of specific vendors. Therefore, an adaptable automated system that can be assembled from off-the-shelf components is a very attractive option for many research labs to both save resources and standardize the feeding process.

      The instructions for assembly provided by the authors appear quite detailed and sufficient to allow non-experts the assembly and operation of the automated feeder systems. The design of the system appears appropriate for the task.

      While additional experiments are not required to support the claims of the article, I feel that it would be significantly improved by the provision of additional information. My suggestions in that regard include:

      Description of the washing procedure of the system (which solvents, how often, how long?). The authors mention that an exchange of the tubing is required every 10 weeks, but since the tubing transports liquid food mixture, it is easily conceivable that microbial growth will occur rapidly in the system without thorough hygiene / washing procedures. Also could the authors provide some information, which type of tubing material they are using (Silicone, Tygon etc.)?

      In a related point, I was left wondering how long the food is being mixed in the mixing flask before being applied to the animals? Too long mixing might lead to a loss of nutrients into the solution (through diffusion). Could the authors comment on that, please? Do the food pellets remain more or less integral so that the majority of delivered food is actually ingested by the fish?

      In yet another related point, I was left wondering, whether the authors observed any negative impact of feeder usage on water quality (besides pH and conductivity, which they report)? Especially, with regards to ammonia that might arise from the decomposition of uneaten food items?

      The authors only tested the feeder on adult fish, but discuss that it would easily be transferrable to a system that is used for raising fish fry. In that context, could the authors comment, on whether the system of using water as the carrier for the dry food (after mixing) would work as well for the smaller pellets required in feeding fish fry (e.g. 75 or 100 um pellet size as compared to the 500 um pellet size they use)? With smaller pellets, break-down of the dry food during the mixing process seems to be an even larger problem, I could imagine.

      Minor comments:

      (1) the average weight of animals is given as lying in the range of 5 to 6g. That seems very high. The "standard" weight range of adult zebrafish is more around 1g [see, for example: Clark, T. S., Pandolfo, L. M., Marshall, C. M., Mitra, A. K. & Schech, J. M. Body Condition Scoring for Adult Zebrafish (Danio rerio). j am assoc lab anim sci (2018)]. Could the authors comment on that discrepancy?

      (2) The authors state that spawning success is not negatively affected by the automated feeding, and they quantify the number of successful crosses. Could the authors briefly confirm or state, that or whether the clutch size was also unaffected?

      (3) The manual feeding procedure / regime that is used to compare husbandry success against the automated feeding regime is not described in any detail. That seems important given the topic of the article.

      (4) The authors cite two recent papers that describe semi-automatic feeding systems for zebrafish in the introduction. The authors might want to consider discussing some key differences between their system and these semi-automatic systems in the discussion.

      (5) What do the error bars in Fig. 1c signify (s.d., s.e.m.)? Please state in Figure legend.

      (6) I do think that the system could be of particular interest to researchers that study learning and that use food rewards in automated associative conditioning experiments. While this might be obvious to researchers with such an interest, this aspect is not at all discussed in the paper. Mentioning it might further underscore the versatility of the feeder system.

      (7) A list of all required equipment with vendors and price estimates (e.g. in the Supplement) would make this paper an even more readily accessible resource.

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This article signifies a purely technical advance in that it provides a characterization of an open-source, scalable automated feeder for aquatic facilities. As such, it presents a significant advance in the field of aquatic animal husbandry. In addition, this system could also be useful for automated large- or medium-scale associative conditioning paradigms, in which food rewards are given as positive reinforcers.

      Place the work in the context of the existing literature (provide references, where appropriate). The authors refer to previously published semi-automatic feeder systems. Regardless of the advantages or disadvantages of all these systems, the field will benefit from a broad(er) choice of automatic feeding systems that are described in sufficient detail to be easily assembled in the laboratory.

      State what audience might be interested in and influenced by the reported findings. This study is of interest for any research laboratory working with zebrafish or other aquatic model organisms. Thus, the audience for this article is very broad. Specific interest might also arise in researchers that are performing learning studies in zebrafish (see above).

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Zebrafish, neural circuits, sensory systems.

      Referee Cross-commenting

      Many of the major points are shared by all three reviewers. Beyond these shared points, I agree with the other reviews; they raise important questions. All reviews are fair, in my opinion.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The manuscript proposes an open source automated feeder for zebrafish facilities, although it would be amenable to other species. Overall, the manuscript is clearly written and easy to understand, the wiki is well sourced and clear. The commitment to open source is commendable. I have some questions regarding the long-term sustainability of this setup, as well as some discrepancies in the methods. Finally, as this aims to be useful to people with no engineering/electronics competence, I feel that it is not yet at a level that is accessible enough.

      Major comments

      • It would be useful to have a centralized list of parts and components, which would make it easier for users order all that is needed to assemble the ZAF or ZAF+, at the moment the information is distributed through the wiki as hyperlinks.

      • A troubleshooting guide for the common problems the team ran into (if any) would be useful for newcomers, even just as issues on the GitHub. The team may also consider some form of chat/forum/google group to allow discussions between users and experts.

      • Did the author observe any algal or bacterial growth in the feeding tubes over the 60 days? Do they have an estimate on how long the tubes stay "clean" enough? The authors mention tube changing every 10 weeks, can they explain the rationale, and did they assess the bacterial/algal contamination over that time? Do the splitter panel and food mixing flask also need replacing regularly?

      • The authors mention that the tubing needs to be of similar length to ensure similar resistance and food distribution, did they compare the body weight of fish in racks at the top or at the bottom of their system? There are no overall differences, but maybe the bottom racks would received slightly more food? Furthermore, did they quantify the differences in food/water delivery as a function of length differences?

      • Methods fish weight: The methods mention different amounts of food than the wiki, the rationale in the wiki is also different from the 5% of body weight outlined in the methods (which then matches the food amount of the methods). Which is the correct amount?

      • The code is decently commented for scientific software with clear variable names, but I wonder how flexible it is if users cannot get access to the specific hardware (especially the pumps) used in ZAF/ZAF+? Can the authors briefly comment on this point?

      • The wiki could use more pictures or, to borrow the Proust Madeleine allusion, schematics akin to LEGO with more intermediary steps clearly outlined. Some pictures are also a bit small/busy (such as 2D and 2E in the frame section, or the magnet pictures), they may benefit from cartoons/schematics to clarify what is done. Alternatively, videos/timelapses may help with better visualising the assembly.

      • The main question that would affect if this approach were taken up would be how reliable it is in the long run. Have the authors experienced any issue over the 2 months test? Is this system still being used currently? If so, could the authors update the water quality logs? Getting a sense of how long it can run without problems, how much troubleshooting is involved per month would be very useful in answering those questions.

      Minor comments

      • Main text page 3: Fig. Supp. 2 instead of Supp. Fig. 2. Furthermore, would the authors have similar data for the manual feeding? If so, it could be useful to add here for comparison (although that is not necessary if the data is unavailable).

      • Main text page 3: I would be useful to add how long it takes to change all the tubing after 10 weeks?

      • Methods fish weight: The phrasing as it stands make it unclear the same method was used for ZAF and ZAF+, the authors may consider to start with the description of the common weighting method, then the specifics of ZAF+.

      • Supp.Fig.1a: "Waste water drain pipe"

      • Acknowledgments: "...for their help..."

      • ZAF - Servo Hat connection: "to control the pumps"

      • ZAF - Installation: the dependencies should be listed as they are in ZAF+, or the two sections merged, unless the GUI is not functional (see below).

      • ZAF - How to use: there is no mention of the GUI, is it not yet implemented? If not, is the touch screen needed?

      • ZAF+ - soldering: "A 12V power supply (at least 10A best 20A) provides power to the electronics, expect the Raspberry Pi and the two Arduino Megas." It seems the sentence is incomplete, or at least I cannot make sense of it.

      Significance

      This manuscript provides a significant technical advance to the zebrafish field. The proposed automated feeder would be a very useful option for smaller labs, to ensure the consistency of feeding, and to remove one of the routine aspect of fish husbandry.

      As the authors state, there is certainly interest in the zebrafish community [9,10] for automation of feeding. I am not aware of other DIY fully automated feeding system, commercial systems do exist, but are expensive.

      The manuscript, and proposed automated feeder, would certainly be of interest within the zebrafish community, as well as other researchers using aquatic models that can rely on dry food. How many in the community would embrace this method will depend on how confident they are in the long-term stability.

      I am neither electronics, nor husbandry expert. As such I am not qualified to comment on any long-term approach this may prove, if any, for fish health. My expertise lies in image and data analysis, as well as microscopy.

      Referee Cross-commenting

      I think the major points are shared by all reviewers, I think the other reviews are fair in their content and I have nothing specific to comment on.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Lange et al. have developed an automatic feeding system for zebrafish facilities. The system is open-source and relatively easy to implement. The authors propose to systems, one that delivers the same amount of food for each aquarium (ZAF) and a second (ZAF+) that can adjust the amount of delivered food to each aquarium. The authors show no difference in fish weight, spawning and water quality, when fed using the automatic system or manually.

      On my opinion, the ZAF and ZAF+ are an excellent first approach to solve the complex problem of automatizing feeding in fish facilities. So far, only one company offers this option which is extremely expensive and demands a lot of maintenance.

      The manuscript is very well written and easy to follow. The supplementary material is very well detailed. It is clear that the authors intended to facilitate the implementation of the ZAF by potential users.

      I just have a few comments regarding the system:

      1) The authors do not indicate how the system is cleaned. the system drains it self, but will any deposits of food remain in the tubes ? Why the system is not flushed with clear water after each feeding? do the tubes get clogged ?

      2) How long the system was tested for?

      3) The ZAFs were used to feed 16 aquariums. For such a small rack, manually feeding takes less than 5 min. The authors should highlight that, at least for such small systems, the ZAFs will be especially very useful for feeding during weekends and holidays. Still, adding 16 commercially available small automatic feeders to each aquarium, could be simpler to implement.

      4) How do authors envisage implementing the ZAFs in much larger facilities (from 100 to 1000 tanks). Implementing a specific ZAF for each rack containing ~20 tanks may not be realistic.

      5) how the length of the tubes influences the efficiency of feeding ? For feeding many tanks with the same ZAF it is necessary that the tubes will be of the same length. In that case, the system will become very cumbersome. Longer tubes will probably need stronger pumps. What's the maximal length of tubes tested ? That will limit the number of aquariums a ZAF can feed.

      Despite these comments, this is an excellent first approach, and the fact that the authors made it open-source and open access, make the ZAFs a very important contribution to the community. I have no doubt that some fish facilities will implement it and the community will help to improve it.

      Significance

      This is the first open-source open-access automatic feeding system every published. It is the first but very important step to the automation of research fish facilities.

      Referee Cross-commenting

      I agree with all the other reviewers.

      We also have to take into account that the system is a first prototype and although not ideal, it is open source. This will allow other labs to develop and improve their own models based on the ZAF.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to reviewer comments on:

      “Recruitment of Scc2/4 to double strand breaks depends on γH2A and DNA end resection”, by Martin Scherzer et al

      We would like to thank the editors and reviewers for their time spent, as well as their appreciated and insightful comments on our manuscript. We have now initiated the revision as outlined point by point below. We provide a description of the plan for how to resolve the points of concern still remaining and also list the modifications and improvements already incorporated in the revised and transferred manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): __ __In the manuscript entitled "Recruitment of Scc2/4 to double strand breaks depends on yH2A and DNA end resection", Scherzer et al. study the role of Scc2 in DSB repair in yeast. Scc2 is part of the cohesin loader and it is required for cohesin loading in response to DSB. The authors study the chromatin association of Scc2 by ChIP-qPCR and use genetics to identify factors that affect its recruitment. They show that Scc2 is enriched up to 10 kb from the break site, similar to cohesin and identify MRE, TEL1 and yH2A as important factors for Scc2 chromatin binding. Remarkably, MEC1 that has been shown to regulate cohesin under these conditions is dispensable for Scc2 recruitment. While DNA resection is important for Scc2 recruitment, chromatin remodelers don't play a significant role in it despite numerous reports on their effect on cohesin loading during the cell cycle. The manuscript provides new and important information on cohesin regulation in response to DNA damage. **Major comments:** The experiments are done appropriately and contain the required control. The results are presented clearly and with adequate statistics and support the conclusions. The experiments provide valuable information. However, the low resolution of the experimental setup is limiting, and dynamic information of Scc2 binding is lacking. I would agree with the authors that this kind of information may be beyond their scope. However, the absence of this information reduces the overall impact of the manuscript.

      1. ChIP-seq, of at least some of the key experiments, could provide information on the specific Scc2 binding sites and elucidate whether cohesin is translocated from the loading sites or accumulate in its proximity.

      ChIP -seq would indeed increase the resolution of the Scc2 and Cohesin DSB accumulation, especially beyond 1 kb. However, to gain insight into the dynamics of the binding, numerous timepoints for both strains would have to be analyzed, which we feel would be beyond the possibilities for this study (see also comment under point 4 of this document). For Scc2 we believe that we have shown high enough resolution, determining binding from 0,1 to 30 kb away from the break. We have also provided a time course experiment from 90 minutes up to 6 hours and show that the Scc2 binding is continuously increasing. We have in the revised version of the manuscript added experiments looking at the Cohesin binding in close vicinity of the break – similar to what we previously did for Scc2. With this we confirm the binding pattern of Cohesin previously reported. We have also compared Cohesin binding at 90 and 180 min after break induction, for increased information on the dynamics of its binding at the DSB, and see no change in Cohesin positioning in relation to the DSB site. Rather the general level of binding increases equally over the region, with time (compare Fig 1B and 4A with Fig 1C and Fig S3). This to us indicates that there is no translocation of Cohesin from one loading site to final binding sites. However, to further clarify this issue we plan to include ChIP qPCR experiments on an ATPase deficient mutant of Cohesin, which has been found to be able to be loaded on DNA but not translocated (Hu et al 2010, “ATP Hydrolysis is required for relocating Cohesin from sites occupied by its Scc2/4 loading complex”). These experiments will potentially allow us to explore the possibility that Cohesin is loaded at one (or several) site(s) in the DSB region and then translocated away to the final binding locations with time. The generation of such a strain is ongoing and the results from these experiments will be included in a fully revised version of the manuscript.**

      1. It has been suggested that Scc2 and Pds5 are mutually exclusive in cohesin complexes. It would be interesting to check in the current experimental setup (ChIP-qPCR) if Pds5 is mimicing Scc2 pattern

      We have generated a strain where Pds5 is FLAG-tagged, and include experiments determining the loading/binding of Pds5 at the break region in the revised version of the manuscript. These show (Fig S1B) that the binding of Pds5 mimics that of Cohesin, indicating that it binds as part of the Cohesin complex. In addition, it is seemingly not affected by the presence of a DSB and therefore most likely not important for the Scc2 or Cohesin loading at the DSB.

      **Minor comments:**

      1. Adding a threshold line to the graphs at fold change= 1 (no enrichment in respect to wild type) will increase their readability.

      We appreciate this suggestion, this has now been added, and is indeed helpful.

      1. Fig. 1A- Add times to the schematic. Modify the text to GAL addition/break induction.

      Thank you for the good suggestion, the figure has now been modified.

      1. Page 9. The authors write: "Cohesin failed to be loaded at the DSB in a mec1**Δ background (Fig 3A)". However, the figure shows reduced cohesin binding in mec1delta in respect to the wild type.

      In this graph Cohesin binding in response to break induction is shown. The level of binding in the mec1 deletion mutant is comparable to that of Cohesin in the absence of break induction, See Fig S3 for a newly added experiment showing wt binding of Cohesin at the same timepoint. The text describing Fig 3A on page 9 has also been slightly modified.

      1. Page 10. ".......recruitment to the DSB compared to wild type (Fig 3D)."Should be Fig. 4D.

      Thank you for noticing this mistake, this has now been corrected.

      1. Figure legend 3. "........Protein samples were taken after 3 hours arrest (G2/M, lane 1),....." The benomyl arrest is referred to as G2 arrest in the text but G2/M arrest in the legend. Consistency is needed.

      We agree on the need for consistency and have thus changed to G2/M throughout the manuscript.

      I suggest presenting the suggested model in a figure

      We plan to add an illustrative model figure as Fig 6 in a fully revised version of the manuscript.

      Reviewer #1 (Significance (Required)): I am an expert in cohesin biology. The Scc2-Scc4 complex has been identified as an essential factor for cohesin loading during the cell cycle (Ciosk et al., 2000). This function has been shown to be essential for cohesin role in response to DNA DSB (Unal et al., 2004, Strom et al., 2004). The interplay between Scc2 and the cohesin has been studied mostly in the context of the cell cycle. It has been shown that Scc2 activates the ATPase activity of cohesin and promotes its translocation from the loading site. Scc2 and Pds5 are mutually exclusive and their switch suppresses cohesin ATPase activity (Hu et al., 2011, Petela et al., 2011). However, the Scc2-cohesin interplay has been poorly studied in the context of DNA repair. The current work adds valuable information on the factors that recruits Scc2 to the break site and identifies end resection as the key event in this process. This information is novel and important and its contribution to the fields of cohesin and DNA repair should not be overlooked. However, ChIP-seq information can increase the overall impact.

      We appreciate the nice verdict. We do agree to some extent on the ChIP seq comment, however based on the discussion under major points 1, we do not see that adding ChIP sequencing experiments to this study will be possible.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Cohesin is a key structural component of chromosomes. Amongst its functions, cohesin plays a critical role in ensuring the accurate repair of double stranded DNA breaks (DSBs). Intuitive as this may seem, a number of fundamental open questions remain. One of these questions is, how does the cohesin loading machinery recognise a DSB? This issue is addressed in the present study. The manuscript begins with a well-written introduction into the fields of DSB repair, as well as cohesin. The research aim is clearly laid out. Experiments follow that sequentially investigate known steps of the DSB repair pathway, asking how these steps intersect with the cohesin loading machinery. On the positive side, this is a technically very well conducted study (investigating the cohesin loader has proven tricky in many contexts). The study is systematic and explores the known steps during DSB repair for their impact on cohesin loader recruitment. The authors find a surprising separation of function. The DSB pathway up until H2AX phosphorylation and DNA end resection is required for both cohesin loader recruitment, as well as consequently for cohesin loading. The Mec1 checkpoint kinase, in contrast, is dispensable for cohesin loader recruitment but is required for cohesin loading. This suggests that Mec1 supports cohesin loading at a step beyond that of attracting the cohesin loader. The manuscript thus contains important information that will be of interest to a wide range of researchers in the DNA repair and cohesin fields. The limitation of the study lies in the fact that the molecular determinant for cohesin loader recruitment to DSBs remains unknown. H2AX phosphorylation and DNA end resection are shown to be prerequisites, but how do these events form a molecular mark that the cohesin loader recognises? And what is this mark? Equally, how does the Mec1 kinase permit cohesin loading additionally to the cohesin loader?

      We appreciate the positive comments as well as the criticism. We are unfortunately fully aware of the lack of precise knowledge regarding the actual mark made by phosphorylation of H2A, and resection, for recruitment of Scc2. The same is true for the limited understanding of what the exact contribution of Mec1 for Cohesin loading is. We would have liked to execute a screening based approach to find the single determinant – however this has to be performed outside the scope of this study.

      **Specific comments:** Figure 1. It would be interesting to overlay the Scc2 prolife around the DSB next with that of Scc1 (obtained previously under similar conditions?), to contrast the loading site with the final cohesin distribution.

      In the revised version of the manuscript, we have looked at the binding of Cohesin close to the break and outwards in the same way as for Scc2, with this experimental system. These binding profiles are not overlapping shown as Fig 1B and 1C. Their different distribution is very clear. This also confirms what been reported previously for Cohesin binding, where the region closest to the break is in principle rather devoid of Cohesin (Fig 1C). This binding pattern is also not changed with increased time for break induction (Fig S3), indicating that there is likely no major translocation of Cohesin from a loading site to the final binding sites around the DSB, at least not during the time frame analyzed, but rather an overall increase in Cohesin binding in the break region. While we cannot exclude translocation completely, we hope that experiments using a Cohesin transition state mutant, deficient in translocation, will address this better.

      Figure 2. Using the same y-axis scale from 1-4 amongst panels A-D could make evaluation of the data easier.

      We agree the comparison is made easier when the scale is the same - this has now been changed within figures.

      Figure 3. Panels A and B contain data that are important to interpret the DNA end resection results shown in Figure S2. Maybe that latter data, which conveys the main conclusion from the figure, could be incorporated within the main figure?

      This is a good point and we have changed accordingly, now resection experiments in the absence of Scc2 from Fig S2 are shown as Fig 3C.

      Figure 5. In this figure, the authors begin to investigate possible contributions of candidate cohesin loader receptors, in the form of chromatin remodelling complexes. The Swr1 and INO80 remodellers have an effect on DNA end resection that parallels the effect on Scc2 recruitment, suggesting that their main contribution might be that of facilitating DNA end resection.

      This relationship remains less well documented in the case of Sth1 depletion. Both when using the sth1-3 allele, or degron depletion, the authors observe a relative reduction of cohesin loader recruitment, compared to what they would otherwise expect. However, in both cases a side-by-side analysis of a similarly-treated wild type strain is missing. Whether or not RSC inactivation impacts cohesin loader recruitment therefore remains uncertain.

      In the revised version of the paper we have included experiments where wild-type cells were grown in the same culturing system as the Sth1 degron strain, included as Figure 5A. The best control would be to use the Sth1 degron strain and not degrade Sth1 as the wt control. However the poor growth of these cells in -Met media with raffinose as the sole carbon source is not compatible with the design of this experiment.

      For the experiment including the ts allele of Sth1 the wt control was not possible to keep arrested in G2 during the course of the experiment. We agree that a comparison with a wt control would be interesting, however due to not having a proper readout for the impairment of sth1 we decided to omit the data from the ts strain in the manuscript. Based on our results we would conclude that Sth1 inactivation affects Scc2 recruitment due to impaired end resection, deem it unlikely though that this is mediated by direct interaction, as has been shown in S-phase.

      It is also not documented what the corresponding effect of RSC inactivation on DNA end resection might be. Given that previous results suggested that RSC might contribute to cohesin loading at DSBs, the nature of how RSC does this could maybe be clarified before publication.

      In the revised version of the manuscript we are including RPA ChIP data for the Sth1 – degron strain. These show that resection is slightly, albeit significantly, reduced after degradation of Sth1. We believe this to be the explanation for the reduced Scc2 loading in its absence, in line with what is seen in the swr1 and nhp10 deletion mutants.

      Reviewer #2 (Significance (Required)): see above.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): This paper presents data analysing the recruitment of Scc2 to double strand breaks. It makes the interesting observation that its recruitment is Tel1 but not Mec1 dependent, and does not require remodelers (it seems). It does correlate with resection but the mechanism of loading is unclear. I have a few issues on controls and alignment of text with results in this manuscript. Also there is some omission of important recent work and some old studies. But if these points can be resolved it could be published. **Major points:**

      1. The cut efficiency under all conditions tested needs to be presented and the CHIP needs to be normalized in every assay to the cut efficiency. This is particularly relevant in the mutants of remodelers as they definitely influence the efficiency of Gal-HO induction. This must be included for every chip result.

      We agree that the Cut efficiency could influence the degree of recruitment due to the strength of the signal from the break for recruitment of the initial DSB response factors that we show are important for recruitment of Scc2. Already in the previous version of the manuscript we therefore show in Fig S3C that the cut efficiency of the chromatin remodelers was comparable to that in WT cells after 3 hours. We have now repeated this type of experiment three times for most strains used in the study and calculated an average cut efficiency for each strain, which is then used for normalization of the ChIPqPCR results. Alternatively, we have used an RT-PCR based method for quantification of the Cut efficiency on the actual ChIP samples when available. The average Cut efficiency is indicated for each strain in the figure legends in the new version of the manuscript. N**ormalization of the ChIP data to the Cut efficiency does in general not change the results or conclusions presented previously, throughout the manuscript.

      The arp8 delta mutant is clearly polyploid and probably has some suppressor mutation or another problem. They should discard the arp8 results and get a proper and controlled arp8 delta strain (from another lab in europe - there are several with good W303 strains).

      We have repeated the Arp8 transformation in different W303 strains which likewise resulted in polyploidy. Loss of INO80 components have been shown to confer polyploidy in a S288C background, with the loss of Arp8 being an exception. Considering the apparent differences regarding INO80 (the INO80 ATPase subunit is essential in W303 but not in S288C), we deemed it plausible that polyploidization could be a resulting phenotype of an Arp8 deletion in W303. Prompted by the comments put forward here we have now transformed a clean W303 background wild type strain and indeed see no sign of polyploidy. It could be that polyploidization is a consequence of the presence of the GAL:HO in combination with an extra recognition sequence for HO. We are now preparing crosses to answer this question. Depending on the outcome these experiments might be added to a final revision of the manuscript. In this version of the manuscript the arp8delta experiments have been removed.

      1. The text does not accurately reflect the results in several places. For instance .. on page 10 where the result of sgs1 exo1 mutant strain is described, it is said that "Recruitment of Scc2 to the DSB was drastically reduced.... and "consistent with long range resection the effect was less promiment closer to the break.". First, the word "drastic" is not appropriate for a drop of about 50% (on average) and in reality the drop is more significant near the cut (+1kb) than far from the break (+ 10 or 30 kb).... - the data are the opposite of what is stated. and it is not drastic. I do not contest that it correlates with resection, if the HO-cut efficiency is equal in all strains.

      We are sorry for this discrepancy between the results shown and the description of the same in a few cases. We have reworded the results section to reflect the data more accurately. We have also removed the sgs2exo1 deletion mutant data close to the break as we have not investigated all mutants in the region closest to the break and thereby lack a comprehensive comparison.

      The results with INO80 and SWR1 are not really compelling - what is the cut efficiency in these strains. Moreover, the "confusion" in the literature is only because people look at different loci and different conditions. INO80 does affect resection (see Van Attikum et al., 2007; and Cheblal A et al., Molecular Cell 2020) for resection assays in wt and mutant strains. And it is very strange that the Van attikum et al., Cell 2004 (the back to back paper with Morrison et al Cell 2004) is not cited. The data on resection is clear in this early work. But it appears that the arp8 mutant used has other mutations and polyploidization, and should clearly be discarded. Nhp10 impact is a bit controversial but not arp8 with a good strain. The references in general are missing Cheblal A et al., Molecular Cell 2020 for Cohesin recruitment, impact on resection and arp8 impact and ditto. Also missing is Deshpande I et al., molecular Cell 2017 for RPA-Ddc2-Mec1 interactions. These omissions are strange and in fact create confusion in the ms.

      We would like to thank the reviewer for bringing our attention on some very relevant articles published in the field that has now been references as we hope correctly. We have in the revised version of the manuscript also adjusted the ChIP qPCR results to the average efficiency of break induction.

      **Minor points:** The english usage needs to be corrected at a few places... and figures are not correctly cited always - see page 10 especially - there is no Figure 3D.

      It is unfortunately not so easy to correct the language without specific examples. We have however gone through the text carefully, and also asked a native English speaker to assess the language, and corrected accordingly. We are sorry for the Figure mistake, this has now been corrected together with a general update of figure numbers based on some modifications of the manuscript structure.

      Reviewer #3 (Significance (Required)): The advance is not groundbreaking but still interesting and worthy of publishing, if proper controls and better referencing can be done.

      We hope that we after having related all ChIP qPCR data to averaged Cut efficiencies for each strain, and edited the discussion to relate it more appropriately to both new and older correct references, have been able to handle the issues raised and motivate publication of the study.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This paper presents data analysing the recruitment of Scc2 to double strand breaks. It makes the interesting observation that its recruitment is Tel1 but not Mec1 dependent, and does not require remodelers (it seems). It does correlate with resection but the mechanism of loading is unclear. I have a few issues on controls and alignment of text with results in this manuscript. Also there is some omission of important recent work and some old studies. But if these points can be resolved it could be published.

      Major points:

      1. The cut efficiency under all conditions tested needs to be presented and the CHIP needs to be normalized in every assay to the cut efficiency. This is particularly relevant in the mutants of remodelers as they definitely influence the efficiency of Gal-HO induction. This must be included for every chip result.
      2. The arp8 delta mutant is clearly polyploid and probably has some suppressor mutation or another problem. They should discard the arp8 results and get a proper and controlled arp8 delta strain (from another lab in europe - there are several with good W303 strains).
      3. The text does not accurately reflect the results in several places. For instance .. on page 10 where the result of sgs1 exo1 mutant strain is described, it is said that "Recruitment of Scc2 to the DSB was drastically reduced.... and "consistent with long range resection the effect was less promiment closer to the break.". First, the word "drastic" is not appropriate for a drop of about 50% (on average) and in reality the drop is more significant near the cut (+1kb) than far from the break (+ 10 or 30 kb).... - the data are the opposite of what is stated. and it is not drastic. I do not contest that it correlates with resection, if the HO-cut efficiency is equal in all strains.
      4. The results with INO80 and SWR1 are not really compelling - what is the cut efficiency in these strains. Moreover, the "confusion" in the literature is only because people look at different loci and different conditions. INO80 does affect resection (see Van Attikum et al., 2007; and Cheblal A et al., MOlecular Cell 2020) for resection assays in wt and mutant strains. And it is very strange that the VAn attikum et al., Cell 2004 (the back to back paper with Morrison et al Cell 2004) is not cited. The data on resection is clear in this early work. But it appears that the arp8 mutant used has other mutations and polyploidization, and should clearly be discarded. Nhp10 impact is a bit controversial but not arp8 with a good strain. The references in general are missing Cheblal A et al., Molecular Cell 2020 for Cohesin recruitment, impact on resection and arp8 impact and ditto. Also missing is Deshpande I et al., molecular Cell 2017 for RPA-Ddc2-Mec1 interactions. These omissions are strange and in fact create confusion in the ms.

      Minor points:

      The english usage needs to be corrected at a few places... and figures are not correctly cited always - see page 10 especially - there is no Figure 3D.

      Significance

      The advance is not groundbreaking but still interesting and worthy of publishing, if proper controls and better referencing can be done.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Cohesin is a key structural component of chromosomes. Amongst its functions, cohesin plays a critical role in ensuring the accurate repair of double stranded DNA breaks (DSBs). Intuitive as this may seem, a number of fundamental open questions remain. One of these questions is, how does the cohesin loading machinery recognise a DSB? This issue is addressed in the present study. The manuscript begins with a well-written introduction into the fields of DSB repair, as well as cohesin. The research aim is clearly laid out. Experiments follow that sequentially investigate known steps of the DSB repair pathway, asking how these steps intersect with the cohesin loading machinery.

      On the positive side, this is a technically very well conducted study (investigating the cohesin loader has proven tricky in many contexts). The study is systematic and explores the known steps during DSB repair for their impact on cohesin loader recruitment. The authors find a surprising separation of function. The DSB pathway up until H2AX phosphorylation and DNA end resection is required for both cohesin loader recruitment, as well as consequently for cohesin loading. The Mec1 checkpoint kinase, in contrast, is dispensable for cohesin loader recruitment but is required for cohesin loading. This suggests that Mec1 supports cohesin loading at a step beyond that of attracting the cohesin loader. The manuscript thus contains important information that will be of interest to a wide range of researchers in the DNA repair and cohesin fields.

      The limitation of the study lies in the fact that the molecular determinant for cohesin loader recruitment to DSBs remains unknown. H2AX phosphorylation and DNA end resection are shown to be prerequisites, but how do these events form a molecular mark that the cohesin loader recognises? And what is this mark? Equally, how does the Mec1 kinase permit cohesin loading additionally to the cohesin loader?

      Specific comments:

      Figure 1. It would be interesting to overlay the Scc2 prolife around the DSB next with that of Scc1 (obtained previously under similar conditions?), to contrast the loading site with the final cohesin distribution.

      Figure 2. Using the same y-axis scale from 1-4 amongst panels A-D could make evaluation of the data easier.

      Figure 3. Panels A and B contain data that are important to interpret the DNA end resection results shown in Figure S2. Maybe that latter data, which conveys the main conclusion from the figure, could be incorporated within the main figure?

      Figure 5. In this figure, the authors begin to investigate possible contributions of candidate cohesin loader receptors, in the form of chromatin remodelling complexes. The Swr1 and INO80 remodellers have an effect on DNA end resection that parallels the effect on Scc2 recruitment, suggesting that their main contribution might be that of facilitating DNA end resection.

      This relationship remains less well documented in the case of Sth1 depletion. Both when using the sth1-3 allele, or degron depletion, the authors observe a relative reduction of cohesin loader recruitment, compared to what they would otherwise expect. However, in both cases a side-by-side analysis of a similarly-treated wild type strain is missing. Whether or not RSC inactivation impacts cohesin loader recruitment therefore remains uncertain. It is also not documented what the corresponding effect of RSC inactivation on DNA end resection might be. Given that previous results suggested that RSC might contribute to cohesin loading at DSBs, the nature of how RSC does this could maybe be clarified before publication.

      Significance

      see above.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In the manuscript entitled "Recruitment of Scc2/4 to double strand breaks depends on yH2A and DNA end resection", Scherzer et al. study the role of Scc2 in DSB repair in yeast. Scc2 is part of the cohesin loader and it is required for cohesin loading in response to DSB. The authors study the chromatin association of Scc2 by ChIP-qPCR and use genetics to identify factors that affect its recruitment. They show that Scc2 is enriched up to 10 kb from the break site, similar to cohesin and identify MRE, TEL1 and yH2A as important factors for Scc2 chromatin binding. Remarkably, MEC1 that has been shown to regulate cohesin under these conditions is dispensable for Scc2 recruitment. While DNA resection is important for Scc2 recruitment, chromatin remodelers don't play a significant role in it despite numerous reports on their effect on cohesin loading during the cell cycle. The manuscript provides new and important information on cohesin regulation in response to DNA damage.

      Major comments:

      The experiments are done appropriately and contain the required control. The results are presented clearly and with adequate statistics and support the conclusions. The experiments provide valuable information. However, the low resolution of the experimental setup is limiting, and dynamic information of Scc2 binding is lacking. I would agree with the authors that this kind of information may be beyond their scope. However, the absence of this information reduces the overall impact of the manuscript.

      1. ChIP-seq, of at least some of the key experiments, could provide information on the specific Scc2 binding sites and elucidate whether cohesin is translocated from the loading sites or accumulate in its proximity.
      2. It has been suggested that Scc2 and Pds5 are mutually exclusive in cohesin complexes. It would be interesting to check in the current experimental setup (ChIP-qPCR) if Pds5 is mimicing Scc2 pattern

      Minor comments:

      1. Adding a threshold line to the graphs at fold change= 1 (no enrichment in respect to wild type) will increase their readability.
      2. Fig. 1A- Add times to the schematic. Modify the text to GAL addition/break induction.
      3. Page 9. The authors write: "Cohesin failed to be loaded at the DSB in a mec1Δ background (Fig 3A)". However, the figure shows reduced cohesin binding in mec1delata in respect to the wild type.
      4. Page 10. ".......recruitment to the DSB compared to wild type (Fig 3D).". Should be Fig. 4D.
      5. Figure legend 3. "........Protein samples were taken after 3 hours arrest (G2/M, lane 1),....." The benomyl arrest is referred to as G2 arrest in the text but G2/M arrest in the legend. Consistency is needed.
      6. I suggest presenting the suggested model in a figure

      Significance

      I am an expert in cohesin biology.

      The Scc2-Scc4 complex has been identified as an essential factor for cohesin loading during the cell cycle (Ciosk et al., 2000). This function has been shown to be essential for cohesin role in response to DNA DSB (Unal et al., 2004, Storm et al., 2004). The interplay between Scc2 and the cohesin has been studied mostly in the context of the cell cycle. It has been shown that Scc2 activates the ATPase activity of cohesin and promotes its translocation from the loading site. Scc2 and Pds5 are mutually exclusive and their switch suppresses cohesin ATPase activity (Hu et al., 2011, Petela et al., 2011). However, the Scc2-cohesin interplay has been poorly studied in the context of DNA repair. The current work adds valuable information on the factors that recruits Scc2 to the break site and identifies end resection as the key event in this process. This information is novel and important and its contribution to the fields of cohesin and DNA repair should not be overlooked. However, ChIP-seq information can increase the overall impact.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:

      Summary

      Copy number variations in the 1q21.1 loci, deletions and duplications, have been associated with neurodevelopmental disease. In particular, deletions of this locus result in a variety of neuronal phenotypes including microcephaly and schizophrenia in varying levels of severity. Duplications of the 1q21.1 locus are often associated with autism and/or macrocephaly.

      In this study Nomura et al. generated 1q21.1 deletion and duplication hESC lines to study the impact of these CNVs on neuronal development. They generated brain organoids and observed a bidirectional effect of this CNV on organoid size, with 1q21.1 deletion showing smaller brain organoids whereas, the 1q21.1 dup lines grew large than controls. This in line with observed micro and macrocephaly observed in patients. They further analyzed these organoids at the gene expression level using single cell RNAseq and performed some electrophysiological assessment on neurons from of dissociated organoids.

      This study is certainly of interest given the association of this loci with NDDs such as autism, epilepsy and schizophrenia. At this stage, the study is mainly a descriptive study, showing differences between the 1q21.1 del/dup versus controls but also between both the del/dup lines. There is no mechanistic insight provided. For example the 1q21.1 CNV encompasses several genes, of which some have already been linked to micro/macrocephaly (eg. NOTH2NL). More importantly, most of the conclusions drawn by the authors are based on a limited set of experiments/analysis which are not always carefully performed and/or presented. In general, the data presented are premature, therefore not supporting the claims/conclusion made by the author (eg title) This makes the overall impact of this study limited.

      As the reviewer pointed out, NOTCH2NL (both A and B) have been regarded as micro/macrocephaly-related genes (Fiddes et al., Cell, 2018; Suzuki et al., Cell, 2018). In this study, however, we focused on the distal region of 1q21.1 between BP3 and BP4, which contains neither NOTCH2NLA nor NOTCH2NLB, because the target site is thought to be the core region of clinical 1q21.1 microdeletion/microduplication syndrome (Mefford et al., NEJM., 2008; Brunetti-Pierri et al., Nat. Genet., 2008; Van Dijck et al., EJMG, 2015). Although both NOTCH2NLA and B are located outside of our target, these genes are important for human neocortical development and neurogenesis, so we cite these papers (Fiddes et al. and Suzuki et al.) and discuss them in the discussion of the revised manuscript.

      Main comments

      In general, the interpretation of the data is too premature:

      1. The title is not supported in any means by data

      As requested by the reviewer, we have corrected the title as “Modeling reciprocal CNVs of chromosomal 1q21.1 in cortical organoids reveals alterations in neurodevelopment”.

      1. Brain organoids size and development: In figure 2 the authors analyzed the development of the organoids. Based on the human phenotype the deletion would lead to smaller brain and the duplication to larger brain organoids. The presented data to support these claims are rather scarce. They indeed provide data on organoid size, however there is no information as to regard how this micro/macrocpehaly comes about. Only limited amount of cell types are being investigated with immunocytochemistry, which give little insight into the mechanism. Fig 3. The authors performed some very basic immunostaining and concluded that the neuronal maturity of 1q del seemed to be accelerated, whereas 1q dup decelerated from the NPC stage. However, there is no direct evidence provided for this. With simple additional immunostainings authors could already get a much better idea of what is going on. For example the authors could measure the amount of differentiating versus proliferating cells, cell cycle exit, etc (eg BrDU, KI67, pHH3 staining,...)

      We thank the reviewer for the suggestion. In response to this, we plan to analyze additional markers such as phosphor-histone H3 (pHH3) to evaluate the late-G2/M status by immunostaining. In addition, to explain the smaller organoid size observed in 1q del organoids, we will check apoptosis markers such as cleaved-caspase3 by immunostaining and western blotting.

      Further there are some technical aspect that would need to be resolved:

      There is a general lack of brain organoid characterization of the controls. It is unclear on how many independent clones these experiments were performed.

      We constructed one clone per genotype (1q21.1 deletion (1q del), 1q21.1 duplication (1q dup) and CTRL) from one human ES cell strain (khES-1) by next-generation chromosome engineering using the CRISPR/Cas9 system. According to the reviewer’s comment, we have added the information of each clone, including the actual number of each clone in the results section. Following the reviewer’s comment, we also recognized the importance of comparing targeted clones even in the same genotype to verify cellular phenotypes in a targeted clone. However, we consider that at least isogenic ES cell lines are less affected by genetic variances on other regions and epigenetic changes than patients-derived iPS cells.

      • Fig 2C: it is unclear why brain organoid sizes reduce over time. Is this an indication of increased apoptosis? Did the authors measure this?

      In order to respond to the reviewer’s comment, we plan to examine apoptotic markers such as cleaved caspase-3 by immunostaining or western blotting, as mentioned above.

      • What is the reason for using t-test with Bonferroni correction as opposed to one -way (or even two-way) Anova is unclear in Fig 2C

      Analysis of variance (ANOVA) has been regarded as optional when multiple comparisons without F-statistics are performed (Jason Hsu. 1996. Multiple Comparisons: Theory and Methods (Guilford School Practitioner)). We selected the Bonferroni test because we thought we could evaluate our data more strictly with the Bonferroni test than with the Tukey-Kramer test. In response to the reviewer’s request, we analyzed our data using one- way ANOVA with the Tukey-Kramer test. We confirmed that statistical significances were consistent (we can provide both data if requested). We have changed the description in the figure legend and methods section of the revised manuscript.

      • 2E is unclear how they came to the conclusion that dosage dependent size difference in NPC organoids was caused by the number of cells within an organoid, not by the size of each cell or different cell types. Since they only measured the amount of Sox 2 positive cells and used Sox2 to measure cell diameter, whereas Sox2 is mainly expressed in the nucleus.

      We thank the reviewer’s comment. We used images of SOX2 staining because contrasts of each cell in bright-field images were too obscure to be detected using the fluorescent microscopy, BZ-X analyzer, and because we found cell sizes seemed similar between bright-field images and SOX2 staining images. However, this method was not desirable. To respond to the reviewer’s comment, we have counted the number of cells in the images of each NPC organoid using the BZ-X analyzer and calculate the cell number per 1000 µm2. We found the cell density was not significantly different among the 3 genotypes. We understand that counting the cell number of a single organoid would be ideal, but it was impossible because each NPC organoid was too small. We have changed Figure 2E, descriptions in the methods and results section, and the corresponding figure legend in the revised manuscript.

      • How do the authors explain that the Dup cells do not express Tubb neither CTIP2, do they only express NPCs and no neurons?

      We consider this finding supports the immaturity in the cortical organoids with 1q21 duplication. However, we have checked only a few markers for intermediate progenitors and mature neurons so far. We plan to examine immature neuronal markers such as DCX and other mature neuronal markers such as NeuN by immunocytochemistry (ICC) to confirm this finding. Similarly, we will perform expression analysis by real-time qPCR to check mature and immature neuronal cell markers.

      In short, the characterization of the brain organoids at the level of general development, cell types, proliferation, differentiation is underdeveloped.

      We will examine the characterization of the brain organoids in more detail by different techniques as described above.

      1. Electrophysiological assessment of brain organoids derived neurons:

      In figure 4 the authors claim that both CNVs (Del/Dup) show hyperexcitability and altered expressions of glutamate system as common features between the Del/Dup lines. The data to support this are however scarce and far from being convincing:

      The poor quality of the data is represented by images in 4B-E:

      • First the authors choose to dissociate the organoids prior to measure the cells on MEA's. This takes away the advantage of 3D brain organoids, will add a lot of non-physiological stress, cause cell death and lead to unequal distribution of cells over the electrodes, see fig 4B.

      We are afraid that the reviewer might misunderstand our experiment. In this experiment, we used not 3-D brain organoids but 2-D neurons. Based on established neural differentiation protocol (Fujimori et al., Stem Cell Reports, 2017, Toyoshima et al., Transl. Psychiatry, 2016, Matsumoto et al., Stem Cell Reports, 2016), we seeded single-cells dissociated from neurospheres on MEA dishes at the same density (8 x 105 cells per dish) on day 33 and continued culturing for 28 days on the MEA dish before analysis. Thus, we didn’t dissociate cells just before analysis. We could avoid adding non-physiological stress because we kept on culturing on the MEA dish for 28 days.

      • MEA recording are meant to measure network activity and heavily (read: fully) dependent on the network being formed. Cherry picking electrodes for analysis is not justified, analysis should be performed per MEA chip not per electrode. Inclusion/exclusion parameters should be defined before analysis

      We have performed statistical analysis with all chips (electrodes) per genotype in response to the reviewer's request. Even though the distributions of firing rate were not consistent among electrodes, we found the significant differences between CTRL and each mutant (Ctrl vs 1q del: p< 0.001, Ctrl vs 1q dup: p< 0.001, 1q del vs 1q del: p=1.0). We have changed Figure 4E, the descriptions in the methods section, and the corresponding figure legend in the revised manuscript this time. We also reanalyzed burst rates so that all electrodes were included in the statistical analysis. We have changed supplementary Figure 3 and edited the descriptions in the methods and the corresponding figure legend in this revised manuscript.

      • MEA parameters such as Mean firing rate (spike/min) and burst rate are very sensitive to plating conditions, especially number of cells and clustering of cell around electrodes (see 4B). Given that the organoids already differ in size and according to the authors in cell number, but also in the amount of starting NPCs, one can expect very different cell densities/cell types per experiment/genotype. The authors should therefore show for every genotype the matching cell culture images. Also with regard to the claims made about GABAergic neurons the cell type composition at the time of the MEA recording should be characterized for every genotype.

      As mentioned above, in MEA analysis, we used 2-D neuronal culture and seeded cells on each chip at the same density. The distribution patterns of cells were similar among the 3 genotypes. We will show the images of cultured neurons from 3 genotypes in the revised figure. As for the cell type composition, we plan to examine the expressions of GABAergic markers using extracted RNAs from neuronal cells on around 28 days post- dissociation (dpd). As reviewer #2 suggested, we also considered that drug treatment with bicuculine in this MEA system was meaningful. We plan to perform this experiment if the experimental conditions can be optimized.

      • Fig 4B illustrates the points made above. The fact that no activity is observed in the control cells can be due to many different reasons: unequal plating, stress after dissociating cells, poor coverage of the electrodes, poor maturation, too early measuring time point, etc Because the authors have no control over the amount of cells covering the electrodes the data presented here carry very little carry little information. Fig 4B, best illustrates this with large cell clumps and areas without cell bodies. Measurements from these cell cultures are irrelevant and no conclusion can be drawn.

      We suggest that the authors first benchmark this technique with their own differentiation protocol, show robust and reliable recordings on control cells, and only compare to the CRISPR lines at a time point at which the control cells show a decent amount of activity 1Hz. When doing so, also reduced activity can be monitored (For examples see, Trujillo et al, Cell Stem Cell2019 or Frega et al 2019 Nat comm).

      As mentioned above, we seeded dissociated neurospheres in equal numbers on MEA dishes and kept culturing neurons gently for 28 days before analysis. Cell distribution was similar among the 3 genotypes and we could observe cell bodies in the area outside aggregates (we will provide additional bright-field images in the revised manuscript later). Low activities in CTRL neurons at 28 dpd could be observed even in the electrodes covered with dense cells, which were consistent among 3 independent experiments as described above. Nonetheless, we agreed with the reviewer that cellular conditions which could show stable activities even in CTRL neurons were more desirable. We have already tried longer cultures three times, but we could not perform sufficient analyses because neuronal cells became unhealthier after 35 dpd. We will try to improve the experimental conditions and perform analyses if the experimental conditions could be optimized.

      • MEAs measure the output of the network (action potentials). In a network, this can be influenced by virtually every neuronal property (morphology, synaptic input, types ofsynapses, intrinsic excitability, etc). Therefore, the authors cannot conclude only based on fig 4E that the Del/Dup cells are intrinsically hyperactive. To make this conclusion they should measure this directly by assessing that passive and active intrinsic properties of individual neurons.

      In control condition many electrodes do not give any signal. From these experiments it is impossible to know whether this is because of lack of cell on the particular electrode or real absence of activity. Certainly one could not conclude that the del en dup cell are intrinsically hyperexcitable.

      As described above, we could observe the similarity of cell distributions among 3 genotypes. However, as the reviewer mentioned, the assessment of the individual neuronal activity would be better. Thus, we will perform patch-clamp recordings in addition to MEA analysis.

      It seems that from the introduction the authors try to link 1q21 CNVs to epilepsy and ASd, thereby justifying the observed phenotypes.

      • How do the authors reconcile the fact that more mature GABA system is observed in the Del lines with the so called increased activity compared to controls but not to the Dup lines.

      We assumed that cell type compositions differed between 1q del and 1q dup, although network excitabilities were commonly observed in both mutants. We agree that this assumption lacks sufficient evidence even though we have shown the results in scRNAseq (Figure 6E). We consider that checking cell type compositions would be needed to ensure this. Although mature GABAergic neurons were increased in 1q del lines as mentioned by the reviewer, we think GABAergic signals and unknown factors such as epilepsy- associated genes (e.g., GRIN2A and SCN1A) may be involved in the abnormal neuronal firing. We will check the expression of these genes and examine the expressions of GABAergic markers in neuronal cells.

      Single cell RNAseq

      • I'm not a specialist on single cell RNAseq, however it seems that the analysis is underdeveloped and conclusion drawn for these experiments premature. It would be essential to validate some of the generated hypothesis, eg GABA maturity and not merely state as a conclusion (eg title).

      We thank the reviewer for the suggestion. We have revised the title as we mentioned above, and we will revise the main text based on our results appropriately.

      • How do the authors explain that a majority of the cells are Glial cells at day 27, and no presence of neurons.

      On day 27 in our 3-D organoid protocol, cells were still in the developmental stage. That’s why we consistently described it as “NPC organoid” but not “brain organoid” in this paper. Indeed, our rationale for the scRNA-seq study was to determine gene(s) or gene regulatory network(s) when the difference of circumference was significant among genotypes (Fig. 2C). Although the underlying mechanism was not fully understood from our results, we interpreted this result. Radial glial cells (RGs) have the ability to self- renewal with symmetric divisions and play a role in both neurogenesis and gliogenesis (Lui et al. Cell 2011, A Kriegstein et al., Annu Rev Neurosci 2009). A recent study showed that the reduction of NF1, a tumor suppressor protein in the RAS/MAPK pathway, induced excessive production of glial cells, i.e., mainly oligodendrocyte precursor cells (OPCs) accompanied with astrocyte precursor cells, from RGs; furthermore, the reduction of NF1 also enhanced the cell divisions of generated OPCs (Z Shen, BioRxiv 2020). We have checked that the expression of NF1 in the glial cluster was also downregulated in our scRNA-seq data. Thus, we reasoned that the predominance of 1q dup cells in the glial cluster reflected the excessive production of glial cells from RGs, which were related to the alteration of the RAS/MAPK pathway. We will add this interpretation in the revised manuscript next time.

      • How relevant is the changes in the extremely low amounts of GABAergic neurons in the Del cells, no excitatory neurons are present, only NSCs

      In a previous paper, CA Trujillo et al. showed the cell type composition in 3-D human cortical organoids at different time points. GABAergic cells were restricted to later stages and the ratio was still very limited at 6 months (Figure 1J in CA Trujillo et al., Cell Stem Cell 2019). From this fact, we regarded the emergence of GABAergic neurons as meaningful even if the ratio was very low. As for excitatory neurons, we will further check the expressions of excitatory neuronal markers. (According to the screening chart we used, we did not explore excitatory neuronal markers as far as cells did not express SLC17A7 significantly).

      Minor comments

      • It is unclear how many clones were assessed per genotype

      We constructed one clone per genotype. As we mentioned above, we have added the information in the results section of this preliminary revised manuscript.

      • The authors should properly annotate the genotypes 1q21.1 instead of 1q del (line 134)

      We have already annotated the abbreviations of 1q21.1 deletion and duplication in lines 87 and 93.

      • Introduction seems to be somehow off topic since 1q21.1 locus is associated with several neurodevelopmental disorders, including SCZ, but is certainly not specific to ASD and epilepsy. So the premiss on line 86: to study 1q21.1 locus to understand ASD/epilepsy is somewhat misleading. I propose that the introduction would be focussed on the 1q21.1 and not on general on ASD/epilepsy.

      As the reviewer pointed out, 1q21.1 CNVs are associated with other neurodevelopmental and neuropsychiatric disorders. Since our research aims to elucidate the underlying mechanism of ASD, we mainly focused on two representative comorbidities (abnormal brain size and epilepsy), which seemed relatively reproducible in vitro. However, we agree with the reviewer that the lack of information about clinical symptoms of 1q21.1 microdeletion and microduplication syndrome besides ASD was not appropriate. Thus, we will revise the introduction to mention the neurodevelopmental phenotypes of 1q21.1 CNVs in the revised manuscript next time.

      • It is unclear whether they generated heterozygous or homozygous deletions.

      We thank the reviewer for pointing it out. We have generated clones with heterozygous deletion and duplication. We have added the information in the results section of this revised manuscript.

      • The authors should cite Fiddes, I. T. et al. Human-Specific NOTCH2NL Genes Affect Notch Signaling and Cortical Neurogenesis. Cell 173, 1356-1369.e22 (2018).

      As the reviewer suggested, we will cite two papers regarding NOTCH2NL (NOTCH2NLA: Fiddes, I. T. et al., Cell 173, 2018; NOTCH2NLB: Ikuo K Suzuki et al., Cell 173, 2018) when we discuss the alteration of neuronal maturity and brain size. We will add the information in the revised manuscript next time.

      • Many unclear statements eg line 138: Next, we analyzed each single-cell in an organoid

      We thank the reviewer for noticing it. We have made an effort to remove inappropriate sentences in this revised manuscript.

      • Discussion on E/I is very speculative, not supported by any evidence

      In response to the reviewer’s suggestion, we will cut the descriptions which contain too speculative contents in the discussion section of the revised manuscript later.

      Significance

      The general topic of this study is high interest given the strong association of the 1q21.1 with disease. The authors developed interesting ESC line to study in parallel del and duplication. Unfortunately the level of of analysis performed on these organoids is not up the current stat of the art, are of low experimental quality, analyses are limited. Therefore no clear conclusion can be drawn except for the size of the organoids, very little mechanism is provided. This therefore remains a purely descriptive study for which the presented data are rather on low quality and limited impact in its current shape.

      We thank the reviewer for the interest and criticism of our paper. As discussed above, we plan to perform additional analyses and experiments to justify our hypothesis more clearly and try to meet the reviewer’s requests.

      Reviewer #2

      This study was initiated to look at specific cellular and molecular mechanism of the duplication and deletion CNV frequently observed at the 1q21.1 gene locus in an isogeneic human embryonic stem (hES) cell model. The authors note that these CNVs are associated with higher than normal penetrance of ASD and epilepsy and aim to elucidate gene expression differences with single cell RNAseq and functional changes in this model system. The authors further sought to proliferation and differentiation states, in addition to neuronal activity, using both 2D cultures and 3D organoid models. The 1q21.1 gene locus model system made here is unique and the results broadly recapitulate the patient phenotype particularly with observations of macrocephaly in the "1q dup" and microcephaly in the "1q del".

      Reviewers statement:

      We have joint expertise in GABAergic neuronal development, iPSC 2D and 3D culture and ASD human molecular genetics.

      Major comments:

      • Not sure why ASD (if used it should also be spelled out) is mentioned in the title if ASD is only seen in a proportion of human 1q21.1. duplication (~36% will have autism) and 1q21.1 deletion (<10% will have autism) carriers. I would prefer to use 'neurodevelopmental phenotype'. A good update review that is accurate with respect to this CNV role in autism is PMID: 29398931. The authors should also put into the context of their results what is known with other neuropsychiatric phenotypes also seen in these CNV events;

      We thank the reviewer for the suggestion and valuable information. We have corrected the title in the revised manuscript this time. We will also refer to the paper by Fernandez and Scherer (Dialogues Clin. Neurosci., 2017) to discuss the detail of roles and neuropsychiatric phenotypes of targeted CNVs.

      • In Fig 1D the ddPCR validation for the genetic alterations in 1q del shows a normal return to 2 copies of GPR89B. However, in the 1q dup the CNV level is still elevated for GPR89B. Please determine how much further the duplication goes as there are five more potentially affected genes in this region (eg PDZK1P1). Modify the text appropriately to note the potential influence of any of these other genes on the experimental outcomes.

      We thank the reviewer for pointing it out. Figure 1D showed the results of aCGH analysis to confirm the copy number alteration of the targeted region in each clone. This analysis expected that the target region contained GPR89B, as confirmed by PCR shown in Fig. 1B. However, as the reviewer’s comment, the cleavage sites shown in Figure 1D seem not consistent with the result of Fig. 1B. We think it reflects the limitation of the microarray-based CGH technique. Since the locus between GPR89B and LOC101927468 contains extensive repeat sequences, aCGH may not be an appropriate method. Thus, we will apply quantitative PCR (or ddPCR) to determine copy number alternation of each clone in addition to microarray-based CGH.

      • The authors' claim that dosage dependent size differences in NPC organoids is caused by a change in the number of cells within the organoid rather than size - from Fig. 2D, cells in 1qdel organoid appears more compact; a quantification of cell number should be done to support this claim. IHC of D27/28 organoids with GABAergic markers would support authors' claim of alterations of GABAergic components in 1qdel cells. These suggested experiments would take 2-3 days if the organoids are available.

      In response to the reviewer’s suggestion, we have counted the number of cells in the images of each NPC organoid using the fluorescent microscopy, BZ-X analyzer, and calculated the cell number per unit area (1000 µm2). We found the cell density was not significantly different among the 3 genotypes. We have changed Figure 2E, descriptions in the methods and results sections, and the corresponding figure legend in the revised manuscript this time. As for exploring GABAergic components in the NPC organoids, we plan to perform immunocytochemistry (ICC) and RT-qPCR analysis.

      • Fig 4 E shows MEA data from "top 10". What is the top ten? Do you mean data points? There are batch differences in 1q dup with one batch having a lower expression than the other. Increasing the n value to accommodate the high variance observed in this group will greatly increase the validity of the data generated. Also, change the figure legend to indicate the age of these cultures. Given that the controls are not spiking, this data should be extended to probe the developmental profile further to week 9 when normal cells should be spiking so that the baseline activity of this isogenic line can be determined.

      Top 10 meant the ten electrodes with the highest spike rates within one MEA dish. To respond to the reviewer’s suggestion, we have performed statistical analysis with all electrodes per genotype. Even though the distributions of firing rate were quite heterogeneous among different electrodes, we found significant differences between CTRL and each mutant per MEA dish. We have changed Figure 4E, descriptions in the methods section, and the corresponding figure legend in the revised manuscript this time.

      The reviewer is correct that the spike rates in 1q dup were quite different between different batches. We noticed from our experiments that spike rates were easily affected by the health conditions of cells. Some mutant batches showed mild spike activities like circles in 1q dup, and some had very vigorous activities. We have even checked the reproducibility of significant differences between CTRL and each mutant per MEA dish with 3 independent experiments. As for the extended cultures to detect more frequent signals in CTRL neurons, we have already tried longer cultures three times. However, we could not perform sufficient analyses because neurons became unhealthier after 35 dpd. We will further try to improve the experimental setup and perform analyses if the experimental conditions could be optimized.

      • Single cell RNAseq data suggests a cluster of GABAergic cell types that are appearing in the 1q del condition, but not in the 1q dup or control groups. The authors suggest that these GABAergic cells are excitatory because the chloride gradient has not yet been altered (no change to KCC2 expression). The authors should substantiate this idea in the MEA system with bicuculline treatment to block GABAergic transmission (drug washed in and out) to show that the spike activity observed in the 2D MEA experiments is due to GABAergic excitatory transmission. Ideally, this should be done for both the 1q dup, 1q del as well as controls.

      We thank the reviewer for the suggestion. We agreed with the reviewer that drug treatment with bicuculine in this MEA system was meaningful to identify cellular properties. We will try to set up the experimental conditions and perform this experiment if the condition can be optimized.

      • Fig 5A. The clustering method for single cell RNAseq seems shows a large proportion of "other" class cells begging the question as to what they are. Is there another cluster analysis, which might be used eg partially supervised/unsupervised clustering methods from the Allen Institute to help determine what these might be?

      We initially made the screening chart for cell-type specifications according to cellular markers from Allen brain map (http://celltypes.brain-map.org/rnaseq/human_ctx_smart- seq) and a published paper (CA Trujillo et al., Cell Stem Cell 2019). We defined this cluster as “other” because this cluster did not have any significant genes in the 1st screening, although we understood that the specifications of all clusters were desirable. To investigate the cellular property in this cluster, we tried to put significant genes into Metascape to check gene ontology. We found some terms about immune cells (mainly lymphocytes and macrophages), cancer cells, roles for inflammation, and apoptotic process, although miscellaneous terms were also included. We have provided the screening chart as supplementary Table 4 in this revised manuscript. Next time, we will add a more detailed description of the ‘other’ cluster in the revised manuscript.

      • Fig 5 B. The manuscript requires additional markers used in the cluster analysis. Particularly, expression of the GABAergic progenitor markers DLX5 and 6 as well as EMX1 for the progenitor cells. Details of all markers and cluster algorithms should be made available in supplementary tables and R scripts, so that others can repeat this analysis.

      In response to the reviewer’s suggestion, we will check these GABAergic progenitor markers and add them to the revised figure and manuscript later. As we mentioned above, we performed the cell type specification of each cluster manually using our screening chart and did not use R scripts. We have provided the information on the screening process in supplementary Table 4 of this revised manuscript.

      • Fig 6. Expanding the heat map of 1q del and 1q dup with CTRL expression would help with context for baseline levels in this isogenic cell line. Please also include additional GABAergic markers GABRA1, GABARB2and GABARG2, (subunits of the most common GABA-A receptor) SOM, VIP, NPY, (other GABAergic interneurons in addition to PVALB) DLX6, EXM1 and for excitatory markers GRIA2, GRIA3 and GRIA4 (all of which have developmentally regulated expression patterns) that will provide more context with the synaptic receptor literature. GRIN2D is expressed only in GABAergic cell types and so I would suggest including this NMDA receptor subunit as well.

      We thank the reviewer for the valuable suggestions. To further explore the cellular properties in 1q del and 1q dup, we will check these cell markers additionally and show the results in the revised figure and manuscript next time.

      Minor comments:

      1. Additional references (eg. Schafer et al. 2019) should be discussed in relation to the authors' suggestions of altered neuronal maturity.

      As the reviewer suggested, we will include the paper in our references and discuss the associations between neurodevelopmental disorders and altered neuronal maturity.

      1. The authors show no change in PAX6 expression between genotypes, but significant differences in TBR2 expression between genotypes (Fig. 2C) - this alteration in normal cortical development should be included in results and discussed.

      Radial glial cells (RGs) have abilities of both self-renewal and neurogenesis (Lui et al. Cell 2011, Fiddes, I. T. et al., Cell 2018). Fiddes et al. showed that if the balance leans toward neurogenesis, premature differentiation with higher TBR2 expressions was observed in week 4 human cortical organoids (Fiddes, I. T. et al., Cell 2018). However, the predisposition to neurogenesis is thought to cause the earlier shortage of RGs. Finally, these cells remain abundant in week 4 organoids. We considered this was why TBR2 expression was significantly different in 1q del, but PAX6 was not. We will add this interpretation in the revised manuscript next time.

      1. In the introduction (Line 67): The author's state that "alterations in brain size is common in patients with ASD" using one meta-study to support this claim. Further primary studies should be consulted and the authors should give the proportion of the population with ASD and altered brain size to support this statement. In addition, the age range should be supported with primary papers.

      As the reviewer suggested, we have cited some primary studies about the prevalence of altered brain size in ASD patients and its age range in this revised manuscript. Since it seems still controversial whether the enlargement of brain size persists or not until adolescence and adulthood (E H Aylward et al., Neurology 2002; J Piven et al., Am J Psychiatry 1995), we have also modified the description in this manuscript.

      1. Line 73. The authors suggest that the brain growth deviations are "Postnatal stage restrictive". Citations are needed to support this statement.

      As the reviewer suggested, we have cited some primary studies as described above and revised the manuscript.

      1. In the scRNAseq data results please report total cell numbers counted for each cluster and for genotype group.

      We apologize for the lack of information and thank the reviewer for noticing it. We have added the information in the results section of the revised manuscript this time.

      1. In the results section (line 269-270) the authors suggest that 1q del cells are in a more mature state because the GABAergic cells are present and glutamatergic genes are similarly altered in 1q dup and 1q del. However, the results from the gene cluster data suggests that there is a very high proportion of progenitor cells (Progenitor 1 and 2 clusters), which seems to argue against faster maturation. This suggests to me that cell fate is being modified here.

      We thank the reviewer for the valuable suggestion. Schafer et al. (the suggested paper in minor comment 1) reported that altered gene expressions in neuronal modules have already been observed in NSCs derived from ASD patient-derived iPSCs. As the reviewer suggested, we plan to consider our results in terms of the alteration of cell fate and neuronal maturity in the revised manuscript later.

      1. Label figures on each page for ms.

      As the reviewer suggested, we have labeled figures at the bottom right of each page.

      1. Fix typos and heat map legends (currently no colors for log2 fold change in Fig 5 or 6)

      We apologize to the reviewer for typos and grammatical errors. We made an effort to remove them. We also apologize for the lack of color information in the legends of Figure 5 and Figure 6 and thank the reviewer for noticing it. We have added the color information in the figure legends of the revised manuscript this time.

      Significance

      Overall the study is clearly described, and the outcomes have been substantiated to a certain degree, but requires a bit more work. This paper does represent a technical 'tour de force' and the authors should be applauded for sticking it out where other labs have so far failed. It might be useful to mention even in brief, of the number of 'failed' (failed or inaccurate) events. The availability of the lines should also be clearly stated.

      We thank the reviewer for the positive comments. In addition to the plans described above, we have added more detailed information, e.g., how many screenings were carried out to get positive clones, in the revised version of the methods and results section. We have also added the descriptions about the availability of the 1q21.1 CNV cell lines in the data availability section of this revised manuscript.

      Reviewer #3

      In this research study by Nomura et al., the authors develop novel hESC-based models of reciprocal CNVs in distal 1q21.1 using CRISPR/Cas9 genome editing technology. Specifically, the authors genome edit KhES-1 cells to produce two isogenic hESC line that contain either a deletion or duplication of this chromosomal region. Patients with 1q21.1 deletion and 1q21.1 duplication syndromes show abnormal head size in conjunction with multiple neurodevelopmental co-morbidities such as epilepsy, developmental delay, and neuropsychiatric abnormalities. This is an important study since it provides robust research tools to understand molecular and cellular mechanisms that may underly these syndromes. Through generation of cortical organoid models, the authors demonstrate 1q21.1 deletion and duplication organoids show deficits in growth and over-growth, respectively. Additionally, the authors provide data that 1q21.1 deletion and duplication organoids show altered signaling cascades which may underly growth deficits and also abnormal neurodevelopment which may underly hyperexcitable neurons as demonstrated by multi-electrode array analysis. While my enthusiasm for this study remain high, I do have a significant number of major and minor reservations specific to the experimental design and analysis that if addressed would provide for an excellent contribution to the field.

      Major concerns:

      1. Though the authors provide extensive data in this study, major revisions are necessary to interpret all of their data in the context of the phenotypes they are observing in organoids and MEA analyses. In addition, the current study lacks cohesiveness throughout the various experiments and does not provide text that clearly unifies the results of the study. For example, no interpretation of higher TBR2 levels in 1q21.1 deletion is provided. Does this mean these organoids show accelerated neuronal differentiation? Also please see my comment regarding TBR2 staining the next section.

      Other examples throughout the manuscript in which there is no clear interpretation of the data or inadequacies of unifying the results of the experiments.

      We thank the reviewer for pointing out that our manuscript had inadequacies of the integrity and cohesiveness throughout our data. With additional data as follows, we plan to improve these issues in the revised manuscript later. As for TBR2 expression, we considered that higher TBR2 expressions in week 4 human cortical organoids showed the predisposition to neurogenesis in 1q del as demonstrated in a previous paper (Fiddes, I. T. et al., Cell 2018). We will add the description in the revised manuscript later.

      • a. Additional interpretation why 1q21.1 duplication organoids show increased growth is lacking. The single cell RNA sequencing results show there are more glia, but no further interpretation is giving why these organoids show an overgrowth phenotype. Inversely, the 1q21.1 deletion organoids show more progenitor cells, but it is not apparent why this should result in decreased cell growth.

      As we have mentioned above, we considered that the predominance of 1q dup cells in the glial cluster reflected the excessive gliogenesis from radial glial cells and enhanced cell divisions in relation to the alteration of the RAS/MAPK pathway (Z Shen, BioRxiv 2020). We plan to analyze additional markers related to cell proliferation and cell division by immunostaining to validate the above hypotheses. To investigate how 1q del organoids showed smaller size, we plan to examine apoptotic markers such as cytochrome C and caspase 3 by culturing NPC organoids again.

      • b. The authors suggest that 1q21.1 duplication organoids are resistant to neuronal differentiation. What data supports this hypothesis other than the fact there are no mature neuronal cells are present in their single cell RNA sequencing data.

      We considered that the results in Figure 3B and Figure 3D also supported this hypothesis that 1q dup organoids expressed the lower intensity of neuronal markers. Since we have only checked a few markers by immunocytochemistry (ICC), we plan to examine additional markers, i.e., immature neuronal markers such as DCX and other mature neuronal markers such as NeuN, as well as proliferation markers such as phospho histone H3 to ensure this hypothesis.

      • c. The MEA analyses show hyperexcitability in both 1q21.1 deletion and duplication cultures. Since the authors suggest 1q21.1 duplication organoids are resistant to neuronal maturation, no interpretation is given why they show hyperexcitable phenotypes.

      In the MEA analyses, we used not 3-D cortical organoids but 2-D neurons because the required culture period to emit electrical activities was thought to be much shorter in 2-D neurons according to some previous studies with human pluripotent cells (A Taga et al., Stem Cells Transl Med 2019; CA Trujillo et al., Cell Stem Cell 2019). We considered that 2-D neurons on 28 dpd (day 63) had much higher maturity than NPC organoids and even 1q dup neurons had already become mature enough to emit spike activities. We will also check neuronal marker expressions using 2-D neurons around 28 dpd by RT-qPCR to ensure this.

      • d. The current study is lacking extensive immunohistochemical stains of representative markers that validate their findings from their single cell RNA sequencing experiments. For example, glial cell markers such as GFAP should be analyzed in 1q21.1 duplication organoids. Additionally, progenitor cell markers such as PAX6 and neuronal markers such as MAP2 and synaptic markers such as SYNAPSIN and others should be incorporated in the study.

      We thank the reviewer for the suggestions. We plan to perform additional IHC staining for NPC organoids with the suggested markers and OPC markers.

      1. Major details are lacking for the single cell RNA sequencing experiments.
      • a. How many cells were analyzed from each group? How many organoids and what age of organoids were analyzed from each group, were they pooled together? Why was a log2FC 1.2 used as a threshold? It is unclear how the authors identify Progenitor 1 and 2 cell clusters? Are they distinct clusters or is this a continuum of differentiation. The progenitor 1 and 2 clusters were chosen based on expression of the ID transcription factors, but no text was provided why these genes specify progenitor cells.

      We apologize for the lack of information and thank the reviewer for noticing it. We described the number of analyzed cells (32,171 cells: 1q del; 10,682, 1q dup; 11,987, CTRL; 9,502) in the results section (line 186) of the original manuscript. However, we could not count how many organoids were analyzed because they were too tiny (diameter; 400-700µm). Many organoids were needed to get the prescribed number of cells (25,000 cells per genotype). According to the analyzed data of size measurement for NPC organoids by fluorescent microscopy, at least 1,500 organoids were collected per genotype. We gathered all cultured organoids in the same batch, dissociated them, and then loaded the prescribed number of cells into the machine. We have added the description of the number of input cells in the methods section of this revised manuscript.

      We used the threshold of log2FC > |1.2| so that the total number of DEGs became around 100-1000 in both bulk and the NSC cluster to avoid a very high or low number of DEGs. Some previous transcriptome studies used the same or even smaller thresholds (Xiaoming Ma et al., Front in Genet 2020; J Zhong et al., Brain Res 2016; Y Wang et al., BMC genomics 2016). We have added these descriptions in the methods section of this revised manuscript.

      As for progenitor-1 and 2, we regarded them as a continuum based on the marker expressions. We chose ID transcription factors for progenitor cells, referring to a published paper (CA Trujillo et al., Cell Stem Cell 2019) as we have described in the methods section (line 633). Several articles have reported that ID transcription factors regulate proliferation and differentiation of neural precursor cells (K Yun et al., Development 2004; D Patel et al., Biochim Biophys Acta 2015).

      Minor concerns:

      1. I would suggest rephrasing the title of the study as it does not clearly convey the advancement to the field. I would suggest the following or something similar this is more concise: " Modeling Reciprocal CNVs of Chromosomal 1q21.1 in Cortical Organoids Reveals Alterations in Neurodevelopment."

      We thank the reviewer for the concrete suggestion. We have revised the title as the reviewer suggested in this preliminary revised manuscript.

      1. The length of the discussion is over extended and should be revised to become more concise.

      We thank the reviewer for pointing it out. We will shorten the beginning part and delete unnecessary sentences in the discussion section of the revised manuscript later.

      1. Additional experiments should be performed to characterize pluripotency of hESC clones generated after genome editing other than staining for alkaline phosphatase activity.

      At minimum, karyotyping in addition to measuring pluripotency markers such as NANOG and OCT3/4 should be performed.

      Karyotyping of wild-type ES cells has been checked by Institute for Frontier Medical Sciences, Kyoto University before being provided. After genome editing, we performed aCGH analysis for all 3 genotypes using the wildtype ES cells as reference genes and confirmed no chromosome aberrations were generated. We have added the information about karyotyping in the methods section of this preliminary revised manuscript.

      As for pluripotency markers, we performed RT-qPCR analyses with ES cells after genome editing and confirmed that OCT3/4 was highly expressed than internal control genes. (We can provide the raw data if requested).

      4) There are several dozen instances of spelling/grammatical and word choice errors throughout the manuscript. For example, line 24 reads "We generate isogenic..." should read "We generated isogenic... "

      • a. Line 25: "opposite organoid size" as written is confusing to interpret.
      • b. Line 46: "have been considered in the context of ASD" would read more clearly as "have been thought to underly ASD etiology."
      • c. Line 53: "in the study of neurological development" should read "nervous system development".
      • d. Line 118: ".. to detect the CRISPR target site for deletion" should read "to detect the CRISPR target site. For the deletion, we checked... "
      • e. <![endif]>Line 119: "...flanking the CRISPR target site; for duplication, we amplified.. " should read "flanking the CRISPR target site, and for the duplication, we amplified..... ".
      • f. Line 127: "we prepared control cells (CTRL) that transfected.... should read ""we prepared control cells (CTRL) that were transfected. ".
      • g. Line 185: "organoid size and mature level" should read "organoid size and developmental maturity."
      • h. In line 40, "We made cryosections of .... should read.... "We performed IHC for the three organoid genotypes on day 27... " i. <![endif]>In Supplementary Figure 8, line 554, "replictes" is misspelled.

      We apologize to the reviewer for many typos and grammatical errors and thank the reviewer for pointing them out in detail. We have corrected these errors as the reviewer suggested.

      5) Line 181: "with a little higher degree of.. " should be re-written more precisely and with more scientific accuracy.

      As the reviewer requested, we have corrected the sentence in this revised manuscript.

      6) Line 216, The use of the colloquial phrase: "On the other hand.. " should be replaced with more formal language. For example, "In contrast, the number of downregulated....

      We thank the reviewer for pointing it out. We have corrected this colloquial phrase at 4 locations.

      7) In line 201, Pprogenitor is misspelled.

      We apologize and thank the reviewer for noticing it. We have corrected it in this preliminary revised manuscript.

      8) In Figure 3, images showing TBR2 staining does not appear correct as this protein should be localized to the nucleus similar to SOX2 staining. I would suggest optimizing conditions such as utilizing antigen retrieval or other methods to reduce non-specific cytoplasmic staining.

      We thank the reviewer for the valuable suggestion. We plan to optimize the condition and try other neuronal lineages markers such as DCX and NeuN.

      9) I would suggest simplifying the text describing the primers utilized in this study and display them in a table format.

      As the reviewer requested, we will make a supplementary table of primer sequences in the revised manuscript later.

      10) Information regarding the number of technical replicates used in this study is lacking throughout the manuscript. For example, how many hESC clones were analyzed? How many organoids were analyzed for each specific assay such as single cell RNA sequencing and MEA analyses? How many independent experiments were used for these studies?

      We apologize for the lack of information. We have constructed one clone per genotype one human ES cell strain (khES-1) and performed all further analyses. The precise number of NPC organoids in scRNA-seq could not be counted, as we mentioned above. As for MEA analysis, 8 x 10^5 cells were seeded on each dish as described in the original manuscript. However, it was unclear how many neurons were observed on each electrode because multiple cells and neurites covered each electrode. Thus, spike activities were detected as the network of many neurons. We have added the information in the methods section of this preliminary revised manuscript.

      11) It is not clear why the authors choose two types of organoid methods in the study. The first protocol referred to as the "NPC organoid method" is synonymous to neurosphere culturing and should be referred to as neurospheres throughout the manuscript.

      One protocol (Fujimori et al., Stem Cell Rep., 2017) was not for 3-D organoids but 2-D neurons (Figure 4A). Thus, we considered neurosphere and NPC organoid were different.

      12) In Figure 4, panel C should be referred to as a local field potential trace and not a waveform.

      We thank the reviewer for pointing it out. We have corrected the description as the reviewer suggested.

      Reviewer #3

      This is an important study since it provides robust research tools to understand molecular and cellular mechanisms that may underlie 1q21.1 deletion and duplication syndromes.

      We thank the reviewer for the positive comments. We plan to perform additional analyses and experiments as described above and try to meet the reviewer’s requests.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this research study by Nomura et al., the authors develop novel hESC-based models of reciprocal CNVs in distal 1q21.1 using CRISPR/Cas9 genome editing technology. Specifically, the authors genome edit KhES-1 cells to produce two isogenic hESC line that contain either a deletion or duplication of this chromosomal region. Patients with 1q21.1 deletion and 1q21.1 duplication syndromes show abnormal head size in conjunction with multiple neurodevelopmental co-morbidities such as epilepsy, developmental delay, and neuropsychiatric abnormalities. This is an important study since it provides robust research tools to understand molecular and cellular mechanisms that may underly these syndromes. Through generation of cortical organoid models, the authors demonstrate 1q21.1 deletion and duplication organoids show deficits in growth and over-growth, respectively. Additionally, the authors provide data that 1q21.1 deletion and duplication organoids show altered signaling cascades which may underly growth deficits and also abnormal neurodevelopment which may underly hyperexcitable neurons as demonstrated by multi-electrode array analysis. While my enthusiasm for this study remain high, I do have a significant number of major and minor reservations specific to the experimental design and analysis that if addressed would provide for an excellent contribution to the field.

      Major concerns:

      1. Though the authors provide extensive data in this study, major revisions are necessary to interpret all of their data in the context of the phenotypes they are observing in organoids and MEA analyses. In addition, the current study lacks cohesiveness throughout the various experiments and does not provide text that clearly unifies the results of the study. For example, no interpretation of higher TBR2 levels in 1q21.1 deletion is provided. Does this mean these organoids show accelerated neuronal differentiation? Also please see my comment regarding TBR2 staining the next section. Other examples throughout the manuscript in which there is no clear interpretation of the data or inadequacies of unifying the results of the experiments.
        • a. Additional interpretation why 1q21.1 duplication organoids show increased growth is lacking. The single cell RNA sequencing results show there are more glia, but no further interpretation is giving why these organoids show an overgrowth phenotype. Inversely, the 1q21.1 deletion organoids show more progenitor cells, but it is not apparent why this should result in decreased cell growth.
        • b. The authors suggest that 1q21.1 duplication organoids are resistant to neuronal differentiation. What data supports this hypothesis other than the fact there are no mature neuronal cells are present in their single cell RNA sequencing data.
        • c. The MEA analyses show hyperexcitability in both 1q21.1 deletion and duplication cultures. Since the authors suggest 1q21.1 duplication organoids are resistant to neuronal maturation, no interpretation is given why they show hyperexcitable phenotypes.
        • d. The current study is lacking extensive immunohistochemical stains of representative markers that validate their findings from their single cell RNA sequencing experiments. For example, glial cell markers such as GFAP should be analyzed in 1q21.1 duplication organoids. Additionally, progenitor cell markers such as PAX6 and neuronal markers such as MAP2 and synaptic markers such as SYNAPSIN and others should be incorporated in the study.
      2. Major details are lacking for the single cell RNA sequencing experiments.
        • a. How many cells were analyzed from each group? How many organoids and what age of organoids were analyzed from each group, were they pooled together? Why was a log2FC >1.2 used as a threshold? It is unclear how the authors identify Progenitor 1 and 2 cell clusters? Are they distinct clusters or is this a continuum of differentiation. The progenitor 1 and 2 clusters were chosen based on expression of the ID transcription factors, but no text was provided why these genes specify progenitor cells.

      Minor concerns:

      1. I would suggest rephrasing the title of the study as it does not clearly convey the advancement to the field. I would suggest the following or something similar this is more concise: " Modeling Reciprocal CNVs of Chromosomal 1q21.1 in Cortical Organoids Reveals Alterations in Neurodevelopment."
      2. The length of the discussion is over extended and should be revised to become more concise.
      3. Additional experiments should be performed to characterize pluripotency of hESC clones generated after genome editing other than staining for alkaline phosphatase activity. At minimum, karyotyping in addition to measuring pluripotency markers such as NANOG and OCT3/4 should be performed.
      4. There are several dozen instances of spelling/grammatical and word choice errors throughout the manuscript. For example, line 24 reads "We generate isogenic...." should read "We generated isogenic...."
        • a. Line 25: "opposite organoid size" as written is confusing to interpret.
        • b. Line 46: "have been considered in the context of ASD" would read more clearly as "have been thought to underly ASD etiology."
        • c. Line 53: "in the study of neurological development" should read "nervous system development".
        • d. Line 118: "...to detect the CRISPR target site for deletion" should read "to detect the CRISPR target site. For the deletion, we checked....".
        • e. Line 119: "...flanking the CRISPR target site; for duplication, we amplified..." should read "flanking the CRISPR target site, and for the duplication, we amplified......".
        • f. Line 127: "we prepared control cells (CTRL) that transfected.... should read ""we prepared control cells (CTRL) that were transfected...."
        • g. Line 185: "organoid size and mature level" should read "organoid size and developmental maturity."
        • h. In line 40, "We made cryosections of .... should read.... "We performed IHC for the three organoid genotypes on day 27...."
        • i. In Supplementary Figure 8, line 554, "replictes" is misspelled.
      5. Line 181: "with a little higher degree of..." should be re-written more precisely and with more scientific accuracy.
      6. Line 216, The use of the colloquial phrase: "On the other hand..." should be replaced with more formal language. For example, "In contrast, the number of downregulated....
      7. In line 201, Pprogenitor is misspelled.
      8. In Figure 3, images showing TBR2 staining does not appear correct as this protein should be localized to the nucleus similar to SOX2 staining. I would suggest optimizing conditions such as utilizing antigen retrieval or other methods to reduce non-specific cytoplasmic staining.
      9. I would suggest simplifying the text describing the primers utilized in this study and display them in a table format.
      10. Information regarding the number of technical replicates used in this study is lacking throughout the manuscript. For example, how many hESC clones were analyzed? How many organoids were analyzed for each specific assay such as single cell RNA sequencing and MEA analyses? How many independent experiments were used for these studies?
      11. It is not clear why the authors choose two types of organoid methods in the study. The first protocol referred to as the "NPC organoid method" is synonymous to neurosphere culturing and should be referred to as neurospheres throughout the manuscript.
      12. In Figure 4, panel C should be referred to as a local field potential trace and not a waveform.

      Significance

      This is an important study since it provides robust research tools to understand molecular and cellular mechanisms that may underlie 1q21.1 deletion and duplication syndromes.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This study was initiated to look at specific cellular and molecular mechanism of the duplication and deletion CNV frequently observed at the 1q21.1 gene locus in an isogeneic human embryonic stem (hES) cell model. The authors note that these CNVs are associated with higher than normal penetrance of ASD and epilepsy and aim to elucidate gene expression differences with single cell RNAseq and functional changes in this model system. The authors further sought to proliferation and differentiation states, in addition to neuronal activity, using both 2D cultures and 3D organoid models. The 1q21.1 gene locus model system made here is unique and the results broadly recapitulate the patient phenotype particularly with observations of macrocephaly in the "1q dup" and microcephaly in the "1q del".

      Reviewers statement: We have joint expertise in GABAergic neuronal development, iPSC 2D and 3D culture and ASD human molecular genetics.

      Major comments:

      • Not sure why ASD (if used it should also be spelled out) is mentioned in the title if ASD is only seen in a proportion of human 1q21.1. duplication (~36% will have autism) and 1q21.1 deletion (<10% will have autism) carriers. I would prefer to use 'neurodevelopmental phenotype'. A good update review that is accurate with respect to this CNV role in autism is PMID: 29398931. The authors should also put into the context of their results what is known with other neuropsychiatric phenotypes also seen in these CNV events;
      • In Fig 1D the ddPCR validation for the genetic alterations in 1q del shows a normal return to 2 copies of GPR89B. However, in the 1q dup the CNV level is still elevated for GPR89B. Please determine how much further the duplication goes as there are five more potentially affected genes in this region (eg PDZK1P1). Modify the text appropriately to note the potential influence of any of these other genes on the experimental outcomes.
      • The authors' claim that dosage dependent size differences in NPC organoids is caused by a change in the number of cells within the organoid rather than size - from Fig. 2D, cells in 1qdel organoid appears more compact; a quantification of cell number should be done to support this claim. IHC of D27/28 organoids with GABAergic markers would support authors' claim of alterations of GABAergic components in 1qdel cells. These suggested experiments would take 2-3 days if the organoids are available.
      • Fig 4 E shows MEA data from "top 10". What is the top ten? Do you mean data points? There are batch differences in 1q dup with one batch having a lower expression than the other. Increasing the n value to accommodate the high variance observed in this group will greatly increase the validity of the data generated. Also, change the figure legend to indicate the age of these cultures. Given that the controls are not spiking, this data should be extended to probe the developmental profile further to week 9 when normal cells should be spiking so that the baseline activity of this isogenic line can be determined.
      • Single cell RNAseq data suggests a cluster of GABAergic cell types that are appearing in the 1q del condition, but not in the 1q dup or control groups. The authors suggest that these GABAergic cells are excitatory because the chloride gradient has not yet been altered (no change to KCC2 expression). The authors should substantiate this idea in the MEA system with bicuculline treatment to block GABAergic transmission (drug washed in and out) to show that the spike activity observed in the 2D MEA experiments is due to GABAergic excitatory transmission. Ideally, this should be done for both the 1q dup, 1q del as well as controls.
      • Fig 5A. The clustering method for single cell RNAseq seems shows a large proportion of "other" class cells begging the question as to what they are. Is there another cluster analysis, which might be used eg partially supervised/unsupervised clustering methods from the Allen Institute to help determine what these might be?
      • Fig 5 B. The manuscript requires additional markers used in the cluster analysis. Particularly, expression of the GABAergic progenitor markers DLX5 and 6 as well as EMX1 for the progenitor cells. Details of all markers and cluster algorithms should be made available in supplementary tables and R scripts, so that others can repeat this analysis.
      • Fig 6. Expanding the heat map of 1q del and 1q dup with CTRL expression would help with context for baseline levels in this isogenic cell line. Please also include additional GABAergic markers GABRA1, GABARB2and GABARG2, (subunits of the most common GABA-A receptor) SOM, VIP, NPY, (other GABAergic interneurons in addition to PVALB) DLX6, EXM1 and for excitatory markers GRIA2, GRIA3 and GRIA4 (all of which have developmentally regulated expression patterns) that will provide more context with the synaptic receptor literature. GRIN2D is expressed only in GABAergic cell types and so I would suggest including this NMDA receptor subunit as well.

      Minor comments:

      1. Additional references (eg. Schaefer et al. 2019) should be discussed in relation to the authors' suggestions of altered neuronal maturity.
      2. The authors show no change in PAX6 expression between genotypes, but significant differences in TBR2 expression between genotypes (Fig. 2C) - this alteration in normal cortical development should be included in results and discussed.
      3. In the introduction (Line 67): The author's state that "alterations in brain size is common in patients with ASD" using one meta-study to support this claim. Further primary studies should be consulted and the authors should give the proportion of the population with ASD and altered brain size to support this statement. In addition, the age range should be supported with primary papers.
      4. Line 73. The authors suggest that the brain growth deviations are "Postnatal stage restrictive". Citations are needed to support this statement.
      5. In the scRNAseq data results please report total cell numbers counted for each cluster and for genotype group.
      6. In the results section (line 269-270) the authors suggest that 1q del cells are in a more mature state because the GABAergic cells are present and glutamatergic genes are similarly altered in 1q dup and 1q del. However, the results from the gene cluster data suggests that there is a very high proportion of progenitor cells (Progenitor 1 and 2 clusters), which seems to argue against faster maturation. This suggests to me that cell fate is being modified here.
      7. Label figures on each page for ms.
      8. Fix typos and heat map legends (currently no colors for log2 fold change in Fig 5 or 6)

      Significance

      Overall the study is clearly described, and the outcomes have been substantiated to a certain degree, but requires a bit more work. This paper does represent a technical 'tour de force' and the authors should be applauded for sticking it out where other labs have so far failed. It might be useful to mention even in brief, of the number of 'failed' (failed or inaccurate) events. The availability of the lines should also be clearly stated.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      Copy number variations in the 1q21.1 loci, deletions and duplications, have been associated with neurodevelopmental disease. In particular, deletions of this locus result in a variety of neuronal phenotypes including microcephaly and schizophrenia in varying levels of severity. Duplications of the 1q21.1 locus are often associated with autism and/or macrocephaly.

      In this study Nomura et al. generated 1q21.1 deletion and duplication hESC lines to study the impact of these CNVs on neuronal development. They generated brain organoids and observed a bidirectional effect of this CNV on organoid size, with 1q21.1 deletion showing smaller brain organoids whereas, the 1q21.1 dup lines grew large than controls. This in line with observed micro and macrocephaly observed in patients. They further analyzed these organoids at the gene expression level using single cell RNAseq and performed some electrophysiological assessment on neurons from of dissociated organoids.

      This study is certainly of interest given the association of this loci with NDDs such as autism, epilepsy and schizophrenia. At this stage, the study is mainly a descriptive study, showing differences between the 1q21.1 del/dup versus controls but also between both the del/dup lines. There is no mechanistic insight provided. For example the 1q21.1 CNV encompasses several genes, of which some have already been linked to micro/macrocephaly (eg. NOTH2NL). More importantly, most of the conclusions drawn by the authors are based on a limited set of experiments/analysis which are not always carefully performed and/or presented. In general, the data presented are premature, therefore not supporting the claims/conclusion made by the author (eg title) This makes the overall impact of this study limited.

      Main comments

      In general, the interpretation of the data is too premature:

      1. The title is not supported in any means by data
      2. Brain organoids size and development: In figure 2 the authors analyzed the development of the organoids. Based on the human phenotype the deletion would lead to smaller brain and the duplication to larger brain organoids. The presented data to support these claims are rather scarce. They indeed provide data on organoid size, however there is no information as to regard how this micro/macrocpehaly comes about. Only limited amount of cell types are being investigated with immunocytochemistry, which give little insight into the mechanism. Fig 3. The authors performed some very basic immunostaining and concluded that the neuronal maturity of 1q del seemed to be accelerated, whereas 1q dup decelerated from the NPC stage. However, there is no direct evidence provided for this. With simple additional immunostainings authors could already get a much better idea of what is going on. For example the authors could measure the amount of differentiating versus proliferating cells, cell cycle exit, etc (eg BrDU, KI67, pHH3 staining,...) Further there are some technical aspect that would need to be resolved:
        • There is a general lack of brain organoid characterization of the controls. It is unclear on how many independent clones these experiments were performed.
        • Fig 2C: it is unclear why brain organoid sizes reduce over time. Is this an indication of increased apoptosis? Did the authors measure this?
        • What is the reason for using t-test with Bonferroni correction as opposed to one -way (or even two-way) Anova is unclear in Fig 2C
        • 2E is unclear how they came to the conclusion that dosage dependent size difference in NPC organoids was caused by the number of cells within an organoid, not by the size of each cell or different cell types. Since they only measured the amount of Sox 2 positive cells and used Sox2 to measure cell diameter, whereas Sox2 is mainly expressed in the nucleus.
        • How do the authors explain that the Dup cells do not express Tubb neither CTIP2, do they only express NPCs and no neurons?

      In short, the characterization of the brain organoids at the level of general development, cell types, proliferation, differentiation is underdeveloped.

      1. Electrophysiological assessment of brain organoids derived neurons: In figure 4 the authors claim that both CNVs (Del/Dup) show hyperexcitability and altered expressions of glutamate system as common features between the Del/Dup lines. The data to support this are however scarce and far from being convincing: The poor quality of the data is represented by images in 4B-E:
        • First the authors chooe to dissociate the organoids prior to measure the cells on MEA's. This takes away the advantage of 3D brain organoids, will add a lot of non-physiological stress, cause cell death and lead to unequal distribution of cells over the electrodes, see fig 4B.
        • MEA recording are meant to measure network activity and heavily (read: fully) dependent on the network being formed. Cherry picking electrodes for analysis is not justified, analysis should be performed per MEA chip not per electrode. Inclusion/exclusion parameters should be defined before analysis
        • MEA parameters such as Mean firing rate (spike/min) and burst rate are very sensitive to plating conditions, especially number of cells and clustering of cell around electrodes (see 4B). Given that the organoids already differ in size and according to the authors in cell number, but also in the amount of starting NPCs, one can expect very different cell densities/cell types per experiment/genotype. The authors should therefore show for every genotype the matching cell culture images. Also with regard to the claims made about GABAergic neurons the cell type composition at the time of the MEA recording should be characterized for every genotype.
        • Fig 4B illustrates the points made above. The fact that no activity is observed in the control cells can be due to many different reasons: unequal plating, stress after dissociating cells, poor coverage of the electrodes, poor maturation, too early measuring time point, etc.... Because the authors have no control over the amount of cells covering the electrodes the data presented here carry very little carry little information. Fig 4B, best illustrates this with large cell clumps and areas without cell bodies. Measurements from these cell cultures are irrelevant and no conclusion can be drawn. We suggest that the authors first benchmark this technique with their own differentiation protocol, show robust and reliable recordings on control cells, and only compare to the CRISPR lines at a time point at which the control cells show a decent amount of activity > 1Hz. When doing so, also reduced activity can be monitored (For examples see, Trujillo et al, Cell Stem Cell2019 or Frega et al 2019 Nat comm).
        • MEAs measure the output of the network (action potentials). In a network, this can be influenced by virtually every neuronal property (morphology, synaptic input, types of synapses, intrinsic excitability, etc). Therefore, the authors cannot conclude only based on fig 4E that the Del/Dup cells are intrinsically hyperactive. To make this conclusion they should measure this directly by assessing that passive and active intrinsic properties of individual neurons. In control condition many electrodes do not give any signal. From these experiments it is impossible to know whether this is because of lack of cell on the particular electrode or real absence of activity. Certainly one could not conclude that the del en dup cell are intrinsically hyperexcitable.

      It seems that from the introduction the authors try to link 1q21 CNVs to epilepsy and ASd, thereby justifying the observed phenotypes.

      • How do the authors reconcile the fact that more mature GABA system is observed in the Del lines with the so called increased activity compared to controls but not to the Dup lines.

      Single cell RNAseq

      • I'm not a specialist on single cell RNAseq, however it seems that the analysis is underdeveloped and conclusion drawn for these experiments premature. It would be essential to validate some of the generated hypothesis, eg GABA maturity and not merely state as a conclusion (eg title).
      • How do the authors explain that a majority of the cells are Glial cells at day 27, and no presence of neurons.
      • How relevant is the changes in the extremely low amounts of GABAergic neurons in the Del cells, no excitatory neurons are present, only NSCs

      Minor comments

      • It is unclear how many clones were assessed per genotype
      • The authors should properly annotate the genotypes 1q21.1 instead of 1q del (line 134)
      • Introduction seems to be somehow off topic since 1q21.1 locus is associated with several neurodevelopmental disorders, including SCZ, but is certainly not specific to ASD and epilepsy. So the premiss on line 86: to study 1q21.1 locus to understand ASD/epilepsy is somewhat misleading. I propose that the introduction would be focussed on the 1q21.1 and not on general on ASD/epilepsy.
      • It is unclear whether they generated heterozygous or homozygous deletions.
      • The authors should cite Fiddes, I. T. et al. Human-Specific NOTCH2NL Genes Affect Notch Signaling and Cortical Neurogenesis. Cell 173, 1356-1369.e22 (2018).
      • Many unclear statements eg line 138: Next, we analyzed each single-cell in an organoid
      • Discussion on E/I is very speculative, not supported by any evidence

      Significance

      The general topic of this study is high interest given the strong association of the 1q21.1 with disease. The authors developed interesting ESC line to study in parallel del and duplication. Unfortunately the level of of analysis performed on these organoids is not up the current stat of the art, are of low experimental quality, analyses are limited. Therefore no clear conclusion can be drawn except for the size of the organoids, very little mechanism is provided. This therefore remains a purely descriptive study for which the presented data are rather on low quality and limited impact in its current shape.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their careful reading, positive feedback and constructive criticisms of our manuscript. Their primary points of concern were that the discussion was too long and too speculative, and that the title did not sufficiently represent our work. We have now cut the discussion in half, and we have also changed the title to more precisely reflect our paper, and made some other minor changes in the text (all highlighted in blue).

      Below, we provide responses to each of the raised issues.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): The study was very well conducted by the group, selecting appropriated methods for achieving the aimed objectives. The sample were abundant and the statistical treatment were suitable for the size of samples, as well to compare different methods used in this study. The results in general were properly exploited by the authors, clearing many aspects of the role/function of the trophallaxis fluid. The results of this manuscript are apparently suggesting that young colonies prioritize the metabolization of carbohydrates, while mature colonies prioritize the accumulation and transmission of stored resources, amongst other processes. This study cleared many aspects about the role/function of the trophallaxis fluid for the colony.

      We are happy the reviewer agrees with our choices of methods, sample sizes, and statistics, and we are pleased that they have come to the same conclusions.

      Even considering the high level of present investigation, still there are some aspects that could be improved by the authors:

      • The text in general is relatively long with an over use of citations of literature;
      • The discussion is interesting, but some times too much speculative; if the authors could attenuate their speculative statements, the text would become more objective and fluid;

      Thank you for this feedback. These comments truly helped us strengthen the manuscript. We have now streamlined the text, cutting down the introduction, cutting in half the discussion and we have made more explicit what is statement and what is speculation (more on this in response to reviewer 2).

      • The results shown in figure 6A and 6D, relative to the processed of neutrophils degranulation and complement cascade, respectively. The authors did not discuss these results; is there a meaning at level of trophallaxis fluid role for the colony ? This was not discussed in the manuscript.

      We thank reviewer #1 for pointing out these results. We have now addressed these terms in lines 277-284 of the discussion:

      “Our gene-set enrichment analysis showed significant enrichment in immunity-related proteins characteristic of phagocytic hemocytes (58) in trophallactic fluid (‘innate immune system’, ‘complement cascade’, ‘neutrophil degranulation’). These results indicate that hemocytes may themselves be transmitted mouth-to-mouth, and generally shows the involvement of the social circulatory system in colony-level immune responses with implications for social immunity.”

      • Considering the very high scientific quality of the present study, the authors could deposit all the raw proteomic data in a international reliable repository of proteins/DNA DB, since it will be required by top journals.

      We wholeheartedly agree, and all data are now shared online through ProteomeXchange.

      Reviewer #1 (Significance (Required)): Significance:the present investigation represents an important contribution for the knowledge the the exchange of signals within the colony, to synchronize the physiology and development of the hive as whole (the concept of superorganism. The existing data about the composition and potential role of the components from tropahallaxis fluid is very small, compared to the present results. The present study is a master piece of knowledge about the importance of eusociality.

      Thank you for recognizing the importance of this study and affirming our work in such a wonderful way!

      **Audience:** all those scientists involved with social insects; biochemists/protomists dedicated to insect biology, biochemistry and physiology. **My expertise:** biochemistry of Arthropods secretion, in special of honeybees, ants and wasps. **Referee Cross-commenting** I think that both reviews aare complementary to each other; both reviews agree with the need to reorganize the text making it more compact and objective. Essentially, the auhtors must focus in the concept of trophallaxis. Thus, the biochemical processes outlined by proteomic analysis should be addressed to explain how the shared physiology of colony works out.

      Our discussion now focuses more on trophallaxis as a whole, and the biomarker-like quality of the changing proteome. We agree the biochemical processes and their role in the shared colony physiology are fascinating topics. We have not yet performed follow-up experiments with the many proteins present in this fluid and thus do not want to over-conclude. We have now stated more clearly in the discussion what the current data can reveal about these topics, what is assumed via orthology, and what needs to be addressed in future studies.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): This ms provides a comprehensive proteomic analysis of the trophallactic fluids extracted from carpenter ants. The analytical methods are state-of-the-art, and the results presented should fuel many studies. The vision of the research program, embodied in the title of the paper, is very exciting and is to be encouraged. However, the title of the paper in no way reflects the content of the paper, as none of the functional processes mentioned have been proven. This will require a lot of work and the development of perhaps new bioassays. I truly hope the PI's lab takes this on a deep and substantial way; the notion of trophallaxis and its socially exchanged fluid has long captivated the fancy of social insect biologists, but with a few specific exceptions, the promise has not yet been realized. The technical and descriptive results presented here lay a strong foundation. For purposes of present publication, I strongly recommend a different title and a revised discussion that reflects the disconnect I outline. Cause/consequence issues need to be addressed.

      We thank reviewer #2 for seeing our vision and that this is indeed foundational work that will “fuel many studies.” We also agree that the title and discussion contained too much speculation. The aim of this paper was to prove that there is systematic variation in trophallactic fluid in natural populations that correlates with biologically important social conditions, and further, that some proteins in this fluid can both act as biomarkers and be informative about underlying molecular processes. We have now communicated this more clearly in the introduction. In the revised version of the paper, we have reduced the speculation, and where appropriate, made it clear when there is speculation.

      For example, discussion lines 233-238:

      “Overall, our data reveal a rich network of trophallactic fluid proteins connected to the principal metabolic functions of ant colonies and their life cycle. Pinpointing contexts that induce changes in trophallactic fluid, along with the exact targets and functions of the proteins, are important subjects for future work. Our establishment of biomarkers transmitted over the social circulatory system that correlate with social life will allow researchers to formulate and test hypotheses on these proteins’ functional roles.”

      Three technical points: 1) Sample sizes are low for some analyses (2/group)--though they are cleverly pooled.

      We are not sure what the reviewer is referring to – none of our sample types had this low sample size (see SI Table 1 for sampling scheme). In contrast, for a proteomics study, our sample sizes are quite high. We are aware that for a study focusing on a natural population, the colony-level sample size of 16 (laboratory colonies) can be considered low, but this has been taken into account in our stringent statistical analyses.

      2) How to distinguish between what animals actually transmit and what is found in the gut? There could be differences.

      This has been addressed in our previous work, where it was shown that the crop content is equivalent to what is exchanged among individuals of this same species during the act of adult-adult stomodeal trophallaxis (Figure 1A, LeBoeuf et al. eLife 2016). We have now clarified this in the methods section of the current paper (line 361-364).

      “Trophallactic fluid was obtained from CO2- or cold-anesthetized workers whose abdomens were gently squeezed to force them to regurgitate the contents of their crops. This method of collection was shown previously to correspond to the fluid shared during the act of adult-adult stomodeal trophallaxis (17).”

      3) Is there evidence that the substances found are not just the product of digestion of ingested food? The differences between lab and field colony samples supports this.

      In the type of proteomic analysis we have performed (the most commonly used proteomics approach when a genome is available), we detect only proteins found in the reference genome of interest (in our case Camponotus floridanus), so excepting cannibalism, we should not see proteins that originate from food. Note that this is why we do not provide lab colonies with the typical lab-reared ant diet that includes honey, as bees are also Hymenoptera, and royal jelly and trophallactic fluid have many proteins in common. Cannibalism could result in trace observation of many proteins, but could not produce the consistent and high-abundance set of proteins that we have observed as they are not produced in those precise ratios in larvae or adults.

      The observed shift in trophallactic fluid from field to lab may reflect a change in diet or microbiome and these are questions that could be further investigated in future work (mentioned in lines 229-232). The clear difference we observe between trophallactic fluid of young and mature colonies, or the difference between the worker castes within a colony, is evidence that the variation observed in trophallactic fluid reflects more than diet.

      “Trophallactic fluid complexity declines over time when colonies are brought from the field to the laboratory. This may reflect dietary, microbiome or environmental complexity – typical of traits that have evolved to deal with environmental cues and stressors (e.g. immunity, (37)).”

      Reviewer #2 (Significance (Required)): The paper addresses a very important topic that should be of widespread interest to social biologists. Journal choice should reflect that this is a technically excellent paper that presents descriptive information but functional significance is highly speculative.

      We appreciate that the reviewer agrees that our results are of widespread interest to social biologists. Indeed, our results must be somewhat descriptive, as we are working on a mostly unexplored socially exchanged fluid in a natural population. However, our study design tests clear hypotheses with preplanned sampling and experimental transfer of ant colonies to a new laboratory environment. We present confirmatory results of the hypothesis that trophallactic fluid is complex mixture of biomarker-like molecules and that these biomarkers can be used predict sample origin through machine learning (see random forest predictions, emphasized in lines 151-152). The fact that our evidence for this is correlative does not render it speculative. Indeed, in both ecology and in much of medicine, using correlative evidence is the norm, as it is often impossible to manipulate ecosystems, natural populations and some organisms in a safe and controlled manner. This is what convinced us to invoke the term ‘biomarkers,’ as biomarkers are excellent examples of molecular correlates of larger conditions that have spurred advances in biology and medicine.

      Some of the next steps in our research will be, as reviewer #2 suggested, additional studies on the roles of individual compounds of trophallactic fluid, building on the results of this paper. Additionally, while this study may not have explored the roles of specific molecules, open ended exploration is extremely important and necessary for any scientific advancement in the long run (eLife 2020;9:e52157).

      All in all, we are grateful for this comment, as it showed us that we must communicate the aims of our work more clearly – which we have now done both in introduction (line 77-91) and throughout the discussion.

      **Referee Cross-commenting** Yes. Most of the discussion is pure speculation because we do t k ow what is exchanged and what the modes of action might be. But it's a great start!

      We have reduced the speculation on the roles of single molecules, and we hope our responses to the points above clarify some of the reviewer’s uncertainties about what is exchanged. However, we do still outline hypotheses for potential functions and origins in the discussion section, as this study is intended to be a foundation for new lines of research.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This ms provides a comprehensive proteomic analysis of the trophallactic fluids extracted from carpenter ants. The analytical methods are state-of-the-art, and the results presented should fuel many studies. The vision of the research program, embodied in the title of the paper, is very exciting and is to be encouraged. However, the title of the paper in no way reflects the content of the paper, as none of the functional processes mentioned have been proven. This will require a lot of work and the development of perhaps new bioassays. I truly hope the PI's lab takes this on a deep and substantial way; the notion of trophallaxis and its socially exchanged fluid has long captivated the fancy of social insect biologists, but with a few specific exceptions, the promise has not yet been realized. The technical and descriptive results presented here lay a strong foundation. For purposes of present publication, I strongly recommend a different title and a revised discussion that reflects the disconnect I outline. Cause/consequence issues need to be addressed.

      Three technical points:

      1) Sample sizes are low for some analyses (2/group)--though they are cleverly pooled.

      2) How to distinguish between what animals actually transmit and what is found in the gut? There could be differences.

      3) Is there evidence that the substances found are not just the product of digestion of ingested food? The differences between lab and field colony samples supports this.

      Significance

      The paper addresses a very important topic that should be of widespread interest to social biologists.

      Journal choice should reflect that this is a technically excellent paper that presents descriptive information but functional significance is highly speculative.

      Referee Cross-commenting

      Yes. Most of the discussion is pure speculation because we do t k ow what is exchanged and what the modes of action might be. But it's a great start!

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The study was very well conducted by the group, selecting appropriated methods for achieving the aimed objectives. The sample were abundant and the statistical treatment were suitable for the size of samples, as well to compare different methods used in this study.

      The results in general were properly exploited by the authors, clearing many aspects of the role/function of the trophallaxis fluid. The results of this manuscript are apparently suggesting that young colonies prioritize the metabolization of carbohydrates, while mature colonies prioritize the accumulation and transmission of stored resources, amongst other processes. This study cleared many aspects about the role/function of the trophallaxis fluid for the colony.

      Even considering the high level of present investigation, still there are some aspects that could be improved by the authors:

      • The text in general is relatively long with an over use of citations of literature;
      • The discussion is interesting, but some times too much speculative; if the authors could attenuate their speculative statements, the text would become more objective and fluid;
      • The results shown in figure 6A and 6D, relative to the processed of neutrophils degranulation and complement cascade, respectively. The authors did not discuss these results; is there a meaning at level of trophallaxis fluid role for the colony ? This was not discussed in the manuscript.
      • Considering the very high scientific quality of the present study, the authors could deposit all the raw proteomic data in a international reliable repository of proteins/DNA DB, since it will be required by top journals.

      Significance

      Significance:the present investigation represents an important contribution for the knowledge the the exchange of signals within the colony, to synchronize the physiology and development of the hive as whole (the concept of superorganism.

      The existing data about the composition and potential role of the components from tropahallaxis fluid is very small, compared to the present results. The present study is a master piece of knowledge about the importance of eusociality.

      Audience:

      all those scientists involved with social insects; biochemists/protomists dedicated to insect biology, biochemistry and physiology.

      My expertise:

      biochemistry of Athropods secretion, in special of honeybees, ants and wasps.

      Referee Cross-commenting

      I think that both reviews aare complementary to each other; both reviews agree with the need to reorganize the text making it more compact and objective. Essentially, the auhtors must focus in the concept of trophallaxis.Thus, the biochemical processes outlined by proteomic analysis should be addressed to explain how the shared physiology of colony works out.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #2 (Evidence, reproducibility and clarity):

      This paper attempts to address a current, clinically relevant question utilizing novel statistical modeling. The authors comprehensively assessed the presence of criteria and non-criteria aPL in a heterogeneous cohort of 75 COVID patients and 20 non-infected controls. They found 66% of COVID patients had positive aPL and demonstrated a correlation between aPL and anti-SARS-CoV-2. However, I have several major concerns:

      1. The cohort is extremely heterogeneous. COVID-19 samples that were used included hospitalized patients and those who had COVID more than 2 months ago and were convalesced (29% of samples). Severity of disease does influence autoreactivity and the presence of autoantibodies. The prevalence of autoantibodies among patients who are acutely ill will be much different than those who are convalesced. I think it would be prudent to assess the presence and correlation of aPL among those two groups separately.

      We thank you for pointing out the complexity of our study population, consisting of multiple cohorts from different centres. Exactly the above-mentioned heterogeneity of our cohorts and their variables is the reason why we employed linear mixed-effect models. Linear mixed-effect smodels, accounting for both fixed as well as random effects, are suitable to address potentially confounding factors. Along these lines, disease severity (different in the convalescent and the acutely ill individuals) as well as the relation of the time of sampling to time of disease occurrence (days post onset of disease manifestation) were included as fixed effects in our mixed model. Thus, our model accounts for potential differences between the acute phase of infection and convalescent phase and would capture them if relevant.

      In order to increase the rigour, we have performed an additional analysis where we excluded the convalescent individuals from the model (see Fig. 3C). The results obtained are in line with results already shown (Fig. 3B, 3D).

      In general, we have pursued a largely data-driven exploratory, and not a hypothesis-driven, approach. Clearly, we could have decided to set a stringent focus on a cohort without complexity. Yet, our approach encourages heterogeneity, which we address using an adequate model. Since, perhaps, the model choice, the model itself, and the data-driven approach were not explained extensively enough, we have added a more detailed account in the manuscript, lines 317-334 and lines 394-403.

      1. Sampling of the patients is concerning, 35% are plasma and 65% are serum. It is undesirable to put data from plasma and serum together to perform analysis.

      We thank the reviewer for raising this important concern. We have aimed to be as rigorous and transparent as possible in the description of the cohorts (see Tables 1 and 2) for serum/plasma). While we agree that, in general, it would be best if either only plasma (i.e., only heparin plasma or only EDTA plasma) or only serum was used, the authors wish to clarify that for both SARS-CoV-2 IgG profiling as well as for LIA, plasma or serum can be used interchangeably. We can formally show this. We have conducted a SARS-CoV-2 IgG profiling experiment on patient-matched samples (plasma and serum). Data is unambiguous about that there is no effect of plasma or serum on the assay outcome (Fig. S3A and S3B), with a Pearson correlation coefficient of 0.9942 (95% confidence interval: 0.9865-0.9975) and R2 of 0.9885. Bland-Altman analysis does not indicate any significant bias (Fig. S3C).

      For the detection of APS antibodies with ELISA, literature is suggestive of no relevant interference by the usage of plasma or serum on the measured value (Pham et al., 2019). To formally reassess this, we measured aPL autoantibodies with LIA in one matched plasma and serum sample of an individual with high-titre aPL antibodies and of one high-titre individual whose plasma was spiked into non-reactive plasma and serum (Fig. S2A and Fig. S2B). We found the same pattern of IgM and IgG aPL-positivity in both matched serum and plasma samples as well as in spiked serum and plasma samples, with a Pearson correlation coefficient of 0.9974 (95% confidence intervals: 09611-1.034) and R2 of 0.9813 (Fig. S2A). Bland-Altman analysis did not indicate a significant bias (Fig. S2B).

      We therefore conclude that in our study, using both plasma as well as serum has no effect on the validity of our results.

      1. LIA based assays were used to assess the presence of aPL and results were reported in OD rather than standardized units. While the same group demonstrated a positive correlation in the past between LIA OD and internationally accepted ELISA-based aPL assays, the validity and clinical utility of these LIA assays still require further evaluation. Furthermore, OD>50 was used as a positive cut-off. How this cut-off was determined and how it relates to internationally accepted positive aPL cut-offs (99th percentile or greater than 40) remains unclear.

      We thank the reviewer for mentioning concerns on LIA. The validity of this technology has been confirmed in multiple peer-reviewed publications (Roggenbuck et al. Arthr Res Ther 2016;18:11, Nalli et al. Autoimmunity Highlights 2018;9,6). In terms of cut-off detection, processed strips were analysed densitometrically employing a scanner with the evaluation software Dr. DotLine Analyzer (GA Generic Assays GmbH). The cut-off of 50 OD units was determined by calculating the 99th percentile of 150 apparently healthy individuals as recommended by the international classification criteria for aPL testing and Clinical and Laboratory Standards Institute (CLSI) guideline C28-A3 (Roggenbuck et al. Arthr Res Ther 2016;18:11, Nalli et al. Autoimmunity Highlights 2018;9,6). A corresponding sentence has been added to the METHODS AND MATERIALS section.

      For our study, we aimed to perform the maximum number of tests possible with limited sample volume and have therefore chosen LIA. We are aware of the discussion on internationally accepted cut-offs for clinical APS diagnostics. However, we would like to point out that our manuscript is not a case report on patients diagnosed with APS, nor do we aim to modify diagnostic standards set in the international consensus statement for the classification criteria for definite APS (established in 2006).

      Moreover, the OD ≥ 50 was used as a cut-off in one analysis (with Fisher’s exact test for statistics) in our manuscript and was re-assessed using Mann-Whitney/Wilcoxon rank sum test on a continuous scale (Fig. 1C and 1D). All subsequent analyses were not contingent on an OD cut-off. We believe that this is clearly stated in the manuscript.

      1. While the authors attempted to evaluate the presence of both IgG and IgM aPL in COVID patients, only 65% of samples were tested for both IgG and IgM aPL.

      We agree that testing the entire collective for IgG and IgM isotypes would have been best. In fact, we would have been interested in also including the IgA isotype. Inconveniently, sample volume is sometimes limiting.

      We have been clear about the omission of IgG aPL measurements in the samples from Zurich (see lines 214-215). We consider this a limitation, however, our data indicated that IgM aPLs are more immediately relevant in the context of SARS-CoV-2. While this has been surprising to us, we would like to highlight that this is a manifestation of the quality of a data-driven approach where data, much more than belief, build the foundation for conclusions. Along these lines, we could have easily omitted all data on IgG aPLs without compromising the message contained in our manuscript. However, we stand behind our decision to show all data even if, in the case of IgG aPL, (1) they are mostly negative and (2) they are incomplete.

      1. 26 patients had anti-SARS-CoV-2 data already available. Whether those were tested on the same samples and at the same time points as aPL ais not clear.

      We apologise for not having been clear about this in the text. The 26 samples from Zurich had been included in another study where their respective anti-SARS-CoV-2 Spike ECD, RBD, and NC p(EC50) values were used (Emmenegger et al., 2020). Thus, the p(EC50) values have been re-used in the current manuscript. The aPL autoantibodies were measured on exactly the same samples. We have tried to improve the explanation of this in the text, see lines 300-301.

      1. The novel statistical modelling design is interested. However, as there are concerns about the data put into the modelling, the validity of the conclusions is debatable.

      We thank the reviewer for being interested in the statistical model we used. Linear regression analysis belongs to the standard equipment when performing epidemiological analyses (see e.g., Szklo, Nieto, Epidemiology: Beyond the Basics). Here, we have employed a linear mixed-effects model to infer changes in the predictive power of fixed and random variables (e.g. SARS-CoV-2 IgG levels, disease severity, age, sex, days post onset of disease manifestation), to determine which of these variables reliably predict an outcome (e.g. PT aPL levels), and in what combination.

      We recognised that the manuscript would benefit from a more thorough explanation of the model and how it helps to evaluate the validity of the data. We have therefore added lines 317-334 in the manuscript.

      All authors are appreciative of the reviewer’s critique. In the light of the answers we provided, we are convinced about our conclusions, based on the data and our dataset. We hope that, with our responses, we have adequately addressed the concerns raised by the reviewer.

      Reviewer #2 (Significance):

      See above.

      Reviewer #3 (Evidence, reproducibility and clarity):

      It is being recognized that SARS-CoV-2 infection leads to acquired thrombophilia with increased arteriovenous thrombosis and endothelial injury and organ damage. This has multiple mechanisms including, the hypercoagulable state with platelet activation, endothelial dysfunction, increased circulating leukocytes, cytokines and fibrinogen, but also the acquired thrombophilia could be due to acquired APS in these patients. In this study, Emmenegger et al. evaluated aPL antibody responses in SARS-CoV2 infected individuals in connection with antibodies against the SARS-CoV2 components and found that antibody strength response against SARS-CoV-2 proteins is associated with PT IgM aPL antibody

      Reviewer #3 (Significance):

      This is overall an interesting and thought-provoking study, as it may explain the development of thrombophilia after SARS-CoV-2 vaccination. While the study provides a possible association of the development of antibodies against SARS-CoV-2 infection and aPL, it does not go to molecular details about the homology between anti- SARS-CoV-2 antibodies and aPL. Therefore, the study remains an association study.

      First of all, we would like to thank the reviewer for the careful evaluation of our work. We are in full consciousness of the descriptive nature of our work. Thanks to the suggestion of the reviewer (see below), we have aimed to go one step further into a more functional/ mechanistic description.

      It is not surprising that they found a difference in IgM rather than IgG as IgM development is an early response.

      The overall conclusion is supported by the rigorous statistical analyses, yet the study remains a correlative and association study.

      Significance: Thrombophilia associated SARS-CoV2 may be due to immunity against SARS-CoV2 rather than that pure cytokine response.

      Furthermore, they did not characterize the PT IgM aPL to find which part could be immunogenic or epitope similarity with anti- SARS-CoV-2 antibodies. Identification of these epitopes is crucial for further understanding of the antibody development and further intervention.

      Existing literature does not connect with antibody responses against Sars-CoV2.

      Could the authors provide some molecular epitope analysis of IgM aPl and ani Sars_ antibodies? Even computation analysis will improve the paper tremendously.

      We thank the reviewer for coming up with this idea. Clearly, the presence of cross-reactive IgM antibodies to human prothrombin, triggered against the SARS-CoV-2 Spike protein, would be a direct and simple explanation for our observation. We have put efforts into analysing epitopes of SARS-CoV-2 Spike protein and prothrombin (see lines 374-390 in the manuscript and Fig. 4). We conclude there is very limited similarity, and that the mechanism is most likely indirect.

      There is no ethical concern.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      It is being recognized that SARS-CoV-2 infection leads to acquired thrombophilia with increased arteriovenous thrombosis and endothelial injury and organ damage. This has multiple mechanisms including, the hypercoagulable state with platelet activation, endothelial dysfunction, increased circulating leukocytes, cytokines and fibrinogen, but also the acquired thrombophilia could be due to acquired APS in these patients. In this study, Emmenegger et al. evaluated aPL antibody responses in SARS-CoV2 infected individuals in connection with antibodies against the SARS-CoV2 components and found that antibody strength response against SARS-CoV-2 proteins is associated with PT IgM aPL antibody.

      Significance

      This is overall an interesting and thought-provoking study, as it may explain the development of thrombophilia after SARS-CoV-2 vaccination. While the study provides a possible association of the development of antibodies against SARS-CoV-2 infection and aPL, it does not go to molecular details about the homology between anti- SARS-CoV-2 antibodies and aPL. Therefore, the study remains an association study.

      It is not surprising that they found a difference in IgM rather than IgG as IgM development is an early response.

      The overall conclusion is supported by the rigorous statistical analyses, yet the study remains a correlative and association study.

      Significance: Thrombophilia associated SARS-CoV2 may be due to immunity against SARS-CoV2 rather than that pure cytokine response.

      Furthermore, they did not characterize the PT IgM aPL to find which part could be immunogenic or epitope similarity with anti- SARS-CoV-2 antibodies. Identification of these epitopes is crucial for further understanding of the antibody development and further intervention.

      Existing literature does not connect with antibody responses against Sars-CoV2.

      Could the authors provide some molecular epitope analysis of IgM aPl and ani Sars_ antibodies?. Even computation analysis will improve the paper tremendously.

      There is no ethical concern.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This paper attempts to address a current, clinically relevant question utilizing novel statistical modeling. The authors comprehensively assessed the presence of criteria and non-criteria aPL in a heterogeneous cohort of 75 COVID patients and 20 non-infected controls. They found 66% of COVID patients had positive aPL and demonstrated a correlation between aPL and anti-SARS-CoV-2. However, I have several major concerns:

      1. The cohort is extremely heterogeneous. COVID-19 samples that were used included hospitalized patients and those who had COVID more than 2 months ago and were convalesced (29% of samples). Severity of disease does influence autoreactivity and the presence of autoantibodies. The prevalence of autoantibodies among patients who are acutely ill will be much different than those who are convalesced. I think it would be prudent to assess the presence and correlation of aPL among those two groups separately.

      2. Sampling of the patients is concerning, 35% are plasma and 65% are serum. It is undesirable to put data from plasma and serum together to perform analysis.

      3. LIA based assays were used to assess the presence of aPL and results were reported in OD rather than standardized units. While the same group demonstrated a positive correlation in the past between LIA OD and internationally accepted ELISA-based aPL assays, the validity and clinical utility of these LIA assays still require further evaluation. Furthermore, OD>50 was used as a positive cut-off. How this cut-off was determined and how it relates to internationally accepted positive aPL cut-offs (99th percentile or greater than 40) remains unclear.

      4. While the authors attempted to evaluate the presence of both IgG and IgM aPL in COVID patients, only 65% of samples were tested for both IgG and IgM aPL.

      5. 26 patients had anti-SARS-CoV-2 data already available. Whether those were tested on the same samples and at the same time points as aPL ais not clear.

      6. The novel statistical modelling design is interested. However, as there are concerns about the data put into the modelling, the validity of the conclusions is debatable.

      Significance

      See above.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Full Revision

      Manuscript number: RC-2021-00785

      Corresponding author: Christian, G. Specht

      1. General Statements

      Dear Editor,

      We greatly appreciate the reviewers’ constructive comments on our manuscript ‘Identification of a stereotypic molecular arrangement of glycine receptors at native spinal cord synapses’. We were particularly pleased that all four reviewers agreed that our data yield new insights into the structure of inhibitory glycinergic synapses, and represent both a technical and conceptual advance the field of synaptic neuroscience.

      The reviewers have consistently raised one main criticism, namely the use of endogenously expressed GlyRs tagged with the fluorescent protein mEos4b, which could potentially have an impact on receptor expression, trafficking and function. We have addressed this point by performing whole-cell recordings of GlyR currents in cultured neurons that show that glycinergic transmission and therefore function is preserved. We have also addressed all other comments of the reviewers in the revised manuscript, including a thorough revision of the text and the addition of new data and figures as detailed in the point-by-point response.

      Point-by-point description of the revisions

      Reviewer 1:

      Summary:

      In this manuscript Maynard et al describe a newly generated knockin mouse to study the endogenous distribution of Gly receptors in the spinal cord. Using quantitative confocal imaging and SMLM the distribution and levels of GlyRs at spinal cord synapses is compared between dorsal and ventral horn. They found that levels of synaptic GlyR are higher in dorsal than ventral spinal cord synapses. Nevertheless, the ratio to gephyrin seems constant, except for synapses in superficial layers of the dorsal horn, where gephyrin levels exceeded the levels of GlyRs. There are also fewer, but larger synapses in the ventral horn than in the dorsal horn. These findings are further corroborated by an SR-CLEM approach. Furthermore, it is shown that in a mouse model for hyperekplexia GlyR levels are lower, but still enriched at synapses, and the dorsal-ventral gradient in GlyR expression was maintained. The difference in size of ventral and dorsal synapses observed in WT animals was also lost in the oscillator mouse, suggesting that particularly the ventral synapses are affected. Despite these differences, the density of GlyRs per synapse remained similar.

      Major comments:

      Line 113: "labeling the_ _b__-subunit has proven difficult". This statement is unclear and it would be informative for readers to grasp what exactly has been difficult, and why the approach described here overcomes that? Related to that, the authors state "KI animals reach adulthood and display no overt phenotype, suggesting that the presence of the N-terminal fluorophore does not affect receptor expression and function". That is indeed reassuring, but it does not exclude that receptor numbers, function and distribution are altered. As it seems there is no prior literature on tagging the beta subunit, additional evidence that the tag does not interfere with receptor trafficking or functioning would be desirable

      We have clarified why it has been difficult to label the GlyR beta subunit until now, lines 113-115 _“To date, labeling of GlyRβ in situ using immunocytochemistry has proven difficult due to a lack of reliable antibodies that recognize the native β-subunit (only antibodies for Western blotting recognizing the denatured protein are available), which has severely limited the study of the receptor.”_ Hence it was important to us to generate this knock-in mouse in order to study the endogenous GlyR at synapses, which is the least well studied receptor mediating fast synaptic transmission.

      The reviewer makes an important point regarding the labeling of the GlyRβ-subunit with a fluorescent protein that has also been raised by the other reviewers. We have now verified receptor function by patch clamp recordings of glycine currents in whole-cell configuration in spinal cord neuron cultures from the mEos4b KI mouse (new Supplementary Fig. S2C). At saturating glycine concentrations of 300 μM we found no difference in chloride influx between mEos4 KI and WT mice. Since glycine concentrations in the synaptic cleft are in the millimolar range during synaptic transmission, these data strongly suggest that glycinergic transmission is not affected by the presence of the mEos4b under physiological conditions, despite a minor shift in the EC50.

      There are several other strong arguments that suggest that mEos4b-GlyRb expression, subcellular localization and function are the same as those of the native subunit. Firstly, the mEos4b sequence was inserted after the signal peptide and before the beginning of the coding sequence of the mature β-subunit (Fig. S1). Since the mEos4b sequence does not interrupt the coding sequence it is less likely to affect the receptor conformation. Secondly, we did not notice any behavioural phenotypes in animals carrying the GlrbEos allele. At the time of weaning, the genotypes of the pups corresponded to the expected Mendelian frequency (new Fig. S2A). Moreover, we did not observe a reduction in live expectancy of GlrbEos/Eos animals (new Fig. S2B), demonstrating that the mEos4b-GlyRb does not cause pathology in older animals.

      Most importantly, our imaging data (Fig. 1-3) provide exhaustive evidence that mEos4b-GlyRb assembles with GlyR alpha subunits as heteropentameric receptor complexes that are trafficked to the plasma membrane and inserted into the synaptic membrane due to their interaction with the gephyrin scaffold at functional synapses. Using quantitative imaging, we have also shown that homozygous GlrbEos/Eos KI mice have exactly twice the number of receptors at synapses as heterozygous animals, strongly suggesting no interference in receptor trafficking to the plasma membrane and gephyrin binding. As the mEos4b mice were also bred with the oscillator mouse model of hyperekplexia, which is lethal when homozygous, we could further test the combined effect of GlrbEos and GlyRa1spt-ot. The presence of both alleles did not lead to any noticeable phenotypes in heterozygous oscillator mice. On the contrary, both synaptic targeting and the packing density of the receptors were not altered in this model, despite a region-specific reduction in synapse size due to the reduced availability of the intact GlyRa1 subunit.

      We believe that these data overwhelmingly support our conclusion that the presence of the mEos4b tag does not alter the structure and function of the receptor, making this mouse model uniquely suited to study the dynamics and regulation of glycinergic synapses in a quantitative manner and at the molecular level.

      In the Discussion the authors conclude that "Our quantitative SR-CLEM data lend support to the first model, whereby inhibitory PSDs in the spinal cord are composed of sub-domains that shape the distribution of the GlyRs". This conclusion seems however based on one example image in Fig 3G that is not very convincing. The EM image seems to show two clearly separated PSDs opposed by two distinct active zones. So, although this conclusion is of high interest, more support should be given to substantiate this conclusion. More general, these subsynaptic domains (SSDs) are hardly further explored, but seem relevant for transmission, particularly given that the synaptic pool of GlyRs at these synapses is not saturated by single release events. How general are these SSDs at these synapses?

      The representative image in Fig. 3G shows two SSDs within the same postsynaptic site with a continuous presynaptic active zone. It should be noted that the PALM/SRRF images were taken of the entire 2 µm thick slice, whereas the electron micrograph shows only a single 70 nm section. We verified throughout the full 3D stack of serial sections that the presynaptic site remains continuous, which it does. We would also like to point out the scale of the image showing that the two SSDs are only around 170 nm apart, i.e. spatially very close. Our conclusions are however not based on this single image but the whole dataset. The graph in Fig. 3I shows 3 synapses (out of N = 36), in which the GlyR density at separate SSDs could be quantified, demonstrating that the receptor density is not different between SSDs. The reviewer is correct that we do not further analyse the SSDs beyond their density and the analysis of the segmentation of the postsynaptic sites (Fig. 3E-G). Further work on the functional role of SSDs in synaptic transmission is outside the scope of this manuscript and would indeed merit future study.

      The approach for counting molecules based on the PALM acquisition has been developed in prior publications and seems robust. It would however be worth to present the reader with a bit more background and explain the assumptions of this approach in more detail. Particularly, since counting of mEos4b can be problematic, as there are multiple dark and fluorescent states of this fluorophore that could be influenced by the illumination scheme, see for instance De Zitter et al., Nat Methods 2019. Since the preceding SRRF acquisition already exposes the fluorophore to high and continuous 561-nm laser power this could skew the counting due to unaccounted conversion and perhaps bleaching of mEos4b. In line with this, although throughout the manuscript the term 'absolute copy numbers' is used the reported numbers are at best an estimate based on a number of assumptions. I think the wording 'absolute numbers' is therefore deceiving and should be nuanced.

      We have clarified how the molecule conversion is calculated (Fig. S7 legend), to provide a more complete description of the way in which the values were obtained. Further we have explained how we calculated the probability of detection. Since the probability of detection accounts for any unconverted or non-functional mEos4b molecules, our molecule counting approach is relatively resistant to potential pre-bleaching of fluorophores. It should be noted, that 561 nm illumination had no obvious effect on the non-converted (green) mEos4b fluorophores, as judged by the fact that the intensity of receptor puncta was unaffected by the SRRF recordings. We appreciate the reviewers point regarding the term ‘absolute copy number’ and we have adjusted our wording throughout the manuscript accordingly.

      Related, most of the quantifications are in estimating the number of receptors, and not so much the distribution with the PSD. The term "molecular arrangement" - also used in the title - might therefore be misleading, there is in fact little characterization of how GlyRs are placed within the PSD. More focused analysis quantifying the distribution of receptors within the PSD and/or SSDs would strengthen the manuscript.

      By estimating the number of receptors and the exact size of synapses, the main conclusion of our study is that receptor density at dorsal and ventral synapses is identical, independent of synapse size, subdomains, or in fact loss of GlyRs in a mouse model of hyperekplexia. This observation clearly relates to how receptors are packed within synapses, and thus describes their molecular arrangement.

      The reported N is confusing and makes it hard to judge the reproducibility of the data. Sometimes it refers to number of images, sometimes number of synapses, but it is unclear from how many experiments these are drawn. This should be reported more completely (number of animals should be reported at least) and consistently. In figure 1, the N numbers (N=3-5 images) are particularly low and question how consistent these findings are across multiple animals.

      We have clarified the N in the figure legends, to reflect the full size of the datasets that have been analysed.

      The levels of mRFP-Gephyrin seem to differ between the different mouse lines, is this a significant difference?

      No significant differences in mRFP-gephyrin levels were found in animals with different mEos4b-GlyRb genotype (Fig. 1B). However, expression of mRFP-gephyrin in heterozygous animals is 50% of that in homozygous mRFP-gephyrin KI animals (not shown).

      The ICQ analysis for co-localization is hardly explained. How do we interpret this parameter? What does an average value of ~0.3 mean? A comparison with sets of proteins that do not overlap as a negative control would strengthen the conclusion.

      We have clarified that an ICQ value of 0.3 is indicative of a very high spatial correlation between pixels, and provided a corresponding reference for ICQ analysis (lines 209-210). We would like to point out that the scale of the ICQ is between -0.5 to 0.5, meaning that a value of 0.3 comes close to complete correlation.

      Minor comments:

      Very little fluorescence was detected in the forebrain, despite the high reported expression of the Glrb transcript". Can the authors expand on this? What would explain this discrepancy?

      We have clarified the text to include “suggesting that protein levels are controlled by post-translational mechanisms in a region-specific manner, as previously proposed (Weltzien et al., 2012)” (Lines 152-153). The reason for this discrepancy is not known. However, the distribution of mEos4b expression throughout the brain is as expected, based on the literature.

      "What region is quantified in Fig 1B? is the same region in all conditions? This should be specified more clearly as the manuscripts presents a clear gradient in expression levels in the spinal cord and thus the location will influence the intensity measurements.

      We have explained in the text that this is the region at the centre of the ventral horn identified by the white square in Fig. 1A, and that the same region was analysed for all images across all animals. Page 5, lines 160-161 “The same region of the ventral horn, indicated by the white square in Fig. 1A was taken for quantification of mEos4b-GlyRβ and mRFP-gephyrin expression in all conditions.”

      The labeling approach does not differentiate between surface and internal receptors, this should be made more explicit in the text.

      Whilst this is correct, we have only analysed mEos4b-positive synapses that had corresponding gephyrin clusters, meaning synapses where receptors are located in the postsynaptic membrane. Indeed we found that all mEos4b clusters imaged colocalised with mRFP-gephyrin clusters. We have adjusted the text accordingly, page 6, line 205-206 “All mEos4b-GlyR clusters closely matched the mRFP-gephyrin clusters, confirming the localization of the receptors in the postsynaptic membrane.”

      Significance:

      The presented data are interesting and the experiments are technically advanced and carefully performed. Particularly the SR-CLEM approach is technically advanced. The datasets present a quantitatively detailed characterization of spinal cord synapses and will be of interest for researchers working in the field of spinal cord circuitry, as well as super-resolution imaging. The conceptual advance for the field is however somewhat limited. It seems that the presented data confirm the general notion that receptor numbers and synapse size are highly correlated. So, although this manuscript describes very interesting observations, in its present form the manuscript does not provide any new mechanistic insight or significant advance in our understanding of how these synapses operate.

      We thank the reviewer for his/her comments relating to the technicality of our manuscript. However we think that the statement “The conceptual advance for the field is however somewhat limited” is unfair, as this level of organisation of inhibitory synapses at the molecular scale has never been achieved before, as pointed out by the other reviewers, and especially not as regards different ages of animals and a disease model that directly affects receptor numbers in a region-specific manner. We therefore believe that our study will have a substantial impact within the fields of synaptic neuroscience as well as quantitative neurobiology.

      Referee cross-commenting:

      I agree with the other reviewers that this study is technically advanced, but I remain critical towards the extent of conceptual advancement this study brings and there are some important concerns with the presented data that need to be addressed. Nevertheless, indeed many of these concerns can be addressed without additional experiments. As pointed out also by other reviewers additional validation that the fusion proteins are not disrupting their function or organization would be important.

      Reviewer 2:

      Summary:

      Maynard et al. investigate (inhibitory) glycinergic synapses in mouse spinal cord, which regulate motor and sensory processes. The authors analyse the molecular architecture and ultra-structure of these synapses in native spinal cord tissue using quantitative super-resolution correlative light and electron microscopy. The major finding is that GlyRs exhibit equal receptor-scaffold occupancy and constant absolute packing densities across the spinal cord and throughout adulthood, although ventral and dorsal inhibitory synapses differ in size. Moreover, what the authors call a „stereotypic arrangement" is even maintained in a hypomorphic mutant (oscillator), which is deficient in the adult GlyR a1 subunit.

      Specific comments:

      To reach their conclusions the authors generate two knock-in mouse lines, one with mEOS-labelled GlyR ß-subunit and one with mRFP-labelled gephyrin, a subsynaptic scaffolding protein of inhibitory synapses, which are subsequently crossed. Both changes are not unproblematic, as mutations in the N-terminal end of the GlyR ß subunit polypeptide chain might interfere with the assembly of functional GlyR (consisting of a und ß subunits) and and mutations at the N-terminal end of gephyrin interfere with it's homo-oligomerization into higher molecular assemblies.

      We have demonstrated that the function of mEos4b-GlyRb does not differ significantly from WT GlyRs, by carrying out electrophysiological experiments (new Fig. S2C). For a detailed response, please see the response to the first comment of reviewer 1. The mRFP-gephyrin KI strain has been validated and published previously (see Machado et al., 2011, J Neurosci; Specht et al. 2013 , Neuron) and was not specifically generated for this study. The experiments with the oscillator mutant did not include the mRFP-gephyrin allele. In these experiments, the wildtype GlrbEos/Eos (Fig. 4, 5) behaves exactly as the GlrbEos/Eos in the double knock-in (Fig. 1, 2), further validating the mouse models used.

      However, in this experimental design both labelled proteins reach postsynaptic membrane specialisations. In case of the ß-subunit quantitative evaluation confirms that heterozygous animals contain only half of the labelled protein as homozygous, which is an indication but not a proof that the correct stoichometry of adult GlyR is maintained. Likewise, mRFP-labelled gephyrin assembles with WT-gephyrin in subsynaptic domains, but it is not clear, if the size and density of the synapses is changed by the knock-in procedure as compared to WT-synapses.

      An effect of the mRFP tag on gephyrin clustering can be ruled out, since we observed no difference in synapse size and receptor density in GlrbEos/Eos animals with (Fig. 1, 2) and without the GphnmRFP allele (Fig. 4, 5, oscillator wild-type controls). Similarly, the synaptic mEos4b-GlyRb levels in heterozygous animals were precisely half those of the homozygous animals, strongly suggesting that the expression and trafficking of the tagged receptor subunit is unchanged, as the reviewer acknowledges. In the absence of any obvious behavioural and/or functional phenotypes (Fig. S2) this KI model is in our view is an exceptional tool to study GlyRs expressed at endogenous levels in a cell-type specific manner.

      Accepting these constraints, which to the knowledge of this reviewer have never been addressed to satisfaction, the authors provide a technically excellent, comprehensive analysis of glycinergic synapses in the spinal cord of double knock-in mice. Therefore, it should be stated in the title, that the investigations were performed with double knock-in instead of „native" spinal cord. Text and figures are clear and accurate and represent the state of the art.

      We thank the reviewer for the positive comments regarding the techniques used in the study, and the clarity of the text and figures. We have adjusted the title as requested.

      Finally, the reviewer would like to raise a minor point: the term postsynaptic density is derived from electron microscopical studies of synapses, where asymmetrical synapses display a „postsynaptic density" but symmetrical synapses do not. The latter were identified as inhibitory synapses and therefore, by definition, inhibitory synapses do not have a postsynaptic density, but rather a postsynaptic membrane specialisation. The use of the term „postsynaptic density" should, therefore, be restricted to excitatory synapses.

      We are conscious of the importance of correct definitions and have revised the terminology, referring to “postsynaptic sites”, “postsynaptic domains”, and “postsynaptic specializations” as appropriate throughout the manuscript.

      Significance:

      The authors provide a state of the art advanced light and electron microscopical analysis of glycinergic synapses in the mouse spinal cord. They suggest a robust "stereotypical" mechanism in place, which guarantees a fixed stoichiometry of relevant components, which is even maintained in a hypomorphic mutant, which is believed to represent a mouse model of human hyperekplexia (startle disease).

      Referee cross-commenting:

      I would like to corroborate the arguments of the previous reviewer: it is not clear to which extent the fusion proteins influence the measurements, which are technically very advanced and well done, however. The authors do definitely not investigate "native spinal cord" as stated in the title.

      The argument concerning fusion proteins must be taken especially serious as the fusions were induced in regions known to be responsible for assembly of glycine receptors and oligomerization of gephyrin.

      We have verified the receptor function with electrophysiological recordings and clarified exactly where the fluorescent protein was inserted (see reviewer 1 response). Given the similarity in synapse size, fluorescence intensities and molecule densities observed in neurons expressing different combinations of tagged and native receptors and scaffold proteins, we strongly believe that all animal models used are well suited to the experimental aims of our study.

      Reviewer 3:

      Summary:

      Glycinergic synapses are the least well understood of synapses that mediate fast synaptic transmission. The manuscript by Maynard et al. adds new information about the structural aspects of these synapses, using PALM and EM imaging of spinal cord synapses from mice at 2 and 10 months. The authors created a knock-in mouse that expresses a tagged GlyRbeta subunit, allowing synaptic localization of glycine receptors; all synaptically localized glycine receptors are thought to require the beta subunit to be tethered by gephyrin. The authors compare synaptic profiles from: 2 month old vs. 10 month old mice; dorsal vs. ventral horn; and GlyR1-reduced vs. wild type mice. Strikingly, they find a tight relationship across all of these variables between glycine receptor puncta and gephyrin puncta, as well as an apparently constant "packing density" of glycine receptors. They conclude that synaptic extent is likely to be the most important determinant of synaptic strength, as the density of receptors within the postsynaptic density is constant. These results use cutting-edge imaging and are analyzed with care, and add new information to our understanding of these relatively less well characterized synapses._

      Major comments:

      The key conclusions are convincing and the claims appear solid. Additional experiments are not needed to support these claims. The data and the methods are largely presented in such a way that they can be reproduced, although there are minor suggestions for improvement below.

      We thank the reviewer for his/her positive comments.

      Minor comments:

      Do the authors have any comment on the requirement during, e.g. LTP, for insertion of a gephyrin-GlyR unit? The lead author has speculated that gephyrin creates "slots" for GlyRs; yet apparently each slot is already filled in the snapshots taken here. How might postsynaptic LTP occur (Kandler group, Kauer group papers)?

      Given the reciprocity of GlyR and gephyrin clustering at synapses, the occupancy of binding sites (and in turn the number of available ‘slots’) is dependent on the strength of receptor-scaffold interactions, as discussed previously (Specht 2020, Neuropharmacol). In this study we demonstrate that the density of GlyRs at synapses is constant, which implies that the receptor occupancy is also the same, with the possible exception of mixed inhibitory synapses in the superficial dorsal horn that contain a majority of GABAARs. The PALM/SRRF data are represented as rendered image reconstructions and not as pointillist representations, and the detection of unoccupied binding sites is below the spatial resolution of our approach. However, the high spatial correlation of the signal intensities (ICQ ≈ 0.3) suggests that receptor occupancy is equal between and within synapses. It has previously been established that there are more scaffold proteins than receptors at synapses (Specht et al. 2013, Neuron; Patrizio et al. 2017, Sci Rep). Based on these studies we report that approximately half the gephyrin binding sites are occupied by receptors (lines 262-655). We have also expanded the discussion, describing how shape and size of synapses may affect synaptic transmission, as well as the possible role of receptor-gephyrin interactions in synaptic plasticity at glycinergic synapses.

      It would be very interesting in the discussion to contrast the present observations with what is known about excitatory synapses (NMDA and AMPAR distributions) and GABAergic synapses. Are the authors at all surprised that receptor packing is constant across conditions? Can the authors speculate on how non-gephyrin binding receptors (homomeric alpha receptors, which are found in recordings) may function and be tethered to the membrane.

      We have included additional information about receptor numbers and distributions at excitatory (lines 428-438) and GABAergic (lines 389-393) synapses in the discussion. So far, homomeric GlyRs composed of alpha subunits have been found to be exclusively extrasynaptic. As stated on page 4, lines 111-112 the beta subunit is required for binding of the GlyR to gephyrin and subsequent anchoring at the synapse. Previous studies have shown exocytosis of receptors to occur at extrasynaptic sites followed by lateral diffusion to synapses. Homomeric GlyRs are therefore most likely targeted to the extrasynaptic plasma membrane where they remain due to the lack of the beta subunit.

      Figure S1. It would be most helpful to quantify this; at the least to include an atlas-like drawing to allow identification of the structures illustrated and containing Glrb; better yet would be quantification of staining in regions where this is strongest.

      We have added an atlas indicating the different brain regions expressing mEos4b-GlyRb protein as a new Supplementary Fig. S3. The regional expression pattern agrees with the available literature about protein expression of the GlyRb subunit in different brain regions and hence provides further evidence that mEos4b-GlyRb is expressed like the native receptor. Due to the relatively low resolution of the tiled image no accurate quantification was possible. We have however added higher magnification confocal images of representative brain regions expressing varying amounts of GlyRb.

      The fact that the lower panel in B is labeled as +/+ across all groups is initially confusing; perhaps relabel as mEos4 -/-, +/- and +/+?

      We assume that the reviewer is referring to Fig1B. The genotype of both the GlrbEos and the GphnmRFP allele is now indicated on the x-axes, and the legend has been modified to clarify that all these animals were homozygous for GphnmRFP/mRFP. We have strived to remain consistent throughout the manuscript when referring to genotypes and protein levels.

      Do gephyrin levels drop in WT mice as well as in the mEosr-GlyRb mouse between 2 and 10 months? Do the authors have any thoughts on this (Supp figure S2)?

      We found no differences in gephyrin levels between 2 and 10 months. Fig. S2 (now Fig. S4C) shows the number of synaptic gephyrin clusters, which was the same at different ages and genotypes.

      Significance:

      Glycinergic synapses are the least well understood of synapses that mediate fast synaptic transmission. The manuscript by Maynard et al. adds new information about the structural aspects of these synapses, using PALM and EM imaging of spinal cord synapses from mice at 2 and 10 months. The authors created a knock-in mouse that expresses a tagged GlyRbeta subunit, allowing synaptic localization of glycine receptors.

      This will be of interest to those studying inhibitory synapses, and more broadly to synaptic morphologists, physiologists and imagers for comparison with other synapse types.

      My own expertise is NOT in these techniques, but I am a synaptic physiologist with a standing interest in glycinergic synapses; thus I am not providing serious technical critiques.

      Referee cross-commenting:

      Hi all, I agree with the other two reviewers, and do not have anything else to add.

      Reviewer 4:

      Summary:

      The authors used a correlative approach and combined photo-activated localization microscopy with electron microscopy to characterise Glycinergic synapses in spinal cord tissue. Some of the major findings are:

      • The receptor-scaffold occupancy and packing densities of glycinergic synapses in different regions of the spinal cord are the same.
      • Gephyrin clusters in the spinal cord are composed of sub-domains that shape the GlyR clusters.
      • Ventral horn synapses are generally larger, more complex (containing a number of gaps) and contain more GlyRs. -In a mouse model of Hyperekplexia, the number of GlyRs is reduced resulting in smaller synapses in the ventral spinal cord.

      Major comments:

      Are the key conclusions convincing? Yes

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. No

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. N/A

      Are the data and the methods presented in such a way that they can be reproduced? Yes

      Are the experiments adequately replicated and statistical analysis adequate? Yes

      Minor comments:

      Specific experimental issues that are easily addressable. Please see below

      Are prior studies referenced appropriately? Yes

      Are the text and figures clear and accurate? Yes

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions? Please see below.

      As the authors pointed out, fusing mEos to the extrasynaptic terminal of GlyRb has been difficult and therefore this construct would benefit the larger scientific community. Fig 1C is a nice imaging control for expression efficiency, however, it is in stark contrast with the lack of functional control. Do authors have any electrophysiological evidence showing that the insertion of mEos4b doesn't modulate channel function? I would assume that the construct would be tested in cell lines before the KI mouse line was created. Was any functional analysis done? If yes, it would be very useful to show it. I do appreciate that the authors used a standard insertion between the 4th and 5th AA in the extracellular domain, which in most cases does not abolish channel function. Given the lack of an obvious phenotype in the KI mouse model, I believe that this is also the case here. However, I disagree with the statement in lines 120-121: "the presence of the N-terminal fluorophore does not affect receptor expression and function." I believe that if there are no electrophysiological measurements of GlyR function, this statement remains speculative. As the authors pointed out in their previous publication: "receptor function and gephyrin binding are not independent properties. Instead, we think that conformational changes triggered at extracellular or intracellular protein domains have downstream consequences on channel opening as well as receptor clustering." In line with this, my concern is that the modulation of channel function by mEos4b could result in an altered cluster size at synapses. There is a large body of literature showing that just one missense mutation in the extracellular domain of ion channel subunits can lead to synaptopathies because the channel function gets modulated, and there is an abundance of similar examples involving mutations of GlyR and GABAAR subunits. In my view, comparing the function of GlyRs incorporating wt-GlyRb and mEos4b-GlyRb subunits is important for the correct interpretation of the main findings of this work and would strengthen the publications.

      As the reviewer points out, the insertion of the mEos4b sequence was considered carefully in order to have the least impact on receptor function. GlyR channelopathies are often caused by point mutations within the coding sequence, which is not the case in the GlrbEos allele. Instead, the mEos4b sequence was inserted after the single peptide of GlyRb, duplicating several amino acid residues in order to maintain the correct cleavage site and N-terminus of the mature receptor, and to not interrupt the GlyRb coding sequence (Fig. S1B). In order to verify that the mEos4b-tag does not affect GlyR function, we have now carried out electrophysiological experiments (new Fig. 2C). For a detailed description please see the response to the first comment of reviewer 1.

      Line 189: Are the authors making conclusions based on intensity comparison of red mEos4b and mRFP? The title of this section implies that the red form of mEos was compared to mRFP(?) But mEos converts from green to red only partially. Was the probability for conversion taken into account at this point? Please clarify which version of mEos was compared to mRFP._

      In line 189 (now 218) we compared the intensities of mRFP-gephyrin with those of converted (red) mEos4b in SRRF / PALM super-resolution images of the synapses (Fig. 2D). Since the absolute intensities are altered by the process of image reconstruction, the probability that mEos4b is photoconverted does not have to be taken into account. The constant ratio of the SRRF and PALM image intensities confirms the data in Fig. 1D showing that GlyR and gephyrin amounts are highly correlated throughout the spinal cord (with the exception of the superficial layers of the dorsal horn). We have clarified in the text that this analysis was carried out on reconstructed SRRF images of mRFP-gephyrin and PALM images of mEos4, line 202.

      Line 192: Please clarify how the density threshold was calculated/determined? This is important for the replication of the experiments, and it also has implications for the calculated probability of detection of mEos4b. I am not aware that this probability was calculated before for mEos4b and therefore other researchers may decide to rely on the value calculated here.

      We have now clarified in more detail how the probability of detection was calculated (new Supplementary Fig. S7 legend).

      In Fig. 2 Gephyrin clusters look consistently smaller than GlyR clusters, which is inconsistent with the published work. I assume that the difference in size is a consequence of different image reconstruction methods(?) However, I would assume that SRRF would have lower resolution than your PALM measurements and that would result in wider Gephyrin clusters. Could you please explain this discrepancy? Also, could you provide an estimate for the image resolution in SRRF and PALM techniques? For SMLM, localization precision would suffice.

      We have provided an estimate of the resolution of the two techniques using Fourier ring correlation, which gave 46 nm for SRRF and 21 nm for PALM. Additionally we have precised the discrepancy between reconstruction methods, page 6, lines 194-200 “The spatial resolution was estimated using Fourier ring correlation (FRC), which measures the similarity of two images as a function of spatial frequency by comparing the odd and even frames of the raw image sequence. According to this analysis, the spatial resolution of SRRF was 46 nm and that of PALM 21 nm. It should be noted that the synaptic puncta in the SRRF images appear somewhat smaller and brighter due to differences in the reconstruction methods that result in differences in the dynamic intensity range.”

      Why is the data in Fig. 5D and E represented as Detections/Synapse instead of GlyRs/Synapse? Could you please re-plot this so that a comparison with Fig. 2H and I is straightforward?

      We have converted the detections to receptor copy numbers as requested (Fig. 5D,E).

      Figure S5C: for P=0.5, 2=0.25. Please correct. Also, I assume that the second graph is what would be observed experimentally for dimers and P=0.5. Please clarify in the figure caption.

      This was a mistake and has been corrected. We have also clarified which parts of the calculations are theoretical and which values were derived from our experimental data. We have provided a more detailed description in the figure legend of Supplementary Fig. S7.

      Line 606: Please provide a complete derivation of this formula.

      We have provided a full derivation of this formula (new Fig. S7C).

      Significance:

      The work described here seem to be a natural progression of a publication by Patrizio et al., 2017 that came out from the same laboratory. This study uses advanced methodologies in the imaging space to visualise and characterise Glycinergic synapses in spinal cord tissue. The experiments described here are technically demanding as evidenced by the relatively small number of publications describing super-resolution measurements in tissue samples. Even more rare are studies that attempt to do single protein counting in neuronal culture and tissue sections. Therefore, I believe that this work brings significant technical advancement in the field of super-resolution and corelative microscopy. The findings are also highly significant for all fields of neuroscience in which the structure of inhibitory Glycinergic synapse is relevant, ranging from the fundamental understanding of inhibitory synapse function to pathologies involving Glycinergic signalling._

      I have substantial experience in different microscopy methods, including quantitative super-resolution microscopy based on single molecule counting. My background also covers the structure and function of GABAA and Glycine receptors using electrophysiology. I am familiar with the methods used in electron microscopy and the process of creating KI mouse lines, however I don't have hands-on experience in these fields._

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Summary:

      The authors used a correlative approach and combined photo-activated localization microscopy with electron microscopy to characterise Glycinergic synapses in spinal cord tissue. Some of the major findings are:

      • The receptor-scaffold occupancy and packing densities of glycinergic synapses in different regions of the spinal cord are the same.
      • Gephyrin clusters in the spinal cord are composed of sub-domains that shape the GlyR clusters.
      • Ventral horn synapses are generally larger, more complex (containing a number of gaps) and contain more GlyRs.<br> -In a mouse model of Hyperekplexia, the number of GlyRs is reduced resulting in smaller synapses in the ventral spinal cord.

      Major comments:

      • Are the key conclusions convincing? Yes
      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No
      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. No
      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. N/A
      • Are the data and the methods presented in such a way that they can be reproduced? Yes
      • Are the experiments adequately replicated and statistical analysis adequate? Yes

      Minor comments:

      • Specific experimental issues that are easily addressable. Please see below
      • Are prior studies referenced appropriately? Yes
      • Are the text and figures clear and accurate? Yes
      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? Please see below.
      1. As the authors pointed out, fusing mEos to the extrasynaptic terminal of GlyRb has been difficult and therefore this construct would benefit the larger scientific community.<br> Fig 1C is a nice imaging control for expression efficiency, however, it is in stark contrast with the lack of functional control. Do authors have any electrophysiological evidence showing that the insertion of mEos4b doesn't modulate channel function? I would assume that the construct would be tested in cell lines before the KI mouse line was created. Was any functional analysis done? If yes, it would be very useful to show it. I do appreciate that the authors used a standard insertion between the 4th and 5th AA in the extracellular domain, which in most cases does not abolish channel function. Given the lack of an obvious phenotype in the KI mouse model, I believe that this is also the case here. However, I disagree with the statement in lines 120-121: "the presence of the N-terminal fluorophore does not affect receptor expression and function." I believe that if there are no electrophysiological measurements of GlyR function, this statement remains speculative. As the authors pointed out in their previous publication: "receptor function and gephyrin binding are not independent properties. Instead, we think that conformational changes triggered at extracellular or intracellular protein domains have downstream consequences on channel opening as well as receptor clustering." In line with this, my concern is that the modulation of channel function by mEos4b could result in an altered cluster size at synapses. There is a large body of literature showing that just one missense mutation in the extracellular domain of ion channel subunits can lead to synaptopathies because the channel function gets modulated, and there is an abundance of similar examples involving mutations of GlyR and GABAAR subunits. In my view, comparing the function of GlyRs incorporating wt-GlyRb and mEos4b-GlyRb subunits is important for the correct interpretation of the main findings of this work and would strengthen the publications.
      2. Line 189: Are the authors making conclusions based on intensity comparison of red mEos4b and mRFP?<br> The title of this section implies that the red form of mEos was compared to mRFP(?) But mEos converts from green to red only partially. Was the probability for conversion taken into account at this point? Please clarify which version of mEos was compared to mRFP.
      3. Line 192: Please clarify how the density threshold was calculated/determined? This is important for the replication of the experiments, and it also has implications for the calculated probability of detection of mEos4b. I am not aware that this probability was calculated before for mEos4b and therefore other researchers may decide to rely on the value calculated here.
      4. In Fig. 2 Gephyrin clusters look consistently smaller than GlyR clusters, which is inconsistent with the published work. I assume that the difference in size is a consequence of different image reconstruction methods(?) However, I would assume that SRRF would have lower resolution than your PALM measurements and that would result in wider Gephyrin clusters. Could you please explain this discrepancy? Also, could you provide an estimate for the image resolution in SRRF and PALM techniques? For SMLM, localization precision would suffice.
      5. Why is the data in Fig. 5D and E represented as Detections/Synapse instead of GlyRs/Synapse? Could you please re-plot this so that a comparison with Fig. 2H and I is straightforward?
      6. Figure S5C: for P=0.5, 2=0.25. Please correct. Also, I assume that the second graph is what would be observed experimentally for dimers and P=0.5. Please clarify in the figure caption.
      7. Line 606: Please provide a complete derivation of this formula.

      Significance

      The work described here seem to be a natural progression of a publication by Patrizio et al., 2017 that came out from the same laboratory. This study uses advanced methodologies in the imaging space to visualise and characterise Glycinergic synapses in spinal cord tissue. The experiments described here are technically demanding as evidenced by the relatively small number of publications describing super-resolution measurements in tissue samples. Even more rare are studies that attempt to do single protein counting in neuronal culture and tissue sections. Therefore, I believe that this work brings significant technical advancement in the field of super-resolution and corelative microscopy. The findings are also highly significant for all fields of neuroscience in which the structure of inhibitory Glycinergic synapse is relevant, ranging from the fundamental understanding of inhibitory synapse function to pathologies involving Glycinergic signalling.

      I have substantial experience in different microscopy methods, including quantitative super-resolution microscopy based on single molecule counting. My background also covers the structure and function of GABAA and Glycine receptors using electrophysiology. I am familiar with the methods used in electron microscopy and the process of creating KI mouse lines, however I don't have hands-on experience in these fields.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      Glycinergic synapses are the least well understood of synapses that mediate fast synaptic transmission. The manuscript by Maynard et al. adds new information about the structural aspects of these synapses, using PALM and EM imaging of spinal cord synapses from mice at 2 and 10 months. The authors created a knock-in mouse that expresses a tagged GlyRbeta subunit, allowing synaptic localization of glycine receptors; all synaptically localized glycine receptors are thought to require the beta subunit to be tethered by gephyrin. The authors compare synaptic profiles from: 2 month old vs. 10 month old mice; dorsal vs. ventral horn; and GlyR1-reduced vs. wild type mice. Strikingly, they find a tight relationship across all of these variables between glycine receptor puncta and gephyrin puncta, as well as an apparently constant "packing density" of glycine receptors. They conclude that synaptic extent is likely to be the most important determinant of synaptic strength, as the density of receptors within the postsynaptic density is constant. These results use cutting-edge imaging and are analyzed with care, and add new information to our understanding of these relatively less well characterized synapses.

      Major comments:

      The key conclusions are convincing and the claims appear solid. Additional experiments are not needed to support these claims. The data and the methods are largely presented in such a way that they can be reproduced, although there are minor suggestions for improvement below.

      Minor comments:

      Do the authors have any comment on the requirement during, e.g. LTP, for insertion of a gephyrin-GlyR unit? The lead author has speculated that gephyrin creates "slots" for GlyRs; yet apparently each slot is already filled in the snapshots taken here. How might postsynaptic LTP occur (Kandler group, Kauer group papers)?

      It would be very interesting in the discussion to contrast the present observations with what is known about excitatory synapses (NMDA and AMPAR distributions) and GABAergic synapses. Are the authors at all surprised that receptor packing is constant across conditions? Can the authors speculate on how non-gephyrin binding receptors (homomeric alpha receptors, which are found in recordings) may function and be tethered to the membrane.

      Figure S1. It would be most helpful to quantify this; at the least to include an atlas-like drawing to allow identification of the structures illustrated and containing Glrb; better yet would be quantification of staining in regions where this is strongest.

      The fact that the lower panel in B is labeled as +/+ across all groups is initially confusing; perhaps relabel as mEos4 -/-, +/- and +/+?

      Do gephyrin levels drop in WT mice as well as in the mEosr-GlyRb mouse between 2 and 10 months? Do the authors have any thoughts on this (Supp figure S2)?

      Significance

      Glycinergic synapses are the least well understood of synapses that mediate fast synaptic transmission. The manuscript by Maynard et al. adds new information about the structural aspects of these synapses, using PALM and EM imaging of spinal cord synapses from mice at 2 and 10 months. The authors created a knock-in mouse that expresses a tagged GlyRbeta subunit, allowing synaptic localization of glycine receptors.

      This will be of interest to those studying inhibitory synapses, and more broadly to synaptic morphologists, physiologists and imagers for comparison with other synapse types.

      My own expertise is NOT in these techniques, but I am a synaptic physiologist with a standing interest in glycinergic synapses; thus I am not providing serious technical critiques.

      Referee Cross-commenting

      Hi all, I agree with the other two reviewers, and do not have anything else to add.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Identification of a stereotypic molecular arrangement of glycine receptors at native spinal cord synapses

      Maynard et al. investigate (inhibitory) glycinergic synapses in mouse spinal cord, which regulate motor and sensory processes. The authors analyse the molecular architecture and ultra-structure of these synapses in native spinal cord tissue using quantitative super-resolution correlative light and electron microscopy. The major finding ist that GlyRs exhibit equal receptor-scaffold occupancy and constant absolute packing densities across the spinal cord and throughout adulthood, although ventral and dorsal inhibitory synapses differ in size. Moreover, what the authors call a „stereotypic arrangement" is even maintained in a hypomorphic mutant (oscillator), which is deficient in the adult GlyR a1 subunit.

      To reach their conclusions the authors generate two knock-in mouse lines, one with mEOS-labelled GlyR ß-subunit and one with mRFP-labelled gephyrin, a subsynaptic scaffolding protein of inhibitory synapses, which are subsequently crossed. Both changes are not unproblematic, as mutations in the N-terminal end of the GlyR ß subunit polypeptide chain might interfere with the assembly of functional GlyR (consisting of a und ß subunits) and and mutations at the N-terminal end of gephyrin interfere with it's homo-oligomerization into higher molecular assemblies.

      However, in this experimental design both labelled proteins reach postsynaptic membrane specialisations. In case of the ß-subunit quantitative evaluation confirms that heterozygous animals contain only half of the labelled protein as homozygous, which is an indication but not a proof that the correct stoichometry of adult GlyR is maintained. Likewise, mRFP-labelled gephyrin assembles with WT-gephyrin in subsynaptic domains, but it is not clear, if the size and density of the synapses is changed by the knock-in procedure as compared to WT-synapses.

      Accepting these constraints, which to the knowledge of this reviewer have never been addressed to satisfaction, the authors provide a technically excellent, comprehensive analysis of glycinergic synapses in the spinal cord of double knock-in mice. Therefore, it should be stated in the title, that the investigations were performed with double knock-in instead of „native" spinal cord. Text and figures are clear and accurate and represent the state of the art.

      Finally, the reviewer would like to raise a minor point: the term postsynaptic density is derived from electron microscopical studies of synapses, where asymmetrical synapses display a „postsynaptic density" but symmetrical synapses do not. The latter were identified as inhibitory synapses and therefore, by definition, inhibitory synapses do not have a postsynaptic density, but rather a postsynaptic membrane specialisation. The use of the term „postsynaptic density" should, therefore, be restricted to excitatory synapses.

      Significance

      The authors provide a state of the art advanced light and electron microscopical analysis of glycinergic synapses in the mouse spinal cord. They suggest a robust "stereotypical" mechanism in place, which guarantees a fixed stoichiometry of relevant components, which is even maintained in a hypomorphic mutant, which is believed to represent a mouse model of human hyperekplexia (startle disease).

      Referee Cross-commenting

      I would like to corroborate the arguments of the previous reviewer: it is not clear to which extent the fusion proteins influence the measurements, which are technically very advanced and well done, however. The authors do definitely not investigate "native spinal cord" as stated in the title.

      The argument concerning fusion proteins must be taken especially serious as the fusions were induced in regions known to be responsible for assembly of glycine receptors and oligomerization of gephyrin.

    5. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript Maynard et al describe a newly generated knockin mouse to study the endogenous distribution of Gly receptors in the spinal cord. Using quantitative confocal imaging and SMLM the distribution and levels of GlyRs at spinal cord synapses is compared between dorsal and ventral horn. They found that levels of synaptic GlyR are higher in dorsal than ventral spinal cord synapses. Nevertheless, the ratio to gephyrin seems constant, except for synapses in superficial layers of the dorsal horn, where gephyrin levels exceeded the levels of GlyRs. There are also fewer, but larger synapses in the ventral horn than in the dorsal horn. These findings are further corroborated by an SR-CLEM approach. Furthermore, it is shown that in a mouse model for hyperekplexia GlyR levels are lower, but still enriched at synapses, and the dorsal-ventral gradient in GlyR expression was maintained. The difference in size of ventral and dorsal synapses observed in WT animals was also lost in the oscillator mouse, suggesting that particularly the ventral synapses are affected. Despite these differences, the density of GlyRs per synapse remained similar.

      Major comments:

      • Line 113: "labeling the -subunit has proven difficult". This statement is unclear and it would be informative for readers to grasp what exactly has been difficult, and why the approach described here overcomes that? Related to that, the authors state "KI animals reach adulthood and display no overt phenotype, suggesting that the presence of the N-terminal fluorophore does not affect receptor expression and function". That is indeed reassuring, but it does not exclude that receptor numbers, function and distribution are altered. As it seems there is no prior literature on tagging the beta subunit, additional evidence that the tag does not interfere with receptor trafficking or functioning would be desirable
      • In the Discussion the authors conclude that "Our quantitative SR-CLEM data lend support to the first model, whereby inhibitory PSDs in the spinal cord are composed of sub-domains that shape the distribution of the GlyRs". This conclusion seems however based on one example image in Fig 3G that is not very convincing. The EM image seems to show two clearly separated PSDs opposed by two distinct active zones. So, although this conclusion is of high interest, more support should be given to substantiate this conclusion. More general, these subsynaptic domains (SSDs) are hardly further explored, but seem relevant for transmission, particularly given that the synaptic pool of GlyRs at these synapses is not saturated by single release events. How general are these SSDs at these synapses?
      • The approach for counting molecules based on the PALM acquisition has been developed in prior publications and seems robust. It would however be worth to present the reader with a bit more background and explain the assumptions of this approach in more detail. Particularly, since counting of mEos4b can be problematic, as there are multiple dark and fluorescent states of this fluorophore that could be influenced by the illumination scheme, see for instance De Zitter et al., Nat Methods 2019. Since the preceding SRRF acquisition already exposes the fluorophore to high and continuous 561-nm laser power this could skew the counting due to unaccounted conversion and perhaps bleaching of mEos4b. In line with this, although throughout the manuscript the term 'absolute copy numbers' is used the reported numbers are at best an estimate based on a number of assumptions. I think the wording 'absolute numbers' is therefore deceiving and should be nuanced.
      • Related, most of the quantifications are in estimating the number of receptors, and not so much the distribution with the PSD. The term "molecular arrangement" - also used in the title - might therefore be misleading, there is in fact little characterization of how GlyRs are placed within the PSD. More focused analysis quantifying the distribution of receptors within the PSD and/or SSDs would strengthen the manuscript.
      • The reported N is confusing and makes it hard to judge the reproducibility of the data. Sometimes it refers to number of images, sometimes number of synapses, but it is unclear from how many experiments these are drawn. This should be reported more completely (number of animals should be reported at least) and consistently. In figure 1, the N numbers (N=3-5 images) are particularly low and question how consistent these findings are across multiple animals.
      • The levels of mRFP-Gephyrin seem to differ between the different mouse lines, is this a significant difference?
      • The ICQ analysis for co-localization is hardly explained. How do we interpret this parameter? What does an average value of ~0.3 mean? A comparison with sets of proteins that do not overlap as a negative control would strengthen the conclusion.

      Minor comments:

      • "Very little fluorescence was detected in the forebrain, despite the high reported expression of the Glrb transcript". Can the authors expand on this? What would explain this discrepancy?
      • What region is quantified in Fig 1B? is the same region in all conditions? This should be specified more clearly as the manuscripts presents a clear gradient in expression levels in the spinal cord and thus the location will influence the intensity measurements.
      • The labeling approach does not differentiate between surface and internal receptors, this should be made more explicit in the text.

      Significance

      The presented data are interesting and the experiments are technically advanced and carefully performed. Particularly the SR-CLEM approach is technically advanced. The datasets present a quantitatively detailed characterization of spinal cord synapses and will be of interest for researchers working in the field of spinal cord circuitry, as well as super-resolution imaging. The conceptual advance for the field is however somewhat limited. It seems that the presented data confirm the general notion that receptor numbers and synapse size are highly correlated. So, although this manuscript describes very interesting observations, in its present form the manuscript does not provide any new mechanistic insight or significant advance in our understanding of how these synapses operate.

      Referee Cross-commenting

      I agree with the other reviewers that this study is technically advanced, but I remain critical towards the extent of conceptual advancement this study brings and there are some important concerns with the presented data that need to be addressed. Nevertheless, indeed many of these concerns can be addressed without additional experiments. As pointed out also by other reviewers additional validation that the fusion proteins are not disrupting their function or organization would be important.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank the reviewers for their critical review of our manuscript. We are excited to see that the reviewers agree that we have presented high-quality data that advances the centrosome field and is worthy of publication following revision. The authors also agree with the reviewers that the data presentation requires improvement, that some experiments require additional replicates with robust statistical analyses and that a model or summary would help clarify the differences between previously published results and ours. We will address all these concerns in the revised version of our manuscript. The reviewer comments in their entirety can be found below in italic followed by our response in bold.

      Considering that the manuscript was very well received we believe it makes a strong candidate for publication in eLife. In terms of editors at eLife, we believe that Anna Akhmanova and Jeremy Reiter would be very well suited to handle this manuscript.

      We hope that you will concur with us that the revision plan detailed below adequately addresses the reviewers’ comments.

      2. Description of the planned revisions

      Reviewer 1, Major points

        • Previous data suggested that an important role of TRIM37 was to limit accumulation of CEP192 levels, yet here CEP192 levels appeared unchanged in TRIM37 knockout cells that stably express wild-type or RING domain mutant TRIM37. However, in agreement with previous work, transient expression of TRIM37 reduced CEP192 levels along with those of other PCM and centriole components in an E3-dependent manner. These data are rather confusing in light of the literature, and the current report does not really deal with these discrepancies but to me they suggest that high levels of TRIM37 can target multiple centrosome components for degradation, but this may be an experimental artefact.* We agree that acutely overexpressed TRIM37 results in decreased CEP192 levels and is consistent with published results. We also provide evidence that CEP192 levels are not correspondingly increased in the absence of TRIM37, nor are they decreased in a cell line that stably overexpresses FLAG-BirA TRIM37. This suggests that the decreased CEP192 (and PCNT and CEP120) after acute overexpression of TRIM37 might be short-lived or a consequence of overexpression. We will discuss this possibility more clearly in the revised mansucript. In addition, we will perform Western blots for TRIM37 in wild type cells, cells stably expressing FLAG-BirA TRIM37 and cells induced to express TRIM37-3xFLAG to more directly compare the amount of TRIM37 present in these cell lines.
      • The choice of cells for particular experiments is not always stated or explained. For instance, in Figure 3A: Trim37 KO pool used while in Figure 3B TRIM37 single KO. These are then combined with both transient and stable expression of TRIM37 mutants.*

      We apologize for this and will clarify the choice of cell lines in the results section. Importantly, because some of our results challenge previously published reports, we performed critical experiments using multiple cell lines. For example, we show that centrinone B-induced growth arrest is independent of TRIM37 E3 ligase activity using a single RPE-1 TRIM37-/- clone, an RPE-1 TRIM37-/- pool and an A375 TRIM37-/- pool. We feel this is a highlight of our work and this new data will be included in the revised version of the manuscript and will be emphasized.

      • Two different concentrations (200 nM and 500 nM) of centrinone were used to compare responses of too many or no centrosomes in RPE1 and A375 . While these concentrations result in centrosome amplification (200 nM) and loss (500 nM) in RPE1 cells, the phenotypes seem much less clear-cut in A375 cells. At 200nM 70% of cells have 0 or 1 centrioles (~35% each category) and only about 15% have centrosome amplification, whereas centrosome amplification occurs in 30% of RPE1 with 0-1 centrioles seen in fewer than 10% (Figure 4 - figure supplement 1H). Hence the different outcomes of centrinone treatment makes conclusions about cell-type specific responses difficult. This difference may be due to differences in drug uptake/efflux, PLK4 activity or in expression of other components of these pathways. In fact, 167nM centrinone B in A375 cells would have been a much closer match to the 200nM treatment of RPE-1. These points should be discussed as they impact the conclusions.*

      The reviewer rightly points out that the response to centrinone appears to differ between cell types, as shown previously (Meitinger et al., 2020 and Yeow et al., 2020), and that this difference may impact our conclusions. Although we don’t think that the major conclusions drawn will change, we will discuss these caveats within the results and discussion of the manuscript.

      • I find the different outcomes of stable versus acute expression of TRIM37 ligase mutant confusing. Here, stable expression of TRIM37 ligase mutant increases mitotic length compared to that of TRIM37 wild-type, which contradicts a recent report by (Meitinger et al. 2021). What could be the potential reason for these differences? *

      It is unclear why we obtain results that differ from Meitinger et al. We are using similar cell lines (RPE-1 hTert vs. RPE-1 hTert Cas9) with similar TRIM37 constructs (TRIM37-3xFLAG) that are induced in similar ways (both are doxycycline inducible but using different systems). For our experiment, we used a single TRIM37 KO clone. As an independent validation, we will repeat this experiment using our TRIM37 KO pools in both RPE-1 and A375 cells and discuss these results and implications.

      What could be the mechanism for TRIM37 action in regulating spindle assembly/mitotic duration and cell proliferation upon centrosome loss? How do those acentrosomal MTOCs form that decrease mitotic duration and promote proliferation?

      These are insightful questions that we feel lie at the heart of TRIM37 function. Current models posit that in the absence of TRIM37, PLK4 condensates form and are required to nucleate ectoptic accumulations of PCM components (ex. CEP192) that facilitate mitosis (Meitinger et al. 2020). A number of our findings are not consistent with this model. First, PLK4 is detected in the Cenpas/condensates only using a single antibody (Wong et al., 2015) (two other antibodies have been reported to be used (Sillibourne et al., 2010, Moyer et al., 2015) and we have used another (Millipore MABC544 clone 6H5) - none of these three detect PLK4 at the condensates). Additionally, the PLK4 signal observed is not sensitive to PLK4 siRNA (Balestraet al. 2021, Figure 4 – figure supplement 1I). In our manuscript we also provide evidence that overexpressed PLK4-3xFLAG cannot be detected (using PLK4 or FLAG antibodies) at these strucures. Moreover, our experiments using TRIM37 mutants show that Cenpas formation and ectopic PCM assembly are mechanistically distinct; Cenpas are not resolved after expression of TRIM37 C18R, yet ectopic PCM structures are suppressed (Figure 5E and G). Our data do, however, suggest that the ability to form ectopic PCM structures is inversely correlated to growth arrest activity (i.e. cells that form ectopic PCM fail to arrest). How these structures form and how they affect growth arrest are still critical, open questions. We will discuss these possibilities further in the revised manuscript.

      Do the authors find a difference in the % of cells expressing TRIM37 mutants upon stable or acute expression? This part needs a better summary, and again a table would help. I also wonder about protein expression levels; wild-type FB-TRIM37 seems to be expressed at much lower levels than the mutants in Figure 5B.

      The differences in overall abundance are not due to heterogenous expression within the population. The TRIM37 mutants are expressed in all cells after stable and acute expression. We will provide quantification of immunofluorescence images and statistics to show this. TRIM37 mediates its own degradation in an E3-dependent manner (Meitinger et al. 2021, Figure 3f). Our results are consistent with this as the TRIM37 C18R and TRIM37 __DRING mutants have a higher overall abundance compared to TRIM37 or TRIM37 D__505-709. These experiments are ongoing and we will discuss this further in the revised manuscript and provide a summary table.

      • Other means of centrosome depletion (Cenpj, SAS6 etc) would have been useful to include in the manuscript in support of E3 ligase dependent and independent roles of TRIM37. It is not essential to perform these experiment but if data are available, including these would improve the paper. *

      We will generate new data using a double TRIM37 KO, SASS6 KO line to address TRIM37 ligase-dependent and -independent functions.

      • The authors show that TRIM37 regulates PLK4 phosphorylation and that this modification could only be observed in HEK293T and not in RPE1. Why would there be a difference between HEK293 and RPE1?*

      We will address this by surveying a panel of cell lines to determine if there any cell type dependent differences in TRIM37 modification. Any potential differences will be addressed in the discussion.

      • Statistical analysis for graphs should be included. Figure 5 is ok but graphs in Figures 3, 4, 6, 7 would benefit.*

      This point is well taken. In the revised manuscript, we will ensure that all experiments are performed in biological triplicate and that proper statistical analyses are included to support our conclusions.

      • The authors characterise TRIM37 localisation. They detect it at centrosomes (as shown by Yeow et al 2021) and more specifically at the PCM, but apparently the signal is not present in all cells. They should also provide a quantification of the % of cells with centrosomal TRIM37 signal and compare this to cells expressing Flag-tagged Trim37. The specificity of the antibody signal using TRIM37-/- should be confirmed. *

      We will perform immunofluorescence experiments using wild type and TRIM37-/- cells to demonstrate the specificity of the antibody signal. We will also provide a more detailed analysis regarding TRIM37 localization noting 1) the number of cells with centrosomal TRIM37 2) cell cycle correlation with centrosomal TRIM37 and 3) a comparison with FLAG-BirA tagged TRIM37.

      Reviewer 1, Minor points

      1.Page 3: "A recent screen for mediators of supernumerary centrosome-induced arrest identified PIDDosome/p53 and placed the distal appendage protein ANKRD26 within this pathway [31]". It appears that the reference for Burigotto et al. is missing.

      This reference will be inserted.

      2.Page 6: The authors state that: TP53BP1, USP28 and CDKN1A are also suppressors in the Nutlin-3a screen and suggest that they act in a general p53 pathway. However Meitinger et al (2016) showed that depletion of TP53BP1 or USP28 did not affect the upregulation of p53 and p21 upon Mdm2 inhibition.

      Our data is consistent with previous reports that TP53BP1 and USP28 are required for cell arrest after Nutlin-3a treatment (Cuella-Martin R et al. 2016). We will discuss possible explanations for the results observed by Meitinger et al.

      3.Page 9: "First, we performed live cell imaging to measure mitotic length in cells grown in centrinone". For consistency the authors should say centrinone B here as wellI

      We will change the text to indicate using centrinone B.

      4.Page 9: "Cells lacking TRIM37 suppressed the growth arrest from 150 to 500 nM centrinone B in RPE-1 and 167 to 500 nM in A375 cells". The growth data for the A375 cells seem to be missing from the figures.

      We refer to Figure 4D and Figure 4 – figure supplement 1G that contain the RPE-1 and A375 growth data, respectively. We will modify the text to more clearly refer to the data.

      5.Page 10: "Our results confirmed that PLK4 and TRIM37 form a complex in RPE-1 cells (Figure 3G)" It appears the authors referred to the wrong figure, it should be Figure 4B.

      Our apologies. The correct figure reference will be used.

      6.Figure 1C: The nuclear p53 signal is not apparent with 500 nM centrinone B in the exemplary cells. Did the authors use thresholding to quantify p53/p21 positive cells?

      The p53 staining in centrinone-treated cells is somewhat variable. To quantify the data, we used automated image analysis and set a cut off based on p53 intensity in DMSO-treated cells to indicate p53-positive cells. To improve the figure we will repeat the experiment and use a lower magnification image to show a more representative field of cells stained for p53. The quantification pipeline will be better explained in the methods section.

      7.Figure 4D and Figure 4 - Figure supplement 1G: The graph is misleading and should not be presented as a continuous line.

      We are sorry that the reviewer finds the graph misleading. We will change the way this data is presented to make it easier to understand and to facilitate indicating statistical differences. Instead of a scatter plot of all the data, we will present the data as individual boxplots at each centrinone B concentration with statistical differences indicated. We hope this will address any confusion regarding these data.

      8.Figure 5A and C: A direct and statistical comparison mitotic timing upon expression different Trim37 mutants to wildtype and trim37-/- cells is missing

      In Figure 5A we compare RPE-1 WT to TRIM37-/- at each centrinone B concentration and within each line we compare each centrinone B concentration to DMSO. Perhaps we do not understand the reviewer’s concern here, but we do not think any comparisons are missing from this panel. In Figure 5C, we compare the mitotic lengths between cell lines expressing TRIM37 WT or TRIM37 C18R since we focus on the requirement for the E3 ligase activity of TRIM37. For this experiment we did not include a wild-type control, but we will perform statistical analyses between control cells expressing FLAG-BirA and those expressing FB-TRIM37 WT or FB-TRIM37 C18R. We hope this addresses this concern.

      9.Figure 6B: A loading control/Ponceau staining is missing as well as the quantification of protein levels

      This experiment will be repeated for proper quantification and we will include a loading control for our representative results.

      10.Figure 6D: It is unclear if the centrosomal signal intensity was quantified in interphase or mitotic cells

      The centrosomal signal was quantified in mitotic cells only. This results and figure legend will be updated to more clearly indicate this.

      11.Figure 7C: A loading control/Ponceau staining is missing

      The experiment will be repeated and a sample will be taken prior to immunoprecipitation to indicate the input amounts for each sample.

      12.Figure 2 - figure supplement 2F and G: It would help if the authors could highlight the cell line, e.g. RPE-1 (F) or A375 (G) in the venn diagrams.

      In Figure 2 – figure supplement 2G we highlight the genes found in RPE-1 and A375 screens only in the overlap of the Venn diagram using font colour. We will colour code the hits from each cell line in panels (F) and (G). We thank the reviewer for this suggestion.

      13.Figure 4 - figure supplement 1E: it appears that the BirA antibody gives only an unspecific signal. It would be useful to show if the different TRIM37 variants are able to localise to the centrosomes. Furthermore it appears that centrosomes are missing in the C18R and 505-709 variants. It would be useful if the authors quantify centrosome numbers upon expression of different Trim37 variants as shown in Figure 4 - figure supplement 1. To make the identification of the cell easier it would help to include a DNA signal or indicate the outline of the cell.

      The anti-BirA antibody does give a slightly diffuse signal, although we disagree that it is unspecific considering that the BirA signal is only observed in cells expressing FLAG-BirA alone or BirA fusion proteins.

      We agree with this reviewer that we did not make any statements about the centrosomal localization of the TRIM37 mutants. We will re-analyze our images to quantify relative centrosomal localization of these proteins. The images as displayed in this Figure panel appear to be somewhat confusing to the reviewer. In terms of scale, only a small portion of the cell surrounding the centrosome is shown, therefore a nuclear or cell outline cannot be displayed on these images. In each image a centrosome is present, even in the C18R and 505-709 samples. We will show images of entire cells with insets to highlight the region surrounding the centrosome.

      14.The generation of stable and dox-inducible cell lines is missing in the material and methods

      We apologize for this omission. This information will be added.

      Reviewer 2, Major points

        • The centrosomal localization of endogenous TRIM37 should be validated by comparing control and knockout/knockdown cells.* We will perform these experiments as outlined in response to Reviewer 1, Major point 8.
      • Some of the quantifications are derived from only two experiments and in many cases no statistical testing was done. The authors should test the observed effects and add extra replicates to make the data more robust, where required. *

      We will ensure experiments are performed in biological triplicate and that appropriate statistical analyses are performed (see comment to Reviewer 1, Major point 7)

      • Fig. 5 supplements: panels showing effects on marker proteins in cells by IF lack quantification of the claimed effects. Without providing some type of quantifications for key findings, it is unclear how strong or penetrant the effects are.*

      Quantification and statistical testing will be performed for these experiments.

      Reviewer 2, Minor points

      I would suggest a final, summarizing schematic that illustrates the main findings in a cartoon/flow chart manner.

      We will improve the discussion of our main findings as well as provide a model/table of comparisons to improve the clarity of our manuscript.

        • Please revise incorrect abstract sentence: "We identify TRIM37 as a key mediator of growth arrest when PLK4 activity is partially or fully inhibited but is not required for growth arrest triggered by supernumerary centrosomes." __In our screens, we find that TRIM37 is required for growth arrest after treating cells with 200 and 500 nM centrinone B. Treatment of cells with 200 nM centrinone B causes centriole overduplication and our initial hypothesis was that centriole overduplication alone is inducing growth arrest. To test this in a parallel manner, we also overexpressed PLK4 to induce centriole overduplication. Surprisingly, but consistent with recently published results (Evans et al*., 2020), TRIM37 was not required for growth arrest after PLK4 overexpression. Thus, TRIM37 is required for growth arrest after 200 nM centrinone treatment, but not PLK4 overexpression, yet both of these conditions induce centriole overduplication. This concept will be highlighted, discussed and clarified in the text. We will change the abstract sentence to ‘We identify TRIM37 as a key mediator of growth arrest when PLK4 activity is partially or fully inhibited, but it is not required for growth arrest after PLK4 overexpression’__.

      Please also see similar comment to Reviewer 3, Major point 1.

      • In various figures and supplements showing centrosome and condensates/Cenpas, these are very difficult to distinguish due to their small size. I suggest to magnify regions of interest and/or add arrowheads in different colors marking the specific structures.*

      This comment is similar to Reviewer 1, Minor point 13. We will use coloured arrowheads to indicate different structures. Where possible, we will use magnified regions to improve clarity.

      • Fig. 2A: What is the purpose of the schematics on the right of panel A? The labels in the graph are unreadable and the network diagram without any labels is also not very useful. This could be removed. *

      The schematics on the right indicate a ‘generic analysis’ using the NGS sequencing data. We agree it is not essential and it will be removed.

      • Fig. 2B: The network presentation is not very easy to read. What are the functional groups/pathways here? The clusters should be labeled accordingly. What is the meaning of the different sizes of the circles? Maybe key interactions (e.g. TRIM37) could be indicated in a different color shade to highlight these? *

      In our figure we tried to highlight 1) the connectivity among screening conditions and 2) complexes that were identified by the screens. In our figure, each node (other than the six hub nodes that denote a screen condition) represents a hit from the screens. Thus, the nodes are connected by edges only to the screening conditions, not to each other. In this scenario, highlighting TRIM37 ‘interactions’ would only highlight the screening conditions for which TRIM37 was a hit (200 nM RPE-1, 500 nM RPE-1, 200 nM A375, 500 nM A375). We could try to overlay functional enrichment data on the graph, but this data is presented separately in Figure 2 – figure supplement A-D. The large circles represent hits found in previous screens and is indicated in the legend. Given the challenges of this figure we will modify it to improve its clarity.

      Reviewer 3, Major points

        • The presentation throughout the manuscript sometimes made it difficult to follow exactly what the authors meant when they referred to the various doses of Centrinone used in their experiments-often using the terms "low" or "high" without specifying exactly what they mean. In Figure 1A, for example, they present a growth inhibition curve using a log10 scale of Centrinone concentration, and they conclude that growth was inhibited "at concentrations above 150nM, with full inhibition observed at concentrations greater than 200nM". I presume this is just sloppy language, as it appears that growth is significantly inhibited at 150nM and full growth inhibition is achieved at 200nM. However, in Figure 4D, the authors show another growth inhibition curve (this time presented on a linear scale) where significant growth inhibition is seen well below 100nM and full inhibition appears to be achieved at ~125nM. The discrepancy between these experiments is not noted, nor any reason for it explained. We agree with the reviewers and apologize for using ‘low’ and ‘high’ as they are ambiguous. We will ensure that we refer specifically to each concentration of centrinone B used (ex. 50 nM, 150 nM etc.). The comparison between Figure 1A and Figure 4D is not straightforward. The experiments presented were performed approximately 6 years apart and in slightly different ways. As reviewer 3 indicates, Figure 1A is presented in a log scale; this makes it difficult for the reader to determine the exact concentrations of centrinone B used. For this panel, we used, 0 (DMSO), 10, 30, 75, 165, 200 and 500 nM centrinone B. For Figure 4D, we used 0, 50, 125, 150, 167, 200 and 500 nM. The only point that might be anomalous is 75 nM in Figure 1A. We do see approximately 25% inhibition using 50 nM centrinone B in Figure 4D, but no inhibition using 75 nM in Figure 1A. We can offer two explanations for this discrepancy. First, we noticed small deviations in the potency of centrinone B batches. Second, for Figure 1A, cells were assayed using a passaging assay where they are continuously plated, counted and re-seeded. Cells in Figure 4D were assayed using a clonogenic assay where cells are plated at low density and allowed to grow over the course of approximately two weeks. It is possible that a combination of these factors led to the highlighted discrepancy. We feel that the discrepancy is a minor one and we propose the following as a solution. We will present the growth data in Figure 1A as a scatter / box plot using only 200 and 500 nM centrinone B since these are the drug concentrations we use for the screen conditions and the key conclusions are derived only from these concentrations (i.e. both concentrations result in p53-dependent growth arrest where centrioles are overduplicated after 200 nM centrinone B, while centrioles are lost after treatment with 500 nM). We hope that this explanation and changes satisfy the reviewers.

      While discrepancies such as this may seem trivial, they make it hard to interpret some of the authors conclusions. For example, in their initial screen, the "low" dose of Centrinone (200nM) leads to centriole amplification and genes that block centriole duplication or PIDDosome function (which normally signals the presence of extra centrioles) are required for the growth arrest triggered by this concentration of the drug (Figure 1B). To me, this suggests that centriole amplification is required for this growth arrest at 200nM. However, when the authors test a more graded series of concentrations they conclude "excess centrioles might not be the trigger for this arrest at low Centrinone B concentrations". I assume they are using "low" here to indicate concentrations at or below 150nM (even though they use low to mean 200nM in their initial screen)? In the Discussion, they state that TRIM37 is "required for the growth arrest in response to partially or fully inhibited PLK4, but this activity was independent of the presence of excess centrioles". Again, it is not clear to which experiments they are referring when they talk about "partially" or "fully" inhibited PLK4, but, if this is correct, then why are genes required for centriole duplication and PIDDosome function identified in their initial screen as being required for the growth arrest at 200nM but not 500nM? Do they consider 200nM to be fully inhibiting PLK4? *

      We observed that cells arrested after treatment with either 200 or 500 nM centrinone B. Additionally, we observed centriole over-duplication after 200 nM but centriole loss at 500 nM. Our initial hypothesis was therefore that either centriole overduplication or loss resulted in growth arrest. Our subsequent results with TRIM37 caused us to question this simple interpretation. To determine if centriole overduplication caused by 200 nM centrinone B triggers growth arrest in this case, we induced centriole overduplication by overexpressing PLK4 and, surprisingly, TRIM37 was not required for growth arrest in these conditions, similar to that observed by __Evans et al., 2020. Thus, we have two conditions where centriole overduplication is observed where the growth arrest in only one condition is dependent on TRIM37. This is an important difference that we will better highlight in our revised manuscript. We will also present a better model and/or table outlining our most salient results. Briefly, it is thought that partially inhibited PLK4 blocks its own auto-phosphorylation and therefore blocks its degradation. The overall abundance of PLK4 therefore increases under these conditions and overduplication occurs. In our hands, we consider PLK4 to be partially inhibited in RPE-1 or A375 cells at any concentrations of centrinone B at 200 nM or lower.__

      Please also see similar comment to Reviewer 2, Minor point 1.

      Presumably it will only require textual changes to address this point, but it is hard to assess the broader significance of the paper until these points are clarified: is the main point of this paper that the cells response to Centrinone treatment is complicated and the role of TRIM37 equally so; or, is there a narrative that leads to a clear hypothesis that can explain these surprising findings?

      We don’t currently have a model that explains all the results we observe with TRIM37. We have data that is consistent with some previously published results and data that challenges some of these recent reports. The current model suggests that TRIM37 E3-dependent remodeling of CEP192 underlies its growth arrest activity after centriole loss. Importantly, we find that TRIM37 supports growth arrest in an E3-ligase-independent manner. We will discuss this further in our revised manuscript, as well as providing additional hypotheses based on our other observations of TRIM37 function.

      • It seems a striking omission that the authors show that p53 and p21 are induced by 200nM and 500nM Centrinone (Figure 1D), but they don't assay these proteins at any concentration lower than this. Perhaps they are saving this data for a subsequent manuscript, but the authors certainly seem to draw conclusions from several experiments they perform at concentrations below 200nM, so they should at least explain why they don't assay p53 and p21 status in these experiments. *

      We apologize for not including this data in the original version of the manuscript. It will be included in the revised version.

      Reviewer 3, Minor points

        • In the abstract the authors claim that the way in which altered centrosome numbers cause a p53-dependent growth arrest is evolutionarily conserved. This is misleading, as it implies that the loss and gain of centrosomes trigger the same arrest (which is probably not correct), and most of the data to date suggests that flies and worms (two popular models for centrosome research) do not have such a growth-arrest pathway.* This is a good point. We will modify this statement to indicate that p53-dependent arrest is confined to mammalian cells: “Altered centrosome numbers cause a p53-dependent growth arrest in both mouse and human cells through mechanisms that are still poorly defined”.

      Reviewer 3, comment in ‘significance’

      I could not discern, however, whether one could draw any broader conclusions than this, in part due to the presentation problems described above. Moreover, in the abstract the authors propose that altering PLK4 activity alone is sufficient to signal growth arrest. This would be an important conclusion, and I presume this refers to the very low dosage Centrinone experiments that trigger growth arrest without altering centrosome numbers and which does not require TRIM37? If so, this arrest is poorly characterised here and will be the subject of a future investigation, so it seems to strange to have this as a major conclusion in the abstract.

      We agree. As reviewer 3 points out, based on our findings we hypothesize that altered PLK4 activity could itself signal growth arrest. As this is not supported experimentally, we will remove it from the abstract and discuss this tantalizing possibility within the discussion.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      Most of the experiments are currently ongoing and the preliminary results we have obtained discussed in the previous section. The revised manuscript will be modified to address each and every concern of the three reviewers as detailed above.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      We will carry out all the experiments requested by the reviewers as detailed above.

      References

      Balestra FR et al., TRIM37 prevents formation of centriolar protein assemblies by regulating Centrobin. Elife. 2021 Jan 25

      Cuella-Martin R et al., 53BP1 Integrates DNA Repair and p53-Dependent Cell Fate Decisions via Distinct Mechanisms. Mol Cell. 2016 Oct 6;64(1):51-64

      Evans LT et al., ANKRD26 recruits PIDD1 to centriolar distal appendages to activate the PIDDosome following centrosome amplification. EMBO J. 2021 Feb 15;40(4)

      Meitinger F et al., TRIM37 controls cancer-specific vulnerability to PLK4 inhibition. Nature. 2020 Sep;585(7825):440-446

      Moyer TC et al., Binding of STIL to Plk4 activates kinase activity to promote centriole assembly. J Cell Biol. 2015 Jun 22;209(6):863-78

      Sillibourne JE et al.,Autophosphorylation of polo-like kinase 4 and its role in centriole duplication. Mol Biol Cell. 2010 Feb 15;21(4):547-61

      Wong YL et al., Cell biology. Reversible centriole depletion with an inhibitor of Polo-like kinase 4. Science. 2015 Jun 5;348(6239):1155-60

      Yeow ZY et al., Targeting TRIM37-driven centrosome dysfunction in 17q23-amplified breast cancer. Nature. 2020 Sep;585(7825):447-452

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript, Tkach et al. analyse the molecular pathways that lead to the growth arrest of either RPE-1 or A375 cells in response to varying doses of the PLK4 inhibitor Centrinone B (hereafter Centrinone). They show that both 200nM and 500nM Centrinone cause a strong growth arrest, but the lower concentration actually leads to centrosome amplification, while the higher concentration leads to centrosome loss. They identify the Ubiquitin E3 ligase TRIM37 as a key mediator of the growth arrest at both drug concentrations, although they confirm previous findings that TRIM37 is not required for the growth arrest induced by the supernumary centrosomes that are formed when PLK4 is overexpressed. Perhaps most importantly, the authors test the ability of various mutated forms of TRIM37 to function in the growth arrest induced by Centrinone treatment, and they conclude that, surprisingly, the E3 ligase activity of TRIM37 is not required for this growth arrest.

      The experiments presented here are generally of a high quality, although I found some aspects of the presentation a little confusing (as detailed below).

      Major Comments:

      1. The presentation throughout the manuscript sometimes made it difficult to follow exactly what the authors meant when they referred to the various doses of Centrinone used in their experiments-often using the terms "low" or "high" without specifying exactly what they mean. In Figure 1A, for example, they present a growth inhibition curve using a log10 scale of Centrinone concentration, and they conclude that growth was inhibited "at concentrations above 150nM, with full inhibition observed at concentrations greater than 200nM". I presume this is just sloppy language, as it appears that growth is significantly inhibited at 150nM and full growth inhibition is achieved at 200nM. However, in Figure 4D, the authors show another growth inhibition curve (this time presented on a linear scale) where significant growth inhibition is seen well below 100nM and full inhibition appears to be achieved at ~125nM. The discrepancy between these experiments is not noted, nor any reason for it explained.

      While discrepancies such as this may seem trivial, they make it hard to interpret some of the authors conclusions. For example, in their initial screen, the "low" dose of Centrinone (200nM) leads to centriole amplification and genes that block centriole duplication or PIDDosome function (which normally signals the presence of extra centrioles) are required for the growth arrest triggered by this concentration of the drug (Figure 1B). To me, this suggests that centriole amplification is required for this growth arrest at 200nM. However, when the authors test a more graded series of concentrations they conclude "excess centrioles might not be the trigger for this arrest at low Centrinone B concentrations". I assume they are using "low" here to indicate concentrations at or below 150nM (even though they use low to mean 200nM in their initial screen)? In the Discussion, they state that TRIM37 is "required for the growth arrest in response to partially or fully inhibited PLK4, but this activity was independent of the presence of excess centrioles". Again, it is not clear to which experiments they are referring when they talk about "partially" or "fully" inhibited PLK4, but, if this is correct, then why are genes required for centriole duplication and PIDDosome function identified in their initial screen as being required for the growth arrest at 200nM but not 500nM? Do they consider 200nM to be fully inhibiting PLK4?

      Presumably it will only require textual changes to address this point, but it is hard to assess the broader significance of the paper until these points are clarified: is the main point of this paper that the cells response to Centrinone treatment is complicated and the role of TRIM37 equally so; or, is there a narrative that leads to a clear hypothesis that can explain these surprising findings?

      1. It seems a striking omission that the authors show that p53 and p21 are induced by 200nM and 500nM Centrinone (Figure 1D), but they don't assay these proteins at any concentration lower than this. Perhaps they are saving this data for a subsequent manuscript, but the authors certainly seem to draw conclusions from several experiments they perform at concentrations below 200nM, so they should at least explain why they don't assay p53 and p21 status in these experiments.

      Minor comments:

      In the abstract the authors claim that the way in which altered centrosome numbers cause a p53-dependent growth arrest is evolutionarily conserved. This is misleading, as it implies that the loss and gain of centrosomes trigger the same arrest (which is probably not correct), and most of the data to date suggests that flies and worms (two popular models for centrosome research) do not have such a growth-arrest pathway.

      Significance

      Significance and comparison to existing literature:

      The question of how centrosome loss or amplification leads to senescence or apoptosis in many cell types is currently a hot topic, and TRIM37 has previously been identified as a potentially important player-most recently in two high-profile papers from the Oegema/Loncarek (Meitinger et al, Nature 2021) and Holland/Chapman (Yeow at al., Nature 2021) labs. In these papers, TRIM37 is shown to be overexpressed in certain cancer cells, where it appears to degrade PCM components (most notably Cep192) to prevent the formation of ectopic spindle poles that help to ensure mitotic fidelity in these abnormal cells. Moreover, mutations in TRIM37 cause Mulibrey nanism, which has recently been shown to be associated with the formation of ectopic Centrobin-dependent PCM condensates (Balestra et al., eLife 2021; Meitinger et al., JCB, 2021).

      This manuscript makes an important contribution to this area, and it will be of considerable interest to researchers in several fields (most obviously the centrosome, but also ubiquitin ligase, cancer and Mulibrey fields). In its current form, this contribution is largely to illustrate that treating cells with Centrinone (which is widely used by many centrosome researchers) triggers a complex cellular response that varies with drug dosage, and that the role of TRIM37 in triggering this response also appears to be surprisingly complicated. These are significant points that are of sufficient importance to warrant publication.

      I could not discern, however, whether one could draw any broader conclusions than this, in part due to the presentation problems described above. Moreover, in the abstract the authors propose that altering PLK4 activity alone is sufficient to signal growth arrest. This would be an important conclusion, and I presume this refers to the very low dosage Centrinone experiments that trigger growth arrest without altering centrosome numbers and which does not require TRIM37? If so, this arrest is poorly characterised here and will be the subject of a future investigation, so it seems to strange to have this as a major conclusion in the abstract.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The study by Tkach et al. investigates the molecular basis of the previously described, p53-dependent growth arrest that is triggered by manipulation of PLK4 kinase activity, a master regulator of centrosome biogenesis. To address this, they use CRISPR/Cas9 screening in human cell lines, gene-specific knockout and rescue experiments, and biochemical interaction assays. As in previously conducted similar screens they identify the E3 ligase TRIM37 as a key mediator of growth arrest after PLK4 inhibition, but not growth arrest induced by increased centrosome number. Importantly, contrary to suggestions in previous studies, they find that TRIM37 function in growth arrest is independent of E3 ligase function, but may involve regulation of PLK4.

      Major comments:

      Overall, I found the key conclusions convincing, assuming the claimed effects are significant. In this regard, some data requires quantification and some of the quantifications may require additional replicates.

      1) The centrosomal localization of endogenous TRIM37 should be validated by comparing control and knockout/knockdown cells.

      2) Some of the quantifications are derived from only two experiments and in many cases no statistical testing was done. The authors should test the observed effects and add extra replicates to make the data more robust, where required.

      3) Fig. 5 supplements: panels showing effects on marker proteins in cells by IF lack quantification of the claimed effects. Without providing some type of quantifications for key findings, it is unclear how strong or penetrant the effects are.

      Minor comments:

      Overall, I felt that the presentation of the data can be improved. After reading the abstract, it was not clear at all to me, what message the authors want to convey, also in comparison to previous work. In particular the final part of the abstract should be improved. The results part is well written, but may still be improved, by providing more summarizing statements that extract the key conclusion from particular experiments and by explaining better why particular experiments were done. The specific rationale may be clear to expert readers but less so to non-experts. Only after reading the discussion, the findings and how they relate to previous work became clearer. I would suggest a final, summarizing schematic that illustrates the main findings in a cartoon/flow chart manner.

      1) Please revise incorrect abstract sentence: "We identify TRIM37 as a key mediator of growth arrest when PLK4 activity is partially or fully inhibited but is not required for growth arrest triggered by supernumerary centrosomes."

      2) In various figures and supplements showing centrosome and condensates/Cenpas, these are very difficult to distinguish due to their small size. I suggest to magnify regions of interest and/or add arrowheads in different colors marking the specific structures.

      3) Fig. 2A: What is the purpose of the schematics on the right of panel A? The labels in the graph are unreadable and the network diagram without any labels is also not very useful. This could be removed.

      4) Fig. 2B: The network presentation is not very easy to read. What are the functional groups/pathways here? The clusters should be labeled accordingly. What is the meaning of the different sizes of the circles? Maybe key interactions (e.g. TRIM37) could be indicated in a different color shade to highlight these?

      Significance

      While the authors start out by essentially reproducing results from previously conducted screens, which may seem to be of limited novelty, the current work reaches conclusions that differ in important aspects from those in previous studies. Moreover, the current work nicely compares in different cell backgrounds PLK4 partial inhibition (extra centrosomes), full inhibition (less/no centrosomes), and p53 pathway inhibition, to obtain an integrated view of the mechanisms involved in growth arrest and tease apart molecular requirements. The results challenge some of the conclusions from previous studies, including high-profile papers where this pathway has been identified as a potential target for cancer treatment. For these reasons I consider this very important work.

      My expertise is in centrosome biology and microtubule organization including mitotic spindle assembly.

      Referee Cross-commenting

      Hi everyone,

      Overall it seems that we all agree that this is an important study. However, as noted by several comments, the presentation definitely needs to be improved and the new findings need to be highlighted better and contrasted with previous studies. I had relatively few major concerns, but, after reading the other reviews, I found the additional comments also important and useful.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Centrosome loss and gain both elicit a p53-dependent cell cycle arrest but the molecular pathways involved are still not fully understood. To address this question the Pelletier lab performed several genome wide CRISPR screens using two different concentrations of centrinone that cause centrosome amplification (low) or loss (high) in RPE1 and A375 cells. In order to distinguish between pathways that act by regulating p53 levels in cells vs those that mediate p53 response to abnormal centrosome numbers, they also performed a screen in cells where p53 levels were artificially elevated by Nutlin treatment. The top hits from the low/high centrinone screen confirmed previous results from other groups, highlighting the importance of the 53BP1/USP28/p53 complex and PIDDisome/ANKR26 complex in the cell cycle response. TRIM37 was shared between both centrinone conditions, while being absent from the Nutlin screen, and thus the authors focused their analysis on the function of TRIM37. Overall the data quality and presentation are both good and the manuscript reads well. The Crispr-Cas9 screens have been performed to a high standard and it is reassuring that the same candidates emerge as from previous screens focusing on centrosome loss and gain.

      TRIM37 has been the subject of several high profile papers over the past year. This current manuscript has the potential to clarify some of the outstanding questions but in its present form the manuscript brings more confusion than clarity to his area of research. Although the authors conduct a careful analysis of TRIM37 function, unless someone is a die-hard specialist, it is difficult to follow what is already known, what the authors find and how or why their data fits/contradicts previous work. The key observations are that i) TRIM37 may not actually control CEP192 levels (unless overexpressed transiently), ii) its E3 ligase activity and its binding to PLK4 are independent of its ability to promote growth arrest upon centrinone treatment, iii) its influence on mitotic duration is independent of its E3 activity or its role in growth arrest upon centrinone treatment. The result that a TRIM37-dependent growth arrest may also exist without increased mitotic duration is another interesting finding, as is the link between TRIM37 and condensates of centrosomal proteins. Including a table that summarises which roles of TRIM37 require PLK4 binding, E3 ligase activity etc would be useful not only to non-specialists. Some of the data contradicts current models for TRIM37 function in growth suppression, so the authors should consider showing a revised model, too.

      Major Points:

      1. Previous data suggested that an important role of TRIM37 was to limit accumulation of CEP192 levels, yet here CEP192 levels appeared unchanged in TRIM37 knockout cells that stably express wild-type or RING domain mutant TRIM37. However, in agreement with previous work, transient expression of TRIM37 reduced CEP192 levels along with those of other PCM and centriole components in an E3-dependent manner. These data are rather confusing in light of the literature, and the current report does not really deal with these discrepancies but to me they suggest that high levels of TRIM37 can target multiple centrosome components for degradation, but this may be an experimental artefact.
      2. The choice of cells for particular experiments is not always stated or explained. For instance, in Figure 3A: Trim37 KO pool used while in Figure 3B TRIM37 single KO. These are then combined with both transient and stable expression of TRIM37 mutants.
      3. Two different concentrations (200 nM and 500 nM) of centrinone were used to compare responses of too many or no centrosomes in RPE1 and A375 . While these concentrations result in centrosome amplification (200 nM) and loss (500 nM) in RPE1 cells, the phenotypes seem much less clear-cut in A375 cells. At 200nM 70% of cells have 0 or 1 centrioles (~35% each category) and only about 15% have centrosome amplification, whereas centrosome amplification occurs in 30% of RPE1 with 0-1 centrioles seen in fewer than 10% (Figure 4 - figure supplement 1H). Hence the different outcomes of centrinone treatment makes conclusions about cell-type specific responses difficult. This difference may be due to differences in drug uptake/efflux, PLK4 activity or in expression of other components of these pathways. In fact, 167nM centrinone B in A375 cells would have been a much closer match to the 200nM treatment of RPE-1. These points should be discussed as they impact the conclusions.
      4. I find the different outcomes of stable versus acute expression of TRIM37 ligase mutant confusing. Here, stable expression of TRIM37 ligase mutant increases mitotic length compared to that of TRIM37 wild-type, which contradicts a recent report by (Meitinger et al. 2021). What could be the potential reason for these differences? What could be the mechanism for TRIM37 action in regulating spindle assembly/mitotic duration and cell proliferation upon centrosome loss? How do those acentrosomal MTOCs form that decrease mitotic duration and promote proliferation? Do the authors find a difference in the % of cells expressing TRIM37 mutants upon stable or acute expression? This part needs a better summary, and again a table would help. I also wonder about protein expression levels; wild-type FB-TRIM37 seems to be expressed at much lower levels than the mutants in Figure 5B.
      5. Other means of centrosome depletion (Cenpj, SAS6 etc) would have been useful to include in the manuscript in support of E3 ligase dependent and independent roles of TRIM37. It is not essential to perform these experiment but if data are available, including these would improve the paper.
      6. The authors show that TRIM37 regulates PLK4 phosphorylation and that this modification could only be observed in HEK293T and not in RPE1. Why would there be a difference between HEK293 and RPE1?
      7. Statistical analysis for graphs should be included. Figure 5 is ok but graphs in Figures 3, 4, 6, 7 would benefit.
      8. The authors characterise TRIM37 localisation. They detect it at centrosomes (as shown by Yeow et al 2021) and more specifically at the PCM, but apparently the signal is not present in all cells. They should also provide a quantification of the % of cells with centrosomal TRIM37 signal and compare this to cells expressing Flag-tagged Trim37. The specificity of the antibody signal using TRIM37-/- should be confirmed.

      Minor Points

      • Page 3: "A recent screen for mediators of supernumerary centrosome-induced arrest identified PIDDosome/p53 and placed the distal appendage protein ANKRD26 within this pathway [31]". It appears that the reference for Burigotto et al. is missing.

      • Page 6: The authors state that: TP53BP1, USP28 and CDKN1A are also suppressors in the Nutlin-3a screen and suggest that they act in a general p53 pathway. However Meitinger et al (2016) showed that depletion of TP53BP1 or USP28 did not affect the upregulation of p53 and p21 upon Mdm2 inhibition.

      • Page 9: "First, we performed live cell imaging to measure mitotic length in cells grown in centrinone". For consistency the authors should say centrinone B here as well

      • Page 9: "Cells lacking TRIM37 suppressed the growth arrest from 150 to 500 nM centrinone B in RPE-1 and 167 to 500 nM in A375 cells". The growth data for the A375 cells seem to be missing from the figures.

      • Page 10: "Our results confirmed that PLK4 and TRIM37 form a complex in RPE-1 cells (Figure 3G)" It appears the authors referred to the wrong figure, it should be Figure 4B.

      • Figure 1C: The nuclear p53 signal is not apparent with 500 nM centrinone B in the exemplary cells. Did the authors use thresholding to quantify p53/p21 positive cells?

      • Figure 4D and Figure 4 - Figure supplement 1G: The graph is misleading and should not be presented as a continuous line.

      • Figure 5A and C: A direct and statistical comparison mitotic timing upon expression different Trim37 mutants to wildtype and trim37-/- cells is missing

      • Figure 6B: A loading control/Ponceau staining is missing as well as the quantification of protein levels

      • Figure 6D: It is unclear if the centrosomal signal intensity was quantified in interphase or mitotic cells

      • Figure 7C: A loading control/Ponceau staining is missing

      • Figure 2 - figure supplement 2F and G: It would help if the authors could highlight the cell line, e.g. RPE-1 (F) or A375 (G) in the venn diagrams.

      • Figure 4 - figure supplement 1E: it appears that the BirA antibody gives only an unspecific signal. It would be useful to show if the different TRIM37 variants are able to localise to the centrosomes. Furthermore it appears that centrosomes are missing in the C18R and 505-709 variants. It would be useful if the authors quantify centrosome numbers upon expression of different Trim37 variants as shown in Figure 4 - figure supplement 1. To make the identification of the cell easier it would help to include a DNA signal or indicate the outline of the cell.

      • The generation of stable and dox-inducible cell lines is missing in the material and methods

      Significance

      Centrosome loss in mammalian cells triggers a somewhat mysterious p53-dependent irreversible cell cycle arrest that bears similarities with senescence. A key modulator of this arrest is the E3 ubiquitin ligase TRIM37; TRIM37-overexpressing cells show increased sensitivity to centrosome loss whereas TRIM37 deletion restores normal growth to cells lacking centrosomes. The precise function of TRIM37 in this process is still not clear.

      The authors here report a two-pronged approach to improve our understanding; first, they perform several genome-wide Crispr/Cas9 screens in two cell lines to identify new players that modulate growth arrest following inhibiton of centrosome duplication, and second, they analyse the function of TRIM37, their top candidate, in this process. Whereas the screens recapitulate previous reports by identifying a near identical set of genes, the functional work of TRIM37 provides interesting new data that go beyond (and at places contradict) published work. They describe a complex relationship between TRIM37 function, PLK4 inhibition and growth arrest, and suggest that TRIM37 acts via modulating PLK4 phosphorylation/stability and perhaps its role in autophagy also contributes to the overall phenotype. These possibilities will need to be tested in the future but the current manuscript contains enough interesting and potentially important data that it is worthy of publication following revision.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In this paper, the authors use a previously published method SHAP for interpreting deep learning (DL) models (specifically LSTMs) that are trained for predicting physicochemical attributes of peptides (such as antigenicity and collisional cross section). The paper shows that it's capable of identifying some amino acid residues contributing to the prediction results of the DL models. Reviewer #1 (Significance (Required)):

      1. One main ideas of the paper is to use SHAP for determine the significant amino acids at each position (or pairs of AA at each position) contributing to the prediction. Some of the interpretation results are consistent with findings reported previously. This is very nice; however, most of these findings are statistical results such "XX is often present at the second position for the peptides with the positive outcome", which are relatively straightforward and may be derived by using some statistical methods without using DL models. We expect more complex patterns can be discovered in addition to these statistical observations.

      We thank the reviewers for these comments.

      First, to the point about discovering complex patterns, we note that one use of PoSHAP we discuss later in the paper is that PoSHAP enables interposition dependence analysis, which depends on interactions between residues and would not be reflected by summary statistics.

      Second, we agree it is important to show whether PoSHAP produces different residue importance maps than simple statistical summaries of amino acids in each group. The strongest binding peptides, or the highest mobility for the CCS model, were determined by taking only peptides that fall above a linear regression best fit of the ranked experimental values. Statistical summary heatmaps were created and then compared to those from PoSHAP revealing some similarities but also many differences. We added the following text and new figure to the results section to illustrate these points:

      “We wondered whether the patterns revealed by PoSHAP simply reflect the summary statistics for the high-binding or high-CCS subset of peptides. As expected, due to known differences in amino acid abundance across the proteome, the prevalence of amino acids was different across the training data and were also heterogeneous across positions (Figure 5A). To determine the subset of high CCS peptides, peptides were ordered in the training set by their CCS rank and then linear regression was performed to get the average trend line (Figure 5B). Any peptide above that trendline was defined as “high CCS”, and the frequency of amino acids at each position in this set was summarized using a heatmap (Figure 5C). Compared to the statistical amino acid frequencies, PoSHAP suggests a greater importance to arginine at both termini, the importance of tryptophan to increase CCS becomes apparent, and interior glutamic acid contributes less to high CCS than the frequencies would suggest (Figure 5D). The same analysis was repeated for MHC data (Supplementary Figures 9 and 10). This demonstrates that PoSHAP found non-linear relationships between the inputs and the outputs that are not present by simple correlation. “

      Figure 5: Amino acid summary statistics differ from PoSHAP values for the CCS data. (A) Amino acid counts as a function of position for training data. (B) Procedure for picking the ‘top peptides’ with the highest CCS. Linear regression was performed on the peptides ranked by their actual CCS value. Any peptide that fell above the trendline and overall mean were defined as ‘top peptides’. (C) Counts of amino acids for the top peptides were summarized in a heatmap. (D) Mean SHAP values across amino acids and positions from PoSHAP analysis.

      We also added the corresponding supplemental figures showing the same examples for the MAMU A001 model and human MHC models:

      Supplemental Figure 9: Amino acid summary statistics differ from PoSHAP values for the A001 MAMU MHC I data. (A) Amino acid counts as a function of position for training data. (B) Procedure for picking the ‘top peptides’ with the highest CCS. Linear regression was performed on the peptides ranked by their actual CCS value. Any peptide that fell above the trendline and overall mean were defined as ‘top peptides’. (C) Counts of amino acids for the top peptides were summarized in a heatmap. (D) Mean SHAP values across amino acids and positions from PoSHAP analysis. For the MAMU model, the amino acid frequencies of the input peptides show no obvious preference for amino acid position, but some amino acids are over-represented overall. The presence of the “end” token is more likely to be a high binder statistically (C), but the PoSHAP reveals that this end token is not the main determinant of binding (D).

      Supplemental Figure 10: Amino acid summary statistics differ from PoSHAP values for the human A1101 MHC I data. (A) Amino acid counts as a function of position for training data. The distribution of amino acids in this data. (B) Procedure for picking the ‘top peptides’ with the highest CCS. Linear regression was performed on the peptides ranked by their actual CCS value. Any peptide that fell above the trendline and overall mean were defined as ‘top peptides’. (C) Counts of amino acids for the top peptides were summarized in a heatmap. (D) Mean SHAP values across amino acids and positions from PoSHAP analysis. There are clear differences between the summary statistics of top peptides (C) and PoSHAP heatmap (D). For example, the end token is prominent in the summary statistics absent from the PoSHAP interpretation. Also, the preference for S/T/V at position two is tempered according to PoSHAP, but would be determined to be very important by the summary statistics.

      Although the interpreting results reported in the paper largely agree with previous reports, the paper did not explicitly model the frequency of different amino acid in the training data. For instance, if the amino acid 'A' happens to be over-represented in the positive samples of peptides in the training data, the DL model may consider it as to contribute to the positive prediction, which may not be not true. This issue might become more serious when pairs of amino acids are considered. The authors may want to analyze this potential issue in their results.

      We agree and understand the concern for the overrepresentation of amino acids that might skew the training of our models. To determine if this is an issue, as part of the response to the previous question, we looked at the amino acid counts for all peptides (Figure 5A, Supplemental Figures 9A and 10A). In general, the PoSHAP heatmaps (panel Ds in the same figures) look very different from the frequencies of amino acids (panel Cs in the figures), suggesting that amino acid frequencies have not caused any problem.

      Even on a balanced training dataset, the LSTM model to be interpreted may still contain arbitrary bias due to invertible overfitting, which the authors did not discuss. It will be more convincing by training multiple models using different hyper-parameters and optimization algorithms, and then see if similar interpretation results can be reached among most or all of these models.

      We assume the reviewer meant ‘inevitable overfitting’ instead of “invertible overfitting”? If so, the original manuscript did assess overfitting in Figure S4 based on the training and validation loss over training epochs.

      We think the reviewer makes a good point that different models might produce different interpretations, so we trained new models without optimization and with different hyperparameters and with a different optimizer (RMS prop). We see essentially the same PoSHAP interpretations. We added the following text to the results section along with these three new supplemental figures:

      “Given the dependence of the model interpretation results on the model used, the same model architecture trained with different parameters might result in different model interpretation. Given this, models for each of the three tasks mentioned here were retrained with different hyperparameters including the “RMS prop” optimizer. Each model produces similar or better prediction performance compared to the earlier version, and the model interpretation by PoSHAP was almost identical to the previous results in all three cases (Supplementary figures 12, 13, 14). This suggests that the model architecture drives the differences in interpretation, not the model training process.”

      Supplemental Figure 12. PoSHAP Analysis of Mamu A001 With Unoptimized Hyperparameters and RMSprop. A new model for the Mamu data was trained using the same architectures but with different hyperparameters and RMSprop as the optimization algorithm. Loss was plotted as mean squared error compared to the validation data. (A) Similar metrics for MSE, r, and p-values were obtained (B). Similar patterns are also observed for the PoSHAP heatmap of A001. (C) A dependence plot for A001 shows similar patterns to the Adam optimized model, including the positional dependence of proline at position two for high SHAP values of serine and threonine.

      Supplemental Figure 13. PoSHAP Analysis of A:11*01 With Unoptimized Hyperparameters and RMSprop. A new model for the A:11*01 data was trained using the same architectures but with different hyperparameters and RMSprop as the optimization algorithm. Loss was plotted as mean squared error compared to the validation data. (A) Similar metrics for MSE, r, and p-values were obtained (B). Similar patterns are also observed for the PoSHAP heatmap of A:11*01. (C) The SHAP ranges by position plot for A:11*01 shows similar patterns to the Adam optimized model, including the largest range of SHAP values at position two, nine, and ten.

      Supplemental Figure 14. PoSHAP Analysis of CCS With Unoptimized Hyperparameters and RMSprop. A new model for the CCS data was trained using the same architectures but with different hyperparameters and RMSprop as the optimization algorithm. Loss was plotted as mean squared error compared to the validation data. (A) Similar metrics for MSE, r, and p-values were obtained (B). Similar patterns are also observed for the PoSHAP heatmap of CCS. (C) Dependence analysis was performed on the dataset and the combined distance-interaction type bar plot shows similar relationships between the groupings, notably charge repulsion’s split.

      For the dependence analysis, it is not completely clear why the distance is used as the variable, while the relative position of the amino acid residue in the peptide is ignored. For example, if there is a strong interaction between the first and the last residues in the peptide, their distance changes depending on the peptide length. In figure 6, the authors showed strong interactions between amino acid that are 8-9 residues apart may suggest the peptide length actually plays a role here.

      We used distance because as the dependence analysis is a calculation of the difference in means between two distributions of SHAP values, dependent of the amino acid at another position. We believe that the distance between these interacting points is a natural choice and among the most informative metrics to explain these interactions. We agree with the reviewer that peptide length is important to the magnitude of the interactions between amino acids. We also recognize that there may be interactions between the peptide termini that could be obscured by the interactions of the longer peptides. To better explore this possibility, we performed the dependence analysis on each of the different peptide lengths separately (8, 9, or 10 here) to see if this is the case. Unfortunately, given the smaller size of these data subsets, we were unable to show significant differences in the interaction groupings. Though, interestingly enough, the significant interactions for the peptides of length eight only occurred between neighboring amino acids or the termini. This may suggest an interaction between termini that could be explored in the future.

      We added the following text and supplemental figure 11 to the results:

      “Finally, to try to ask if the absolute positions of amino acids in the peptide are relevant for the interaction, the data was split into 8, 9, or 10mers before analysis (Supplemental Figure 11). This revealed that there may be interactions between the termini, but this effect may be difficult to observe because there are significantly fewer 8mers and 9mers in the CCS dataset.”

      Supplemental Figure 11. SHAP Values of Collisional Cross Section by Peptide Length. The impact of peptide length on SHAP values was explored for the CCS data. The dataset was split into peptides of length 8, 9, and 10. All SHAP values were plotted as violin plots. The mean SHAP values were plotted in heatmaps by position and amino acid and standardized. Significant interactions by dependence analysis were plotted in bar charts by distance between interactions.

      To further support our decision to use distance as an interaction metric, we have also now included an additional box plot for Figure 7, demonstrating the interactions between each of the categories combined with distance. We have found that some of the bimodality of the interaction categories are explained by the distance at which they interact. Most strikingly is charge repulsion that decreases CCS when neighboring but increases CCS when the interaction is further.

      We added the following text and updated Figure 7 to the results section:

      “Additionally, there are interesting differences in the interactions of the amino acid among the significant set of interactions (Figure 76B). All significant interactions from the CCS data (Supplemental Table 3, adj. p-value Though it is evident that the mean of each interaction type corresponds to the expected impact those interactions would have on CCS, each of the interaction dependence plots are bimodal, with some interactions increasing CCS and some decreasing it. To dissect this observation further, we combined the two methods of splitting the data to see if the bimodality of interaction types would be resolved by distance (Figure 7C). Though definitive conclusions cannot be made for most categories, likely due to the ever decreasing sample size by splitting, of note is the difference between neighboring charge repulsion and non-neighboring charge repulsion. Neighboring charge repulsion seems to decrease CCS while distant charge repulsion increases CCS (see adjusted p-value from Tukey’s posthoc test in Figure 7D). When distant, charge repulsion makes intuitive sense as the amino acids are forced apart, linearizing the peptide and increasing the surface area. When neighboring, it is possible that the repulsion causes a kink in the linear peptide, decreasing the cross section. Overall, these analyses demonstrate that the models were able to learn fundamental chemical properties of the amino acids and through PoSHAP analysis we were able to uncover them.”*

      Figure 7. Dependence analysis of CCS model. (A) Significant (Bonferroni corr. P-value = charge repulsion, * = other, and δ = polar. For the distance analysis, interactions were grouped into three categories, neighboring (distance = 1), near (distance = 2, 3, 4, 5,6), and far (distance = 7, 8, 9). * indicates significance (ANOVA with Tukey’s post hoc test p-value

      Also, it would be better to show that how the result looks like when applying this method to peptides in the negative samples (e.g., the peptides that are not bound by MHC in the antigenicity prediction experiment). Will the interpreting results also be negative?

      We agree this is an interesting idea. We updated the supplemental figure showing PoSHAP of top peptide subsets to also show PoSHAP of bottom peptide subsets (supplemental figure 8). The results suggest that certain amino acid positions are detrimental to binding, for example D/E at various positions. We updated this section to add:

      “We also performed the same analysis with the eight peptides with the lowest binding predictions (Supplemental Figure 8). These PoSHAP heatmaps are primarily composed of negative SHAP values, suggesting that using this subset reveals amino acids at certain positions that are detrimental to MHC binding.”

      Supplemental Figure 8. Pooled PoSHAP for bottom and top predicted subsets of the data. The mean SHAP values for each amino acid at each position were calculated for the peptides with the bottom (A) or top (B) 0.013% predicted intensity (top 8 peptides) for the “A” Mamu alleles. Due to the small sample size, most of the amino acid positions have a value of zero. The positions with extreme values, however, illustrate important amino acids for prediction. Notably for A001 and A002, aspartic acid and glutamic acid contribute to low prediction along the peptide, suggesting charge may inhibit binding. For the top predictions, phenylalanine or leucine are important at the first position for both A001 and A008. A serine or threonine at position two is important for A001, A002, and A008. All alleles demonstrate the importance of a proline near the middle of the peptide.

      Finally, it will be interesting to see the interpreting results when the method is applied to the DL models on more challenging tasks such as the prediction of tandem mass spectra of peptides. The authors may want to discuss these applications.

      We agree it would be very interesting to apply this method to interpret predictions of tandem mass spectra. In this paper we already demonstrated PoSHAP on three different datasets with three different models, so we feel that adding a fourth model is out of the scope of this work. We do agree that we would like to explore this option in the future. We added this idea to the discussion section:

      “Altogether the advances described herein are likely to find widespread use for interpreting models trained from biological sequences, including models not covered here such as those to predict tandem mass spectra (reviewed in 33).”

      I am primarily interested in algorithmic and statistical problems in genomics and proteomics. We have develop deep learning models for predicting the full tandem mass spectrum of peptides, and am interested model interpretation methods to explain the fragmentation mechanism resulting in non-conventional fragment ions in tandem mass spectra of peptides. I review the paper in collaboration with my Ph.D students, who are developing deep learning models for computational mass spectrometry.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): **Comments to the Authors** In this study, the authors developed a framework named PoSHAP for the interpretation of neural networks trained on biological sequences. The current manuscript can be stronger if the following issues can be clearly addressed.

      1. As interpreting model with SHAP is a vital part of this manuscript, it would be better to provide descriptions of the underlying principles of SHAP to enable the readers to understand the paper easily.

      We recognize that understanding the principles of SHAP is vital. To better explain SHAP, we have added the following text to the introduction:

      “SHAP is a perturbation-based explanation method where the contribution of an input is calculated by hiding that input and determining the effect on the output. SHAP expands this using the game theoretic approach of Shapely values that ensures the contributions of the inputs plus a calculated baseline sum to the predicted output.”

      It is emphasized in the manuscript that PoSHAP is introduced to interpret neural networks trained on biological sequences. However, it is not clear why the authors choose the Model Agnostic Kernel SHAP, which is based on Linear LIME. Although it can be used for any model, the performance of which may not be optimal. In this regards, perhaps Deep SHAP or Gradient SHAP is more appropriate, both of which are designed for deep learning networks [1]. It would be better to provide some additional experiments on Deep SHAP and this work will be more convincing if the same or similar contribution of each position on each peptide as that of Kernel SHAP. [1] Lundberg, S., and S. I. Lee. "A Unified Approach to Interpreting Model Predictions." Nips 2017.

      Our goal in using KernelExplainer was to demonstrate that PoSHAP was not dependent on model specific interpretation methods. However, we have realized that this intention may not have been clearly stated or demonstrated. To expand on this, we have included a new Figure 8, which shows PoSHAP analysis comparisons to other classes of machine learning models, all using Kernel Explainer. This result was interesting because it revealed that even though the XGboost model technically performed better at prediction (Figure 8A, reduced MSE and higher spearman rho), and produced a similar PoSHAP motif heatmap, the interpositional dependences from the perspective of distance (Figure 8C) or chemical interactions (Figure 8D) were substantially muted. This is also apparent with the other standard machine learning model ExtraTrees. This result shows that the choice of model architecture is important, and this direct comparison would not be possible if we used the DeepExplainer.

      We added the following text and figure to the manuscript:

      “ PoSHAP uses the SHAP KernelExplainer method, which is based on Local interpretable model-agnostic explanations (LIME). Using the general KernelExpplainer method enables direct comparison of interpretations produced by different models trained from the same data. To ask whether PoSHAP interpretation changes based on the model used, the CCS data was used to train XGboost or ExtraTrees models. Surprisingly, the XGboost model performed better than the LSTM model with regard to MSE and spearman rho between true and predicted values in the test set (Figure 8A). ExtraTrees was slightly worse than the other two models. The model interpretation heatmaps from PoSHAP were similar between the LSTM and XGboost, but the interpretation from the ExtraTrees model was missing the high average SHAP due to n-terminal histidine or arginine (Figure 8B). Even though XGboost produced a similar PoSHAP heatmap, the interpositional dependence with regard to distance (Figure 8C) and chemical interactions (Figure 8D) was muted. This shows that the choice of model is important for revealing amino acid interactions.”

      Figure 8. CCS PoSHAP of Various Machine Learning Models. PoSHAP analysis was performed on two additional machine learning models, Extra Trees and Extreme Gradient Boosting (XGB). Predictions were plotted against experimental values and the Mean Squared Error and r values are reported for each model (A). PoSHAP heatmaps were created for each model (B), illustrating an increase in model complexity as more sophisticated models are used. Dependence analysis was performed on each model and the significant interactions are plotted by distance (C) and by combined distance and interaction type (D).

      As described in the manuscript, "Correlations between true and predicted values were assessed by MSE, Spearman's rank correlation coefficient, and the correlation p-value." As an important indicator for evaluation, the exact p-values should be provided in the seven subgraphs in Figure 2, not p=0.0.

      We agree with the reviewer that reporting accurate p-values can assist in evaluation. We have updated the figures to reflect the p-values as far as we were able to determine them. Unfortunately, we are limited by the nature of the double data type in python and so reported that the p-value was less than the minimum value allowed by a double in six of the seven graphs. Additionally, the scales have been marked symmetrically as you mentioned in comment 4.

      It should be noted that the coordinate scales of Figure 2B and Figure 2C need to be marked symmetrically. And from Figure 2B, we can see that, the IC50 with smaller (0.8) values cannot be well predicted. Can the authors provide a detailed explanation about these results?

      We understand the reviewer’s concern with poor prediction of extreme values. Figure B represents the IC50 prediction for the A1101 human allele which was the smallest of the datasets we used for training. It only consists of 4,522 entries, around 1/10 of the data used for the Mamu alleles and CCS. Because of this, it is likely that there were not enough examples of datapoints at the extremes to reliably train the model to account for them. However, given the limited size of the dataset, we were surprised with the satisfactory predictions. More importantly, the purpose of our paper is model interpretation not model prediction accuracy, and this shows that even when predictions are not perfect, the model interpretation by PoSHAP can still be effective. We thank the reviewer for noticing this and added the following statement to the results:

      “Remarkably, this was achieved for A\11:01 using a total dataset of only 4,522 examples, which shows that PoSHAP can be effective with even less than 10,000 training examples. “*

      References are needed in some descriptions in the manuscript. For example, "one might train a network to take an input of peptide sequence and predict chromatographic retention time", "RNNs have found extensive application to natural language processing, and by extension as a similar type of data, predictions from biological sequences such as peptides or nucleic acids".

      We apologize for missing these references. We have now cited these statements and have added many additional references as part of our revision.

      The description of the adopted three models in the section "Model architecture" is a bit confusing. As described in this section, "The LSTM layer outputs a 50x128 dimensional matrix to a dropout layer where a proportion of values are randomly set to 0", "a second LSTM layer outputs a tensor with length 128 and a second dropout layer then randomly sets a proportion of values to 0". But as shown in the Supplemental Figure 3, the output size of the first LSTM was 10x128. Also, as shown in Table 1, the dropout rates were not 0. Therefore, the section should be adjusted for clear clarification.

      We apologize for the confusing wording. We meant that dropout layers randomly set values=0, not that the dropout proportion was 0. We reworded this part to read:

      “The LSTM layer outputs a 10x128 dimensional matrix to a dropout layer where a proportion of values are randomly “dropped”, or set to 0. For the MHC models, a second LSTM layer outputs a tensor with length 128 to a second dropout layer. Then in all models, a dense layer reduces the data dimensionality to 64. For the MHC models, the data is then passed through a leaky rectified linear unit (LeakyReLU) activation before a final dropout layer, present in all models.”

      Reviewer #2 (Significance (Required)): Pls refer to my comments provided as above.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): **Summary:** The main goal of the work is to provide the interpretation of Deep Neural Networks (LSTM in the paper) trained on biological sequences. For this purpose authors used the framework introduced earlier - SHapley Additive exPlanations (SHAP), in particular - the slight adaptation of this method called positional SHAP (PoSHAP), because they are interested in the impact of each position of the input sequence to the model output. They demonstrate this on three regression tasks that predict peptide properties. **Major comments** The main contribution, highlighted in the paper: authors showed how PoSHAP discloses amino acid motifs that influence MHC I binding. Further they described how PoSHAP enables understanding of interpositional dependence of amino acids that result in high affinity predictions. Also they argued that this work also contributes to a method for accurate prediction of peptide-MHC I affinity using peptide array data enabled by novel application of a neural network that combines amino acid embedding and LSTM layers.

      There are some comments about the statements above: 1.Why was the LSTM model chosen? Recent publications showed the success of the Transformer model for biological sequences; however this direction was not covered in the related work overview. The architecture choice then should be better justified. Also the choice of LSTM for the biological sequences is not new and authors should better claim their statement about "novel application of a neural network that combines amino acid embedding and LSTM layers ". Where exactly is the novelty? Could the community use the pretrained embeddings for their purpose?

      The reviewer is correct that transformer models are highly effective for making predictions from biological sequences. In fact, many models do well, and there is no single correct choice of model for this task. Though there are many models to choose from, our models are sufficiently accurate. Importantly, the main contribution of our manuscript is not to train the most accurate models, but rather to demonstrate a strategy for positional model interpretation based on SHAP. Related to that point, please note our response to reviewer #2’s second comment that our approach uses the kernel explainer and can be applied to any model. However, we do agree that we neglected coverage of the transformer model in the introduction and have added a paragraph to the introduction covering some of the recent work in this area:

      “Many effective deep learning model architectures are available for making predictions from inputs of biological sequences, and there is currently no single correct choice. CNN models such as MHCflurry 2.0 (40) and LSTM models are effective at predicting MHC binding of peptides (41). Even simpler models, such as random forests, have been used to predict MHC binding (42,43). Prediction of other peptide properties like tandem mass spectra are often done with CNN or LSTM models (33). More recently, given the extraordinary performance of transformer models like BERT (44) and GPT-3 (45) for NLP, there is interest in transformer models for biological sequences (46).”

      We also want to be sure we do not overstate the novelty of our contributions. We have updated our discussion to better reflect the nature of our contributions. We reworded the statement quoted above to read:

      “Overall, the three modeling examples laid out herein serve as a tutorial for PoSHAP interpretation of almost any model trained from almost any biological sequence.”

      The attention mechanism itself provides the great opportunity to interpret the model predictions. In the introduction section authors made a statement that attention layers may limit the flexibility of model architecture when designing new models. Could they better explain this limit? Because recent state of the art models successfully work with long biological sequences and show better results then any other models (one example could be found here: https://openreview.net/pdf?id=YWtLZvLmud7). Authors should cover these limits more, that also related to the motivation of the LSTM choice.

      We added a paragraph to our introduction to expand on attention and its limitations:

      “Attention mechanisms have been successful in recapitulating experimentally defined binding motifs, but require that the model be constructed with attention layers. This may limit the flexibility of model architecture when designing new models. For example, attention mechanisms are specific to neural networks. Simpler models, such as random forests and XGboost, may also be more suitable for some applications, and these cannot utilize attention. Also, while attention mechanisms are currently very effective, there is always a possibility that new architectures will emerge that make interpretations using attention infeasible. Beyond this, attention is a metric of the model itself, while SHAP values are calculated on a per input basis. By looking at the model through the lens of the inputs, we can understand the model’s “reasoning” behind any peptide. Attention mechanisms also do not enable dissection of interpositional dependencies between amino acids. Thus, new methods for model agnostic interpretation are desirable.”

      Another statement was made about the PoSHAP - adaptation of the SHAP method. It is hard to follow through the explanation of this adaptation - it is not clear what exactly is this adaptation. For example, Kernel SHAP from the original paper computes feature importance, in this paper authors compute the impact of each position, that is basically also the feature importances. Thus authors should better explain the statement about PoSHAP novelty. Will it be possible to use PoSHAP for any other model trained for the same purpose? If yes, for better reproducibility, authors should provide the place where exactly in the repo is the code for this. Also mathematical notations are missing in the Positional SHAP (PoSHAP) section - it is better to explain the adaptation with them to increase the understanding of the section.

      We apologize for the ambiguous wording in the abstract stating that “PoSHAP adapts SHAP”. We have reworded this statement to “PoSHAP utilizes SHAP”. The novelty of this approach is taking the feature importance values calculated by SHAP and structuring them to include each position’s index to allow for the interpretation of biological sequences. As we demonstrate here, this allows for novel interpretations of previously published data and will enable model interpretation in future studies that learn from biological sequences. Although this is practically very simple, we are not yet aware of any examples in the literature that do this.

      The following two SHAP force plots demonstrate the difference between using SHAP as-is versus PoSHAP. There is a demonstrated need for such a framework, considering the dearth of biological sequence model interpretation using SHAP and the ambiguity within biological sequence SHAP interpretation. For example, Meier et al., Nature Communications, 2021 performed an analysis like our Figure S7C, which just shows the range of SHAP values per residue. Although we can learn something about which AAs are important based on the range of their SHAP values, SHAP as-is doesn’t reveal a motif. While our position indexing is a simple change, it enables all the rich, sequence dependent analysis we performed in this paper. We added the following text to the results section with this new supplementary figure:

      “PoSHAP utilizes the standard SHAP package but adapts the analysis by simply appending an index to each input and maintaining positional information after the kernelExplainer interpretation, which enables tracking of each input postion’s contribution to an output prediction (supplementary figure 5, showing force plot with and without index).”

      Supplemental Figure 5. SHAP Forceplots Demonstrating PoSHAP Indexing. Two forceplots were created with the SHAP forceplot method of the third peptide in the CCS testing set. (A) shows the plot with encoded inputs mapped to their amino acid. (B) shows the plot with the encoded inputs mapped to their amino acid and position. The addition of positional indexing removes the ambiguity of contributions, for example, glutamine having both a positive and a negative SHAP contribution to the prediction of the third peptide.

      We have updated the repository to include a tutorial that demonstrates PoSHAP on provided data and explains how to use PoSHAP with your own model and data.

      In the experimental section, authors first compare the results with previously known. For example, for the human MHC allele A*11:01 model PoSHAP analysis shows the similar results as was shown with another approach. Based on the provided explanation, it is not clear why PoSHAP is better than the previously published method. The advantage of the PoSHAP should be better explained.

      We agree with the reviewer that the benefits of our approach should be as clear as possible. The referenced section of the paper is to validate our approach compared to another model interpretation technique. We added a new third paragraph to the discussion section to clearly explain the benefits of PoSHAP:

      “There are several benefits of PoSHAP over competing methods. First, PoSHAP determines important residues despite biases in the frequencies of amino acids (Figure 5, Supplementary Figures 9 and 10). PoSHAP is also applicable to any model trained from sequential data (Figure 8), and enables dissection of interpositional dependencies (Figures 6 and 7). Finally, we include a clearly explained jupyter notebook on Github that will take any model and dataset and perform PoSHAP analysis.”

      In the experimental section, after the PoSHAP performance verification, hypothesis generation was introduced. However, it is not clear how many hypotheses were generated; how many of them were known before; what kind of other categories are inside these hypotheses (unknown, possible and potentially interesting, etc).

      We are unsure as to how to quantify the number of hypotheses generated by our approach. In a sense, the SHAP value of each amino acid at each position within a heatmap represents a hypothesis of the contribution of that amino acid to the metric being predicted. Each significant interaction listed in the first three supplemental tables represents a hypothesis of the interactions between two given amino acids at two positions. To make these into testable hypotheses requires some analysis, as we have discussed. i.e. the two binding motifs (L-T-P, F-S-P) of A001, or the distance-type interactions within the CCS.

      The README section in the GitHub repo is not easily understandable. An additional explanation for each step is required (e.g., links to the folders where the calculated SHAP values, the trained models, all splits and all-important benchmarks are).

      We have updated the README and repository to explain how to use PoSHAP, and explanations of each item in the repository.

      **Minor comments**

      1. The prior studies should be covered better (see Major comments).

      We apologize for not better covering prior studies. We have significantly expanded the introduction by adding two new paragraphs and at least 10 additional citations.

      The work consists of some typos, for example: "However, because many reports forgo model interpretation" - "t" is missed.

      We did intend to use the word “forgo” not “forget” in that sentence. We have checked again thoroughly for spelling and grammar mistakes.

      The hyperparameters table, hyperparameter search section should be moved to the supplemental material, that's technical details.

      We moved this table to the supplementary materials.

      Reviewer #3 (Significance (Required)): Interpretation of the model results is an important topic for biology. New findings here could lead to new interactions opening, new drugs development etc. That is relevant for the applied ML Researches and computational biologists. This paper aims to provide a way to do it. Because my field of interest and expertise lies in Machine Learning for healthcare, language modelling of biological sequences and Natural Language Processing, this work is of great interest to me. So I mostly evaluated ML methodology presented in the paper.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The main goal of the work is to provide the interpretation of Deep Neural Networks (LSTM in the paper) trained on biological sequences. For this purpose authors used the framework introduced earlier - SHapley Additive exPlanations (SHAP), in particular - the slight adaptation of this method called positional SHAP (PoSHAP), because they are interested in the impact of each position of the input sequence to the model output. They demonstrate this on three regression tasks that predict peptide properties.

      Major comments

      The main contribution, highlighted in the paper: authors showed how PoSHAP discloses amino acid motifs that influence MHC I binding. Further they described how PoSHAP enables understanding of interpositional dependence of amino acids that result in high affinity predictions. Also they argued that this work also contributes to a method for accurate prediction of peptide-MHC I affinity using peptide array data enabled by novel application of a neural network that combines amino acid embedding and LSTM layers.

      There are some comments about the statements above:

      1.Why was the LSTM model chosen? Recent publications showed the success of the Transformer model for biological sequences, however this direction was not covered in the related work overview. The architecture choice then should be better justified. Also the choice of LSTM for the biological sequences is not new and authors should better claim their statement about "novel application of a neural network that combines amino acid embedding and LSTM layers ". Where exactly is the novelty? Could the community use the pretrained embeddings for their purpose?

      1. The attention mechanism itself provides the great opportunity to interpret the model predictions. In the introduction section authors made a statement that attention layers may limit the flexibility of model architecture when designing new models. Could they better explain this limit? Because recent state of the art models successfully work with long biological sequences and show better results then any other models (one example could be found here: https://openreview.net/pdf?id=YWtLZvLmud7). Authors should cover these limits more, that also related to the motivation of the LSTM choice.
      2. Another statement was made about the PoSHAP - adaptation of the SHAP method. It is hard to follow through the explanation of this adaptation - it is not clear what exactly is this adaptation. For example, Kernel SHAP from the original paper computes feature importance, in this paper authors compute the impact of each position, that is basically also the feature importances. Thus authors should better explain the statement about PoSHAP novelty. Will it be possible to use PoSHAP for any other model trained for the same purpose? If yes, for better reproducibility, authors should provide the place where exactly in the repo is the code for this. Also mathematical notations are missing in the Positional SHAP (PoSHAP) section - it is better to explain the adaptation with them to increase the understanding of the section.
      3. In the experimental section, authors first compare the results with previously known. For example, for the human MHC allele A*11:01 model PoSHAP analysis shows the similar results as was shown with another approach. Based on the provided explanation, it is not clear why PoSHAP is better than the previously published method. The advantage of the PoSHAP should be better explained.
      4. In the experimental section, after the PoSHAP performance verification, hypothesis generation was introduced. However, it is not clear how many hypotheses were generated; how many of them were known before; what kind of other categories are inside these hypotheses (unknown, possible and potentially interesting, etc).
      5. The README section in the GitHub repo is not easily understandable. An additional explanation for each step is required (e.g. links to the folders where the calculated SHAP values, the trained models, all splits and all important benchmarks are).

      Minor comments

      1. The prior studies should be covered better (see Major comments).
      2. The work consists of some typos, for example: "However, because many reports forgo model interpretation" - "t" is missed.
      3. The hyperparameters table, hyperparameter search section should be moved to the supplemental material, that's technical details.

      Significance

      Interpretation of the model results is an important topic for biology. New findings here could lead to new interactions opening, new drugs development etc. That is relevant for the applied ML Researches and computational biologists. This paper aims to provide a way to do it. Because my field of interest and expertise lies in Machine Learning for healthcare, language modelling of biological sequences and Natural Language Processing, this work is of great interest to me. So I mostly evaluated ML methodology presented in the paper.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Comments to the Authors

      In this study, the authors developed a framework named PoSHAP for the interpretation of neural networks trained on biological sequences. The current manuscript can be stronger if the following issues can be clearly addressed.

      1. As interpreting model with SHAP is a vital part of this manuscript, it would be better to provide descriptions of the underlying principles of SHAP to enable the readers to understand the paper easily.
      2. It is emphasized in the manuscript that PoSHAP is introduced to interpret neural networks trained on biological sequences. However, it is not clear why the authors choose the Model Agnostic Kernel SHAP, which is based on Linear LIME. Although it can be used for any model, the performance of which may not be optimal. In this regards, perhaps Deep SHAP or Gradient SHAP is more appropriate, both of which are designed for deep learning networks [1]. It would be better to provide some additional experiments on Deep SHAP and this work will be more convincing if the same or similar contribution of each position on each peptide as that of Kernel SHAP. [1] Lundberg, S., and S. I. Lee. "A Unified Approach to Interpreting Model Predictions." Nips 2017.
      3. As described in the manuscript, "Correlations between true and predicted values were assessed by MSE, Spearman's rank correlation coefficient, and the correlation p-value." As an important indicator for evaluation, the exact p-values should be provided in the seven subgraphs in Figure 2, not p=0.0.
      4. It should be noted that the coordinate scales of Figure 2B and Figure 2C need to be marked symmetrically. And from Figure 2B, we can see that, the IC50 with smaller (<0) and larger (>0.8) values cannot be well predicted. Can the authors provide a detailed explanation about these results?
      5. References are needed in some descriptions in the manuscript. For example, "one might train a network to take an input of peptide sequence and predict chromatographic retention time", "RNNs have found extensive application to natural language processing, and by extension as a similar type of data, predictions from biological sequences such as peptides or nucleic acids".
      6. The description of the adopted three models in the section "Model architecture" is a bit confusing. As described in this section, "The LSTM layer outputs a 50x128 dimensional matrix to a dropout layer where a proportion of values are randomly set to 0", "a second LSTM layer outputs a tensor with length 128 and a second dropout layer then randomly sets a proportion of values to 0". But as shown in the Supplemental Figure 3, the output size of the first LSTM was 10x128. Also, as shown in Table 1, the dropout rates were not 0. Therefore, the section should be adjusted for clear clarification.

      Significance

      Pls refer to my comments provided as above.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this paper, the authors use a previously published method SHAP for interpreting deep learning (DL) models (specifically LSTMs) that are trained for predicting physicochemical attributes of peptides (such as antigenicity and collisional cross section). The paper shows that it's capable of identifying some amino acid residues contributing to the prediction results of the DL models.

      Significance

      1. One main ideas of the paper is to use SHAP for determine the significant amino acids at each position (or pairs of AA at each position) contributing to the prediction. Some of the interpretation results are consistent with findings reported previously. This is very nice; however, most of these findings are statistical results such "XX is often present at the second position for the peptides with the positive outcome", which are relatively straightforward and may be derived by using some statistical methods without using DL models. We expect more complex patterns can be discovered in addition to these statistical observations.
        1. Although the interpreting results reported in the paper largely agree with previous reports, the paper did not explicitly model the frequency of different amino acid in the training data. For instance, if the amino acid 'A' happens to be over-represented in the positive samples of peptides in the training data, the DL model may consider it as to contribute to the positive prediction, which may not be not true. This issue might become more serious when pairs of amino acids are considered. The authors may want to analyze this potential issue in their results.
        2. Even on a balanced training dataset, the LSTM model to be interpreted may still contain arbitrary bias due to invertible overfitting, which the authors did not discuss. It will be more convincing by training multiple models using different hyper-parameters and optimization algorithms, and then see if similar interpretation results can be reached among most or all of these models.
        3. For the dependence analysis, it is not completely clear why the distance is used as the variable, while the relative position of the amino acid residue in the peptide is ignored. For example, if there is a strong interaction between the first and the last residues in the peptide, their distance changes depending on the peptide length. In figure 6, the authors showed strong interactions between amino acid that are 8-9 residues apart may suggest the peptide length actually plays a role here.
        4. Also, it would be better to show that how the result looks like when applying this method to peptides in the negative samples (e.g., the peptides that are not bound by MHC in the antigenicity prediction experiment). Will the interpreting results also be negative?
        5. Finally, it will be interesting to see the interpreting results when the method is applied to the DL models on more challenging tasks such as the prediction of tandem mass spectra of peptides. The authors may want to discuss these applications.

      I am primarily interested in algorithmic and statistical problems in genomics and proteomics. We have develop deep learning models for predicting the full tandem mass spectrum of peptides, and am interested model interpretation methods to explain the fragmentation mechanism resulting in non-conventional fragment ions in tandem mass spectra of peptides. I review the paper in collaboration with my Ph.D students, who are developing deep learning models for computational mass spectrometry.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank reviewers for helping us clarify our manuscript. Some key information was only in the Supporting Information document, and was not obvious to find. We have now introduced some of this information into the main text, and otherwise clarified to which specific sub-paragraph of the Supporting Information document we refer every time we mention it. Another aspect which we have clarified is the relevance of controls previously published in our paper PLOS Comp Biol 16: 1-23. These controls address many of the remarks raised by the reviewers, regarding for instance rhythm detection methods, detection threshold, the effect of normalization of time-series data in rhythm detection, the consideration of biological replicates in time-series data, or the relationship between rhythms and highly expressed genes. We have now introduced some of these results within the main text to clarify these points, or have specified to which specific result of our previous paper we refer.

      REVIEWER #1

      Major comments:

      They assumed the optimal constant level would be the maximum over the rhythm period when rhythmic regulation is absent. They also assumed the trade-off between the benefits of not producing proteins when they are not needed (costs saved) and the costs involved in making it rhythmic (costs of complexity), which they argued lead to the expectation that costlier genes be more frequently rhythmic. However, there was no explicit definition for the trade-off, so it is unclear how it leads to the expectation. [...]

      Second, the "costs of complexity" were not defined

      We have now clarified these points:

      Thus, a first evolutionary advantage given by rhythmic biological processes would be an optimization of the overall cost (over a 24-hour period), compared to a constant expression at a high level of proteins, when this high level is necessary **for fitness at least at some point of time.

      • Thus, a first evolutionary advantage given by rhythmic biological processes would be an optimization of the overall cost (over a 24-hour period), compared to the costs generated over the same period by optimizing a constant level of proteins. The reasonable assumption that the optimal constant level would be the maximum over the rhythm period strengthens the case for selection on expression cost.

      • Our results suggest that rhythmicity of protein expression has been favored by selection for cost control of gene expression, while keeping optimal expression levels. In the case of rhythmic genes, what would that optimal constant level be? We can propose two hypotheses. The first is that it would be the mean expression over the period, since this maintains the same overall amount of protein. The second is that it would be the maximum over the rhythm period, since that is the level needed at least at some point. The second hypothesis explains better the existence of this maximum level during the cycle. Of note, it also strengthens the case for selection on expression cost. Thus, for the case of rhythmic genes, the optimal constant level should at least correspond to the mean expression level (Fig 1d). We provide results obtained using both the maximum and the mean of expression in Fig. 2a. We have modified Fig. 1d accordingly, and specified in Supp Fig. S2 that the delta value was calculated from mean expression levels.

      We assume that the maximal expression level gives an estimation of the level that would be constantly maintained in the absence of rhythmic regulation

      • We assume that, in the absence of rhythmic regulation, the constant optimal level is included between the mean and the maximum expression level observed in rhythmic expression. Here, we studied the evolutionary costs and benefits that shape the rhythmic nature of gene expression at the RNA and protein levels. For this, we analysed characteristics we presume to be part of the trade-off.

      • Here, we studied the evolutionary costs and benefits that shape the rhythmic nature of gene expression at the RNA and protein levels. For this, we analysed characteristics we presume to be part of the trade-off determining the rhythmic nature of gene expression between its advantages (cost economy over 24h, non-ribosomal occupancy) and disadvantages (costs of complexity related to precise temporal regulation). The evolutionary** origin of maintaining large cyclic biological systems, in term of adaptability, can be seen as a trade-off between disadvantages such as cost or noise induced by the added complexity, and advantages such as economy over a daily time-scale, temporal organization, or adaptability.

      • Most rhythmic genes are tissue-specific (Zhang et al. 2014, Boyle et al. 2017), which means that their rhythmic regulation is not a general property of the gene and is therefore expected to be advantageous only in those tissues in which they are found rhythmic. This argues that rhythmic regulation has costs, since it is not general. These costs are **probably related to the complexity of regulation** to maintain precise temporal organisation. Thus, cyclic biological systems are expected to have adaptive origins.

        It would be more convincing to define a fitness function or cost function to demonstrate their argument that costlier genes have fitness advantages if they are rhythmic.

      Considering rhythmicity as an economy strategy is quite intuitive and our results confirm what is currently accepted (Wang et al. 2015). We show and discuss to which extent this is true by comparing expression costs at different expression levels. Defining more precisely a fitness function in our case would require an experiment where we could compare fitness between two populations (e.g. prokaryote growth rates): WT versus a strain whose promoter of the costliest genes would be controlled by non-cyclic transcriptional factors. We do not feel that this is a reasonable extension of this work, but a whole new research program.

      First, when proteins are not needed, it can be either the case of not producing extra proteins (cost saved) or the case of degrading excessive proteins (cost incurred). […]

      The cost function presented in this paper may be oversimplified. It only takes into account the costs to produce protein. The authors argued that a more complex cost calculation would not change the observation, but without proving it. However, protein degradation, including ubiquitination and proteolysis, requires energy; for a rhythmic gene, it is also necessary to consider the cost of maintaining the rhythmicity, including the temporally precise regulation of protein expression when the proteins are needed and of protein destruction when they are not.

      We have now clarified this in Section 4.1 of the Supporting Information document:

      Protein decay can be due to spontaneous decay of unstable molecules (no cost), cellular dilution (no cost), or active protein degradation, which has a cost which has been shown to be negligible. Costs of protein decay are negligible enough to not be opposed by selection. Indeed, Lynch and Marinov (2015) and Wagner (2005) have shown that “degradation in a lysosome may cost essentially nothing, and amino-acid export back to the cytoplasm consumes 1 ATP for every 3 to 4 amino acids”. Compared with the unique cost of producing one single nucleotide which consume 49~P, protein decay costs becomes negligible comparatively to transcriptional costs, which are themselves negligible comparatively to translational costs. All the more, given that amino acids from degradation are reused and do not need to be produced by the cell, which therefore economizes around 30 ~P per amino-acid (~P: high-energy phosphate bonds).

      In Section 3 of the Supporting Information document, we also show why rhythmic and highly expressed proteins are costlier for the cell per time-unit than rhythmic and lower expressed proteins, even considering decay costs or proteins half-lives.

      Thus, the order of costs between genes is not expected to be affected by a more complex calculation accounting for protein decay and protein half-lives.

      We think these points should be in Supporting Information document since they are not novel. Lynch and Marinov as well as Wagner have studied and reported these points in detail in their work. We have replicated their results and have used them to understand rhythmicity, which is the focus of our manuscript.

      The authors claimed that cycling genes are enriched in highly expressed genes, by showing rhythmic proteins are costlier than non-rhythmic proteins (based on the expression cost function) in several species. However, only the first 15% of proteins based on p-values ranking from their rhythm detection algorithms were classified rhythmic. One potential artifact of this classification is that the identified rhythmic genes are biasedly highly expressed genes because the lower-amplitude genes are harder to detect and excluded by the algorithm. If changing the threshold for rhythmicity to include more rhythmic genes with intermediate p-values (p-value Since the results of this paper would be sensitive to the accuracy of identifying rhythmicity at both mRNA and protein levels, it is crucial to validate the rhythm detection algorithm by cross-checking algorithm-generated results with those known rhythmic genes. Can the authors estimate the false positive and false negative rates in each group of the rhythmic and non-rhythmic proteins or mRNAs identified by their algorithm?

      Our 2020 paper (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007666) addresses these issues, but we did not make this sufficiently clear here. We have now added some details of our previous results in the main text to clarify, as this a logical limitation remark. We mostly use GeneCycle based on the results of the benchmarking in that paper; it notably produces a uniform distribution under the null hypothesis and a skew towards low p-values for all empirical data.

      Furthermore, cycling genes have been shown to **be over-represented among highly expressed genes (Laloum & Robinson-Rechavi 2020, Wang et al. 2015).

      • Furthermore, we have shown in our previous work that rhythmic genes are largely enriched in highly expressed genes, and that the differences in rhythm detection obtained between highly and lowly expressed genes either reflect true biology or a lower signal to noise ratio in lowly expressed genes (Laloum & Robinson-Rechavi 2020).

        Higher gene expression usually leads to lower genetic noise. The authors thus applied a definition of the stochastic gene expression (SGE) that controls the biases associated with the correlation between the expression mean and variance to evaluate expression noise. They found lower noise with rhythmic transcripts. However, they did not explain, mechanistically, why rhythmic RNA has lower noise and what is the biological meaning behind this finding. It is also unclear whether they considered the phase difference between signal and noise that usually exists in an oscillatory system.

      Please see answer to second reviewer.

      Minor comments:

      It would be helpful if the authors could interpret their observations including where the results may not be as significant. A few examples are listed below.

      1) In tissue-specific studies, they used the transcriptomics datasets from 11 mouse tissues to compare the difference in expression levels (based on z-score) of each gene between tissue groups of rhythmic and non-rhythmic expression and found higher gene expression in rhythmic tissues. However, proteins showed a bimodal distribution, and it would be helpful to add interpretation or discussion regarding this bimodal distribution.

      Note that for proteins, the delta was calculated based on only 3 or 4 tissues, which limits a lot our detection power. We now proposed the hypothesis:

      • We also provide results obtained from other datasets in supplementary Table S3, although they must be taken with caution since only 2 to 4 tissues were available, and sometimes data were coming from different experiments. Of note, for proteomic data, the distributions of are bimodal (Fig. S3), separating rhythmic proteins into two groups, with low or high protein levels in the tissues in which they are rhythmic. **A hypothesis is that for some tissue-specific proteins the rhythmic regulation is not tissue-specific, making them rhythmic also in tissues where they are lowly expressed. But the very small sample size does not allow us to test it, and we caution against any over-interpretation of this pattern before it can be confirmed.

        2) They also calculate partial correlation for rhythmicity with expression level over tissues for all tissue-specific genes (tau>0.5) and found Spearman's correlation coefficient is skewed towards negative (suggesting a correlation), but Pearson's correlation showed a positive peak. It indicates that a subset of genes is less rhythmic in the tissues where they are most expressed. Is this positive peak significant or expected? What are these genes? Any evolutionary benefits? Can the authors discuss the functional difference between these genes and other genes that follow the predictions?

      While Spearman’s correlation is clearly skewed towards negative correlations, i.e. lower p-values thus stronger signal of rhythmicity in the tissue where genes are more expressed, Pearson’s correlation also has a smaller peak of positive correlations (Fig. S4), suggesting a subset of genes which are less rhythmic in the tissues **where they are most expressed.

      • While Spearman’s correlation is clearly skewed towards negative correlations, i.e. lower p-values thus stronger signal of rhythmicity in the tissue where genes are more expressed, Pearson’s correlation also has a smaller peak of positive correlations (Fig. S4a), suggesting a subset of genes which are less rhythmic in the tissues where they are most expressed. We show that tissue-specific genes which are mostly rhythmic in tissues where they are highly expressed are under stronger selective constraint than those which are rhythmic in tissues where they are lowly expressed (Fig. S4b). Thus, rhythmic expression of this second set of genes might be under weaker constraints.**

      We added Fig. S4b in Supplementary figures.

      3) In SGE analysis, the scRNA data of Arabidopsis was from roots, while the data for detecting the rhythmicity was from leaves. Without knowing whether the gene expression patterns in these two different parts are comparable, it is hard to judge the results. The authors may want to provide some discussion.

      Indeed, this limits the interpretation for Arabidopsis, as noted in the results and in the discussion. We still prefer to report this pattern than to remove it. But, we have now moved the results obtained for Arabidopsis into Supplementary Table S5.

      • In Arabidopsis, the single-cell data used are from the root, while transcriptomic time-series data used to detect rhythmicity are from the leaves, which limits the interpretation. Despite this limitation, we found no evidence of lower noise for genes that are rhythmic at the protein level (Table 1b and 1e, and Supplementary Table S5), **and trends towards lower noise in almost all cases for genes with rhythmic mRNAs (Table 1a, 1c, and 1d).
      • Our results in mouse are consistent with all of these considerations (Table 1 and Supplementary Table S5), although it was not fully the case for Arabidopsis (Supplementary Table S5). However, this last point might be explained by the tissue-specificity of rhythmic gene expression. Indeed, for Arabidopsis, the time-series dataset come from leaves whereas single-cell RNA data come from roots.

        For Mouse tissues, while most show lower noise for rhythmic genes, they saw the opposite in Muscle. Is this significant? Any discussion?

      For mouse muscle, we had not mentioned it since it was the only tissue showing such a trend. We now added comment regarding this in the main text:

      • In mouse, tissue muscle gave opposite result, possibly because skeletal muscle is one of the most un-rhythmic tissues in the body.

        In various places of the text, the authors only pointed the readers to "Supporting information" without explicitly referring to a specific supplemental figure by its number. It would be helpful to cite a table or figure explicitly.

      We agree, and have corrected this. See first General Statements.

      Figure 2 does not have legends in the graphs.

      This is now corrected, thank you for your attention.

      REVIEWER #2

      Major comments:

      • Our major concern regards the identification of rhythmic genes.

      Despite we are not experts in the specific method used (details are not provided in the manuscript), a method looking for a statisical significant periodicity in a noisy signal will provide a high p-value for a signal sufficiently above the noise level. Gene expression data are noisy because of stochastic gene expression and technical noise (e.g., the sampling noise due to RNA capture in RNAseq data). This noise scales with the average level of expression. Lowly expressed genes generally display larger relative fluctuations (e.g., sampling noise is essentially Poisson-like). As a result, the method will identify with a higher probability genes that are highly expressed as rhythmic genes since the signal to noise ratio is generally higher.

      This could significantly bias the subsequent analysis, since most of the claims are related to a link between expression levels and rhythmicity.

      [There is not even an obvious separtation of timescales that can be invoked between a possible 24-hour periodic signal and the fluctuations. For example, the timescale of protein fluctuations can be largely set by dilution and thus have a timescale comparable to the cell cycle.]

      The authors should discuss this issue, which is overlooled in the current manuscript.

      How much this potential bias affects the selection of rhythmic genes can probably be assessed using synthetic data.

      • It would be useful to clarify in the main text what are the units of measurement of gene expression at the mRNA and at the protein level. If we understood correctly, the authors used FPKM and protein counts respectively. The dynamics in time could in principle be different if an absolute or a normalized level of expression is considered. For example, the cell cycle can be correlated with the circadian clock (as reported for example in cyanobacteria). Since the absolute amount of total proteins has to approximately double during a cell cycle (for cell size homeostasis), this can create a periodic signal in protein counts with a 24-hour period.

      The same reasoning does not hold true if the measurement is normalized, as in the FPKM case.

      The authors should discuss this issue or simply show that the results for proteins are robust if the protein count is normalized (for example with respect to the total protein amount).

      We haven’t focused the present manuscript on these issues since we recently published another paper which addresses these points: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007666

      We have now added some details of our previous results in the main text to make the work more relevant.

      • The expression cost defined in the manuscript seems dominated by the expression level.

      It would be useful to report the scatter plot and the correlation level of cost versus average expression. A high correlation between these two quantities can largely recapitulate the results in Figure 2 (even though the results presented are still interesting per se). In other words, the relation between cost and rhytmicity sounds like a simple rephrasing of the relation between average expression level and rhythmicity (previously reported as correctly referenced in the manuscript).

      We now provide these results in Fig. S2 (Supplementary figures) and show a negative and significant correlation between the order of the rhythmicity signal and the total expression cost (calculated from the mean expression level). Since our previous benchmark show that the order of genes from most to less rhythmic genes is not very reliable for known methods, including the one used here, we prefer to present this result in the Supplementary figures document.

      • The empirical observation of a relation between noise and rhythmicity in mRNA expression is interesting, but we cannot fully understand its link with the theoretical arguments proposed.

      The Authors suggest that perodicity in mRNA expression could decrease protein noise at the peak of mRNA expression (Fig.S1). But this is not what they can measure in the single-cell data analyzed, where cell-to-cell variability is reported at a single timepoint for a cell population. If the oscillations are not syncronized in the cell population, an oscillating transcript would simply display a high cell-to-cell variability dominated by the amplitude of oscillations. Even if the oscillations are syncronized, there is no information in the dataset about the mRNA dynamics. Thus, mRNA cell-to-cell variability could have been measured at any point of its (putative) cyclic dynamics.

      Thus, we propose to make more clear the connections between the theoretical arguments and the empirical observation about noise in gene expression.

      Thank you for pointing out this issue. We have clarified the following in the main text:

      These considerations lead to predictions which we test here: i) a decreased stochasticity strategy for genes with rhythmically accumulated mRNAs ...**.

      • These considerations lead to predictions which we test here: i) a strategy to periodically decrease stochasticity for genes with rhythmically accumulated mRNAs .... Assuming that genes with low noise have noise-sensitive functions (and thus noise is tightly controlled), these results support the hypothesis that noise is globally reduced thanks **to rhythmic regulation at the transcriptional level.

      • Our results show that noise is globally reduced for genes with rhythmic regulation at the transcriptional level. Since rhythmic genes are not all in the same phase (Fig. S9a in Supporting information), we expect this result obtained for a given time-point (noise estimation based on a single time-point scRNA dataset) to be general to all time-points (section 6.3 in Supporting information). Assuming that genes with low noise have noise-sensitive functions (and thus noise is tightly controlled), these results suggest that rhythmic genes have their noise periodically and drastically reduced through periodic high accumulation of their mRNAs.

      • Thus, since we find lower noise among rhythmic transcripts, rhythmic expression of RNAs might be a way to periodically reduce expression noise of highly expressed genes (Figure 2 and Fig. S1-S2), which are under stronger selection. Indeed, we found that genes with rhythmic transcripts are under stronger selection, even controlling for expression level effect. As proposed by Horvath et al. (2019) and supported by results in mouse by Barroso et al. (2018) genes under strong selection could also be less tolerant to high noise of expression. Thus, periodic accumulation of mRNAs might be a way to periodically reduce expression noise of noise-sensitive genes (Fig 1c), i.e. genes under stronger selection. **However, our results are limited by the fact that noise estimation is based on a single time-point measurement since no scRNA time-series data are currently available for these species. Since the peak time of rhythmic transcripts is distributed across all times (Supporting Information Fig. S9a), the mean noise estimated at a given time-point includes the noise of the genes that are peaking at that time (lowest noise) and all the others that have a higher noise than those at their own peak time-point (Supporting Information Fig. S9b). Our results suggest that rhythmic genes peaking at the time-point of the scRNA measurement have sufficiently low noise for the mean noise of rhythmic genes to be much lower than that of non-rhythmic genes.
        • As a simple additional test of robustness of the rhythmic gene selection, biological replicates can be used, although this would not resolve the possible bias discussed above. As explained by the Authors, some of the datasets analyzed have biological replicates. It would be interesting to know the robustness of the detection method across replicates. How much is the set of genes identified as rhythmic conserved if estimated on different replicates? Spearman correlation or simply the overlap between the sets (maybe assessed with a hypergeometric test) can be used.

      These points have been already addressed in our 2020 paper https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007666 (paragraph “The importance of having an informative dataset”) as well as in recent guidelines (Hughes et al. 2017). We specified in Methods that we considered replicates as new cycles as recommended.

      Minor comments:

      • The claim that "transcriptional noise is known to be the main driver of overall expression noise", which is present in the discussion is questionable.

      For example, the quantitative large-scale dataset referenced by the Authors for E.coli (Taniguchi et al) shows instead that the dominant source of noise is extrinsic for many of the genes tested.

      We have clarified in the main text that by “main driver of the overall noise” we refer to the relative contribution of transcriptional versus translational noise into the overall noise.

      We have also added the section 6.1 into Supporting Information document:

      • Relatively to translational noise, transcriptional noise is the main driver of the overall noise (Raj and van Oudenaarden 2008) and should give a good estimation of the output noise. Indeed, based on estimations of coefficient of variations (CV, cell-to-cell variations of protein level) for diverse transcription and translation rates in E. coli and S. cerevisiae, Hausser et al. (2019) have shown that for a fixed transcriptional rate, CV is almost constant for diverse translational rates. Thus, changes in protein level have little to no impact on gene expression noise. The availability of mRNA molecules seems to drive the final noise. I.e., comparatively to the noise caused by the translational activity, the availability of low number molecules such as transcriptional factors (subject to the stochasticity of diffusion and binding in the cell environment) is the main factor of the output cell-to-cell variation in protein abundances. And have modified the main text:

      Indeed, transcriptional noise, which we measure here, is known to be the main driver of overall expression **noise (Raj & van Oudenaarden 2008).

      • Relatively to translational noise, transcriptional noise is the main source of the overall noise (Raj & van Oudenaarden 2008) (section 6.1 in Supporting information) In addition, highly expressed proteins are all precisely expressed and they display little variation in noise (also shown by Hausser et al. (2019) who reused Taniguchi et al. (2010) data). The noise of these highly expressed proteins is also just above a limit which is the noise floor. This "noise floor" is dominated by extrinsic noise as suggested by Hausser et al. and Taniguchi et al.: “The extrinsic noise in the last three terms in Eq. 4 (of the noise floor) might originate from fluctuations in cellular components such as metabolites, ribosomes, and polymerases and dominates the noise of high copy proteins” (Taniguchi et al.). Thus, highly expressed proteins are precisely expressed and their residual noise is similar to the noise floor, which is due to the extrinsic noise (imperfect synchrony of cell states inherent or due to the environment).
      • We suggest to avoid explicit statements about a causal link between expression level and rhythmicity, as in the caption title of Figure 2. A detected correlation is not a proof of a causal relation.

      We have corrected the sentence as follows:

      Rhythmic proteins are costly proteins due to their high level of expression.

      • High level of expression is the main factor explaining the higher cost observed in rhythmic proteins.
        • Supplementary Figures attached at the end of the main text and Supplementary Figures in the Supporting Information file have the same numbering...so there are two different versions of Fig.S1 S2 etc.

      This complicates the work of the reader.

      We have modified the numbering of figures to make them easier to follow.

      -The legend of Fig 2 is missing (the legend is instead reported in Fig.S1).

      This is now corrected, thank you for your attention

      Other modifications:

      We also show how cost can explain the tissue-specificity of rhythmic gene expression. Indeed, the nycthemeral transcriptome has long been known to be tissue-specific (Zhang et al. 2014, Boyle et al. 2017, Korenˇciˇc et al. 2014), i.e. a given gene can be rhythmic in some tissues, and constantly or not expressed in others.

      • Furthermore, the nycthemeral transcriptome has long been known to be tissue-specific (Zhang et al. 2014, Boyle et al. 2017, Korenˇciˇc et al. 2014), i.e. a given gene can be rhythmic in some tissues, and constantly or not expressed in others. Here, we provide a first explanation for the tissue-specificity of rhythms in gene expression by showing that genes are more likely to be rhythmic in tissues where they are specifically highly expressed.
    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The manuscript proposes an interesting hypothesis to explain the widespread presence of rhythmicity in gene expression. The Authors suggest that rhythmicity can be the combined result of cost optimization and control of gene expression noise. To support this hypothesis, they analyzed several proteomic and RNA sequencing datasets across different species. Specifically, putative rhythmic genes were identified using a published tool from time-series datasets. Their first claim concerns the typical expression cost (Cp) for rhythmic vs. non-rhythmic genes. The evaluated Cp is empirically (slightly but significantly) higher for rhythmic genes, mainly because these genes on average show higher expression levels than non-rhythmic genes. The analysis of tissue-specific expression data further supports this relation between expression levels and rhythmicity. Genes are more likely to be rhythmic in tissues where they are specifically highly expressed. To investigate the additional hypothesis of a relation with noise control, the Authors compared expression fluctuations of rhythmic and non-rhythmic genes, measuring noise only at the mRNA level (and using a specific noise measure). According to this measure, genes displaying rhythmicity, in particular at the transcript level, are indeed in most cases less noisy than non-rhythmic genes.<br> Finally, the analysis of protein evolutionary conservation between rhythmic and non-rhythmic genes suggests that genes with rhythmic transcription are under strong purifying selection.

      The paper is concise and well written. The data used are described in sufficient detail to reproduce the results.

      Major comments:

      • Our major concern regards the identification of rhythmic genes.

      Despite we are not experts in the specific method used (details are not provided in the manuscript), a method looking for a statisical significant periodicity in a noisy signal will provide a high p-value for a signal sufficiently above the noise level. Gene expression data are noisy because of stochastic gene expression and technical noise (e.g., the sampling noise due to RNA capture in RNAseq data). This noise scales with the average level of expression. Lowly expressed genes generally display larger relative fluctuations (e.g., sampling noise is essentially Poisson-like). As a result, the method will identify with a higher probability genes that are highly expressed as rhythmic genes since the signal to noise ratio is generally higher. This could significantly bias the subsequent analysis, since most of the claims are related to a link between expression levels and rhythmicity. [There is not even an obvious separtation of timescales that can be invoked between a possible 24-hour periodic signal and the fluctuations. For example, the timescale of protein fluctuations can be largely set by dilution and thus have a timescale comparable to the cell cycle.]

      The authors should discuss this issue, which is overlooled in the current manuscript. How much this potential bias affects the selection of rhythmic genes can probably be assessed using synthetic data.

      • It would be useful to clarify in the main text what are the units of measurement of gene expression at the mRNA and at the protein level. If we understood correctly, the authors used FPKM and protein counts respectively. The dynamics in time could in principle be different if an absolute or a normalized level of expression is considered. For example, the cell cycle can be correlated with the circadian clock (as reported for example in cyanobacteria). Since the absolute amount of total proteins has to approximately double during a cell cycle (for cell size homeostasis), this can create a periodic signal in protein counts with a 24-hour period.

      The same reasoning does not hold true if the measurement is normalized, as in the FPKM case. The authors should discuss this issue or simply show that the results for proteins are robust if the protein count is normalized (for example with respect to the total protein amount).

      • The expression cost defined in the manuscript seems dominated by the expression level. It would be useful to report the scatter plot and the correlation level of cost versus average expression. A high correlation between these two quantities can largely recapitulate the results in Figure 2 (even though the results presented are still interesting per se). In other words, the relation between cost and rhytmicity sounds like a simple rephrasing of the relation between average expression level and rhythmicity (previously reported as correctly referenced in the manuscript).
      • The empirical observation of a relation between noise and rhythmicity in mRNA expression is interesting, but we cannot fully understand its link with the theoretical arguments proposed. The Authors suggest that perodicity in mRNA expression could decrease protein noise at the peak of mRNA expression (Fig.S1). But this is not what they can measure in the single-cell data analyzed, where cell-to-cell variability is reported at a single timepoint for a cell population. If the oscillations are not syncronized in the cell population, an oscillating transcript would simply display a high cell-to-cell variability dominated by the amplitude of oscillations. Even if the oscillations are syncronized, there is no information in the dataset about the mRNA dynamics. Thus, mRNA cell-to-cell variability could have been measured at any point of its (putative) cyclic dynamics. Thus, we propose to make more clear the connections between the theoretical arguments and the empirical observation about noise in gene expression.
      • As a simple additional test of robustness of the rhythmic gene selection, biological replicates can be used, although this would not resolve the possible bias discussed above. As explained by the Authors, some of the datasets analyzed have biological replicates. It would be interesting to know the robustness of the detection method across replicates. How much is the set of genes identified as rhythmic conserved if estimated on different replicates? Spearman correlation or simply the overlap between the sets (maybe assessed with a hypergeometric test) can be used.

      Minor comments:

      • The claim that "transcriptional noise is known to be the main driver of overall expression noise", which is present in the discussion is questionable. For example, the quantitative large-scale dataset referenced by the Authors for E.coli (Taniguchi et al) shows instead that the dominant source of noise is extrinsic for many of the genes tested.
      • We suggest to avoid explicit statements about a causal link between expression level and rhythmicity, as in the caption title of Figure 2. A detected correlation is not a proof of a causal relation.
      • Supplementary Figures attached at the end of the main text and Supplementary Figures in the Supporting Information file have the same numbering...so there are two different versions of Fig.S1 S2 etc. This complicates the work of the reader. -The legend of Fig 2 is missing (the legend is instead reported in Fig.S1).

      Significance

      The hypothesis of a link between rhythmic expression, expression cost and noise control is intriguing and can be of interest for a large audience of scientists from computational and evolutionary biologists to interdisciplinary researchers interested in models of gene expression.

      Our combined expertise (keywords):

      Physical biology, mathematical modelling, stochastic gene expression, transcriptomic data, quantititative cell physiology, genomics.

      Referee Cross-commenting

      The other report looks fair to me too. We seem to agree on the relevance of the questions asked, but also on some major concerns about the methods used to support the conclusions. Thanks!

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      This paper explored the evolutionary advantages of having nycthemeral rhythmicity in many genes, using genome-wide transcriptomics and proteomics datasets from bacteria, plants, animals, and specific mouse tissues. As the main findings of this paper, the authors first applied a cost function with the proteomics data in four species and showed that rhythmic proteins are costlier. They also evaluated the stochastic gene expression (SGE) using single-cell RNA (scRNA) data from several plant and animal species and found that genes with rhythmic mRNAs had lower noise than non-rhythmic mRNAs. They argued that rhythmic genes are evolutionarily selected because of the cost-saving advantage at the protein level and the noise control strategy at the mRNA level.

      In addition to their main findings, the authors also compared the protein evolutionary conservation between rhythmic and non-rhythmic genes using dN/dS data (the ratio of non-synonymous to synonymous substitutions). They found that genes with rhythmic transcripts were more conserved even after controlling for the effect of gene expression and suggested that rhythmic transcripts are important for genes under strong purifying selection.

      Major comments:

      The finding that rhythmic genes are costlier does not convincingly lead to the conclusion that protein rhythmicity has a cost-saving advantage. To make sense of this conclusion, the authors made several assumptions that lack convincing support. They assumed the optimal constant level would be the maximum over the rhythm period when rhythmic regulation is absent. They also assumed the trade-off between the benefits of not producing proteins when they are not needed (costs saved) and the costs involved in making it rhythmic (costs of complexity), which they argued lead to the expectation that costlier genes be more frequently rhythmic. However, there was no explicit definition for the trade-off, so it is unclear how it leads to the expectation. First, when proteins are not needed, it can be either the case of not producing extra proteins (cost saved) or the case of degrading excessive proteins (cost incurred). Second, the "costs of complexity" were not defined. It would be more convincing to define a fitness function or cost function to demonstrate their argument that costlier genes have fitness advantages if they are rhythmic. The cost function presented in this paper may be oversimplified. It only takes into account the costs to produce protein. The authors argued that a more complex cost calculation would not change the observation, but without proving it. However, protein degradation, including ubiquitination and proteolysis, requires energy; for a rhythmic gene, it is also necessary to consider the cost of maintaining the rhythmicity, including the temporally precise regulation of protein expression when the proteins are needed and of protein destruction when they are not.

      The authors claimed that cycling genes are enriched in highly expressed genes, by showing rhythmic proteins are costlier than non-rhythmic proteins (based on the expression cost function) in several species. However, only the first 15% of proteins based on p-values ranking from their rhythm detection algorithms were classified rhythmic. One potential artifact of this classification is that the identified rhythmic genes are biasedly highly expressed genes because the lower-amplitude genes are harder to detect and excluded by the algorithm. If changing the threshold for rhythmicity to include more rhythmic genes with intermediate p-values (p-value<=0.05), will this change the results? Since the results of this paper would be sensitive to the accuracy of identifying rhythmicity at both mRNA and protein levels, it is crucial to validate the rhythm detection algorithm by cross-checking algorithm-generated results with those known rhythmic genes. Can the authors estimate the false positive and false negative rates in each group of the rhythmic and non-rhythmic proteins or mRNAs identified by their algorithm? Higher gene expression usually leads to lower genetic noise. The authors thus applied a definition of the stochastic gene expression (SGE) that controls the biases associated with the correlation between the expression mean and variance to evaluate expression noise. They found lower noise with rhythmic transcripts. However, they did not explain, mechanistically, why rhythmic RNA has lower noise and what is the biological meaning behind this finding. It is also unclear whether they considered the phase difference between signal and noise that usually exists in an oscillatory system.

      Minor comments:

      It would be helpful if the authors could interpret their observations including where the results may not be as significant. A few examples are listed below.

      1) In tissue-specific studies, they used the transcriptomics datasets from 11 mouse tissues to compare the difference in expression levels (based on z-score) of each gene between tissue groups of rhythmic and non-rhythmic expression and found higher gene expression in rhythmic tissues. However, proteins showed a bimodal distribution, and it would be helpful to add interpretation or discussion regarding this bimodal distribution.

      2) They also calculate partial correlation for rhythmicity with expression level over tissues for all tissue-specific genes (tau>0.5) and found Spearman's correlation coefficient is skewed towards negative (suggesting a correlation), but Pearson's correlation showed a positive peak. It indicates that a subset of genes is less rhythmic in the tissues where they are most expressed. Is this positive peak significant or expected? What are these genes? Any evolutionary benefits? Can the authors discuss the functional difference between these genes and other genes that follow the predictions?

      3) In SGE analysis, the scRNA data of Arabidopsis was from roots, while the data for detecting the rhythmicity was from leaves. Without knowing whether the gene expression patterns in these two different parts are comparable, it is hard to judge the results. The authors may want to provide some discussion. For Mouse tissues, while most show lower noise for rhythmic genes, they saw the opposite in Muscle. Is this significant? Any discussion?

      In various places of the text, the authors only pointed the readers to "Supporting information" without explicitly referring to a specific supplemental figure by its number. It would be helpful to cite a table or figure explicitly. Figure 2 does not have legends in the graphs.

      Significance

      The paper attempts to understand the origins of why many genes display nycthemeral rhythmicities. The question, if addressed, would have a significant impact in the fields of computational systems biology and evolutionary biology. But the findings of this study do not provide a satisfying answer to the question, thus reducing the significance. The conclusions are too overarching without providing significant biological insights and interpretation. Our field of expertise is in systems biology, but we do not have sufficient expertise to evaluate computational tools used to classify genome-wide gene expression data.

      Referee Cross-commenting

      I have reviewed other reports, which look fair to me. I have no comments. Thanks!

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      First of all, I sincerely appreciate the critical reading of our manuscript by the reviewers.

      Point-by-point responses to the reviewer #1’s comments

      Most of the key conclusions are valid but the main one should be either reinforced or tuned down.

      Through our study, we want to indicate that MTCL2 preferentially associates with perinuclear MTs accumulated around the Golgi complex, and its target is not necessarily restricted to “Golgi-associated (nucleated) MTs.” In this sense, the sentences in the previous manuscript, such as “MTCL2 preferentially associates with Golgi-associated MTs” and “MTCL1 and 2 …. are specifically condensed on Golgi-associated MTs,” were overstatements and completely misleading.

      According to reviewer#1’s comment, we carefully revised these sentences throughout the manuscript and eliminated ambiguity on this point as far as possible.

      The corresponding revisions are as follows.

      In particular, the authors tend to give central role to MTCL2 in regulating the formation and organization of Golgi-associated MT network, and conversely in organizing Golgi elements, without considering the other factors identified (the authors cite relevant papers though but do not discuss this). They should analyze the function of MTCL2 in relation to the role of CLASP2, AKAP450, Golgi-g-Tubulin, or even EB proteins (like EB3).

      I agree with the above comment since it is important to analyze how MTCL2 preferentially associates with the perinuclear MTs accumulated around the Golgi complex.

      In the revised manuscript, we included new data analyzing knockdown effects of CLASP1/2 and AKAP450 on the subcellular localization of MTCL2 (Fig. 7A). These data indicate that CLASPs but not AKAP450 are required for the preferential localization of MTCL2 to the perinuclear MTs around the Golgi. We also demonstrate that the minimum Golgi-localizing region of MTCL2 (the N-terminal coiled-coil region) physically associates with CLASP2 (Fig. 7B), further supporting the idea that CLASPs mediate the Golgi association of MTCL2. Additional involvement of another Golgi element, giantin, is also suggested through Fig. 7C and Appendix Fig. S6. We believe that these revisions significantly improved the weakness previously pointed out by the reviewer.

      I also do not think that carrying out super resolution microscopy is enough to "reveal the possibility that MTCL2 mediates the association of the Golgi membrane with stabilized MTs". More generally, the authors cannot conclude that MTCL2 preferentially associated to Golgi-MT only from their immunofluorescence and KD experiments. The centrosome (the main MTOC) is indeed also localized in the perinuclear area. Easy to do additional experiments may help to confirm these conclusions (see below). Also, the authors could strengthen the way the study how MTCL1 and MTCL2 binds to microtubules and Golgi (see below). The localization or interaction of MTCL2 with Golgi-associated MT is not directly shown.

      Previously, we demonstrated that the N-terminal region of MTCL2 shows clear Golgi-localization activity, whereas the C-terminal region directly binds to MTs. These data support our conclusion that MTCL2 mediates the association of the Golgi membrane with general MTs (although not with Golgi-associated or stabilized MTs).

      In the revised manuscript, we reinforced these data by newly revealing that four-point mutations (4LA) in the first coiled-coil motif disrupt the Golgi localization of the N-terminal region of MTCL2 (Fig. 4D). Thereafter, we found that introduction of the same mutations in full-length MTCL2 abolished its preferential association to the perinuclear MTs accumulating around the Golgi without affecting its localization to MTs (Fig. 4E and F). In addition, we provide data on candidate molecules mediating the Golgi association of MTCL2, as stated above (Fig. 7). These results reinforce our immunofluorescence analysis results (Fig. 2) and indicate that the preferential association of MTCL2 to perinuclear MTs accumulating around the Golgi is facilitated by physical interactions between the N-terminal region of MTCL2 and the Golgi-resident proteins, such as CLASPs and giantin.

      The title should be changed also. I am not sure I understand what an asymmetric microtubule network means in this context. I guess that the authors mean non-centrosomal microtubule network.

      We acknowledge the confusion caused by our previous manuscript. By “an asymmetric MT network” we meant not equivalent to “non-centrosomal MT network.”

      In many cases, microtubules do not elongate radially (symmetrically) from the centrosome but intensely accumulate around the Golgi area and show asymmetric organization (see Meiring et al. Curr. Opin. Cell Biol. 62: 86-95, 2020). “An asymmetric MT network” in the title corresponds to this asymmetric array of general MTs accumulating around the GA.

      The present findings that MTCL2 depletion severely disrupted MT accumulation around the Golgi and induced random and rather symmetric arrays of MTs (Fig. 5A) are very impressive. We believe that the knockdown/rescue experiments in this study strongly support the title by demonstrating that MTCL2 facilitates MT accumulation around the Golgi through its dual binding activity to MTs and the Golgi membrane.

      We changed the title in the revised manuscript but still used the term “asymmetric microtubule organization” based on these rationalities.

      The authors also state that tubulin acetylation is induced by MTCL1 C-MTBD but it may simply be stabilized. They should also clarify if MTCL2 regulates Golgi-dependant nucleation microtubules.

      Yes, we think that MTCL1 C-MTBD enhances tubulin acetylation by simply stabilizing the polymerization state of MTs (see Kader et al. PLos One 12: e0182641, 2017). As for the second point, please see our response below to the comment (7).

      I was not convinced by the use of the quantification of "skewness", in particular in figure 5B. Whether a Wilcoxon test is adequate is unclear to me.

      I understand that utilization of skewness, a measure of the asymmetry of distribution, might not be popular in previous studies. In fact, the skewness of tubulin signal distribution in pixels does not indicate in which way MTs distribute asymmetrically by themselves. However, quantification of this statistical parameter does not require any arbitrary factors and thus eliminates the chance of using discretion as far as possible. Therefore, we are confident that this is the best way to estimate the asymmetric organization of microtubules, which are severely affected by various conditions, without any preconception.

      The two biological phenomena we attempted to elucidate here (microtubule arrays and Golgi ribbon expansion) are thought to be context-dependent in each cell (for example, cell cycle, cell densities, etc.). Therefore, we do not have any substantial reason to assume a normal distribution for variation of the two values (skewness of tubulin signal distribution and Golgi ribbon expansion angle) in our cell population. Therefore, we considered that the Wilcoxon test, being a non-parametric rank test, was the most appropriate and safest test to use.

      To demonstrate that MTCL2 associated to Golgi-MT, microtubule regrowth experiments following nocodazole treatment have to be conducted (time course). Another efficient way to analyze such events, as shown by the Kaverina and the Akhmanova labs for example, is to use fluorescent EB proteins (e.g. EB3) to image microtubule plus ends and back-track them to identify nucleation points. Carrying out such an experiment (nocodazole way-out and EB tracking) in the presence or absence of MTCL2 would allow to confirm, or not, the functional hypothesis of the authors.

      We did not want to demonstrate that MTCL2 preferentially associates with “Golgi-MTs.” From this point of view, we do not think the experiments suggested by reviewer#1 were necessarily required for our study.

      However, there is no doubt that one of the main components of the “perinuclear MTs accumulating around the Golgi” is “Golgi-associated (nucleated) MTs.” In this sense, we still agree with reviewer#1’s comment that it is better to examine whether MTCL2 is involved in MT nucleation from the Golgi membrane. The results of these experiments will be informative for readers particularly because we previously reported that MTCL1 stabilizes Golgi-associated (nucleated) MTs.

      In keeping with the above consideration, we have performed both experiments (nocodazole way-out and EB tracking) according to the previous studies (for example, Sanders et. al. M.B.C. vol. 28; 3181-3192, 2017). However, we ultimately decided against the inclusion of the data as we could not overcome large cell-to-cell deviations.

      Nevertheless, we believe that our current dataset adequately answers and supports the specific questions we explored. Briefly, if these experiments succeed to demonstrate the functional importance of MTCL2 for the development of Golgi-nucleated microtubules, they will not necessarily indicate the physical interaction of MTCL2 with Golgi-associated microtubules. In this respect, as described above, we have significantly supplemented data on the molecular mechanisms by which MTCL2 mediates MT–Golgi interactions. This improvement must sufficiently compensate lack of data from the experiments suggested by reviewer#1.

      Several circumferential data suggest that MTCL2 is not involved in the development of Golgi-associated (nucleated) MTs in contrast to MTCL1. We discussed this issue in the “Discussion” of the revised manuscript.

      Additionally, carrying electron microscopy analysis would be important to qualify better the effects observed on Golgi complexes upon depletion. The authors mention the effects on the "morphology of Golgi ribbon" but it is rather unclear.

      We did not perform electron microscopy analysis, because we are not implicating a change in the ultrastructure of the Golgi apparatus in MTCL2-knockdown cells. We specifically want to demonstrate that MTCL2 knockdown changes the assembly structures of the Golgi ribbons, and we believe that it is feasible to do so by light microscopy. We realize that using the term “Golgi morphology” may be misleading in this context. In the revised manuscript, we replaced this term with appropriate ones, such as “assembly structures of the Golgi stacks” or “compactness of the Golgi ribbon.”

      Last, because the authors compare the way MTCL1 and MTCL2 bind microtubules, and suggest intriguing differences, domain swapping experiments between these two isoforms would be important to carry out.

      We conducted the suggested experiments and obtained interesting results. However, we ultimately decided against their inclusion given that the functional difference between MTCL1 and 2 is not the main point of discussion in our study.

      Some studies are referred but the published data not actually used (with the exception of the final scheme). The authors should comment on the fact that other Golgi-associated MT binding proteins have been shown to be involved in the mechanisms highlighted here. Why they would not take over in the absence of MTCL2 should be properly discussed.

      In the revised manuscript, we included data regarding the involvement of CLASPs and AKAP450 in the Golgi association of MTCL2. Accordingly, we introduced their roles in the development of Golgi-associated MTs as far as possible in the “Introduction” (see lines 29-36 and 38-42), “Results” (see lines 306-309 and 344-347), and “Discussion” (see lines 398-402 and 442-444).

      Similarly, in the discussion, the authors indicate that SOGA has been found as an interacting partner of CLASP2. As CLASP2 is a microtubule binding protein also localized at the Golgi complex and binding to acetylated microtubules, the authors should at least comment on the putative role of the interaction between MTCL2 and CLASP2 in the phenotypes they described. The role of the interaction between CLASP2 and MTCL2 should be discussed and ideally tested.

      As described above, we provided the data indicating the role of the interaction between MTCL2 and CLASP2 in the revised manuscript.

      In the introduction, page 3 line 74-77, the authors wrote « The resultant N-terminal fragment is released into the cytoplasm to suppress autophagy by interacting with the Atg12/Atg5 complex, whereas the C-terminal fragment is secreted after further cleavage (see Fig. 1A, boxed illustration). » while on the Fig1 the boxed area indicates that SOGA bears Atg16 and Rab5 binding domains. Please double check the interacting partners of SOGA1.

      Thank you for pointing this out. The sentence in the “Introduction” was revised to “… interacting with the Atg12/Atg5/Atg16 complex” (Rev. Endocr. Metab. Disord. 15, 137–147, 2014).

      Figure 1 B and C are not cited in the main text.

      These figures were cited in the “Introduction” section (line 65 in the previous manuscript). In the revised manuscript, these figures were replaced with Fig. EV1 A and C and cited in the “Introduction” section (line 59) as well as in the legend to Fig. 1 (line 757).

      Figure 1E: a loading control is needed to evaluate the expression level of SOGA/MTCL2 in the mouse tissues.

      Sample loading in each lane shown in previous Fig. 1E (Fig. 1D in the revised manuscript) was normalized by total protein amount (25 mg), as indicated in the figure legends. However, we have decided to add the data for a-tubulin expression in each lane as a reference, although they are not equal for each lane.

      In the liver, the size of the bands is different than in other tissues (smaller size). The authors might comment if these smaller bands correspond to the cleaved version of SOGA that was previously described in mouse hepatocyt

      In Fig. 1D of the revised manuscript, we added arrowheads indicating the bands of smaller sizes observed in some tissues such as the liver. In addition, we commented on them in the corresponding part of the “Results” section by describing that “we cannot exclude a possibility that MTCL2 is subjected to the reported cleavage and works as SOGA in these tissues.”

      Figure 2A: single color picture for the anti-tubulin immunolabeling would help to see the distribution of microtubules in the perinuclear area. The perinuclear region is a crowded area with many intracellular compartments accumulating there as well as cytoskeleton elements.

      We completely revised Fig. 2 following the reviewers’ suggestion. To provide single-color pictures for the anti-MTCL2 and anti-tubulin immunolabeling, we added new pictures examining colocalization of MTCL2 with MTs at the peripheral regions where densities of both signals are rather low. In Fig. 2B, the colocalization was further examined via a line scan analysis across MTs. Finally, we have included new data demonstrating that exogenously expressed MTCL2 similarly colocalized with MTs even at the peripheral regions when its expression was suppressed to the endogenous level (Fig. 2C).

      Figure 2C: same comment as above, a single-color picture for the anti-MTCL2 and anti-GM130 immunolabeling are required.

      Owing to the space limitation, we could not include a single-color picture for the anti-GM130 immunolabeling in Fig. 2, although we enlarged their merged figure so that readers easily agree with our statement: “some overlapped with the Golgi marker signals” (lines 146-147).

      Alternatively, we included a new Appendix Fig. S8, in which immunofluorescence signals of MTCL2 and CLASP1/2 (A) or giantin (B) are compared at a super-resolution microscopic level. In these figures, we included single-color pictures together with merged data.

      page 7, line 132-134: the authors state: « Close inspection using super-resolution microscopy further revealed the possibility that MTCL2 mediates the association of the Golgi membrane with stabilized MTs (Fig. 2D, arrows). » To my opinion, the data are over-interpreted. The signals partially co-localize but this does not indicate a function of MTCL2 in mediating the interaction.

      We deleted the previous Fig. 2D and the corresponding sentence. By doing so, we ceased to suggest that MTCL2 functions to mediate MT–Golgi interactions only based on immunofluorescence data.

      Figure 3: Another way of merging the anti MTCL2 and GS28 pictures have to be provided. The pictures are difficult to interpret with the current display.

      We deleted the previous Fig. 4 and ceased to discuss colocalization of MTCL2 with Golgi proteins only based on immunolabeling data as mentioned above.

      Figure 4C: please indicate the meaning of « ppt »

      We included the explanation of “ppt” in the legends to the corresponding figure (Fig. 3C in the revised manuscript) as follows (lines 801-802):

      “ppt represents the MT precipitate obtained after centrifugation (200,000 × g) for 20 min at 25°C.”

      Figure 5B and C: for easier reading of the figure, it would be useful to annotate with MTCL2 construct is overexpressed following doxycycline treatment (MTCL2 WT (A) and MTCL2 delta C-MTBD (C)).

      We followed the suggestion. Please see new Fig. 5 and Fig. EV4 and 5.

      Figure 6 A and C: the labels are wrong. Bottom pictures correspond to anti-GM130 immunostaining not anti-tubulin. If I am not mistaken, it is MTCL2 delta C which is studied in panel C.

      Thank you for pointing this out. We corrected this error in Fig. EV5 (previous Fig. 6) in the revised manuscript.

      Page 11, line 212: Supplementary Figure 2 (knockdown in RPE1 cells) is intended to be cited not Supplementary Figure 3.

      Thank you for pointing this out. We corrected the error in the revised manuscript appropriately.

      Figure 8A: single color pictures are needed to appreciate the distribution of the signals

      One of the major comments of three reviewers have been provided on Fig. 8, which reports that MTCL1 and 2 differentially regulate microtubules. We agree that the previous data in Fig. 8 A–C are rather preliminary. Although we could improve these figures according to the reviewers’ comments, we decided to omit these data and cease the discussion that MTCL1 and 2 localize with microtubules in a mutually exclusive manner, as this was not the main focus of the study.

      Point-by-point responses to the reviewer #2’s comments

      In figure 1D, a loading control should be included for the Western Blot probing for V5-mMTCL2 in HEK293T cells.

      We did include loading controls for the indicated lanes. However, because the HEK293T cell extract in lanes 1–3 was diluted, the signals were too weak to be visualized in this figure (Fig. 2C in the revised manuscript).

      The authors use the anti-SOGA antibody to detect MTCL2. However, in Figure 1A they do not show the sequence similarity between this region in MTCL1 and MTCL2. The authors should include this, as well as show that the anti-SOGA antibody is specific for MTCL2 and does not detect MTCL1.

      In new Fig. EV1, we included amino acid sequence alignment data for the region corresponding to the used anti-SOGA1 antibody epitope. The data indicate significant divergence of the sequence from MTCL1 (6% homology, 23% similarity).

      We also included new western blot data (Fig. 1B in the revised manuscript) demonstrating that anti-SOGA1 antibody does not react with MTCL1 exogenously expressed in HEK293T cells.

      Line 132-134. The authors conclude that MTCL2 possible mediates association between Golgi membrane and stabilized MTs based on localization microscopy only. This is an overstatement and should be corrected. Not only is the microscopy technique used able to produce resolution of 140nm, which is not enough to show direct association; the staining techniques used (double antibody staining) ensures the fluorophores are approximately 20-30nm away from the intended target (MTs, MTCL2, or Golgi). Thus, the conclusion drawn is overstated and should be refined at this point in the manuscript.

      I agree with reviewer#2’s comment that the previous data in Fig. 2D are insufficient to draw the conclusion that MTCL2 mediates the association between the Golgi membrane and stabilized MTs. We deleted the figure and the corresponding sentence reviewer #2 indicated.

      We want to demonstrate that “MTCL2 mediates the association between the Golgi membrane and MTs (not restricted to the stabilized MTs).” In this sense, we have already obtained supportive data in the previous manuscript that the N-terminal region of MTCL2 has clear Golgi-localization activity, whereas the C-terminal region directly binds to MTs.

      In the revised manuscript, we reinforced these data by revealing that four-point mutations (4LA) in the first coiled-coil motif disrupt the Golgi localization of the N-terminal region of MTCL2 (Fig. 4D). Thereafter, we found that introduction of the same mutations in full-length MTCL2 abolished its preferential association to the perinuclear MTs accumulating around the GA without affecting its colocalization to MTs (Fig. 4E and F). We also provide data on candidate molecules mediating the Golgi association of MTCL2 (Fig. 7). These results reinforce our immunofluorescence analysis results (Fig. 2) and indicate that the preferential association of MTCL2 to perinuclear MTs accumulating around the Golgi is facilitated by physical interactions between the N-terminal region of MTCL2 and the Golgi-resident proteins, such as CLASPs and giantin.

      The authors should include some quantification of MTCL2 signals along stabilized microtubules near the Golgi and in peripheral regions of the cell in Figure 2. This will show that MTCL2 preferentially localizes to MTs in the Golgi region but not the periphery, as the authors claim (lines 124-130). This quantification could be in the form of linescans along or across MT signals.

      We included a line scan data across peripheral MTs to confirm MTCL2 colocalization with MTs (Fig. 2C). However, it is difficult to perform a line scan for the perinuclear regions where both signals of MTCL2 and MTs are too dense. Therefore, we demonstrate the preferential colocalization of MTCL2 to the perinuclear MTs by comparing peripheral signals of MTCL2 with that of MAP4 (Fig. 2D).

      The authors show that ectopic expression of the C-terminus of MTCL2 can rescue MTCL2 siRNA phenotypes. Since the N-terminus localizes strongly to the Golgi membrane, the authors should do corresponding experiments with this fragment, to determine if membrane binding of MTCL2 can have a similar rescue effect or if MT binding is essential. This is especially important for the Golgi-ribbon organization (Figure 6).

      We did not include data indicating rescue activity of the C-terminal fragment of MTCL2. In the previous Fig. 5 and 6, we demonstrated that MTCL2 lacking the C-terminal microtubule-binding region does not show rescue activities. Therefore, we did not follow reviewer#2’s suggestion directly.

      However, we included new data indicating that an MTCL2 mutant (4LA) that associates with MTs but not with the Golgi membrane also lacks rescue activities for asymmetric MT organization and Golgi ribbon compactness (new Fig. 5 and Fig. EV4). I hope these revisions are satisfactory.

      Line 261-2. The authors claim that MTCL1 and MTCL2 function in a mutually exclusive manner. As with point 3, this is an overstatement based solely on localization microscopy. The authors cannot draw this conclusion from the data associated with this statement (Figure 8A) and it should be refined to reflect that they only comment on the respective localization patterns of MTCL1 and MTCL2. Additionally, to show that MTCL1 and MTCL2 do not overlap on MTs, the authors should include linescans along MTs showing the anti-V5 and anti-MTCL1 intensities.

      One of the major comments of three reviewers have been provided on Fig. 8, which reports that MTCL1 and 2 differentially regulate microtubules. We agree that the previous data in Fig. 8 A–C are rather preliminary. Although we could improve these figures according to the reviewers’ comments, we decided to omit these data and cease the discussion that MTCL1 and 2 localize with microtubules in a mutually exclusive manner, as this was not the main focus of the study.

      In Figure 8C the authors show acetylated tubulin staining in cells depleted of MTCL2. Based on this localization pattern, it seems the MT network is not grossly altered, as was shown in Figure 5 where perinuclear accumulation of MTs was lost. The authors should comment on whether acetylated tubulin presence and localization is altered in MTCL2-depleted cells. This is also mentioned in the discussion where the authors conclude that the major function of MTCL2 is to crosslink and accumulate MTs in the Golgi region. However, based on acetylated tubulin staining patterns, stable MTs seem to still accumulate in the Golgi region. The authors need to show this accumulated population of stable MTs is no longer crosslinked in the absence of MTCL2 to support their claim.

      Acetylated microtubules represent a minor fraction of the perinuclearly accumulated microtubules. From the point of this view, it could be possible that the accumulation of perinuclear microtubules is severely affected, whereas that of acetylated microtubules is not. MTCL1 might crosslink these acetylated microtubules.

      In any case, we have decided to delete the previous Fig. 8 A–C, as stated above.

      To investigate potential functional overlap between MTCL1 and MTCL2, the authors should include a double depletion experiment where MT organization and Golgi organization are investigated. The currently shown experiments do not test a functional relationship between the two paralogs. Additionally, the authors should show Western Blot analysis of MTCL1 levels in MTCL2-depleted cells, and vice versa. While there does not seem to be an overlap in localization patterns of the two proteins, that does not mean there is no functional relationship.

      We did not follow reviewer#2’s comment because of the reason stated above.

      Lines 120-30 and 297-9. The authors state that based on the localization pattern of MTCL2 it mostly localizes along MTs in the perinuclear region (shown in Figure (2). Then, in the discussion they state MTCL2 preferentially localizes to Golgi membranes. Please clarify which of the two sites MTCL2 localizes to preferentially.

      We agree that we should be more careful while describing the subcellular localization of MTCL2. We revised the information in the manuscript to indicate that MTCL2 preferentially localizes to perinuclearly accumulated microtubules showing partial colocalization to the Golgi membrane.

      Loss of Golgi organization as described in Figures 6 does not appear in polarized cells in Figure 7. The authors should comment on the loss of the phenotype in polarized cells.

      Since RPE1 cells cultured at high density show abnormally elongated shapes, as described in the original text (line 238; in the revised text, line 326), Golgi ribbons in these cells do not appear to be as expanded. However, their loss of compactness in MTCL2-knockdown cells can be easily recognized in the previous Fig. 7C (corresponding to Fig. 6C in the revised manuscript).

      The authors should consider using colorblind friendly palettes in figures. For example, magenta/green instead of red/green and magenta/cyan/yellow instead of red/blue/green. Additionally, for tri-color images the combination red/green/white (Figure 4B, 7C) should be avoided, as overlapping red/green signals will show up as yellow which is difficult to distinguish from the white signals. Finally, human eyes detect shades of red much poorer than for example green. Therefore, the main point of a figure should not be in red. For example, MTCL2 is frequently shown as red signal in a merged image and should be replaced with a different color.

      We incorporated the reviewer’s suggestion.

      The authors claim the mouse MTCL2 protein lacks 203 N-terminal amino acids. Authors should clarify in the text that this is relative to mouse MTCL1. The authors should also include the human comparisons, as they work on human cell lines in the majority of the manuscript.

      I am afraid that this comment is based on a misunderstanding by reviewer #2, because we did not claim that mouse MTCL2 lacks 203 N-terminal amino acids. Instead, we described that SOGA, a mouse MTCL2 isoform, lacks 203 N-terminal amino acids compared to the full-length mouse MTCL2, the cDNA of which was used in this work.

      In Figure 1D the authors show Western Blots where various amounts of HEK293T extracts were probed for exogenously expressed MTCL2. As a control, authors should include a non-transfected control. From Figure 1E, it would be expected that HEK293 (kidney cells) would not express endogenous MTCL2, but the control should be included anyway.

      In the revised Fig. 2B, we included a lane in which a non-transfected HEK293T cell extract was loaded, according to reviewer #2’s comment (see lanes indicated as mock).

      In Figure 3, the color scheme in the final column of images should be changed. Red/white contrast is very poor and no conclusions can be drawn from these images. Additionally, the authors should include a box to show where the inset is located in the overview images.

      In the revised manuscript, we deleted the “final column of images using red/white contrast” from Fig. 2D (previous Fig. 3), to avoid drawing a conclusion on the interaction between MTCL2 and the Golgi membrane only from immunofluorescence data.

      In addition, we included boxes in the overview images to show where the inset is located, wherever it is required in the revised manuscript.

      Authors claim that MTCL2 is not detected near more dynamic MTs in the periphery of the cell and references Figures 2A and 3. They should include annotation in the figures to highlight this. This can be done with arrowheads or other markings, or with additional insets enlarging a peripheral region of the cell.

      To respond to the comment, we separately provided enlarged views of perinuclear and peripheral regions in the revised Fig. 2.

      The authors should clarify in the main text and figure legend which superresolution microscopy technique was used in Figure 2D.

      As mentioned above, we deleted the previous Fig. 2D.

      The authors use methanol fixation to examine localization of MTCL2, MTs, and Golgi. Methanol extracts lipids and thus affects intracellular membrane compartments, and can affect the localization pattern of GM130, a Golgi matrix protein. The authors should include samples fixed with a crosslinking fixative to ensure their conclusions drawn from methanol-fixed samples are not affected by the choice of fixative.

      According to the reviewer’s suggestion, we included additional data obtained using PFA fixations (Fig. EV2). PFA fixation also revealed a similar localization pattern of MTCL2 to that obtained by methanol fixation.

      In Supplementary Figure 1B a third, relatively high expressing cell can be seen in the top panel. The GM130 signal for this cell seems to be comparable to non-transfected cells in the same image. Can the authors address this? Alternatively, to show differences in expression levels between these three cells in that panel and others, authors could use a heatmap LUT of the V5 signal to differentiate expression levels more clearly in different cells.

      I am unsure whether the reviewer is referring to the cell located at the bottom-left corner of the panel in the previous Supplementary Fig. 1B (Appendix Fig. S1B in the revised manuscript). The cell shows a rather normal distribution pattern of exogenous MTCL2 similar to the endogenous one. We think this is the reason why it maintains a rather normal assembly structure of the Golgi ribbon. We included the word “frequently” in the sentence (line 153 in the revised text) to indicate that high levels of exogenous MTCL2 do not disrupt the normal Golgi ribbon structure. We do not think it is necessary to differentiate the expression levels of exogenous MTCL2 more clearly by using a heatmap, since this issue is not critical for the conclusions of this paper.

      Line 139. How was the ectopic expression 'suppressed to endogenous levels'? The panels in Suppl Fig. 1 of 'low expression' clearly show increased MTCL2 signal when compared to non-transfected cells in the same panel still. This would suggest ectopic expression is still above endogenous levels.

      We did not suppress the expression actively. We identified the cells expressing exogenous MTCL2 at low levels comparable to those of endogenous MTCL2. The information provided in line 139 of the previous text is not accurate. Thank you for pointing out this issue; we revised the sentence as follows: “However, when the expression levels were similar to the endogenous levels, … (lines 154-155 in the revised text)”

      Figure 5C. The label for MTCL2 construct should read mMTCL2 ΔC-MTBD to clarify the expression construct used.

      Since the labeling in previous Fig. 5 and 6 was confusing, we revised them all by adding the name of the expressed MTCL2 mutant under the label “+dox” (see Fig. 5, Fig. EV4, and Fig. EV5 in the revised manuscript).

      In Figures 6A and 6C the label shows a-tubulin, but the staining is of a Golgi marker.

      Thank you for pointing this out. We corrected this error in the corresponding figure (Fig. EV5) in the revised manuscript.

      In Figures 6B and 6D the different conditions should be separated more in the graph, the datapoints overlap.

      In the revised manuscript, we significantly improved the presentation of the statistical data shown in the previous Figs. 5 and 6 (Fig. 5 and Figs. EV4 and 5 in the revised manuscript). In these improvements, we determined to only include data of biological replicates in a single typical experiment in the main figures. Automatically, data points in the previous Fig. 6B and D were decreased in number and do not overlap anymore (see Figs. EV4 and EV5D). Instead, we have included new figures (Appendix Fig. S4) in which the results of technical replicates (three independent experiments) are presented.

      Lines 246-7. The authors claim the Golgi-associated and centrosomal MTs can be easily distinguished in MTCL2 knockdown cells. They should include annotation in the corresponding figures to highlight these different populations.

      We followed the reviewer’s suggestion by adding arrows in Fig. 6C of the revised manuscript.

      Figure 8A. A horizontal line is missing in the panel showing MTCL/a-tub merge.

      Thank you for pointing this out. As mentioned above, we deleted the previous Fig. 8A from the manuscript.

      Figures 8C and 8D. The acetylated tubulin staining in control cells (control RNAi and GFP) in these panels vary greatly. Can the authors comment on this?

      Expression of MTCL1 C-MTBD induces tubulin acetylation intensely. Therefore, to obtain appropriate pictures under non-saturated conditions, we had to decrease the gain of photomultiplier of the confocal microscopy system for the previous Fig. 8D. This is why acetylated tubulin signals in control cells appear to be too weak in the previous Fig. 8D than those in Fig. 8C.

      In any case, we deleted the previous Fig. 8C in the revised manuscript as stated above. The previous Fig. 8D is solely included in Fig. EV3.

      Additionally, there appears to be an increase in acetylated tubulin on the Western Blot (8E) shown in cells expressing GFP-MTCL2 CMTB that is not reflected in the image in Figure 8D. Since a significant population of GFP-MTCL2 CMBT localizes to the nucleus, it is possible that the functional population of GFP-MTCL2 CMBT that can stabilize MTs is much lower than GFP-MTCL1 CMBT despite showing equal levels in the Western Blot. The author should compare signal intensity in the cytosol of GFP-expressing cells and base their analysis of acetylated tubulin levels on cells where cytosolic levels are comparable.

      We agree with this reviewer’s comment and did not include WB data in Fig. EV3B corresponding to the previous Fig. 8D.

      As for quantification of the fluorescence data in Fig. 8D, we provided a typical result on the acetylate-tubulin signals normalized by GFP and a-tubulin signals in the boxed regions where cytosolic GFP signals are comparable.

      Point-by-point responses to the reviewer #__3’s comments__

      While the standard fluorescence images are of good quality, the quality of the super-resolution microscopic images is quite low and insufficient. Fig. 8A looks like an enlarged standard laser scanning microscope image, but does not achieve the resolution of a super-resolution image by far, which should be well below the µm range. However, such a resolution would be required to support the claim that MTCL1 and 2 locate on MTs in a mutually exclusive manner. (Negative) data from immunoprecipitation experiments also provide only weak evidence for the absence of a heterocomplex. I also fear that the fixation process creates artifacts. Experiments to image living cells would definitely bolster the data and also provide information about the dynamics of the interactions.

      One of the major comments of three reviewers have been provided on Fig. 8, which reports that MTCL1 and 2 differentially regulate microtubules. We agree that the previous data in Fig. 8 A–C are rather preliminary. In the revised manuscript, we deleted these data and ceased to discuss that MTCL1 and 2 localize with microtubules in a mutually exclusive manner, as this was not the main focus of the study.

      We also deleted the previous Fig. 2D (showing another super-resolution image) and the corresponding sentence. By doing so, we ceased to suggest that MTCL2 functions in mediating MT–Golgi interactions only based on immunofluorescence data.

      It would also be relevant to confirm that the results are not a cell line artifact in HeLa cells.

      In the previous manuscript, we included data indicating that the knockdown effects observed in HeLa-K cells (reduced accumulation of MTs around the Golgi as well as lateral expansion of the Golgi ribbon) are also induced in RPE1 cells by MTCL2 knockdown (Supplementary Fig. 2 in the previous manuscript). We included the same figure in the revised manuscript as Appendix Fig. S4.

      A standard method for detecting microtubule association in cultured cells would be to use an extraction protocol. This has to be done to show that MTCL2 actually behaves like a microtubule-associated protein (MAP).

      In the revised manuscript, we included new immunofluorescence data obtained using PFA fixation with or without pre-extraction, which revealed a similar localization pattern of MTCL2 to that obtained by methanol fixation (Fig. EV2). Pre-extraction was performed using BRB80 buffer supplemented with 0.5% TX-100 and 4 mM EGTA for 30 s, according to a protocol provided by Dr. Mitchison Laboratory.

      I don't see that the study proves that MTCL2 is essential for the organization of an asymmetric microtubule network as the title claims. The experiments shown in Fig. 5 demonstrate a change in the skewness of the pixel intensity distribution dependent on the presence of MTCL2, which may indicate a contribution of MTCL2 (provided that the fixation and staining do not produce an artifact). However, they do not prove that MTLC2 is essential.

      We cannot understand how an artifact due to the fixation and staining may be responsible for the results shown in the previous Fig. 5 (Fig. 5 and Figs. EV4 and 5 in the revised manuscript).

      In many cases, microtubules do not elongate radially (symmetrically) from the centrosome but intensely accumulate around the Golgi area and show asymmetric organization (see Meiring et al. Curr. Opin. Cell Biol. 62: 86-95, 2020). “An asymmetric MT network” in the title corresponds to this asymmetric array of general MTs accumulating around the Golgi complex.

      In this respect, our findings that MTCL2 depletion severely disrupted MT accumulation around the Golgi and induced random and rather symmetric arrays of MTs (Fig. 5A) are very impressive. We believe that the knockdown/rescue experiments in this study strongly support the title by demonstrating that MTCL2 facilitates MT accumulation around the Golgi through its dual binding activity to MTs and the Golgi membrane.

      We are unable to comprehend the reviewer’s standpoint in not allowing us to conclude the essential role of MTCL2 in the organization of an asymmetric microtubule. However, the title in the revised manuscript was changed as follows.

      “MTCL2 promotes asymmetric microtubule organization by crosslinking microtubules on the Golgi membrane”

      There is also a large oversampling of the data by plotting each individual cell from only two separate experiments. It would be better and more reliable to present the data as the mean of the experiments (then of course more than 2 would be required). The same applies to the experiments in which the "Golgi ribbon expanding angle" was determined (Fig. 6).

      In my opinion, statistical theories based on an ideal assumption cannot simply be applied to the quantitative analysis of biological phenomena. In our case, the MT distributions, as well as the Golgi ribbon expansion angles significantly deviate in a context-dependent manner in each cell (for example, cell cycle, cell densities, etc.). The deviation of these values between each cell (in biological replicates) is much larger than the experimental deviation, which is mainly dependent on the stochastic element (in technological replicates). I understand that this is the reason why many journals in cell biology do not necessarily require “three” independent experiments for statistical analysis.

      In the revised manuscript, however, we included data from three independent experiments for all rescue experiments (Fig. 5, Figs. EV4 and 5, and Appendix Fig. S4) to further demonstrate the reliability of our data.

      In the main figures (Fig. 5, Figs. EV4 and 5), we included statistical data of a single typical experiment to demonstrate reproducibility in biological replicates in each condition. To compensate for these figures, we listed statistical data for each biological replicate of all experiments in Appendix Fig. S4 A. In Appendix Fig. S4 B and C, we further provided statistical data of technical replicates (three independent experiments) by comparing the average of each biological replicate. We concluded that this is the best way to statistically demonstrate the reliability of the biological analysis.

      We believe that the data collectively presented by these figures strongly support the reliability of our conclusions.

      It would be good to support the claim that MTCL2 affects the Golgi ribbon structure through ultrastructural analysis (EM).

      We did not perform electron microscopy analysis, because we are not implicating a change in the ultrastructure of the Golgi apparatus in MTCL2-knockdown cells. We specifically want to demonstrate that MTCL2 knockdown changes the assembly structures of the Golgi ribbons, and we believe that it is feasible to do so by light microscopy. We realize that using the term “Golgi morphology” may be misleading in this context. In the revised manuscript, we replaced this term with appropriate ones, such as “assembly structures of the Golgi stacks” or “compactness of the Golgi ribbon.”

      The critical mechanistic question is which molecule on the Golgi side interacts with MTCL2, since the experiments with the deletion constructs would suggest that it is not the microstructure of the microtubules. As shown, the study is mainly descriptive in relation to this aspect.

      We significantly improved this weakness by including new data indicating the possible involvement of CLASPs and giantin in mediating the Golgi association of MTCL2 (see Fig. 7 and Appendix Figs. S5–7).

      We also revealed that four-point mutations (4LA) in the first coiled-coil motif disrupt the Golgi localization of the N-terminal region of MTCL2 (Fig. 4D). Thereafter, we found that introduction of the same mutations in full-length MTCL2 abolished its preferential association to the perinuclear MTs accumulating around the GA without affecting its colocalization to MTs (Fig. 4E and F).

      These results reinforce our immunofluorescence results (Fig. 2) and indicate that the preferential association of MTCL2 to perinuclear MTs accumulating around the Golgi is facilitated by physical interactions between the N-terminal region of MTCL2 and the Golgi-resident proteins, such as CLASPs and giantin.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Matsuoka et al. describe an MTCL1 paralogue (MTCL2) that is present in vertebrates and binds to the Golgi membrane and interacts with microtubules. In contrast to MTCL1, MTCL2 contains only one microtubule binding region and does not stabilize any microtubules. The authors provide evidence that MTCL2 may be involved in accumulating microtubules on the Golgi and promote directed migration. The study is based on experiments with cell lines, predominantly HeLa cells, and relies heavily on the immunofluorescence staining of methanol-fixed cells. While the concept of a functional Golgi-microtubule interaction is interesting and may be relevant for directed migration, I am not convinced of the experimental support and interpretation provided by the authors.

      1. The study relies entirely on the examination of cell lines, mainly HeLa cells, and the immunofluorescence of fixed cells. While the standard fluorescence images are of good quality, the quality of the super-resolution microscopic images is quite low and insufficient. Fig. 8A looks like an enlarged standard laser scanning microscope image, but does not achieve the resolution of a super-resolution image by far, which should be well below the µm range. However, such a resolution would be required to support the claim that MTCL1 and 2 locate on MTs in a mutually exclusive manner. (Negative) data from immunoprecipitation experiments also provide only weak evidence for the absence of a heterocomplex. I also fear that the fixation process creates artifacts. Experiments to image living cells would definitely bolster the data and also provide information about the dynamics of the interactions.
      2. It would also be relevant to confirm that the results are not a cell line artifact in HeLa cells.
      3. A standard method for detecting microtubule association in cultured cells would be to use an extraction protocol. This has to be done to show that MTCL2 actually behaves like a microtubule-associated protein (MAP).
      4. I don't see that the study proves that MTCL2 is essential for the organization of an asymmetric microtubule network as the title claims. The experiments shown in Fig. 5 demonstrate a change in the skewness of the pixel intensity distribution dependent on the presence of MTCL2, which may indicate a contribution of MTCL2 (provided that the fixation and staining do not produce an artifact). However, they do not prove that MTLC2 is essential. There is also a large oversampling of the data by plotting each individual cell from only two separate experiments. It would be better and more reliable to present the data as the mean of the experiments (then of course more than 2 would be required). The same applies to the experiments in which the "Golgi ribbon expanding angle" was determined (Fig. 6).
      5. It would be good to support the claim that MTCL2 affects the Golgi ribbon structure through ultrastructural analysis (EM).
      6. The critical mechanistic question is which molecule on the Golgi side interacts with MTCL2, since the experiments with the deletion constructs would suggest that it is not the microstructure of the microtubules. As shown, the study is mainly descriptive in relation to this aspect.

      Significance

      The study is based on experiments with cell lines, predominantly HeLa cells, and relies heavily on the immunofluorescence staining of methanol-fixed cells. While the concept of a functional Golgi-microtubule interaction is interesting and may be relevant for directed migration, I am not convinced of the experimental support and interpretation provided by the authors.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      In this work, Matsuoka et al. describe a novel microtubule (MT) crosslinking factor, MTCL2. They use Western Blot analysis to show the presence of MTCL2 in various tissues and use a previously developed antibody to show its localization in cultured cells. The authors show that MTCL2 localizes along MTs in the Golgi region and that upon depletion of MTCL2, these MTs do not accumulate in the Golgi and Golgi organization is affected, leading to defects in migration. Through deletion mutant analysis, they show that MTCL2 C-terminus binds to MTs and that the N-terminus binds to Golgi membranes, though this may be lost or reduced in the full length protein. Expression of the C-terminal fragment rescues the phenotypes observed in MTCL2-depleted cells. Finally, the authors show that MTCL1 and MTCL2 show non-overlapping localization patterns and conclude they may have different functions in crosslinking and stabilizing MTs and Golgi organization.

      Major comments:

      1. In figure 1D, a loading control should be included for the Western Blot probing for V5-mMTCL2 in HEK293T cells.
      2. The authors use the anti-SOGA antibody to detect MTCL2. However, in Figure 1A they do not show the sequence similarity between this region in MTCL1 and MTCL2. The authors should include this, as well as show that the anti-SOGA antibody is specific for MTCL2 and does not detect MTCL1.
      3. Line 132-134. The authors conclude that MTCL2 possible mediates association between Golgi membrane and stabilized MTs based on localization microscopy only. This is an overstatement and should be corrected. Not only is the microscopy technique used able to produce resolution of 140nm, which is not enough to show direct association; the staining techniques used (double antibody staining) ensures the fluorophores are approximately 20-30nm away from the intended target (MTs, MTCL2, or Golgi). Thus, the conclusion drawn is overstated and should be refined at this point in the manuscript.
      4. The authors should include some quantification of MTCL2 signals along stabilized microtubules near the Golgi and in peripheral regions of the cell in Figure 2. This will show that MTCL2 preferentially localizes to MTs in the Golgi region but not the periphery, as the authors claim (lines 124-130). This quantification could be in the form of linescans along or across MT signals.
      5. The authors show that ectopic expression of the C-terminus of MTCL2 can rescue MTCL2 siRNA phenotypes. Since the N-terminus localizes strongly to the Golgi membrane, the authors should do corresponding experiments with this fragment, to determine if membrane binding of MTCL2 can have a similar rescue effect or if MT binding is essential. This is especially important for the Golgi-ribbon organization (Figure 6).
      6. Line 261-2. The authors claim that MTCL1 and MTCL2 function in a mutually exclusive manner. As with point 3, this is an overstatement based solely on localization microscopy. The authors cannot draw this conclusion from the data associated with this statement (Figure 8A) and it should be refined to reflect that they only comment on the respective localization patterns of MTCL1 and MTCL2. Additionally, to show that MTCL1 and MTCL2 do not overlap on MTs, the authors should include linescans along MTs showing the anti-V5 and anti-MTCL1 intensities.
      7. In Figure 8C the authors show acetylated tubulin staining in cells depleted of MTCL2. Based on this localization pattern, it seems the MT network is not grossly altered, as was shown in Figure 5 where perinuclear accumulation of MTs was lost. The authors should comment on whether acetylated tubulin presence and localization is altered in MTCL2-depleted cells. This is also mentioned in the discussion where the authors conclude that the major function of MTCL2 is to crosslink and accumulate MTs in the Golgi region. However, based on acetylated tubulin staining patterns, stable MTs seem to still accumulate in the Golgi region. The authors need to show this accumulated population of stable MTs is no longer crosslinked in the absence of MTCL2 to support their claim.
      8. To investigate potential functional overlap between MTCL1 and MTCL2, the authors should include a double depletion experiment where MT organization and Golgi organization are investigated. The currently shown experiments do not test a functional relationship between the two paralogs. Additionally, the authors should show Western Blot analysis of MTCL1 levels in MTCL2-depleted cells, and vice versa. While there does not seem to be an overlap in localization patterns of the two proteins, that does not mean there is no functional relationship.
      9. Lines 120-30 and 297-9. The authors state that based on the localization pattern of MTCL2 it mostly localizes along MTs in the perinuclear region (shown in Figure 2). Then, in the discussion they state MTCL2 preferentially localizes to Golgi membranes. Please clarify which of the two sites MTCL2 localizes to preferentially.
      10. Loss of Golgi organization as described in Figures 6 does not appear in polarized cells in Figure 7. The authors should comment on the loss of the phenotype in polarized cells.

      Minor comments:

      1. The authors should consider using colorblind friendly palettes in figures. For example, magenta/green instead of red/green and magenta/cyan/yellow instead of red/blue/green. Additionally, for tri-color images the combination red/green/white (Figure 4B, 7C) should be avoided, as overlapping red/green signals will show up as yellow which is difficult to distinguish from the white signals. Finally, human eyes detect shades of red much poorer than for example green. Therefore, the main point of a figure should not be in red. For example, MTCL2 is frequently shown as red signal in a merged image and should be replaced with a different color.
      2. The authors claim the mouse MTCL2 protein lacks 203 N-terminal amino acids. Authors should clarify in the text that this is relative to mouse MTCL1. The authors should also include the human comparisons, as they work on human cell lines in the majority of the manuscript.
      3. In Figure 1D the authors show Western Blots where various amounts of HEK293T extracts were probed for exogenously expressed MTCL2. As a control, authors should include a non-transfected control. From Figure 1E, it would be expected that HEK293 (kidney cells) would not express endogenous MTCL2, but the control should be included anyway.
      4. In Figure 3, the color scheme in the final column of images should be changed. Red/white contrast is very poor and no conclusions can be drawn from these images. Additionally, the authors should include a box to show where the inset is located in the overview images.
      5. Authors claim that MTCL2 is not detected near more dynamic MTs in the periphery of the cell and references Figures 2A and 3. They should include annotation in the figures to highlight this. This can be done with arrowheads or other markings, or with additional insets enlarging a peripheral region of the cell.
      6. The authors should clarify in the main text and figure legend which superresolution microscopy technique was used in Figure 2D.
      7. The authors use methanol fixation to examine localization of MTCL2, MTs, and Golgi. Methanol extracts lipids and thus affects intracellular membrane compartments, and can affect the localization pattern of GM130, a Golgi matrix protein. The authors should include samples fixed with a crosslinking fixative to ensure their conclusions drawn from methanol-fixed samples are not affected by the choice of fixative.
      8. In Supplementary Figure 1B a third, relatively high expressing cell can be seen in the top panel. The GM130 signal for this cell seems to be comparable to non-transfected cells in the same image. Can the authors address this? Alternatively, to show differences in expression levels between these three cells in that panel and others, authors could use a heatmap LUT of the V5 signal to differentiate expression levels more clearly in different cells.
      9. Line 139. How was the ectopic expression 'suppressed to endogenous levels'? The panels in Suppl Fig 1 of 'low expression' clearly show increased MTCL2 signal when compared to non-transfected cells in the same panel still. This would suggest ectopic expression is still above endogenous levels.
      10. Figure 5C. The label for MTCL2 construct should read mMTCL2 ΔC-MTBD to clarify the expression construct used.
      11. In Figures 6A and 6C the label shows a-tubulin, but the staining is of a Golgi marker.
      12. In Figures 6B and 6D the different conditions should be separated more in the graph, the datapoints overlap.
      13. Lines 246-7. The authors claim the Golgi-associated and centrosomal MTs can be easily distinguished in MTCL2 knockdown cells. They should include annotation in the corresponding figures to highlight these different populations.
      14. Figure 8A. A horizontal line is missing in the panel showing MTCL/a-tub merge.
      15. Figures 8C and 8D. The acetylated tubulin staining in control cells (control RNAi and GFP) in these panels vary greatly. Can the authors comment on this? Additionally, there appears to be an increase in acetylated tubulin on the Western Blot (8E) shown in cells expressing GFP-MTCL2 CMTB that is not reflected in the image in Figure 8D. Since a significant population of GFP-MTCL2 CMBT localizes to the nucleus, it is possible that the functional population of GFP-MTCL2 CMBT that can stabilize MTs is much lower than GFP-MTCL1 CMBT despite showing equal levels in the Western Blot. The author should compare signal intensity in the cytosol of GFP-expressing cells and base their analysis of acetylated tubulin levels on cells where cytosolic levels are comparable.

      Significance

      This work describes a novel MT crosslinking protein, MTCL2. The authors show that MTCL2 may function predominantly on non-centrosomal MTs associated with the Golgi and suggest a function in linking the centrosome and Golgi in polarized, migrating cells. However, the manuscript is highly descriptive as the authors do not uncover a mechanism for how MTCL2 stabilizes and crosslinks MTs and do not address potential functional interactions between MTCL1 and MTCL2. Additionally, there are some contradictory findings that are not addressed in the current manuscript.

      This work adds a new factor to an expanding list of proteins that regulate non-centrosomal MTs (reviewed in Meiring et al., 2019, Current Opinion in Cell Biology, and Sanders and Kaverina, 2015, Frontiers in Neuroscience), and would be of interest to those interested in cell biology of MT organization and function.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      In this manuscript, the authors identify MTCL2, a paralog of the MTCL1 protein and study its interaction with the Golgi complex and with microtubules. A shorter version of this protein was identified before and named SOGA (suppressor of glucose from autophagy). A role of MTCL2 in regulating the polymerization of Golgi associated microtubules is reported as well as an implication in cell polarity and migration.

      Major comments:

      - Are the key conclusions convincing?

      Most of the key conclusions are valid but the main one should be either reinforced or tuned down. In particular, the authors tend to give central role to MTCL2 in regulating the formation and organization of Golgi-associated MT network, and conversely in organizing Golgi elements, without considering the other factors identified (the authors cite relevant papers though but do not discuss this). They should analyze the function of MTCL2 in relation to the role of CLASP2, AKAP450, Golgi-g-Tubulin, or even EB proteins (like EB3). I also do not think that carrying out super resolution microscopy is enough to "reveal the possibility that MTCL2 mediates the association of the Golgi membrane with stabilized MTs". More generally, the authors cannot conclude that MTCL2 preferentially associated to Golgi-MT only from their immunofluorescence and KD experiments. The centrosome (the main MTOC) is indeed also localized in the perinuclear area. Easy to do additional experiments may help to confirm these conclusions (see below). Also, the authors could strengthen the way the study how MTCL1 and MTCL2 binds to microtubules and Golgi (see below). The localization or interaction of MTCL2 with Golgi-associated MT is not directly shown. The title should be changed also. I am not sure I understand what an asymmetric microtubule network means in this context. I guess that the authors mean non-centrosomal microtubule network.

      - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The authors also state that tubulin acetylation is induced by MTCL1 C-MTBD but it may simply be stabilized. They should also clarify if MTCL2 regulates Golgi-dependant nucleation microtubules

      - Would additional experiments be essential to support the claims of the paper?

      Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. To demonstrate that MTCL2 associated to Golgi-MT, microtubule regrowth experiments following nocodazole treatment have to be conducted (time course). Another efficient way to analyze such events, as shown by the Kaverina and the Akhmanova labs for example, is to use fluorescent EB proteins (e.g. EB3) to image microtubule plus ends and back-track them to identify nucleation points. Carrying out such an experiment (nocodazole way-out and EB tracking) in the presence or absence of MTCL2 would allow to confirm, or not, the functional hypothesis of the authors. Additionally, carrying electron microscopy analysis would be important to qualify better the effects observed on Golgi complexes upon depletion. The authors mention the effects on the "morphology of Golgi ribbon" but it is rather unclear. Last, because the authors compare the way MTCL1 and MTCL2 bind microtubules, and suggest intriguing differences, domain swapping experiments between these two isoforms would be important to carry out.

      - Are the suggested experiments realistic in terms of time and resources?

      It would help if you could add an estimated cost and time investment for substantial experiments. The proposed experiments to study Golgi-based nucleation are easy and inexpensive, as are domain swapping experiments. Electron microscopy on the other hand is quite expert and requires either internal knowledge, access to a facility or setting-up a collaboration. A few months, 3-4, would be needed.

      - Are the data and the methods presented in such a way that they can be reproduced?

      yes

      - Are the experiments adequately replicated and statistical analysis adequate?

      I am not a statistician but I was not convinced by the use of the quantification of "skewness", in particular in figure 5B. Whether a Wilcoxon test is adequate is unclear to me.

      Minor comments:

      -Specific experimental issues that are easily addressable.

      Yes

      -Are prior studies referenced appropriately?

      Some studies are referred but the published data not actually used (with the exception of the final scheme). The authors should comment on the fact that other Golgi-associated MT binding proteins have been shown to be involved in the mechanisms highlighted here. Why they would not take over in the absence of MTCL2 should be properly discussed. Similarly, in the discussion, the authors indicate that SOGA has been found as an interacting partner of CLASP2. As CLASP2 is a microtubule binding protein also localized at the Golgi complex and binding to acetylated microtubules, the authors should at least comment on the putative role of the interaction between MTCL2 and CLASP2 in the phenotypes they described. The role of the interaction between CLASP2 and MTCL2 should be discussed and ideally tested.

      -Are the text and figures clear and accurate? In general, yes. There are however quite a few problems:

      • In the introduction, page 3 line 74-77, the authors wrote « The resultant N-terminal fragment is released into the cytoplasm to suppress autophagy by interacting with the Atg12/Atg5 complex, whereas the C-terminal fragment is secreted after further cleavage (see Fig. 1A, boxed illustration). » while on the Fig1 the boxed area indicates that SOGA bears Atg16 and Rab5 binding domains. Please double check the interacting partners of SOGA1.

      • Figure 1 B and C are not cited in the main text. • Figure 1E: a loading control is needed to evaluate the expression level of SOGA/MTCL2 in the mouse tissues. In the liver, the size of the bands is different than in other tissues (smaller size). The authors might comment if these smaller bands correspond to the cleaved version of SOGA that was previously described in mouse hepatocyte.

      • Figure 2A: single color picture for the anti-tubulin immunolabeling would help to see the distribution of microtubules in the perinuclear area. The perinuclear region is a crowded area with many intracellular compartments accumulating there as well as cytoskeleton elements. • Figure 2C: same comment as above, a single-color picture for the anti-MTCL2 and anti-GM130 immunolabeling are required.

      • page 7, line132-134: the authors state: « Close inspection using super-resolution microscopy further revealed the possibility that MTCL2 mediates the association of the Golgi membrane with stabilized MTs (Fig. 2D, arrows). » To my opinion, the data are over-interpreted. The signals partially co-localize but this does not indicate a function of MTCL2 in mediating the interaction.

      • Figure 3: Another way of merging the anti MTCL2 and GS28 pictures have to be provided. The pictures are difficult to interpret with the current display.

      • Figure 4C: please indicate the meaning of « ppt »

      • Figure 5B and C: for easier reading of the figure, it would be useful to annotate with MTCL2 construct is overexpressed following doxycycline treatment (MTCL2 WT (A) and MTCL2 delta C-MTBD (C)).

      • Figure 6 A and C: the labels are wrong. Bottom pictures correspond to anti-GM130 immunostaining not anti-tubulin. If I am not mistaken, it is MTCL2 delta C which is studied in panel C.

      • Page 11, line 212: Supplementary Figure 2 (knockdown in RPE1 cells) is intended to be cited not Supplementary Figure 3.

      • Figure 8A: single color pictures are needed to appreciate the distribution of the signals

      -Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Yes, see above

      Significance

      - Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      It adds a new player in the machinery involved in the interplay between the Golgi complex and microtubules for their mutual organization. To me a key observation is the unlinking between the Golgi complexes and the centrosome but this observation is not really used and studied (here again, may be a nocodazole wash-out experiment and real-time analysis may help)

      - Place the work in the context of the existing literature (provide references, where appropriate).

      A large number of studies, cited by the authors, have identified proteins involved in mutual organization of Golgi membranes and microtubules. Identification and study of MTCL1 and 2 are important in this context. It also questions the role and function of the initially identified SOGA.

      - State what audience might be interested in and influenced by the reported findings.

      This is a pure cell biology study that will primarily interest people studying the Golgi complex and micrutubules. People interested by the internal organization of the cell, the interaction between the centrosome and the Golgi and intracellular polarity would also be interested, as well as people studying migration.

      - Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      I studied Golgi dynamics and function as well as microtubule dynamics. I have no expertise in statistical analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1 (Evidence, reproducibility and clarity):

      The main message of this paper, as far as I understood since I am not a molecular bioinformatician but I am certainly interested in mtDNA variations especially related to disease, is that there is a very obvious bias among synonymous changed in the ORF of human mtDNA, more frequent for aminoacids with 4 variants, more frequent in P position, and much more frequently characterized by transversion rather than transition substitutions. This survey is well written and, although edited in a rather technical language, the message is reachable and interesting. I also agree on the conclusions of the Author concerning the considerations that this set of new data should prompt one to draw also considering non-synonymous, potentially pathogenic mutations. The only contribution I feel I can provide to this manuscript is to invite the Authors to consider the possibility that the selection may be due to a preferred codon bias, linked to the higher or lower compliance of different codon to be translated by the translational in situ machinery of mitochondria. I am not sure that this applies also for mitochondrial mitochondria and related factors (you may want to ask Aleksey Amunts in Stockholm or Bob Lightowlers or Zoscha Lightowlers in Newcastle on this matter). I do know that this is certainly a problem for recombinant proteins containing, for instance, mammalian MTS fused with a bacterial restriction enzyme; in most of the cases the bacterial sequence has to be recoded using the preferred codon for mammalian system in order to increase translation by an eukaryotic (mammalian) translation machinery. I wonder whether you could discuss this possibility in your paper and maybe perform some further comparative measurement to test it.

      I appreciate the supportive comments of Reviewer 1 regarding the accessibility of our manuscript, and I address comments related to codon bias below.

      Reviewer 1 (Significance):

      The paper provides novel information on the structure and constrains of mtDNA variants in humans, opens an area of investigation which is new and potentially relevant, with some possible implications also on pathogenic mtDNA mutations in humans.

      I thank Reviewer 1 for their positive comments about the novelty of this work and the important implications of our study.

      Reviewer 1 (Referee Cross-commenting):

      I said in my first comment that I am not a bioinformatician, but Referee 2 made a great job in identifying some critical points and suggest the Authors how to cope with them. I maintain my opinion, that I think it's shared by referee 2, that the paper conveys an interesting and rather unexpected message, and that if the Authors are able to answer properly to the points raised by referee 2 the paper should be published.

      We are quite glad to hear that Reviewer 1 would like to see this manuscript published, provided that the items noted by the reviewers are properly addressed.

      Response to Reviewer 1:

      R1Q1 (Continuation from Referee Cross-commenting): I confirm that the only contribution I feel I can provide to this manuscript is to invite the Authors to consider the possibility that the selection may be due to a preferred codon bias, linked to the higher or lower compliance of different codons to be translated by the translational in situ machinery of mitochondria. I wonder whether the Authors could consider this possibility in the Discussion and possibly perform some further comparative measurement to test it.

      R1A1: My manuscript takes into consideration the possibility that codon-specific preferences would determine the frequency of mtDNA variants. Findings that argue against codon bias as a strong source of selection include:

      1) At two-fold degenerate P3s, nearly every site (> 97%) harbored at least one HelixMTdb sample associated with a non-reference base. It is worth noting that HelixMTdb is not enriched for known mitochondrial disease variants.

      2) SSNEs are very tightly associated with transversions from the human reference sequence, implicating mutational biases as a cause of any limited diversity in the HelixMTdb.

      3) Every possible base can be found at 99% of >500 analyzed I-P3 positions (those P3s at which the base at codon positions one and two is identical throughout the alignment), arguing against the idea that codon bias plays a significant role in controlling variant frequency across mammals. The only exception that I identified in my extensive analysis is the P3 found within the first methionine codon of COX3.

      4) Earlier, more limited studies of mitochondrial codon choice (citations of these earlier studies can be found in the manuscript) also argue against substantial selection based upon codon choice.

      5) Finally, I would note that the set of tRNAs encoded by vertebrate mtDNAs is quite limited, with only one tRNA linked to each codon family defined by codon positions P1 and P2. There is no evidence, to my knowledge, that nucleus-encoded tRNAs enter human mitochondria. Therefore, the scope of potential selection linked to, for example, translation speed and protein folding seems particularly limited at vertebrate mitochondria.

      While most evidence does not support strong selection on mtDNA codon choice in vertebrates, I do report divergence in TSS distributions obtained from the I-P3s of different amino acids within the same degeneracy class (eg. two-fold purine, two-fold pyrimidine, four-fold), hinting at some minimal role for codon preferences at P3. However, on the whole, mutational propensities are likely to be the predominant factor controlling synonymous variation.

      Reviewer 2 (Evidence, reproducibility and clarity):

      The manuscript explores a large database of human mtDNA sequences and performs some comparative analysis across mammals to characterise the profile of mtDNA mutations. It finds that some variants are surprisingly poorly represented in human mtDNA and suggests that mutational bias rather than selection is the dominant driver of this heterogeneity.

      This is an interesting message and an efficient and interpretable of a large-scale dataset to shed light on biological mechanisms, which is a highly desirable philosophy. The factors shaping human mtDNA heterogeneity are of immense interest for several fields from population genetics to medicine, making this a valuable perspective. My comments are mainly quite fine-grained and reflect instances where I think the argument could be tighter, rather than fundamental flaws in the approach. In the cases where these points are due to my own naivety, I apologise and suggest that more explanation of these points could help other readers like me!

      I am happy to read that Reviewer 2 (Dr. Iain Johnston) finds my approach to be fundamentally sound, and I certainly appreciate the insightful comments and suggestions that he has provided.

      Reviewer 2 (Significance):

      I wrote the above review without realising the reviewer interface would be categorised in this way. Here's a repeat of my "significance" comments

      The manuscript explores a large database of human mtDNA sequences and performs some comparative analysis across mammals to characterise the profile of mtDNA mutations. It finds that some variants are surprisingly poorly represented in human mtDNA and suggests that mutational bias rather than selection is the dominant driver of this heterogeneity.

      This is an interesting message and an efficient and interpretable of a large-scale dataset to shed light on biological mechanisms, which is a highly desirable philosophy. The factors shaping human mtDNA heterogeneity are of immense interest for several fields from population genetics to medicine, making this a valuable perspective.

      I am very pleased that the reviewer appreciates the importance and potential impact of my analysis. We agree that mtDNA heterogeneity is likely to be of high medical relevance.

      Response to Reviewer 2:

      R2Q1: The first paragraph is focused on humans without explicitly saying so; missing heritability is less of an issue in, for example, plants [Brachi et al., 2011. Genome biology, 12(10), pp.1-8]. This focus should be clearer (or the differences across kingdoms mentioned!). It's also worth noting that the argument about pathogenic variants being infrequent because of selection can only address missing heritability in pathogenic variants, and cannot (directly) inform the missing heritability in traits like height etc. Also, the whole motivation with respect to missing heritability currently comes across as a bit of a non sequitur. An introduction section could be used to help describe how the analysis of the provenance of mtDNA mutations contributes to the missing heritability question.

      R2A1: I agree that beginning the manuscript with a discussion of genome-side association studies may distract the reader from the main topic at hand: the utility of variant frequency when predicting pathogenicity in humans. I have changed the Introduction accordingly.

      R2Q2: I also suggest that such an introduction section introduces the (later cited) previous work from Reyes and others on mutational profiles in mtDNA to set the scene.

      R2A2: I now provide these citations in the second paragraph of the Introduction. However, I do not expand further upon mutational propensities in that section, with an eye toward minimizing manuscript length toward publication as a short report.

      R2Q3: An early result, that 35% of possible synonymous mutations do not appear in a dataset, lacks a null hypothesis. Depending on the size of the dataset this may be very surprising or very unsurprising : an order of magnitude estimate of what proportion would be expected under uniform mutation and zero selection would help comparison here. I guess this can be as simple as 16k/3*4 R2A3: The reviewer raises an excellent point regarding how 'surprising' it should be to the reader, previous to downstream analyses revealing transition/transversion biases, that so many synonymous substitutions are lacking within this dataset. While the authors of the HelixMT study removed mtDNA from highly related individuals from the analysis, the vast majority of the mtDNAs analyzed (91.2%) were from haplogroup N and of inferred European ancestry (doi.org/10.1101/798264). The authors of the HelixMTdb study do note that nearly all mtDNA lineages were present in the study, presumably encompassing roughly 100,000 years of human mtDNA evolution. That said, how this information alone may be used to quantitatively model expectations under zero selection is unclear.

      To address this question of whether sample diversity might be very limited in the HelixMTdb study, I have carried out additional analyses on this dataset. I now assess, for third codon positions allowing two-fold synonymous change (serine and leucine not included, due to their decoding by two different tRNAs), how often only one nucleotide was found at that position. For two-fold degenerate P3s, > 97% (n=1604) harbored both nucleotide possibilities within the database. This result strongly suggests that mtDNA diversity was well sampled in the HelixMTdb study, since a database consisting of highly related samples would presumably be characterized by a greater number of sites showing total identity. Moreover, when considering analyzed four-fold degenerate P3s (again, leucine and serine codons were omitted), only a very small number of sites showed no diversity (1%), with more than half of sites harboring at least three different bases. My interpretation is that the HelixMTdb authors have successfully sampled a very diverse set of human mitochondrial genomes. I have added these new analyses to the manuscript as Fig. 2a and 2b.

      I have also changed the word 'surprising' to 'noteworthy' within the relevant portion of my manuscript text.

      R2Q4: I think some comments and additional framing of the diversity in the central database would be valuable and important for interpretation. I believe it has, for example, rather more European rows than African ones, thus (to take a very basic view) sampling a less diverse population more than a more diverse one.

      R2A4: I now state explicitly that the vast majority of the mtDNAs analyzed (91.2%) were from haplogroup N and of inferred European ancestry. Also, please see point R2A3 for further discussion of the human mtDNA diversity reflected within HelixMTdb.

      R2Q5: Another rhetorically important number lacking a comparison with a null is that guanine was detected at >3000 P3 positions accepting synonymous purine substitutions. This is cited as evidence that nucleotide frequencies at P3s don't reflect selection inherent to translation. But this link isn't clear -- if such selection was present, how different from 3000 would Iexpect this number to be? Isn't there a continuum of possibilities? Is the key idea that 3000 is greater than some other number, and if so, what is that?

      R2A5: The purpose of this figure is simply to demonstrate that no nucleotide is ruled out when considering silent substitutions at the P3 of any amino acid. This is consistent with (although does not prove, and I believe that the I-P3 analysis provides stronger evidence on this point) a minimal role for mitochondrial codon preference in mtDNA evolution. To reflect that my point is more general, and not to be taken as a quantitative comparison, I changed my text to: 'However, even considering the relative depletion of guanine from all four-fold degenerate P3s and two-fold degenerate purine P3s, guanine was nonetheless detected at thousands of P3 positions (Fig. 3b)'.

      R2Q6: I also wasn't clear whether/how the finding that little selection inherent to translation was implicitly extended to suggest little general selection overall. The following section only considers selection acting at specific P3 sites, thus implicitly discarding other hypotheses about general selection based on nucleotide content but not inherent to translation. Perhaps I am misunderstanding this translation link, but selection based on general nucleotide profiles (for example, due to thermodynamic stability [Samuels, Mech. Ageing Dev. 2005; 126: 1123-1129] or availability of nucleotides [Aalto & Raivio, Mech. Ageing Dev. 2005; 126: 1123-1129; Ott et al., Apoptosis. 2007; 12: 913-922]) would seem to still be on the table?

      R2A6: I would argue against selection upon nucleotide choice linked to local changes to mtDNA thermodynamic stability. Most prominently, when considering two-fold degenerate sites, nucleotide differences from the reference sequence were identified within the HelixMTdb at almost every analyzed position (Fig. 2a), even though hydrogen bond strength between opposing bases would be affected in every case (AT>GC or vice versa). Of course, my argument here applies generally, and there may be a small subset of sites for which nucleotide substitutions can cause a pronounced functional defect because of a change to local mtDNA structure.

      I would also argue against mitochondrial nucleotide availability as a source of selective pressure within the human population. When considering the entire L-strand sequence (NC_012920.1), nucleotide counts are as follows:

      A 5124

      C 5181

      G 2169

      T 4094

      And when considering both strands, nucleotide counts and frequencies are as follows:

      A 9218 (27.8%)

      C 7350 (22.2%)

      G 7350 (22.2%)

      T 9218 (27.8%)

      One nucleotide substitution would lead to a change in nucleotide frequencies by less than 0.02%. While the formal possibility exists that mitochondrial nucleotide availability lies exquisitely close to an important threshold, there is no current evidence to support this proposition. And here again, the diversity of P3 nucleotide choice found among the HelixMTdb samples would argue against this possibility.

      That said, it is worth noting that nucleotide frequencies, and mtDNA mutation rates relative to nuclear mutation rates do appear to differ among clades (PMID: 8524045 and 28981721). Therefore, while selection related to nucleotide availability seems an unlikely explanation for the variant frequencies that I have recovered at degenerate sites among human samples, I certainly would not rule out taxon-specific dietary, environmental, or physiological factors that, over longer evolutionary timescales, might shape mtDNA nucleotide frequencies.

      I would like to raise the possibility of another source of selection upon nucleotide choice. Specifically, one might propose that synonymous mtDNA substitutions could affect the binding of proteins controlling the replication, compaction, or expression of mtDNA. Indeed, an intriguing study has reported that human cells manifest a mtDNA footprinting pattern (PMID: 30002158), suggestive of regulatory sites bound to protein or sites of transcriptional pausing. However, Blumberg et al. found no statistically significant difference in human synonymous change at footprinted sites, arguing against a strong selective pressure on nucleotide choice at footprinted P3s. Moreover, footprinting sites identified in the above-mentioned study are conserved in mouse and human, but I have shown that all four nucleotides are acceptable at all four-fold degenerate sites (n=252), all two-fold degenerate pyrimidine sites (n=157), and 99% of two-fold degenerate purine sites (n=152) within the mammalian I-P3 set, again arguing against general limitations on nucleotide choice caused by protein association. These analyses cannot, however, totally rule out the possibility that a subset of individual P3s are under some selection due to their role in binding or traversal of proteins.

      R2Q7: A reptile is chosen as an outgroup for a comparative analysis of mammals. As always when a choice is made, the question arises: what if that choice was different? Perhaps the corresponding figures can be presented for two other choices of outgroup to demonstrate that there's nothing particularly unrepresentative about this reptile?

      R2A7: While preparing this revised manuscript, I have performed an updated analysis using the most current mammalian mtDNA dataset available on RefSeq. For these new tests, I used Iguana iguana, rather than Anolis punctatus, as an outgroup. The new results are essentially indistinguishable from my previous findings. Importantly, when old TSS values and new TSS values for I-P3 sites were compared by linear regression, the R-squared value is 0.9955, with a p-value of

      R2Q8: Another analysis involves classifying variant frequency into discrete groups based on percentage appearance, then seeking links with the TSS statistic. First, it is not clear why discretisation is needed here. A statistical model embracing the continuous nature of variant frequency requires fewer arbitrary choices (e.g. of numbers and boundaries of classes).

      R2A8: A primary audience of this manuscript will certainly be the human genetics community, which commonly speaks in terms of variant classes (eg. 'common', 'rare', 'ultra-rare'). Therefore, I prefer to also use such classifications when analyzing the relationship between TSS and mtDNA variant frequency. I took advantage of the following references when generating frequency classifications:

      Bomba L, Walter K, Soranzo N. 2017. The impact of rare and low-frequency genetic variants in common disease. Genome Biol 18:77.

      McInnes G, Sharo AG, Koleske ML, Brown JEH, Norstad M, Adhikari AN, Wang S, Brenner SE, Halpern J, Koenig BA, Magnus DC, Gallagher RC, Giacomini KM, Altman RB. 2021. Opportunities and challenges for the computational interpretation of rare variation in clinically important genes. Am J Hum Genet 108:535–548.

      R2Q9: Second, an interpretation point here is in danger of equating absence of evidence with evidence of absence. Without an estimate of statistical power, an absence of a significant relationship cannot suggest that anything is likely or unlikely, only that there may not be sufficient power to detect an effect.

      R2A9: To address this point, I have changed my text as follows:

      Old: 'However, I detected no significant relationship between TSS and variant frequency for four-fold degenerate I-P3s (Fig. 2d), indicating that the highly elevated SSNE abundance at four-fold degenerate P3s is unlikely to be due to selection.'

      New: 'However, I detected no significant relationship between TSS and variant frequency for four-fold degenerate I-P3s (Fig. 2d), consistent with the idea that the highly elevated SSNE abundance at four-fold degenerate P3s is unlikely to be due to selection.'

      R2Q10: Figs 1a and 1e have a log vertical axis but I think the lowest points actually corresponds to zero? This is not compatible with a log axis and the zero position should be explicitly labelled with its own tick (perhaps in parentheses to highlight the discontinuity).

      R2A10: Quite correct, and I had neglected to clarify those details in the previous version of the manuscript. I now designate the samples with zero counts in the population using a smaller dot size, and I describe this approach in the figure legend.

      R2Q11: The methods are presented in an interesting way, with specific filenames for the code associated with each part of the pipeline explicitly provided. This is (very!) nice but it would also be good to describe in words what each piece of code does (e.g. "this was used as input for x.py, which counts the mutations and outputs a profile" or some such). This is indeed sometimes written but some parts lack an explanation.

      R2A11: I have now expanded my description of several scripts within the Methodology section.

      R2Q12: I could do with an additional sentence or two on the statistical analysis. As Kolmogorov-Smirnov tests examine differences between distributions, it's not immediately unambiguous how they are applied to total count statistics. Are count distributions with respect to variant frequency analysed for each amino acid separately? Or are the amino acids somehow ordered and the distributions across them compared? Or something else?

      R2A12: TSS distributions are held for each individual amino acid, which are then compared by Kolmogorov-Smirnov testing only within a given degeneracy category (four-fold degenerate, two-fold degenerate purine, two-fold degenerate pyrimidine). I have now elaborated upon this statistical test selection, and other details of the analysis, in the Methodology section.

      Reviewer 2 (Referee Cross-commenting):

      I agree that codon bias is an interesting potential axis of selection. Even if the analysis rejects the hypothesis of selective effects inherent to translation, it is conceivable that codon bias could be shaped by selection in other indirect ways (depending on how "inherent" is defined, these could include tRNA/nucleotide availability, GC content and thermodynamic stability, etc). I think this aligns with my suggestion that modes of selection that are not directly linked to translation could be explored in more depth before discounting selective effects overall. IJ

      I hope that I have now successfully addressed points related to codon bias, GC content, and thermodynamic stability in the manuscript, as well as here in this response to the reviewers.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript explores a large database of human mtDNA sequences and performs some comparative analysis across mammals to characterise the profile of mtDNA mutations. It finds that some variants are surprisingly poorly represented in human mtDNA and suggests that mutational bias rather than selection is the dominant driver of this heterogeneity.

      This is an interesting message and an efficient and interpretable of a large-scale dataset to shed light on biological mechanisms, which is a highly desirable philosophy. The factors shaping human mtDNA heterogeneity are of immense interest for several fields from population genetics to medicine, making this a valuable perspective. My comments are mainly quite fine-grained and reflect instances where I think the argument could be tighter, rather than fundamental flaws in the approach. In the cases where these points are due to my own naivety, I apologise and suggest that more explanation of these points could help other readers like me!

      The first paragraph is focused on humans without explicitly saying so; missing heritability is less of an issue in, for example, plants [Brachi et al., 2011. Genome biology, 12(10), pp.1-8]. This focus should be clearer (or the differences across kingdoms mentioned!). It's also worth noting that the argument about pathogenic variants being infrequent because of selection can only address missing heritability in pathogenic variants, and cannot (directly) inform the missing heritability in traits like height etc. Also, the whole motivation with respect to missing heritability currently comes across as a bit of a non sequitur. An introduction section could be used to help describe how the analysis of the provenance of mtDNA mutations contributes to the missing heritability question. I also suggest that such an introduction section introduces the (later cited) previous work from Reyes and others on mutational profiles in mtDNA to set the scene.

      An early result, that 35% of possible synonymous mutations do not appear in a dataset, lacks a null hypothesis. Depending on the size of the dataset this may be very surprising or very unsurprising : an order of magnitude estimate of what proportion would be expected under uniform mutation and zero selection would help comparison here. I guess this can be as simple as 16k/34 << 200k. Also the ancestry of the dataset is important here: if all samples are highly related then a more homogenous mutational profile is unsurprising. Perhaps one could assign a quantity like an effective population size to the database and compare this to 16k/34? I think some comments and additional framing of the diversity in the central database would be valuable and important for interpretation. I believe it has, for example, rather more European rows than African ones, thus (to take a very basic view) sampling a less diverse population more than a more diverse one.

      Another rhetorically important number lacking a comparison with a null is that guanine was detected at >3000 P3 positions accepting synonymous purine substitutions. This is cited as evidence that nucleotide frequencies at P3s don't reflect selection inherent to translation. But this link isn't clear -- if such selection was present, how different from 3000 would we expect this number to be? Isn't there a continuum of possibilities? Is the key idea that 3000 is greater than some other number, and if so, what is that?

      I also wasn't clear whether/how the finding that little selection inherent to translation was implicitly extended to suggest little general selection overall. The following section only considers selection acting at specific P3 sites, thus implicitly discarding other hypotheses about general selection based on nucleotide content but not inherent to translation. Perhaps I am misunderstanding this translation link, but selection based on general nucleotide profiles (for example, due to thermodynamic stability [Samuels, Mech. Ageing Dev. 2005; 126: 1123-1129] or availability of nucleotides [Aalto & Raivio, Mech. Ageing Dev. 2005; 126: 1123-1129; Ott et al., Apoptosis. 2007; 12: 913-922]) would seem to still be on the table?

      A reptile is chosen as an outgroup for a comparative analysis of mammals. As always when a choice is made, the question arises: what if that choice was different? Perhaps the corresponding figures can be presented for two other choices of outgroup to demonstrate that there's nothing particularly unrepresentative about this reptile?

      Another analysis involves classifying variant frequency into discrete groups based on percentage appearance, then seeking links with the TSS statistic. First, it is not clear why discretisation is needed here. A statistical model embracing the continuous nature of variant frequency requires fewer arbitrary choices (e.g. of numbers and boundaries of classes). Second, an interpretation point here is in danger of equating absence of evidence with evidence of absence. Without an estimate of statistical power, an absence of a significant relationship cannot suggest that anything is likely or unlikely, only that there may not be sufficient power to detect an effect.

      Figs 1a and 1e have a log vertical axis but I think the lowest points actually corresponds to zero? This is not compatible with a log axis and the zero position should be explicitly labelled with its own tick (perhaps in parentheses to highlight the discontinuity).

      The methods are presented in an interesting way, with specific filenames for the code associated with each part of the pipeline explicitly provided. This is (very!) nice but it would also be good to describe in words what each piece of code does (e.g. "this was used as input for x.py, which counts the mutations and outputs a profile" or some such). This is indeed sometimes written but some parts lack an explanation.

      I could do with an additional sentence or two on the statistical analysis. As Kolmogorov-Smirnov tests examine differences between distributions, it's not immediately unambiguous how they are applied to total count statistics. Are count distributions with respect to variant frequency analysed for each amino acid separately? Or are the amino acids somehow ordered and the distributions across them compared? Or something else?

      Iain Johnston

      Significance

      I wrote the above review without realising the reviewer interface would be categorised in this way. Here's a repeat of my "significance" comments

      The manuscript explores a large database of human mtDNA sequences and performs some comparative analysis across mammals to characterise the profile of mtDNA mutations. It finds that some variants are surprisingly poorly represented in human mtDNA and suggests that mutational bias rather than selection is the dominant driver of this heterogeneity.

      This is an interesting message and an efficient and interpretable of a large-scale dataset to shed light on biological mechanisms, which is a highly desirable philosophy. The factors shaping human mtDNA heterogeneity are of immense interest for several fields from population genetics to medicine, making this a valuable perspective.

      Referee Cross-commenting

      I agree that codon bias is an interesting potential axis of selection. Even if the analysis rejects the hypothesis of selective effects inherent to translation, it is conceivable that codon bias could be shaped by selection in other indirect ways (depending on how "inherent" is defined, these could include tRNA/nucleotide availability, GC content and thermodynamic stability, etc). I think this aligns with my suggestion that modes of selection that are not directly linked to translation could be explored in more depth before discounting selective effects overall. IJ

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The main message of this paper, as far as I understood since I am not a molecular bioinformatician but I am certainly interested in mtDNA variations especially related to disease, is that there is a very obvious bias among synonymous changed in the ORF of human mtDNA, more frequent for aminoacids with 4 variants, more frequent in P position, and much more frequently characterized by transversion rather than transition substitutions. This survey is well written and, although edited in a rather technical language, the message is reachable and interesting. I also agree on the conclusions of the Author concderning the considerations that this set of new data should prompt one to draw also considerin g non-synonymous, potentially pathogenic mutations. The only contribution I feel I can provide to this manuscript is to invite the Authors to coinsider the possibility that the selection may be due to a preferred codon bias, linked to the higher or lower campliance of different codon to be translated by the translational in situ machinery of mitochondria. I am not sure that this applies also for mitochondrial mitochondria and related factors (you may want to ask Aleksey Amunts in Stockholm or Bob Lightowlers or Zoscha Lightowlers in Newcastle on this matter). I do know that this is certainly a problem for recombinant proteins containing, for instance, mammalian MTS fused with a bacterial restriction enzyme; in most of the cases the bacterial sequence has to be recoded using the preferred codon for mammalian syste in orderr to increase translation by an eukaryotic (mammalian) translation machinery. I wonder whether you could discuss this possibility in your paper and maybe perform some further comparative measurement to test it.

      Significance

      The paper provides novel information on the structure and constrains of mtDNA variants in humans, opens an area of investigation which is new and potentially relevant, with some possible implications also on pathogenic mtDNA mutations in humans.

      Referee Cross-commenting

      I said in my first comment that I am not a bioinformatician, but Referee 2 made a great job in identifying some critical points and suggest the Authors how to cope with them. I maintain my opinion, that I think it's shared by referee 2, that the paper conveys an interesting and rather unexpected message, and that if the Authors are able to answer properly to the points raised by referee 2 the paper should be published. I confirm that the only contribution I feel I can provide to this manuscript is to invite the Authors to consider the possibility that the selection may be due to a preferred codon bias, linked to the higher or lower compliance of different codons to be translated by the translational in situ machinery of mitochondria. I wonder whether the Authors could consider this possibility in the Discussion and possibly perform some further comparative measurement to test it.

  5. Aug 2021
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to Reviewers:

      ## Comments by Reviewer 1

      For the sake of clarity, it may help to provide some table illustrating the proportion of gastruloids behaving precisely as the best example shown

      We thank the reviewer for raising this point. For the expression patterns of each of the main endodermal markers analyzed, at the timepoints considered, we will provide additional context on the variability among Gastruloids. We envisage to provide such results in a format similar to what used e.g. in Supplementary Figure 4 of https://doi.org/10.1038/s41467-021-23653-4 [Xu, Peng-Fei, et al. "Construction of a mammalian embryo model from stem cells organized by a morphogen signalling centre." (2021)]; i.e. categorization of the phenotypes observed among gastruloids, and quantification of their proportions.

      It would also strengthen the message even more to provide some quantitation of co-expression for the main markers. As the behaviour seems very consistent, it is likely that such quantification would not be very arduous, and it would show the strength of the model.

      We thank again the reviewer for highlighting the value of co-localisation data, and in fact this is a suggestion also put forward by our other reviewer. We are currently developing a pipeline to segment the nuclei on the DAPI channel of each immunostained Gastruloid, and extract marker intensities within each cell. We envisage the results to be presented in a format similar to what shown e.g. in Fig. 2 of https://doi.org/10.1242/dev.159103 [Mulas, Carla, et al. "Oct4 regulates the embryonic axis and coordinates exit from pluripotency and germ layer specification in the mouse embryo." (2018)]. Such data would undoubtedly strengthen the reliability of the claims we here draw mostly from a qualitative inference of colocalisation.

      The introduction could be shortened, and the results a bit more to the point.

      As this is also a point raised by both reviewers, we will make sure to go back over the entirety of the manuscript and do the necessary adjustments, especially for what concerns the Results section. We would however like to point out that the unusual length of our Introduction section is to be understood as a deliberate departure from the limits and conventions prescribed by traditional publishing formats. In line with our intentional choice of a journal-independent publication platform (i.e. a preprint server), and of journal-independent review process, we also presented and contextualized our research in a journal-independent format. We see this as an opportunity to present research in a voice and in a form truer to that of the researchers that carried it out. We thank both reviewers for their feedback on the format and will certainly still make sure to minimize redundancy of information throughout our manuscript.

      ## Comments by Reviewer 2

      The following conclusions were not warranted by the findings:

      Endoderm emergence can occur in the absence of extraembryonic tissues and embryonic architecture: It is unclear if extraembryonic tissues (e.g., primitive endoderm and visceral endoderm-like cells) are absent in the early phase of gastruloid development.

      The reviewer raises an important point regarding the presence/absence of cells with extraembryonic endoderm identity within the early developing gastruloid. Our claim of absence of these cell types is grounded on previous gastruloid research (Turner, David A., et al. "Anteroposterior polarity and elongation in the absence of extra-embryonic tissues and of spatially localised signalling in gastruloids: mammalian embryonic organoids." (2017); van den Brink, Susanne C., et al. "Single-cell and spatial transcriptomics reveal somitogenesis in gastruloids." Nature 582.7812 (2020): 405-409.), which has indeed found no evidence of extraembryonic endoderm in Gastruloids. While other datasets do not include early Gastruloid stages where such cells may instead be detected, transcriptional analyses of later timepoint Gastruloids [Rossi, Giuliana, et al. "Capturing cardiogenesis in gastruloids." (2021)] also do not seem able to uniquely define extraembryonic endoderm signatures. Accordingly, we did not see this claim as having to be further substantiated.

      In light of most recent reports (see https://doi.org/10.1038/s41467-021-23653-4 [Xu, Peng-Fei, et al. "Construction of a mammalian embryo model from stem cells organized by a morphogen signalling centre." (2021)]), the necessary absence of extraembryonic endoderm types within self-organising embryonic models may appear to need to be at least revisited. Xu et al indeed report extraembryonic endoderm cells in an embryoid model in many ways analogous to the Gastruloid system. These claims are mainly based on the recovery of DBA (Dolichos Biflorus Agglutinin) signal, a marker traditionally associated with extraembryonic endoderm in vivo. Yet, and based on our own unpublished exploration of the topic, we presently cannot confirm DBA lectins to be an exclusive marker of extraembryonic endoderm in an in vitro setting (i.e. in the cell types obtained from and amongst differentiating stem cells), as not only we detect DBA positivity in wide cellular domains that are incompatible with any realistic estimate of the extraembryonic makeup of Gastruloids, but we also see this marker decorating the membrane of 2D colonies of pluripotent mouse embryonic stem cells maintained under 2iLIF conditions. When counterstaining for Ttr (Transthyretin; aka prealbumin) (a major discriminant of extraembryonic vs embryonic endoderm, as recovered from single-cell datasets; https://endoderm-explorer.com/), we further cannot detect positivity in either DBA+ or DBA- cells.

      To the best of the knowledge and evidence available to us, we thus still consider the absence of extraembryonic endoderm in Gastruloids to be a substantiated claim. We thank however the reviewer for raising this important point and agree that dedicated characterization of extraembryonic endoderm signatures/markers in the developing Gastruloid could certainly help to support the validity of the claims made in previous Gastruloid literature, from which we draw the premise that no extraembryonic endoderm is present. To facilitate this process, and in fact to better contextualize our results in the light of what now published in Xu et al, we will now include detected DBA and Ttr (absent) patterns in early Gastruloids, as well as the expression patterns of extraembryonic endoderm markers for the timepoints for which single cell transcriptomics data is available (i.e. 96h onwards).

      While the gastruloid does not replicate the morphological feature of the post-gastrula embryo, it nevertheless has a certain degree of tissue organization. Perhaps the emergence of DE-like cells in 2-D culture would be a more convincing model for "the absence of extraembryonic tissues and embryonic architecture".

      The observation is correct: gastruloids do not replicate the architecture of the peri gastrulation mouse embryo. Concomitantly, they display a striking degree of tissue (re-/)organization as they proceed throughout differentiation. We were deliberate in the presentation of our data, and as such we always refer to “absence of embryonic architecture” rather than “absence of architecture” (i.e. of any architecture), as the latter assertion would in fact be contrary to a main finding of our investigation.

      Inde ed, the observation that Gastruloids, and specifically endodermal cells within them, can give rise to such developed tissue architectures, by self-organisation and without the need of externally-supplied matrices, is a major focus of our manuscript. Since Gastruloids start as an unstructured, epithelioid, cluster of stem cells, the architectural rearrangements observed in the mature Gastruloid highlight an intrinsic propensity of endodermal cells to forming epithelia. Key aspect to this point, is that these architectural rearrangements are carried out by cells in a landscape that is not architecturally similar to that of the embryo (specifically, where endoderm progenitors do not and cannot arise from a columnar epithelium and do not and cannot have a visceral-endoderm-like destination in which to intercalate).

      Considering what discussed in this and in the previous point, we struggle to frame the Gastruloid as not a “convincing model for the absence of extraembryonic tissues and embryonic architecture", given that it satisfies both criteria. More complete and articulated discussions and appraisals of the value of Gastruloids and other 3D in vitro models towards uncovering fundamental features of embryonic development are available elsewhere, including in their comparison to 2D differentiation assays (van den Brink, Susanne C., and Alexander van Oudenaarden. "3D gastruloids: a novel frontier in stem cell-based in vitro modeling of mammalian gastrulation." (2021); Simunovic, Mijo, and Ali H. Brivanlou. "Embryoids, organoids and gastruloids: new approaches to understanding embryogenesis." (2017); Turner, David A., Peter Baillie‐Johnson, and Alfonso Martinez Arias. "Organoids and the genetically encoded self‐assembly of embryonic stem cells." (2016)). We refer readers to these reviews to help inform their own assessment on the matter.

      Of course, this is not to say that 2D culture models are not an equally valuable system to study peri-gastrulation development (if only as exemplified by https://doi.org/10.1186/s12915-014-0063-7 [Turner, David A., et al. "Brachyury cooperates with Wnt/β-catenin signalling to elicit primitive-streak-like behaviour in differentiating mouse embryonic stem cells." (2014)] and by the numerous studies on micropatterned stem cells). To the best of our knowledge, however, no 2D model of spontaneous, undirected endodermal differentiation has been investigated in detail and from a developmental perspective. We share the reviewer’s interest on the insight that this kind of approach could provide. Still, claims of absence of some degrees of architectural organisation or of extraembryonic tissues are not straightforward in self-organising 2D systems either.On the other hand, 2D approaches that consist in directed differentiation of stem cells to specific endodermal fates are clearly not the type of investigation we were interested within the scope of our experimental questions. We also believe that e.g. differentiation of 2D epithelia that then bud to form 3D spheres are generally contexts that are too decoupled from embryonic modes of development to provide the same degree of developmental insight than e.g. Gastruloids. Having both 2D and 3D platforms at our availability, we opted for latter.

      The following conclusions were not warranted by the findings: […]

      The FoxA2+/Sox17+ endoderm progenitors never transitioning through the mesenchymal intermediates and never leaving the epithelial compartment that they arise: In view of that the stereotypic morphogenetic activity was not documented during the development of the gastruloid, it is not possible to exclude the possibility of the progenitors undergo a partial EMT (loss of epithelial feature and cellular polarity and display of morphogenetic movement, as in vivo) in the transition from progenitor to the epithelial endoderm cells. The DE-like cells when first discerned in the gastruloid are apparently epithelialized. In the absence of lineage tracing results, It is not clear whether they are still residing in the "epithelial compartment that they arise".

      We agree with the reviewer’s comment: the ability to trace endodermal cells throughout their journey in the Gastruloid and throughout differentiation, specifically in conjunction with a live monitor of their epithelial status (e.g. overlayed with a Cdh1 reporter) would provide clear and definitive insight on the endodermal and epithelial transitions taking place in this system. Our conclusions are based on timed immunostaining showing that in early Gastruloids all cells are epithelioid. Finding FoxA2+ (and Sox17+) cells consistently within a Cdh1+ context, while also having necessarily emerged within an epithelioid context, suggests that these cells never leave an epithelial compartment. Within the text, we also put forwards alternative hypotheses equally consistent with our observations: namely that these cells would indeed leave the epithelial compartment (still, through incomplete EMT processes not relying on Snai1 programmes), but reintegrate it at short timescales. We do not find FoxA2+ cells within the mesodermal compartment, as one would expect from comparison with the embryo, and we do not see Snai1 expression within Cdh1+ cells. We would like a live tracing system whereas we could track endodermal identity (e.g. a FoxA2 reporter) while being also able to track its epithelial (E-cadherin, Cdh1) status. We do not foresee to be able to perform such experiment in the near future. Live tracking of Cdh1+ cells in Gastruloids has been described in [Hashmi, Ali, et al. "Cell-state transitions and collective cell movement generate an endoderm-like region in gastruloids." BioRxiv (2020).].

      We use the term “mesenchymal” to signify “not-epithelial”, and as such “Cdh1-negative”. When we hypothesise endoderm cells not to go through EMT, we imply a classical, complete, Snai1-mediated EMT. By “leaving the epithelial compartment” we mean “losing Cdh1 expression” (and as such not being associated anymore with the epithelial/epithelioid compartment). As such, we are not excluding the possibility put forwards by the reviewer: i.e. endodermal progenitors going through a partial EMT with loss of epithelial architecture, but not epithelial markers, and movement in this epithelioid state. In fact, this is the interpretation we are favoring in our report. We will clarify our use of each term within the text of the preprint and provide more clarity to these points. Accordingly, we will rephrase each of the terms above (“mesenchymal”, “absence of EMT”, and “epithelial compartment”) in terms of the Cdh1 and Snai1 status of the cells.

      The mature endoderm cells are patterned segmentally in the gastruloid. The findings that the molecular phenotype (marker expression) of the mature endoderm cells "aligns with (cellular) identities along the entire length of the embryonic gut tube" are not sufficient evidence of spatial A-P patterning of endoderm cells. The expression pattern of Foxa2/Cdh1 (Fig 5d) was not informative of tissue patterning.

      We share the reviewer’s point that alignment of the molecular phenotype (transcript expression) of Gastruloid cells with that of cells along varying position of the gut tube of the embryo does not in fact necessarily imply that these cells are spatially patterned within the Gastruloid. We were deliberate in our presentation of the data on this point, and in fact explicitly presented to readers the equally probable possibility that “the variety of cell identities uncovered in the single cell dataset are intermingled throughout the core of the Gastruloid” (rather than being spatially patterned; lines 787-789). This possibility is in fact provided as the rationale to highlight the need for alternative investigations able to provide spatial information (provided in the following section).

      Promisingly, the markers we did investigate (Pax9 for anterior endoderm; Cdx2 and TBra for posterior identities) were found to not be intermingled throughout the primordium but correctly expressed at anterior and posterior domains (as already known for TBra and Cdx2 from previous Gastruloid literature). We do agree that showing the distribution of AP markers within a same sample would provide a more immediate and compelling visual of the AP patterning of anterior and posterior markers. We also agree that the number of markers we investigated spatially is still restricted, and further characterisation, specifically of middle markers, would provide a more complete picture of the extent to which the Gastruloid primordium is effectively patterned. We would like however to point out that we do not make claims of spatial patterning beyond those supported by the markers we did confirm by immunostaning/HCR; i.e. we only claim spatial patterning of anterior and posterior domains (“Gastruloid endoderm contains patterned anterior and posterior endodermal types”; line 663).

      We agree with the reviewer that the expression pattern of Foxa2/Cdh1 is not informative of tissue patterning. When using these markers on Gastruloids, we indeed use them as “pan-endodermal” markers to identify the general cellular domain at the core of the Gastruloid.

      […] Whether endoderm cells are patterned or not is, however, irrelevant for the understanding of the mode of endoderm formation, unless the timing and the mechanism of allocation of endoderm cells of specific segmental property has been studied in the gastruloid.

      The relevance of the specific set of results indicated by the reviewer (i.e. the topic of patterning of endodermal cells in the Gastruloid) is maybe better appreciated in the context of understanding the modes of later development and maturation of endoderm in vitro, and its self-organising and self-patterning abilities after it has formed, rather than provide direct insights into the formation of the germ layer itself. In the preprint, we indeed present this investigation as a segue to the first set of experiments that instead focus on the formation of the endoderm itself. As the reviewer points out, we have not investigated the specific aspect of timing and mechanisms of allocation of endoderm cells to specific segmental identities as these cells are emerging within the Gastruloid. Our reporting that Gastruloids contain endodermal identities that do end up specified to different segmental identities in the first place sets the basis for the kind of investigation the reviewer suggests.

      it is also unclear if a structure reminiscent of the embryonic gut (closed or partly open) was formed (or self-organised) in the gastruloid.

      We thank the reviewer for raising this point. The structures we describe are initially multi-branched and whisk-shaped (120h), and in turn resolve a single rod-like tissue that follows the outer geometry of the Gastruloid (144h), interfacing with an outer envelope of mesenchymal (non-endodermal) cells. We provide depth-coded maximal intensity projection of these structures (under Cdh1 immunostaining) to try to best convey images of their three-dimensional shape (Figure 4A, Figure 5C). While we believe this epithelial primordium to be fully closed and non-hollow, in fact quite different than the folding/folded epithelium of the post-gastrulating mouse gut endoderm, a better idea of the 3D organisation of this primordium could certainly be provided by outputs from light sheet imaging and series of optical sections along the z-axis of our immunostained samples. We plan to include these alternative visualisations and in the meantime refer to already published light-sheet data as found in [Rossi, Giuliana, et al. "Capturing cardiogenesis in gastruloids." (2021)]. Here too, the endodermal primordium appears as a dense, closed mass of cells. To contextualise these structures with respect to most recent literature, we do not see the kind of vacuolated and segmented structures described instead in https://doi.org/10.1038/s41467-021-23653-4 [Xu, Peng-Fei, et al. "Construction of a mammalian embryo model from stem cells organized by a morphogen signalling centre." (2021)]

      The information regarding the spatial localization of specific germ layer markers in the gastruloids at different timepoints would be important to understand how the morphology progresses and how it is comparable to the developing embryo itself. How is the organisation of the mesoderm and endoderm layers in comparison to embryo in the early timepoints and later timepoints of gastruloids?

      We agree with the reviewer about the helpfulness of such characterisations. While a temporal characterisation of the evolution of the endoderm compartment in relationship to the other cell types is provided in Figure 4A, cells of the other two germ layers are not explicitly labelled. We will provide analogous immunostaining series across Gastruloid development (early to late timepoints), choosing markers able to highlight cells of each of the three germ layers. Given the difficulty of finding specific germ layer markers, and the often-unforgiving limitations on the markers that can be chosen for simultaneous staining due to host-species antibody cross-reactivity, we also find useful to piece together relative germ layer localisation from the much wider imaging data now widely available in previous Gastruloid literature. We thus point interested readers to complement the endoderm descriptions from our preprint with dedicated characterisations of the axial organisation and distribution of the other two germ layers in Gastruloids published in https://doi.org/10.1038/s41586-018-0578-0 [Beccari, Leonardo, et al. "Multi-axial self-organization properties of mouse embryonic stem cells into gastruloids." Nature 562.7726 (2018): 272-276.], which also specifically discusses these aspects in relation to the germ layer organisation of the embryo proper.

      Clarify if Foxa2 and Sox17 double positive cells exist in the Cdh1 patches (Fig 3a). In Fig 4, authors have demonstrated the development of epithelial primordium with overlaying mesodermal wings, however it is important to show if Foxa2, Sox17, or other definitive endoderm markers co-express in these cells.

      We thank the reviewer for highlighting the value of co-localisation data, and in fact this is a suggestion also put forward by our other reviewer. We are currently developing a pipeline to segment the nuclei on the DAPI channel of each immunostained Gastruloid, and extract marker intensities within each cell. We envisage the results to be presented in a format similar to what shown e.g. in Fig. 2 of https://doi.org/10.1242/dev.159103 [Mulas, Carla, et al. "Oct4 regulates the embryonic axis and coordinates exit from pluripotency and germ layer specification in the mouse embryo." (2018)]. Such data would undoubtedly strengthen the reliability of the claims we here draw mostly from a qualitative inference of colocalisation. When showing the distribution of markers within the 120h epithelial primordium, the assumption was that since the entire primordium is FoxA2+, any other marker would colocalise with a subset of them, when expressed within the primordium. We do note the importance of colocalisation data for a more exact description of the cell type identities contained within the primordium.

      It was suggested that E-Cadherin is maintained during endoderm differentiation. N-cadherin expression may be examined to determine if N-cad is expressed in the other region of gastruloids.

      We share the interest in describing N-cadherin expression patterns, especially in the context of EMT development and endoderm epitheliality, and thank the reviewer for highlighting the value of this marker. To the time of publication, we had been unfortunately unable to source a N-cadherin antibody giving good signal quality in our hands. We are planning further staining of early and late Gastruloids for this marker. Given what recently reported in https://doi.org/10.1038/s41556-021-00694-x [Scheibner, Katharina, et al. "Epithelial cell plasticity drives endoderm formation during gastrulation." (2021)] we expect endodermal cells to display double cadherin expression (Cdh1+/Cdh2+), and the mesodermal compartment to display N-cadherin instead only.

      In Fig 6, FACS quantification is not proportional to the expression of the TBra:GFP as shown in the microscopic images at 96 hr, 120hr. Fig 6D does not show the TBra:GFP positive cells on the y -axis in the top-left quadrant, even though it is quite visible in microscopy - at 96, 120 hr. Microscopic images suggest TBra signal is almost completely lost at 128hr whereas FACS does not represent that. Infact, at 120 hours, the plot shows opposite of what microscopy shows.

      The reviewer is correct in pointing out these discrepancies and we will make sure to flag them explicitly in the main body of our report. It appears that the trend in reporter expression highlighted by FACS data is delayed with respect to what visible by live imaging (where TBra loss of expression can already be seen starting at t=120h). The decrease in TBra expression does not instead seem to be recovered even by the last FACS timepoint. Concomitantly, the TBra signal detected during time-lapse seems to decrease to abnormally low level (indeed, be lost), especially if compared to equivalent timepoints processed for FACS. We will investigate whether TBra reporter signal is particularly vulnerable to sustained illumination (i.e. timelapse conditions), as it appears to be still be present at later timepoints in both FACS and end-imaged Gastruloids (see clear posterior expression in Fig 6B). Maintained presence of TBra expression is also what expected and detected by immunostaining in both our data and in general Gastruloid literature. In light of the possible effects of sustained imaging on our reporters, we only use timelapse data to describe global cell movement (and of the FoxA2+ cells, rather than changes in reporter expression), and instead rely on the FACS data for claims based on intensity values. While we resolve the effect of timelapse imaging on TBra reporter detection, we will explicitly highlight the discrepancy between the two investigative approaches within the Results section, and thank the reviewer for bringing this point to our attention.

      Gastruloids were sampled at 96-168 hours for single cell transcriptome analysis. However, the specimens documented in this study were those only up to 144 hours. How does the gastruloid morphology look at 168 hours? It is essential to show the morphology and characterise the further development from 144 to 168 hours, to compare the single cell RNA seq data with the morphology of the gastruloid.

      The reviewer is correct in pointing out the absence of imaging data showing the internal endodermal primordium at t=168h. Examples of 168h Gastruloids (but not of the internal primordium) are shown in Figure 8D, indeed when we compare transcriptomics data with spatial patterning and verify the spatial distribution of late gut markers. At this stage, the internal endodermal primordium is mostly similar to its configuration at t=144h. We will incorporate additional morphological data for Gastruloids at 168h to show the organisation of the endodermal primordium at this later stage and to facilitate morphologytranscriptome comparisons.

      In Fig 7, it is surprising to see that the proportion of cells in the two clusters 13, 4 that mark endoderm are a minor portion of the whole dataset collected, whereas the microscopic images suggest that the majority of the gastruloid structure from 120hr onwards is marked by Foxa2 and shows the epithelial primordium morphology as claimed.

      Microscopic images are optical sections through the midplane of each Gastruloid as to capture the full extent of the internal Gastruloid epithelial primordium. We believe that these pictures fail to accurately convey the degree to which this primordium is in fact completely surrounded by a thick and dense layer of mesenchymal cells, not only in the lateral dimension as can be appreciated e.g. in Figure 4B, but also above and below the plane of the microphotographs. A better idea of the volume occupied by such cells, and the proportion of endodermal vs non-endodermal cells, could be provided by lightsheet imaging. At later timepoints, and as hinted by e.g. panel 4B [144h], the volume occupied by non-endodermal cells can be considerable. Linking back to a previous suggestion from the reviewer, we believe documentation of Gastruloid morphology at 168h could help to further clarify the relationship between data coming from the single cell dataset and morphological data as seen from the Gastruloid themselves. This might also be a further opportunity to underscore that the single-cell dataset accessed for our endoderm-targeted analysis was produced in the context of a different study https://doi.org/10.1016/j.stem.2020.10.013 [Rossi, Giuliana, et al. "Capturing cardiogenesis in gastruloids." (2021)], in which Gastruloids made with this same cell line were additionally treated with cardiogenesis-inducing factors. The proportion of cells classified as cardiac mesoderm and associated mesodermal types is thus likely much over-represented compared to what present in non-induced Gastruloids (i.e. those considered in this report), and as in fact illustrated by the imaging data presented in the paper.

      The single-cell RNA-seg data should be analysed for the co-expression of multiple segment-specific cell markers to ensure that the mature endoderm cells align with high-confidence with the known cell types in different segments of the embryonic gut, and that the localization of representative cell types can be validated spatially along an endoderm structure in single gastruloids.

      We will analyze the single-cell RBAseq dataset to show the co-expression of segment specific markers. As the reviewer points out we have only shown the patterns of expression of single markers at a time. As mentioned in a previous point, we also agree that the number of markers we verified to be spatially pattern is still restricted, and further characterisation, specifically of middle markers, would provide a more complete picture of the extent to which the Gastruloid primordium is effectively patterned. We do also agree that showing the distribution of AP markers within a same sample would provide a more immediate and compelling visual of the AP patterning of anterior and posterior markers.

      It is not clearly indicated how many replicates were performed to assure consistency/reproductivity of the gastruloid results. Statistical results were not provided for most of the immunostaining experiments, either in the main text or in the figure legends.

      Both reviewers highlight the qualitative nature of much of the data that is presented. Accordingly, we will more clearly and more consistently indicate the number of samples analysed and the number of replicates considered. For what concerns the “statistical results” of immunostaining experiments, and as indicated in response to a previous comment, we envisage to provide such results in a format similar to what used e.g. in Supplementary Figure 4 of https://doi.org/10.1038/s41467-021-23653-4 [Xu, Peng-Fei, et al. "Construction of a mammalian embryo model from stem cells organized by a morphogen signalling centre." (2021)]; i.e. categorization of the phenotypes observed among gastruloids, and quantification of their proportions. To convey statistics on the variability among Gastruloids, our current pipeline can output scatterplots like the one presented in Figure 4C, where the datapoint spread informs about the variability of the data. We will provide statistics on these plots in a numerical format.

      Majority of the images presented in the manuscript are shown as Maximum Intensity Projections, and it is not clearly stated if the localisation of the cells expressing specific protein markers are present on the surface or in the internal layers of the gastruloids. Optical slices of the gastruloid images may be presented as supplementary information.

      Most of the images presented in the manuscript are presented as optical cross-section through the midplane of the Gastruloid. We can make this clearer in the text. Indeed, and since this report focuses on on the spatial distribution of markers along the AP axis of the Gastruloids (and not DV or LR axes), we found midplane optical sections to best capture the entirety of the axis and thus the full extent of marker pattern (where those patterns are not asymmetric along the DV axis). All cells positive for immunostained markers are thus to be interpreted to be within the midplane of the Gastruloid, and as such internal to the Gastruloid if falling internally to the midplane, and external to the Gastruloid when falling towards the edges. As suggested by the reviewer we will acquire optical sections along the z-stack of our immunostained samples and provide them as supplementary information. Maximum intensity projections were only shown when wanting to better show the 3D structure of the internal epithelial primordium, and were always depth-coded to aid visualisation (Figure 4A, rightmost panel, Figure 5C).

      - Are prior studies referenced appropriately?

      Yes, except for the study on endoderm formation by lineage tracing in vivo, high-resolution single-cell analytics and functional analysis of genetic mutant embryos.

      We thank the reviewer for pointing out inappropriate referencing of studies on these three topics, yet given the breadth of each of these topics and the absence of any more specific information we remain unsure on how to address this comment appropriately. We would thus like to warn readers that this comment might have remained unaddressed.

      Results may be presented with reference to the data figures in the appropriate sequential order, or the figures may be re-organised to match the presentation in the Results.

      We thank the reviewer for bringing this to our attention. The figures are automatically organised in the order they are presented in the Results, yet since most of the figures are page-long their placement may appear odd. We will look into the matter and readjust figure positioning where possible, and/or reduce mentions of data shown in previous figures.

      Reduce the verbosity throughout the manuscript, especially the Results and Discussion.

      As this is also a point raised by both reviewers, we will make sure to go back over the entirety of the manuscript and do the necessary adjustments, especially for what concerns the Results section. We would however like to point out that the unusual length of our Introduction section is to be understood as a deliberate departure from the limits and conventions prescribed by traditional publishing formats. In line with our intentional choice of a journal-independent publication platform (i.e. a preprint server), and of journal-independent review process, we also presented and contextualized our research in a journal-independent format. We see this as an opportunity to present research in a voice truer to that of the researchers that carried it out. We thank both reviewers for their feedback on the format and will certainly still make sure to minimize redundancy of information throughout our manuscript.

      This study on endoderm development, however, is confounded by the inherent limitation of the experimental model: lack of extraembryonic tissue components, the atypical morphological structure and the deviation from the in vivo schedule of development and morphogenesis. This may raise doubt of the relevance of the findings to the

      We find difficult to share the view expressed by the reviewer here. The “lack of extraembryonic tissue components, the atypical morphological structure, and the deviation from the in vivo schedule of development and morphogenesis”, which they identify here as the inherent limitations of the model, represent in fact its value for us and for most in the Gastruloid field. For more complete and articulated discussion on how it is precisely the differences with the embryo proper that provides insights when studying in vitro models of embryonic development, we refer interested readers to dedicated discussions on the topic [van den Brink, Susanne C., and Alexander van Oudenaarden. "3D gastruloids: a novel frontier in stem cell-based in vitro modeling of mammalian gastrulation." (2021); Simunovic, Mijo, and Ali H. Brivanlou. "Embryoids, organoids and gastruloids: new approaches to understanding embryogenesis." (2017); Turner, David A., Peter Baillie‐Johnson, and Alfonso Martinez Arias. "Organoids and the genetically encoded self‐assembly of embryonic stem cells." (2016)).]. In fact, we share the reviewer’s concern about the relevance of these findings to “understanding of the morphogenetic activity and molecular control of endoderm formation during gastrulation in the embryo”. The constant questioning of such relevance is in fact a central point in Gastruloid research, where insight comes from a dialectic comparison between what observed in vitro and what known to happen in vivo. The reviewer questions the relevance of the findings “to the understanding of the morphogenetic activity and molecular control of endoderm formation during gastrulation in the embryo”. The relevance of our findings might be better framed in that they provide a better “understanding of the morphogenetic activity and molecular control of endoderm formation” tout court. In this case, outside of the embryo (and inside a self-organising developmental system), and reflections on this inform better understanding of how endoderm might be developing in vivo.

      Our findings in the Gastruloid open lines of inquiry to be verified and tested in the embryo. Both similarity and differences being equally informative on the intrinsic and extrinsic elements of endoderm behaviour. In fact, where some of the aspects we describe have been investigated in the embryo proper (see [Scheibner, Katharina, et al. "Epithelial cell plasticity drives endoderm formation during gastrulation." (2021)]) many of the same themes have emerged.

      Knowledge gleaned from the present study on the gastruloid study added little to that of a recent study of the morphogenetic program of endoderm formation in the mouse embryo and the ESC differentiation model (https://doi.org/10.1038/s41556-021-00694-x ) .

      The reviewer is absolutely correct in mentioning [Scheibner, Katharina, et al. "Epithelial cell plasticity drives endoderm formation during gastrulation." (2021)] to the readers of this public review. Given that these results were published posteriorly to our preprint, we could only reference them in our later versions, and we did this extensively throughout the text. We consider the paper mentioned by the reviewer [Scheibner, Katharina, et al. "Epithelial cell plasticity drives endoderm formation during gastrulation." (2021)] to represent a major contribution to the topic of (mouse) endoderm development, specifically in its investigation of endoderm EMT mechanisms as they take place within the mouse embryo itself. In relationship with what we describe here in Gastruloids, we see what reported by Scheibner’s et al extremely validating and as a very strong example of the investigative validity of in vitro models of development. Here is the in vivo exploration of the same topic highlighting many of the endoderm features we had inferred from in vitro observations, or that our observations further supported (specifically, incomplete EMT, tight association with epithelioid character, low evidence for mesendodermal intermediates etc..).

      To say that our study added very little to what now available in [Scheibner, Katharina, et al. "Epithelial cell plasticity drives endoderm formation during gastrulation." (2021)] represents however an inaccurate view of our study as being a study of endoderm development in vivo (see also answer to previous point). We would also want to point out that a major portion of this preprint describes the self-organisation of endoderm in vitro, the emergence and development of almost all AP-endoderm identities by self-organisation, the effective spatial patterning of at least some of these (waiting further characterisation), and the description of an accessible, tractable, reproducible in vitro model system to study endoderm development and provide populations of interest for culture. Our study also provides further insight on the necessary inputs to endoderm development and patterning, and whether extracellular matrices and extraembryonic tissues are part of such necessary inputs. Sharing the view expressed by the other reviewer, we see great insight from all of these aspects, and these are certainly not the topic or focus of the study referenced by this reviewer.

      As we do throughout the preprint, we strongly encourage readers interested in the topic to refer to [Scheibner, Katharina, et al. "Epithelial cell plasticity drives endoderm formation during gastrulation." (2021)] for insights coming from the embryo proper. We also take the opportunity to stress the value of all types of science, be it incremental, consolidating, or complementary.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      • Are the key conclusions convincing?

      The following conclusions were not warranted by the findings: Endoderm emergence can occur in the absence of extraembryonic tissues and embryonic architecture: It is unclear if extraembryonic tissues (e.g., primitive endoderm and visceral endoderm-like cells) are absent in the early phase of gastruloid development. While the gastruloid does not replicate the morphological feature of the post-gastrula embryo, it nevertheless has a certain degree of tissue organization. Perhaps the emergence of DE-like cells in 2-D culture would be a more convincing model for "the absence of extraembryonic tissues and embryonic architecture".<br> The FoxA2+/Sox17+ endoderm progenitors never transitioning through the mesenchymal intermediates and never leaving the epithelial compartment that they arise: In view of that the stereotypic morphogenetic activity was not documented during the development of the gastruloid, it is not possible to exclude the possibility of the progenitors undergo a partial EMT (loss of epithelial feature and cellular polarity and display of morphogenetic movement, as in vivo) in the transition from progenitor to the epithelial endoderm cells. The DE-like cells when first discerned in the gastruloid are apparently epithelialized. In the absence of lineage tracing results, It is not clear whether they are still residing in the "epithelial compartment that they arise".

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The mature endoderm cells are patterned segmentally in the gastruloid. The findings that the molecular phenotype (marker expression) of the mature endoderm cells "aligns with (cellular) identities along the entire length of the embryonic gut tube" are not sufficient evidence of spatial A-P patterning of endoderm cells. Only the spatial regionalization of Pax6-expressing cells (Fig. 8) and Cdx2-expressing cells (Fig 4C) were shown on different gastruloid specimens. The expression pattern of Foxa2/Cdh1 (Fig 5d) was not informative of tissue patterning. It is also unclear if a structure reminiscent of the embryonic gut (closed or partly open) was formed (or self-organised) in the gastruloid. Whether endoderm cells are patterned or not is, however, irrelevant for the understanding of the mode of endoderm formation, unless the timing and the mechanism of allocation of endoderm cells of specific segmental property has been studied in the gastruloid.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      Specific points:

      1. The information regarding the spatial localization of specific germ layer markers in the gastruloids at different timepoints would be important to understand how the morphology progresses and how it is comparable to the developing embryo itself. How is the organisation of the mesoderm and endoderm layers in comparison to embryo in the early timepoints and later timepoints of gastruloids?
      2. Clarify if Foxa2 and Sox17 double positive cells exist in the Cdh1 patches (Fig 3a). In Fig 4, authors have demonstrated the development of epithelial primordium with overlaying mesodermal wings, however it is important to show if Foxa2, Sox17, or other definitive endoderm markers co-express in these cells.
      3. It was suggested that E-Cadherin is maintained during endoderm differentiation. N-cadherin expression may be examined to determine if N-cad is expressed in the other region of gastruloids.
      4. In Fig 6, FACS quantification is not proportional to the expression of the TBra:GFP as shown in the microscopic images at 96 hr, 120hr. Fig 6D does not show the TBra:GFP positive cells on the y -axis in the top-left quadrant, even though it is quite visible in microscopy - at 96, 120 hr. Microscopic images suggest TBra signal is almost completely lost at 128hr whereas FACS does not represent that. Infact, at 120 hours, the plot shows opposite of what microscopy shows.
      5. Gastruloids were sampled at 96-168 hours for single cell transcriptome analysis. However, the specimens documented in this study were those only up to 144 hours. How does the gastruloid morphology look at 168 hours? It is essential to show the morphology and characterise the further development from 144 to 168 hours, to compare the single cell RNA seq data with the morphology of the gastruloid.
      6. In Fig 7, it is surprising to see that the proportion of cells in the two clusters 13, 4 that mark endoderm are a minor portion of the whole dataset collected, whereas the microscopic images suggest that the majority of the gastruloid structure from 120hr onwards is marked by Foxa2 and shows the epithelial primordium morphology as claimed.
      7. The single-cell RNA-seg data should be analysed for the co-expression of multiple segment-specific cell markers to ensure that the mature endoderm cells align with high-confidence with the known cell types in different segments of the embryonic gut, and that the localization of representative cell types can be validated spatially along an endoderm structure in single gastruloids.
      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      The suggested experiments may be accomplished in a few months.

      • Are the data and the methods presented in such a way that they can be reproduced?

      Yes for the methods. It is not clearly indicated how many replicates were performed to assure consistency/reproductivity of the gastruloid results.

      • Are the experiments adequately replicated and statistical analysis adequate?

      Statistical results were not provided for most of the immunostaining experiments, either in the main text or in the figure legends.

      Minor comments:

      • Specific experimental issues that are easily addressable.

      Majority of the images presented in the manuscript are shown as Maximum Intensity Projections, and it is not clearly stated if the localisation of the cells expressing specific protein markers are present on the surface or in the internal layers of the gastruloids. Optical slices of the gastruloid images may be presented as supplementary information.

      • Are prior studies referenced appropriately?

      Yes, except for the study on endoderm formation by lineage tracing in vivo, high-resolution single-cell analytics and functional analysis of genetic mutant embryos.

      • Are the text and figures clear and accurate?

      Results may be presented with reference to the data figures in the appropriate sequential order, or the figures may be re-organised to match the presentation in the Results.

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      While taking into consideration the limitation of this embryo model for studying morphogenesis, highlight the interesting/unique findings of the gastruloid study, albeit they may have been discovered in the study of embryos in vivo.

      Reduce the verbosity throughout the manuscript, especially the Results and Discussion.

      Significance

      • Describe the nature and significance of the advance (e.g., conceptual, technical, clinical) for the field.

      This study demonstrated that definitive endoderm (DE)-like cells can be generated in the stem cell-derived embryo-like structure, the gastruloid. This observation is significant for that gastruloids can serve as an amenable experimental model, in comparison to other 2D invitro differentiation models, for elucidating the requisite cellular process in endoderm differentiation and the acquisition of cell identity. It was inferred that, in the gastruloid, epithelial-mesenchyme transition may not be a requisite cellular process for the formation of the DE-like cells and that endoderm formation may not involve progression through an intermediate mesenchymal state. The progenitor cells may have acquired and retained the attribute of epithelization during differentiation and organization the DE-like cells into an endoderm layer.<br> This study on endoderm development, however, is confounded by the inherent limitation of the experimental model: lack of extraembryonic tissue components, the atypical morphological structure and the deviation from the in vivo schedule of development and morphogenesis. This may raise doubt of the relevance of the findings to the understanding of the morphogenetic activity and molecular control of endoderm formation during gastrulation in the embryo.

      • Place the work in the context of the existing literature (provide references, where appropriate).

      Knowledge gleaned from the present study on the gastruloid study added little to that of a recent study of the morphogenetic program of endoderm formation in the mouse embryo and the ESC differentiation model (https://doi.org/10.1038/s41556-021-00694-x ) . This recent study has advanced the understanding of the functional attributes that segregate the endoderm progenitors and mesoderm progenitors in the primitive streak and the posterior epiblast, and has characterised the role of epithelial plasticity and the modulation of EMT activity under the control of Forkhead box transcription factor A2 and modulation of WNT signalling in the formation of the definitive endoderm.

      • State what audience might be interested in and influenced by the reported findings.

      Developmental biologists, stem cell scientists and embryo modellers

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Mouse embryogenesis, gastrulation, in vitro stem cell differentiation, advanced microscopy, single cell transcriptomics.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The study adresses a long-standing question in mouse gastrulation: the existence of a mesendodermal progenitor, similar to other species. Recent data in the mouse point towards a direct transition from epiblast to endoderm without going through a bipotent progenitor, and without loss of epithelial characteristics (notably Probst 2021). The authors take advantage of the gastruloid system to follow the emergence of endoderm cells. Through co-staining with markers of various germ layers at different timepoints, live imaging, and reanalysis of single cell RNASeq data, they propose a model in which endoderm cells differentiate from E-cadherin positive cells without going through a bipotent stage. Interestingly, they then organise a rod like structure along the anterior-posterior axis of the gastruloid, that display some polarity illustrated by a marker of anterior gut.

      The data present show very high reproducibility, and authors fully exploit the organoid system by analysing a large amount of samples and showing very similar results. The results are qualitatively very convincing. For the sake of clarity, it may help to provide some table illustrating the proportion of gastruloids behaving precisely as the best example shown. It would also strengthen the message even more to provide some quantitation of co-expression for the main markers. As the behaviour seems very consistent, it is likely that such quantification would not be very arduous, and it would show the strength of the model.

      The manuscript is pleasantly written and all statements are clearly explained. It is a bit long though, in particular the introduction, some of which might read more like a review of the field. It is less striking in the Results part, but there is some level of repetition that is a bit distracting.

      Significance

      The question is important and timely, and the data are clear and convincing. There have been a number of publications addressing it in the last years, either in the embryo or in gastruloids (notably Hashmi 2021). All data appear to point in the same direction, and this study is certainly an important contribution. An original aspect to my knowledge is the self organisation of the central rod. The system is simple, reproducible, and opens novel possibilities to dispose of a large number of cells to explore the emergence of endoderm subpopulations.

      The fact that several studies converge is rather an advantage as they all have specificities, and I believe they are adequately cited here.

      In summary this is an important and well conducted study that may just benefit from some additional quantification to prove robustness. In terms of writing, it is pleasant and quite literary, but perhaps a little bit too much so. The introduction could be shortened, and the results a bit more to the point.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the Reviewers

      We would like to thank the reviewers for their thoughtful comments and efforts towards improving our manuscript. Based on the reports, we have revised the manuscript entitled “Bi-phasic effect of gelatin in myogenesis and skeletal muscle r__egeneration__ (RC-2021-00854 )”. We have addressed all the concerns, revised the text and figures. Modified parts are in red, and line numbers are tracked in this response letter. Our detailed point-by-point responses to the reviewers’ comments are listed below. We believe that these changes strengthen the manuscript and are grateful to the reviewers for all of their suggestions.

      Point-by-point description of the revisions

      R__eviewer #2 (Evidence, reproducibility and clarity (Required)):__

      \*Summary:***

      The manuscript by Xiao Ling Liu and colleagues titled "Bi-phasic effect of gelatin in myogenesis and skeletal muscle regeneration" deals with the effect of gelatin on differentiation of myoblast cell line, in vitro, and on skeletal muscle regeneration upon muscle injury, in vivo. In vivo, the gelatin is a product of collagen breakdown associated with skeletal muscle regeneration upon acute or chronic muscle damage.

      Specifically, the authors define a dose-dependent effect of gelatin, beneficial at low dose and detrimental at high dose. This effect is mediated by the level of ROS accumulation leading to the induction of different cytokines with opposite effects on skeletal muscle regeneration.

      **Major comments:**

      The experimental purpose is well tackled from both biochemical and functional point of view, and the proposed experiments are quite exhaustive.

      Response: We are grateful for the encouraging comments from this reviewer.

      However, I would suggest some additional experimental analyses to improve the robustness and quality of the study, as well as text and figure editing, as reported below.

      Regarding the additional experiments/analyses/images:

        • Figure 5: I would suggest to add an image of C2C12 cells in GM (growth medium), as representative images of proliferation analysis upon LCG/HCG/NAC treatment.* Response: We appreciate this suggestion by the reviewer. New images of C2C12 cells upon LCG/HCG/NAC treatment have now been added as in Supplementary Figure 4B. The new results are described in main text Lines 237-239.
      • Figure 5: I would suggest to repeat the main si-NOX2 experiments with an alternative siRNA to rule out off target effects.*

      Response: We thank the reviewer for this suggestion. We have now added a new siRNA targeting NOX2 with an independent sequence and shown the results in Figure 5F-J and Supplementary Figure 4H-J. The sequences of si-NOX2 and si-NC are shown in Materials and Methods Lines 681-687. The newly added siRNAs confirmed results with the previous siRNA, suggesting unlikely off-target effects. This result is described in the main text Lines 246-256.

      • In vivo experiments could be improved by adding DHE or DCFH staining on muscle TA cryosections to quantify the level of oxidative stress.*

      Response: We appreciate this suggestion by the reviewer. We have now stained DCFH-DA in TA cryosections to quantify oxidative status in situ and show the results in Supplementary Figure 8A-B. Indeed, low- and high-dose gelatin injections both triggered ROS production, and high-dose injection resulted in a high accumulation of ROS. The new result is described in the main text Lines 318-321.

      • The proposed model could be better tackled by additional in vivo treatment with Ab anti IL-6 or anti TNFalpha in combination with CTX and LCG or HCG, followed by H/E staining at 14 dpi.*

      Response: We appreciate this suggestion by the reviewer. We have now added in vivo treatment with IL-6 or TNFα neutralizing antibody (Ab) in combination with CTX and LCG or HCG. The procedure is illustrated in Figure 8J and described in main text Lines 522-525. H&E staining showed that anti-IL-6 Ab injection significantly reduced the beneficial effect of LCG, but had no effect on HCG-treated mice. By contrast, anti-TNFα Ab injection significantly suppressed infiltration of macrophages into the injury site upon HCG, reversed the deleterious effect of HCG on muscle repair, resulting in myofibers with higher CSA and more myofibers with central nuclei. The new results are described in Figure 8J-K and main text Lines 331-346.

      **Minor comments:**

      1. Each acronym should be indicated in full in the main text at the first mention (for instance BHP, NAC and others). Moreover, I would suggest to add an acronym list for reagents and factors Response: We thank the reviewer for this suggestion and have now added an acronym list in Material and Methods, Lines 466-504.

      Experimental methods should be better detailed; for instance I would suggest:

        • Add a detailed description of the quantification of differentiation indexes* Response: We thank the reviewer for this suggestion. We have now added a detailed description on the quantification method of myogenesis and differentiation to Material and Methods, Lines 577-582.
      • Explain how cell growth (OD 450nm) and optical density (570nm) assays have been performed*.

      Response: We thank the reviewer for pointing this out. The cell growth was examined by measuring dehydrogenase activities that generate a soluble formazan dye, whose OD 450nm value was measured as a proportional value to the number of viable cells in the sample (CCK-8 kits according to manufacturer’s instructions).

      The transwell cell migration ratio was determined by measuring the optical density of crystal violet staining (570 nm) of migrated cells on the bottom of the transwell filters (transwell filters have 8 μm pores to allow cells to pass through). The non-migrated cells on the top of the transwell filter were scraped away.

      Detailed descriptions have now been added to Material and Methods, Lines 593-609.

      • Explain how ROS species and antioxidant enzymes have been measured (Fig 4C and 4D)*

      Response: We thank the reviewer for pointing this out. The levels of ROS species (O2.- , OH· and H2O2) and antioxidant enzymes in Figure 4C and 4D were examined using commercial kits according to the manufacturer’s instructions (Nanjing Jiancheng Bioengineering Institute, Nanjing, China). Briefly, cells were lysed in RIPA lysis buffer and protein concentrations were determined using BCA method. The O2.- level and SOD activity were measured by adding electron transfer substances that reduce azo blue tetrazole to blue methionine. The activity of SOD was evaluated by the absorption of methionine. GSH-PX facilitates the reaction between H2O2 and GSH to produce H2O and oxidized glutathione (GSSG). The activity of GSH-PX can thus be obtained by measuring the consumption of GSH in this enzymatic reaction. OH· was measured using Fenton reaction. The level of H2O2 was determined according to the reaction with molybdic acid. We have now added the corresponding information to Figure. 4 legend and main text Lines 1066-1067. Detailed protocols can be found in Material and Methods, Lines 704-714.

      Figures and figure legends:

      1. Please add in the figures the figure number in order to facilitate the reading of the pdf file Response: We thank the reviewer for this suggestion and have added figure numbers to the PDF file.

      The sequence of the panels should be coherent with the alphabet and reading left to right and up to down

      Response: Yes, we have rearranged the figure layouts to be coherent.

      • In the Figure 1, I would suggest to add the whole TA sections for H/E staining in order to appreciate the overall beneficial or detrimental effect of LCG and HCG, respectively.*

      Response: We thank the reviewer for this suggestion and have added the whole-section view of TA with H&E staining as in Supplementary Figure 1C. The results are described in the main text Lines 132-136.

      • I would suggest to show Supplementary figures 1C and 1D in the Figure 1*.

      Response: We thank the reviewer for this suggestion and have now moved Supplementary Figures 1C and 1D to Figure 1F and 1G.

      In the Supplementary Fig 2A, I would suggest to the authors to show and comment only the data about proliferation: the earlier orientation, fusion and differentiation of C2C12 exposed to LCG are a consequence of the positive effect of LCG on proliferation

      Response: We thank the reviewer for this suggestion and have removed data and comments except for proliferation of C2C12 in Supplementary Figure 2A.

      • The quality of representative images of western blot is not always high: the bands are fuse and, consequently, the quantification is not reliable. For instance Fig S4 B, S4 G; in Fig 2 the representative image does not really represent the reported quantification.*

      Response: We thank the reviewer for this suggestion and have replaced western blot images in Supplementary Figure 4B, 4G and Figure 2.

      • Please specify in the figure legends the meaning of the acronym in the axis title (for instance RFU, MFI or DCF) and in the axis title the unit of measure (for instance Count (?)).*

      Response: We thank the reviewer for this suggestion and have spelt out “RFU” as “Relative fluorescence intensity”, changed “MFI” to “Relative fluorescence intensity of NOX2” in Figure 4H and Figure 8H. The unit of measurement is the fluorescence intensity. The related modifications have been described in the legend of Figure 4H and Figure 8H, Lines 1071, 1125-1126. We have now included “DCF: 2',7'-dichlorofluorescein” in the acronym list in Material and Methods, Line 473. DCF positive ratio is now used to reflect ROS level in the population. We have explained “Count” in the legend as “Cell counts”.

      The authors always wrote "filed" in place of "field"**.

      Response: We apologize for this typo and have corrected it with others throughout the text.

      • In the figures 7 and 8 the letters for densitometry panel are missed.*

      Response: We could not identify those missing labels and letters in the densitometry panel. We suspect this issue could be from the soft wares opening the documents.

      • In the figure legend 8 the panel letters do not match the panels in the figure*.

      Response: We thank the reviewer for pointing this mistake out and have corrected Figure legend 8.

      • Figure 3I: replace "nucleis" with "nuclei"*.

      Response: We thank the reviewer for pointing out this mistake and have modified Figure 3I.

      Text editing:

        • Line 64: Satellite cells (SCs) would be more appropriate than myoblasts* Response: We thank the reviewer’s suggestion and have replaced myoblasts with satellite cells in the main text Line 60.
      • Line 64: Please define more carefully the location of SCs*

      Response: We thank the reviewer for this suggestion. SCs are underneath the basal laminin and myofiber plasma membrane in the resting skeletal muscle. The results are described in the main text Lines 60-65.

      • Line 67: MyoD+/Myog+ would be more appropriate than Pax7+/Myog+*

      Response: We thank the reviewer’s suggestion and have changed pax7+/MyoG+ into MyoD+/MyoG+ in the main text Line 64.

      • Line 67: (Pax7+)/Myog+ "myocytes" in place of myotubes ... and fuse with each other to generate myotubes*

      Response: We thank the reviewer for pointing this out and have modified the sentence in Lines 64-65.

      • Line 69: Please add Myf6/MRF4 to MRFs list*

      Response: We thank the reviewer for pointing out and have added Myf6/MRF4 to MRFs list in the main text Line 66.

      • Line 112: replace "its" with "their"*

      Response: We thank the reviewer for pointing out this mistake and have replaced “its” with “their”.

      • Lines 330-331: "promoted IL-6" "enhanced TNF", please insert secretion/production*

      Response: We thank the reviewer’s suggestion and have inserted “production and secretion” behind IL-6 and TNFa in the main text Lines 310-312. .

      • Line 352-353: this sentence is not necessary*

      Response: We have deleted this sentence.

      Reviewer #2 (Significance (Required)):

      The study reports robust and interesting data applicable to both basic research and translational research, such as tissue engineering applications.

      Response: We thank the reviewer for sharing this positive and important opinion on our work. We are delineating the biological pathways of gelatin treatments, motivated by the application of this biocompatible and industrial material for treating disease and aging-related skeletal muscular dystrophies

      Keywords for field of expertise of this reviewer:

      Skeletal muscle regeneration

      Duchenne Muscular Dystrophy

      Inflammation

      Macrophages

      Oxidative stress

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary**

      The Review Commons submission by Liu and colleagues entitled "Biphasic effect of gelatin in myogenesis and skeletal muscle regeneration" provide a systematic in vitro and in vivo evaluation of the effects of myogenic cell exposure to low or high dose gelatin. Through these analyses they uncover pro- and anti-regenerative effects of gelatin that are dose dependent and through a series of cell and molecular studies, they attribute these dual effects to a ROS-IL6/TNFa signaling axis. The study is well designed and executed. For the most part the figures and experimental details are clear and transparent, and most of the conclusions are supported by the data. Specific comments follow.

      Response: We are grateful for the reviewer’s encouraging comments.

      **Major Comments**

        • Study framing in the Introduction is mis-aligned with the research conducted. Currently the Introduction sets the stage for an in vivo exploration of the effects of endogenous gelatin produced in the course of muscle regeneration. However, there are no experiments in this paper investigating the presence or effects of endogenous gelatin.* Response: We thank the reviewer for pointing this out and apologize for the misleading parts in our abstract and introduction. The reported phenomenon of a temporary breakdown of collagen after skeletal muscle injury has inspired us to apply gelatin for achieving a pro-regenerative effect. Although it will be a very interesting biological pathway to study, no adequate research tools are available for measuring endogenous gelatin in vivo. We have re-written parts in abstract and introduction to avoid confusion, see Lines 34-35, 78-93.

      1.a. The impact of the study would indeed be increased by including a systematic characterization of endogenous gelatin levels during muscle regeneration in healthy mice as compared to in those where fibrosis is prevalent. A demonstration that ROS-IL6/TNFa levels align with patterns seen in the in vitro studies, and pharmacological manipulations to 'rescue' would all provide a demonstration of a hormesis gelatin response in vivo. Meaning, is this process something that naturally occurs in the physiological context, or is it one that is possible, but only be supraphyiological gelatin injections?

      Response: We resonate with the reviewer and would have loved to investigate whether what we have observed is a fundamental mechanism of the natural healing process. But we are currently limited by the lacking of adequate tools for endogenous gelatin quantification. We appreciate the reviewer’s suggestion to compare normal and aberrant repairing processes such as fibrosis and would like to explore the possibility in a separate future study.

      1.b. Alternatively to 1.a., the authors should reframe the Introduction to focus on understanding the effects of gelatin as a biomaterial that is being used in regenerative medicine applications. In this case, the authors should delete/edit/reframe lines 74-102 and instead use lines 103 on to motivate the study so as to be consistent.

      Response: We have rephrased sections of Abstract and Introduction to focus on gelatin as a biomaterial. Please find the changes in the main text Lines 34-35, 78-93.

      • Satellite cell conclusions in Figures 2C-D that are based upon representative images provided in 2A, are questionable. Pax7 staining in mouse tissue sections is notoriously difficult and the antibodies can have dramatic lot to lot variability. The immunostaining provided in the representative images is not convincing, and hence, draws into question the conclusions based upon them.*

      Response: We thank the reviewer for the suggestion and have optimized Pax7 staining based on the published protocol by Feng et al., 2018 in JoVE. The new images can be found in Figure 2A.

      2.a. If the authors wish to leave the satellite cell conclusions in their study, they will need to optimize their Pax7 staining and repeat this study. They should focus on Pax7+ objects that contain a nucleus and are located below the basal lamina. Also, the word 'activation' in line 180 should be edited to 'expansion' as the histological analysis and study design preclude an evaluation of satellite cell activation.

      Response: Our new results after optimizing staining protocol support that low-dose gelatin injection causes more Pax7+ cells (green) underneath the basal lamina (red) in injured TA muscle and high-dose gelatin injection suppressed the number of Pax7+ SCs at 7 D.P.I. The new results were shown in Figure 2A and described in the main text Lines 152-155.

      We have changed the word “activation” into “expansion” according to reviewer’s suggestion in the main text Line 161.

      2.b. Alternatively to 2.a., it would not diminish the impact of this study to remove the 'satellite cell' findings in their entirety from the manuscript.

      Response: Please see above. We hope the new images are convincing to this reviewer.

      • It is surprising that the molecular hallmarks of low vs high gelatin injection shown in Fig. 8 would still be present at a time point 2-weeks after the initial injection.*

      Response: Please see below 3.b.

      3.a. It would increase the impact of the study to better understand the basis of this surprising observation. This point links to point 1.a. as one would ideally need to quantify baseline gelatin levels pre-injury and post-injury. For example, is the injected gelatin still present 14-days after injection? Or is the MMP profile altered in a way that sustains these levels one direction or the other? Etc.

      Response: Please see below 3.b.

      3.b. Alternatively to 3.a., the authors should use the Discussion to note this point and speculate on the significance.

      Response: We thank the reviewer for making this important point. The sustained effect of gelatin materials has been reported in several previous studies, and we now have discussed possible mechanisms such as MMP expression profiles and lasting interplays between SCs, macrophages, ECM, and myoblasts. See the main text Lines 437-459.

      **Minor Comments**

        • The authors should conduct a careful review of the manuscript to address minor typos and grammatical errors.* Response: We thank the reviewer for pointing this out and have carefully reviewed the manuscript and corrected minor typos and grammatical errors.
      • It is unclear from the Figure Legends, Results, or Methods what the 'PBS' condition in Figure 1 refers to. Is this the uninjured control? If so, consider using 'Cntrl' as the label and then defining it in the figure legend for clarity.*

      Response: We thank the reviewer for pointing this out. Yes, PBS/phosphate-buffered saline is the vehicle for delivering CTX thus representing the uninjured model. In the revised manuscript, we have replaced “PBS: phosphate-buffered saline” with “Ctrl” in Figures, Figure Legends, Results and Methods.

      • It is unclear from the Figure Legends, Results, or Methods what is quantified to read-out the 'myogenesis index' and 'fusion index' that is reported in Fig. 5D & E. Please reconcile.*

      Response: This point has been addressed in the previous section. Myotube formation was quantified using myogenesis and fusion index. Myogenesis index (% nuclei within MyHC-stained myocytes/total nuclei) and fusion index (% nuclei in myotubes with >5 nuclei/total nuclei in MyHC-stained cells) are now explained both in legends and in Materials and Methods. Please see the main text Lines 577-582.

      Reviewer #3 (Significance (Required)):

      The manuscript constitutes a technical advance, and offers a molecular mechanism, in support of the notion that intramuscular injection of low dose gelatin serves to expedite the process of skeletal muscle regeneration. This study has translational implications for a regenerative medicine application of this knowledge. It is my opinion that this aspect of the study is well supported by the results and requires Response: We thank the reviewer for sharing this positive and important comment. Indeed, the motivation of this work was to delineate biological pathways of gelatin treatments in myogenesis and muscle regeneration for potential therapeutic applications.

      The manuscript would constitute both a technical and a conceptual advance by addressing Major Point 1a as the authors would show, for the first time, that low and high dose gelatin levels naturally exist in vivo to mediate the process of muscle endogenous repair. This would be highly significant, because as the authors rightly point out in Lines 74-76 of their manuscript "...by what mechanisms ECM regulates the functional, morphological, and molecular events of skeletal muscle regeneration remain poorly understood". It is my opinion that making this latter point would require 1-3 months of additional studies.

      Response: We thank the reviewer for pointing to this important scientific direction. But regrettably, missing adequate tools for quantifying endogenous gelatin in vivo is currently prohibitive.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The Review Commons submission by Liu and colleagues entitled "Biphasic effect of gelatin in myogenesis and skeletal muscle regeneration" provide a systematic in vitro and in vivo evaluation of the effects of myogenic cell exposure to low or high dose gelatin. Through these analyses they uncover pro- and anti-regenerative effects of gelatin that are dose dependent and through a series of cell and molecular studies, they attribute these dual effects to a ROS-IL6/TNFa signaling axis. The study is well designed and executed. For the most part the figures and experimental details are clear and transparent, and most of the conclusions are supported by the data. Specific comments follow.

      Major Comments

      1. Study framing in the Introduction is mis-aligned with the research conducted. Currently the Introduction sets the stage for an in vivo exploration of the effects of endogenous gelatin produced in the course of muscle regeneration. However, there are no experiments in this paper investigating the presence or effects of endogenous gelatin.

      1.a. The impact of the study would indeed be increased by including a systematic characterization of endogenous gelatin levels during muscle regeneration in healthy mice as compared to in those where fibrosis is prevalent. A demonstration that ROS-IL6/TNFa levels align with patterns seen in the in vitro studies, and pharmacological manipulations to 'rescue' would all provide a demonstration of a hormesis gelatin response in vivo. Meaning, is this process something that naturally occurs in the physiological context, or is it one that is possible, but only be supraphyiological gelatin injections?

      1.b. Alternatively to 1.a., the authors should reframe the Introduction to focus on understanding the effects of gelatin as a biomaterial that is being used in regenerative medicine applications. In this case, the authors should delete/edit/reframe lines 74-102 and instead use lines 103 on to motivate the study so as to be consistent.

      1. Satellite cell conclusions in Figures 2C-D that are based upon representative images provided in 2A, are questionable. Pax7 staining in mouse tissue sections is notoriously difficult and the antibodies can have dramatic lot to lot variability. The immunostaining provided in the representative images is not convincing, and hence, draws into question the conclusions based upon them.

      2.a. If the authors wish to leave the satellite cell conclusions in their study, they will need to optimize their Pax7 staining and repeat this study. They should focus on Pax7+ objects that contain a nucleus and are located below the basal lamina. Also, the word 'activation' in line 180 should be edited to 'expansion' as the histological analysis and study design preclude an evaluation of satellite cell activation.

      2.b. Alternatively to 2.a., it would not diminish the impact of this study to remove the 'satellite cell' findings in their entirety from the manuscript.

      1. It is surprising that the molecular hallmarks of low vs high gelatin injection shown in Fig. 8 would still be present at a time point 2-weeks after the initial injection.

      3.a. It would increase the impact of the study to better understand the basis of this surprising observation. This point links to point 1.a. as one would ideally need to quantify baseline gelatin levels pre-injury and post-injury. For example, is the injected gelatin still present 14-days after injection? Or is the MMP profile altered in a way that sustains these levels one direction or the other? Etc.

      3.b. Alternatively to 3.a., the authors should use the Discussion to note this point and speculate on the significance.

      Minor Comments

      1. The authors should conduct a careful review of the manuscript to address minor typos and grammatical errors.
      2. It is unclear from the Figure Legends, Results, or Methods what the 'PBS' condition in Figure 1 refers to. Is this the uninjured control? If so, consider using 'Cntrl' as the label and then defining it in the figure legend for clarity.
      3. It is unclear from the Figure Legends, Results, or Methods what is quantified to read-out the 'myogenesis index' and 'fusion index' that is reported in Fig. 5D & E. Please reconcile.

      Significance

      The manuscript constitutes a technical advance, and offers a molecular mechanism, in support of the notion that intramuscular injection of low dose gelatin serves to expedite the process of skeletal muscle regeneration. This study has translational implications for a regenerative medicine application of this knowledge. It is my opinion that this aspect of the study is well supported by the results and requires <1month of additional edits to finalize the manuscript.

      The manuscript would constitute both a technical and a conceptual advance by addressing Major Point 1a as the authors would show, for the first time, that low and high dose gelatin levels naturally exist in vivo to mediate the process of muscle endogenous repair. This would be highly significant, because as the authors rightly point out in Lines 74-76 of their manuscript "...by what mechanisms ECM regulates the functional, morphological, and molecular events of skeletal muscle regeneration remain poorly understood". It is my opinion that making this latter point would require 1-3 months of additional studies.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The manuscript by Xiao Ling Liu and colleagues titled "Bi-phasic effect of gelatin in myogenesis and skeletal muscle regeneration" deals with the effect of gelatin on differentiation of myoblast cell line, in vitro, and on skeletal muscle regeneration upon muscle injury, in vivo. In vivo, the gelatin is a product of collagen breakdown associated with skeletal muscle regeneration upon acute or chronic muscle damage.

      Specifically, the authors define a dose-dependent effect of gelatin, beneficial at low dose and detrimental at high dose. This effect is mediated by the level of ROS accumulation leading to the induction of different cytokines with opposite effects on skeletal muscle regeneration.

      Major comments:

      The experimental purpose is well tackled from both biochemical and functional point of view, and the proposed experiments are quite exhaustive. However, I would suggest some additional experimental analyses to improve the robustness and quality of the study, as well as text and figure editing, as reported below.

      Regarding the additional experiments/analyses/images:

      • Figure 5: I would suggest to add an image of C2C12 cells in GM (growth medium), as representative images of proliferation analysis upon LCG/HCG/NAC treatment.
      • Figure 5: I would suggest to repeat the main si-NOX2 experiments with an alternative siRNA to rule out off target effects.
      • In vivo experiments could be improved by adding DHE or DCFH staining on muscle TA cryosections to quantify the level of oxidative stress.
      • The proposed model could be better tackled by additional in vivo treatment with Ab anti IL-6 or anti TNFalpha in combination with CTX and LCG or HCG, followed by H/E staining at 14 dpi.

      Minor comments:

      Each acronym should be indicated in full in the main text at the first mention (for instance BHP, NAC and others). Moreover, I would suggest to add an acronym list for reagents and factors

      Experimental methods should be better detailed; for instance I would suggest:

      • Add a detailed description of the quantification of differentiation indexes
      • Explain how cell growth (OD 450nm) and optical density (570nm) assays have been performed
      • Explain how ROS species and antioxidant enzymes have been measured (Fig 4C and 4D)

      Figures and figure legends:

      • Please add in the figures the figure number in order to facilitate the reading of the pdf file
      • The sequence of the panels should be coherent with the alphabet and reading left to right and up to down
      • In the Figure 1, I would suggest to add the whole TA sections for H/E staining in order to appreciate the overall beneficial or detrimental effect of LCG and HCG, respectively.
      • I would suggest to show Supplementary figures 1C and 1D in the Figure 1
      • In the Supplementary Fig 2A, I would suggest to the authors to show and comment only the data about proliferation: the earlier orientation, fusion and differentiation of C2C12 exposed to LCG are a consequence of the positive effect of LCG on proliferation
      • The quality of representative images of western blot is not always high: the bands are fuse and, consequently, the quantification is not reliable. For instance Fig S4 B, S4 G; in Fig 2 the representative image does not really represent the reported quantification.
      • Please specify in the figure legends the meaning of the acronym in the axis title (for instance RFU, MFI or DCF) and in the axis title the unit of measure (for instance Count (?)).
      • The authors always wrote "filed" in place of "field"
      • In the figures 7 and 8 the letters for densitometry panel are missed.
      • In the figure legend 8 the panel letters do not match the panels in the figure
      • Figure 3I: replace "nucleis" with "nuclei"

      Text editing:

      • Line 64: Satellite cells (SCs) would be more appropriate than myoblasts
      • Line 64: Please define more carefully the location of SCs
      • Line 67: MyoD+/Myog+ would be more appropriate than Pax7+/Myog+
      • Line 67: (Pax7+)/Myog+ "myocytes" in place of myotubes ... and fuse with each other to generate myotubes
      • Line 69: Please add Myf6/MRF4 to MRFs list
      • Line 112: replace "its" with "their"
      • Lines 330-331: "promoted IL-6" "enhanced TNF", please insert secretion/ production
      • Line 352-353: this sentence is not necessary

      Significance

      The study reports robust and interesting data applicable to both basic research and translational research, such as tissue engineering applications.

      Keywords for field of expertise of this reviewer:

      Skeletal muscle regeneration

      Duchenne Muscular Dystrophy

      Inflammation

      Macrophages

      Oxidative stress

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Firstly, we would like to thank the reviewers for their helpful and insightful comments.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In the manuscript of Ramadan et al. , authors use the ex vivo organoid approach to compare gene expression in organoids derived from adult type stem cells when these organoids are grown using different matrices. The presence of Collagen type I induces the emergence of cells with a transcriptome similar to fetal progenitors. In contrast, laminin the main component of matrigel, induces an organoid-protruded phenotype with transcriptome of stem cell type. Then, they correlate these data with expression of collagens and laminins from data publicly available. They show by qRT -PCR that laminins are more expressed in mesenchymal versus epithelial fractions postnatally. They hypothesize on this basis that the remodeling at postnatal stage is likely only dependent on the mesenchymal compartment and it involves interaction of laminins with integrity a6.

      It seems that some of the presented data have already been described and could not be considered as « novel ».

      For some of the statements, like this one « the basement-membrane produced by the epithelium is not sufficient to increase stem cell numbers and induce a morphological crypt formation », the conclusion is not sustained by provided experiments. To draw definitive conclusion on this particular point, authors could reproduce the experiment presented in Fig. 4d but using Cre recombinases specific for mesenchymal and epithelial compartments rather than the ubiquitous Cre line. It would be interesting to investigate if organoids grown from lamc1-/- mice can generate protruded organoids or not.

      In addition, how interpret the fact that fetal organoids up is associated with « laminin interactions » in fig. 1c?

      The statement that the epithelium-produced basement membrane is not sufficient to increase stem cell numbers is based on our in vitro observations. Analysis of the RNAseq data shows that the expression of several laminins is increased on collagen (see heatmap of laminin interactions below, which will be added to the manuscript). This is also the reason why ‘laminin interactions’ is highly significant in the gene set enrichment analysis (Fig. 1C). Despite this upregulation, we never observed morphological changes (or expression changes) as when laminin is added to the collagen-hydrogel. In addition, we showed that the vast majority of ECM components is produced by the mesenchyme in vivo, in line with previous literature as cited in the manuscript. The mentioned Cre lines to address the question in vivo are unfortunately not available to our collaborators with the Lamc1 k.o. mice and it would therefore take too long to perform these experiments.

      However, to address this point in vitro we will grow organoids from Lamc1 fl/fl mice and induce loss of laminin in the pure epithelial cell culture. Organoids will then be analysed for morphological changes, as well as proliferation and gene expression changes.

      One major point to address regards statistics. In material and methods, the paragraph describing statistical analyses is missing. Moreover, in the figures presenting qPRC data ( figs 1g 3b 3D 3g 4c and f), no statistic analysis is provided; and the number of samples for some conditions is extremely limited (n=2). In general, the term « independent experiment « should be clarified : does it correspond to one organoid line for which the experiment was repeated or one single experiment using different organoid lines?

      In fig 4c , all collagen conditions are set to 1.

      The avoidance of statistical inference for most of the experiments was a deliberate choice. In line with several comments (e.g. 1. Vaux, D. L. (2012) Know when your numbers are significant. Nature. 492, 180–181), we chose to show all individual data points (with exception of Fig. 3D, n=5, to ease interpretation) without statistics. In addition, for most expression data, we have data from RNAseq, single-cell RNAseq and qPCRs repeated at different hydrogel concentrations to obtain reliable results. Further, the in vivo mesenchymal qPCR expression data was validated with RNA in situ hybridization showing the mainly mesenchymal expression.

      The term independent experiment was used mainly for repeated experiments with the same organoid lines (exception RNAseq data, different organoids derived from individual mice). While conducting these experiments, we realised that the variability of these experiments comes from time in culture, density of cells and even Matrigel variation. The experiment in Fig. 4c (n=4, each time with the all controls) was performed with longer intervals in between, and showed variation in the absolute levels of expression. However, relative to each control we believe the effect is clear.

      As we will perform additional experiments for the revision of this paper, we will then perform statistical tests in the key experiments (e.g. Itga6 experiment) to alleviate any concerns regarding significance.

      Regarding the experiment presented in fig 4c, authors should include additional control conditions : anti-a6 integrity antibody in matrigel and use of an isotype antibody.

      We will conduct additional experiments regarding the Itga6. In addition to including the mentioned controls for the neutralizing antibody, we will genetically inactivate Itga6 via an inducible Crispr/Cas9. This should enable us to delete Itga6 when the cells are grown on collagen, and hence reduce the possibility of compensation in matrigel derived organoids.

      Another point regards RNAscope data presented in Fig 4b, it is surprising to observe such difference in terms of expression between E19 and P0. Does this mean that birth dramatically unregulates Itga6 expression in few hours? Authors should comment this point if verified.

      We do believe that birth is a timepoint where a dramatic change in the ECM and their receptors can be observed. The epithelial RNAseq data would already indicate that at 18.5 there is an increase in expression compared to E16. This upregulation of the receptor is in line with the dramatic remodelling of the ECM at birth, as is shown by the expression of the basement membrane components in Fig. 3d.

      Authors should avoid the word « signaling » for laminin-integrin interactions as they do not study this aspect at all in their experiments.

      The word signaling was used for the protein:receptor interaction and to distinguish it from changes to the physical characteristics of the hydrogel. But we agree with the reviewer, that we did not study laminin signaling per se and therefore will change the wording accordingly.

      Regarding Col1a1, authors cannot claim that it's expression only slightly changed (fig 3d) as it is clearly upregulated between E17 and P0.

      The reviewer is right, and we apologise for the misleading sentence. The contrast was meant to the basement membrane components that are very lowly expressed at E17 and then suddenly show the burst of expression at birth, whereas collagen seems to be continuously expressed with a peak at P7. We will rephrase the sentence.

      Reviewer #1 (Significance (Required)):

      Overall, the methodology used for the asked questions is accurate.

      One potential problem for publication comes from the fact that some of the findings are already reported and hat the present data do not provide further advances.

      for example, collagen and fetal-like expression profile, Ly6a sorting and replating in culture-Yui et al, 2018, Jabaji et al, 2013.

      We obviously do not agree with the reviewer on this point. We build upon the work of Jabaji et al. 2013, and Wang 2017 to characterise the specific effect of collagen on the intestinal epithelium compared to a pure Matrigel culture. The emergence of Ly6a cells was nicely shown by Yui et al., however it was unclear if collagen changes the fate of all intestinal cells or only a few. We strongly feel that our data extends these findings as it associates the changes we observe in vitro to the development of the crypt morphology and intestinal stem cells.

      The phenotype of Lamc1-/- mice and the observed reduced stem cell marker expression are also reported by Fields et al, 2019.

      Indeed, as we cited this paper. However we predicted based on our in vitro model, that deletion of laminin would result in this specific fetal-like gene expression and hence were happy to include these findings in our manuscript.

      Infine, authors do not interpret their ex vivo data in the context of fetal progenitors which grow as spheres in matrigel (containing laminin)?

      Our ex-vivo (in vitro) data would suggest that adult epithelial cells express some genes that are characteristic for fetal organoids, however we do not think these cells completely revert back to a fetal stage. Regarding the comment of spheres, it is noteworthy that fetal cells from E14-16 stay as spheres in Matrigel, whereas fetal cultures from E19 initially grow as spheres and then develop into organoids within 30 days in vitro (M. Navis et al., “Mouse fetal intestinal organoids: new model to study epithelial maturation from suckling to weaning,” EMBO Rep., vol. 20, no. 2, pp. 1–12, 2019.).

      In figure 5, should we interpret that there is no laminin at all in the fetal mesenchyme?

      We now see how the image is a bit misleading for that stage. The levels of laminin are lower in the fetal stage as can be seen by the IF image in Fig.3f and the image will be updated. We also apologise for the lack of labeling in Figure 3f, which should be E19, P7 and adult.

      Also, authors do not cite a paper reporting on the role of the epithelium ( stem cells) in regulating its own extracellular matrix composition, this process modulating the stem cell number and fate (Fernandez-Vallone et al. 2020). As this is contradictory with the claim that only mesenchyme impacts on crypt morphogenesis, authors could discuss on this point.

      In the paper by Fernandez-Vallone, deletion of Lgr5 in E16.5 embryos resulted in a decrease expression of several ECM genes. Further, the authors could show that the fetal epithelium does express for example Col1a1 at this point, which decreases with maturation. However even for the example of Col1a1 it is evident in their paper that the mesenchyme expresses Col1a1 at much higher levels. Our proposed experiments with Lamc1 k.o. In organoids will show if the produced laminins of the epithelium are essential.

      This manuscript could be interesting for an audience in the stem cell and developmental fields ( my field of expertise).

      **Referee Cross-commenting**

      Considering the pertinent and sometimes overlapping comments of the two other reviewers, the estimated time is revised to 3-6 months.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      In this manuscript, Ramadan and colleagues demonstrate that depending on the extracellular matrix (ECM) composition in which mouse intestinal organoids and/or 2D intestinal epithelial cells are grown in, cellular composition of the epithelium changes. Organoids plated on 2D collagen layers show a unique cell cluster characteristic of fetal-like genes, while organoids plated with increased amount of Matrigel in 2D or in 3D exhibit a shift towards higher stem cell abundance and the absence of the fetal-like gene cluster. Specifically, the ECM component Laminin supports acquisition of stem cells identities in small intestinal epithelial cells, correlating with a transient increase in expression levels of collagen and laminin genes in vivo spanning time points of crypt formation. The authors reported the functional contribution of laminin signaling (Lamc1 KO) via integrin alpha 6 (antibody-blocking) to intestinal stem cell acquisition in vitro and in vivo.

      There are a handful of comments/concerns that would need to be addressed before publication.

      Major points:

      1. The author claimed: "the effect of ECM components on gene expression is not due to difference in morphology (2D collogen vs. 3D Matrigel)". The conclusion of the 2D vs. 3D experiment should be toned down to that organoid morphology (2D vs. 3D) does not directly impact on the expression of fetal-like genes. Otherwise more analysis of RNAseq data with different group of genes (e.g., in different mechanosensing pathways) should be provided with Fig. S1D. Also, would it be technically feasible to perform experiments of SI in collagen (3D) in all group of experiments? Directly comparing 3D Matrigel with 3D collagen avoids the concern of the 2D vs. 3D effect.

      We apologise for the too strong claim of structure of growth vs. signalling of the ECM and its effect on the transcriptome. Indeed, the main message is that a changed morphology from 3D to 2D is not responsible for the expression of fetal-like genes. The paragraph will be rephrased.

      Also, would it be technically feasible to perform experiments of SI in collagen (3D) in all group of experiments? Directly comparing 3D Matrigel with 3D collagen avoids the concern of the 2D vs. 3D effect.

      To address this point, we want to refer to the excellent idea of growing established organoids in collagen (3D) vs. Matrigel (3D) as suggested by this reviewer (and reviewer #3) in Minor points #5. This circumvents the need for Wnt3a addition, which affects stem cell and Paneth cell gene expression.

      1. For Fig. 1f, the authors should include overlapping stainings of Lyz (or Olfm4, CD44 etc.) and Adolase B signal, or they could perform Aldolase B staining in Lgr-5-DTR-GFP and/or Lyz-RFP organoid line. From the current data provided one cannot draw clear conclusions on the crypt morphology as claimed by the authors. Additionally, when talking about crypt morphology and apical accumulation of Actin specifically in the Lyz+ cells, the authors should show a higher zoom in of the picture and either add an orthogonal slice to see apical and basal side and the specific accumulation in one of the cell types, or also co-label with apical polarity markers.

      We will perform additional co-stainings to further highlight the differences in the spatial distribution of differentiated cells and undifferentiated-crypt-like cells. Further we will provide higher magnification images highlighting the apical accumulation of Actin in the crypt-like structures, which can also be seen in mature organoids.

      1. Authors referred the organoid transient change to fetal-like state. To exam the similarity of ECM-induced reprogramming with the regenerative-type of reprogramming, it would be essential to compare the expression of the selected fetal-like genes (Anxa3, Ly6a/Sca1, Msln, Col4a2 et al.), as well as bulk and single-cell (if applicable) RNA-seq data.

      Here, we would like to refer to the excellent study by Yui et al. ([S. Yui et al., “YAP/TAZ-Dependent Reprogramming of Colonic Epithelium Links ECM Remodeling to Tissue Regeneration,” Cell Stem Cell, vol. 22, no. 1, pp. 35-49.e7, 2018.). In this study the authors detected the same gene signature in the repairing epithelium. We can provide a GSEA for the Ly6a+ signature that was derived from this paper, if necessary.

      1. For in vivo data, authors were looking at the normal development of intestine. Following the point of organoid culture recapitulates regeneration, it would be relevant to check the in vivo ECM change by staining in the process of intestinal regeneration or discuss would the fetal-like genes be involved in regeneration.

      We will address this point in the discussion as it also involves the study by Yui et al.

      1. For Fig2.d and e, it would be important to measure compactness vs. the emergence/probability of Ly6+ cells to see if there is correlation.

      If we understand the reviewer correctly, this would address the important relationship between cell shape and cell fate/type. However, this is a topic that needs more attention than a simple correlation and would exceed the scope of this manuscript as we are not able to modulate cell shape to make any further points about its effect on the fetal gene expression program.

      1. In Fig.2d, Ly6a expression is very obscure, and it would be important to show control staining for cell boundaries (eg. Phalloidin, PM) to visualize which nuclei show Ki67 staining and are high or low in Ly6a (plus quantification).

      We will improve the image in Fig.2d and include the mentioned Actin staining. In addition we will perform an analysis via Flow Cytometry to quantify the level of Ly6a staining and EdU positivity.

      1. In Fig. 2f-g, FACS-ed Ly6a+ and Ly6- cells embedded in Matrigel can grow into organoids with crypts. Here the imaging of Paneth cell staining is not clear, and a quantification on number of Paneth cells per crypt would be very helpful to confirm the phenotype. Also, authors should either provide data on the initial size of seeded cell clusters and report organoid growth and cell type composition in more detail when plating from Ly6a+ and Ly6- cells or report the variation in the respective populations.

      This comment suggests that we may not have described the experimental settings properly. The sorted cells were embedded as single cells, not as clusters, in drops of matrigel (10k cells/25ul Matrigel). The emergence of Paneth cells together with a normal organoid architecture grown from Ly6a+ cells shows their stem cell capacity, as has been shown by Yui et al. before from the regenerating colon. In addition, organoids from both cell populations (Ly6a+ and Ly6a-) could be passaged, indicating presence of intestinal stem cells.

      1. The authors could also test whether Ly6+ cells have any advantages over Ly6- cells when grown on collagen I instead of Matrigel.

      We will sort Ly6a+ and Ly6- negative cells and plate them on collagen I. It will be interesting to see if the Ly6a+ cells can give rise to the other cell types when plated on collagen or if they stay Ly6+ cells. This will also answer whether Ly6a+ need the presence of Ly6a- cells in the cultures. In addition, the experiment proposed in #6 will also highlight any proliferative advantage of Ly6a cells compared to Ly6-negative cells on collagen.

      1. In Fig.3f, a control of membrane protein staining should be added for the experiment. The increased Laminin signal can be caused by the global increase of protein when there are more cells, or tissues are more compact. When authors make conclusion of "Dramatic remodelling of ECM during crypt formation ", the experiment should also count cell numbers vs. Laminin (intensity). The phenotype can come from increased area of interface between epithelium and mesenchyme instead of active remodelling.

      We agree with the reviewer that by itself the IF images are not enough to make such a claim. However, we would point to the qPCR data and RNA in situ, that can be more easily normalised and shows the dramatic increase in expression of all laminins at birth. To show that laminin protein is increasing is more difficult than we initially anticipated. However, in the study by De Arcangelis (A. De Arcangelis et al., “Hemidesmosome integrity protects the colon against colitis and colorectal cancer,” Gut, vol. 66, no. 10, pp. 1748–1760, 2017.) the authors use an EDTA assay to show that the epithelium detaches easily when Itga6 is deleted. Within the figure, it seems also that the epithelium detaches easily at P2, compared to P14. As EDTA is disrupting laminin polymerisation, this would further indicate increased laminin protein deposition after birth.

      1. The authors claim that intestinal stem cells in vivo are controlled by Laminin signalling that goes via Integrin alpha 6. However, there is no evidence provided that supports the contribution of ITGA6 in the in vivo setting. So, the authors should either tone down on that point or show a convincing in vivo experiment (e.g., inhibit ITGA6 in vivo by inhibitor injections or by extracting the ECM of a wild-type mouse and seeding intestinal epithelial cells without vs. with ITGA6 blocking antibody which should recapitulate the phenotype in Fig. 4 c.

      We apologise for this confusion. We are well aware about the limitations of our Itga6 blocking experiment in vitro and its relevance in vivo. We tried to get material of the inducible VilCreER Itga6 mouse as referenced in the discussion of the manuscript, without any luck so far. Therefore we will highlight further that any claims about the laminin:Itga6 interaction can only be made in vitro.

      1. Fig. 4: For the data of ITGA6 expression and all sorts of analysis on protein expression with staining, normalization with cell numbers should be performed.

      The RNAseq data that shows the upregulation of Itga6 in the epithelium at E18 is normalized within. Our RNAscope only further validates these expression changes and highlights the specific enriched expression at the bottom of the nascent crypts. We can add quantification of the RNAscope if required.

      1. Two questions on mechanisms:

      2. What is the mechanism from ITGA signaling to Ly6a+ cell fate?

      3. And would/how Laminin induce ITGA expression? Depends on how much the authors would like to go deep with the project, could be addressed further with functional studies, or touch on the topics with discussion.

      These are important questions, however we do agree that this will go to deep for the scope of this manuscript. We will address these open questions in the discussion and leave the experimental part for a follow-up study.

      **Minor points:**

      1. Text in Fig.S1d regarding 'in' or 'on' collagen, could be clearer by changing the terms to 2D and 3D correspondingly.

      We agree and the text will be changed accordingly.

      1. Fig. S1a, it is great the authors showed that similar stiffness in Matrigel and collagen I. It would be important to check the concentration of collagen I vs. stiffness (also for increasing concentrations of Laminin in Fig. 3b), since this is also the type of ECM change that might lead to the change of cell status in cancer progression or collective cell migration.

      We will perform further stiffness measurements of the hydrogels and update the Fig. S1a.

      1. When plating intestinal epithelial cells on collagen I, is the Ly6+ phenotype altered upon Wnt addition? This is not so clear from the RNAseq data Fig S1d., so authors should provide antibody stainings (stem cells/Paneth cells). This could give insight whether Ly6+ cells are still able to convert into stem cells/ Paneth cells by changing morphogen concentration vs. ECM composition.

      We will reanalyse the RNaseq dataset further, specifically analysing the ratio of stem cell and Paneth cell gene expression. However, as mentioned before, Wnt3a specifically does reduce the expression of Paneth cell markers.

      Similar to this point, also enteroendocrine cell fate is absent in collagen I condition (Fig2.ab), the authors could address this point by medium induced EE cell fate.

      Due to the reduced number of secretory cells the clustering in Fig2 a/b does not separate all the different cell lineages. However, EE cells are present in the collagen cultures as characterised by expression of Chga, just reduced in their number (see Supl Fig. 2B).

      1. It would be more informative to indicate the thickness of ECM layer in culture of 2D collagen I, as well as the image of the whole well, demonstrating the morphological variation in the middle and peripheral of the ECM layer.

      The thickness of the collagen layer is about 1mm in a 6well plate and we do not observe any morphological differences in the cells between the periphery and center of the well.

      1. After the formation of PC/SC clusters, would ECM contribute to maintenance? Putting mature organoids from Matrigel to Collagen I 3D would help to clarify.

      This is an interesting experiment that we will conduct, we thank the reviewer for this suggestion. Indeed, established organoids should be able to grow in collagen I without Wnt3a addition. The paper by Sachs et al. (N. Sachs, Y. Tsukamoto, P. Kujala, P. J. Peters, and H. Clevers, “Intestinal epithelial organoids fuse to form self-organizing tubes in floating collagen gels,” Development, vol. 144, no. 6, pp. 1107–1112, 2017. et al) used extensive washing with PBS to remove Matrigel from the organoids. We will go one step further and trying to completely remove laminin specifically by EDTA incubation, as has been shown recently (J. Y. Co et al., “Controlling Epithelial Polarity: A Human Enteroid Model for Host-Pathogen Interactions,” Cell Rep., vol. 26, no. 9, pp. 2509-2520.e4, 2019.). This should then also answer whether disruption of laminin signalling is sufficient to induce fetal-gene expression without the addition of collagen I in a 3D setting.

      1. Check secretome and individual culture of Mesenchyme, see if the increase of Laminin is epithelium independent.

      We agree that the mesenchyme is key for laminin production, therefore these are important questions. Our prediction would be that epithelium from birth (P0) versus adult might result in different responses on the mesenchyme. However, we feel these experiments are better suited for a follow-up study.

      1. In general, the authors should look at cell polarity markers to check the ECM contribution to cell polarity in different cell types.

      We thank the reviewer for the suggestions and as mentioned above, we will perform additional stainings.

      Reviewer #2 (Significance (Required)):

      **Significance:**

      The work highlights the role of ECM on stem cell niche and is of great interest to the organoid and stem cell community.

      Our field of expertise is image- and seq-technology-based quantitative biology, regeneration and mechanics in organoid.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Ramadan et al present a highly informative paper detailing how the Extracellular matrix influences the development of the intestine. Specifically, the authors provide a thorough analysis of how manipulating the components of the ECM can affect organoid growth, morphology, and gene expression of the organoids. Most importantly, the authors isolate laminin as a critical component of the ECM which impacts the development of fetal-like epithelium. While the in vitro work is generally compelling and of interest to the field, the in vivo data is someone lacking in depth and novelty. Particularly, these conclusions from the abstract could be much better supported: "This laminin:ITGA6 signalling is essential for the stem cell induction and crypt formation in vitro. Importantly, deletion of laminin in the adult mouse results in a fetal-like epithelium with a marked reduction of adult intestinal stem cells." The in vivo work was largely published previously and has caveats noted below, while the in vitro association of ITGA6 signaling with crypt formation is over-interpreted based upon an antibody blocking experiment and a lack of statistical rigor. Despite these concerns, this reviewer finds the work of considerable interest in an important area of the field (epithelial/stromal interactions of the intestine).

      **Major Concerns:**

      The use of the Ubc-Cre Lamc1-flox mouse model is an interesting way to test the impact of loss Lamc1 on intestinal development. However, with the Ubc-Cre, does the mouse model have other deleterious effects on the mouse beyond the intestine?

      Can the authors use a more localized Cre to observe specifically the impacts of Lamc1 loss in the intestine? What is the fate of these mice? Can the authors show swiss roll, low mag sections to let the reader know the extent of this phenotype? OLFM4 and Ki67 IHC should be conducted over a timecourse to show how the changes occur over time after loss of Lamc1. How long does Lamc1's protein product perdure after tamoxifen treatment? More details of this exciting, in vivo validation of the authors' in vitro studies are key to elevating the impact of this work. However, it appears that much of this mouse model was previously published, but the previous findings are not well summarized in the current manuscript.

      We will describe the model in more detail and refer readers to the excellent study of our collaborators which answers most of the raised questions. It is interesting to note that although a ubiquitous Cre was used to delete Lamc1 in adult mice, a phenotype was only observed in the intestine indicating a specific role for continuous laminin production here.

      Can the authors show that ITGA6 loss has functional consequences in vivo with an epithelial knockout or via an organoid knockdown? A more rigorous genetic test of this proposed function would be important for substantiating the claims made in the abstract.

      As referenced in the discussion of the manuscript, there is a VilCreER Itga6 mouse described in the literature (A. De Arcangelis et al., “Hemidesmosome integrity protects the colon against colitis and colorectal cancer,” Gut, vol. 66, no. 10, pp. 1748–1760, 2017.), which mainly focus on the colon. However, the authors use an EDTA assay in the small intestine to show that the epithelium detaches easily when Itga6 is deleted (Fig. 1J). Within the figure, it seems also that the epithelium detaches easily at P2, compared to P14. As EDTA is disrupting laminin polymerisation, this would further indicate increased laminin protein deposition after birth which is dependent on Itga6.

      We tried to get material of the inducible VilCreER Itga6 mouse however without any luck so far.

      We will conduct additional experiments regarding the Itga6 in vitro. In addition to including additional controls for the neutralizing antibody, we will genetically inactivate Itga6 via an inducible Crispr/Cas9. This should enable us to delete Itga6 when the cells are grown on collagen, and hence reduce the possibility of compensation in matrigel derived organoids.

      The authors state that the changes in gene expression are not due to differences in morphology, but rather are specific to the components of the environment. While the authors show that organoids treated with Wnt3a "in Matrigel" and "CollagenI" appear to have similar morphologies and yet still result in a different gene expression profiles, it would be of great interest to see whether that difference persists without Wnt3a when organoids are "in Matrigel" and "in CollagenI". While the reviewer understands the difficulties of culturing organoids in 3D Collagen without Wnt3a, organoids can be indeed be cultured in 3D using "floating collagenI rings" (Sachs et al 2017).

      This is an interesting experiment that we will conduct, we thank the reviewer for this suggestion. Indeed, established organoids should be able to grow in collagen I without Wnt3a addition. The paper by Sachs et al. (N. Sachs, Y. Tsukamoto, P. Kujala, P. J. Peters, and H. Clevers, “Intestinal epithelial organoids fuse to form self-organizing tubes in floating collagen gels,” Development, vol. 144, no. 6, pp. 1107–1112, 2017. et al) used extensive washing with PBS to remove Matrigel from the organoids. We will go one step further and trying to completely remove laminin specifically by EDTA incubation, as has been shown recently (J. Y. Co et al., “Controlling Epithelial Polarity: A Human Enteroid Model for Host-Pathogen Interactions,” Cell Rep., vol. 26, no. 9, pp. 2509-2520.e4, 2019.). This should then also answer whether disruption of laminin signalling is sufficient to induce fetal-gene expression without the addition of collagen I in a 3D setting.

      Similarly, while the authors indicate that increasing Matrigel concentrations altered the gene expression patterns in a dose-dependent manner, it is unknown whether this can fully be attributed to the Matrigel composition, or whether the layer of Matrigel is providing the capability to transition from 2D to 3D culture.

      We are not entirely sure, we understood this point. The Matrigel and the collagen I are mixed before they solidify, therefore enabling a homogenous hydrogel. The different hydrogels are not layered (if that is what the reviewer is referring to).

      The authors cultured organoids in different concentrations of Laminin/CollagenIV when mixed with CollagenI. Can organoids be sustained only on a matrix of CollagenIV and/or Laminin? Would this show more direct differences between CollagenI vs. Laminin cultured organoids?

      Organoids cannot be grown in pure Collagen IV, but pure laminin should be feasible. We did initial experiments with 3-5mg/ml laminin in PBS and that was sufficient to allow organoid growth. We will perform additional experiments with pure laminin and show the impact on organoid growth.

      With the reduction in stem-cell and Paneth cells in the Lamc1-KO mice, it would also be of interest to determine what cell types are now prominent within the heavily elongated intestinal "crypt" structures seen in the Lamc1-KO mice and whether populations are more TA-cells or enterocytes to consider differentiation status of the cells. Additionally, it would also of interest to see if the Itga6 expression is significantly altered in the absence of Lamc1.

      We will test expression changes for Itga6 in the Lamc1-KO mice, in the epithelium via qPCR. Additionally we can stain tissue from these mice for Sox9, Ki67 and differentiated markers eg. CD44, AldolaseB, Villin etc .to determine whether the elongated, hyperproliferative crypts contain progenitor cells or secretory enterocytes.

      **Minor concerns:**

      Matrigel is a complex matrix derived from mouse tumors. In many instances in the manuscript, the authors portray it as a more simpler mix of laminin/Collagen4 (fig 3a). It should be made clearer to the reader that Matrigel is not a mix of recombinant proteins, but a more clear depiction of how Matrigel is derived will be critical for this study, given the focus on specific ECM components and how they affect intestinal epithelial growth.

      We agree and will change the oversimplified view of Matrigel.

      Some labels of specific conditions would be appreciated in the figures as opposed to only the figure legends (ie. Fig. 1b and 1d should be labeled with comparisons; Fig. 3f labels of fluorescence, Fig. 4b label of itga6 staining).

      We apologise for this and the Figure labels will be updated.

      With the light staining of Lamc1 in-situ, it is hard to appreciate the expression of laminin within the stroma of the intestine when compared to Col4. This reviewer is also curious of the biological relevance of the concentrations of Laminin/CollagenIV when culturing organoids in Fig. 4a.

      Indeed, the Lamc1 due to its lower expression than Col4a1 is more difficult to see. Maybe the reviewer overlooked Suppl.Fig.4A, where the blue channel of these in situ images show more contrast. If required, we can try to optimise the hybridisation times to increase the signal a bit further.

      When culturing the organoids with the mixture of collagen and laminin or collagen IV, the concentrations of the two single components were selected similar to their concentrations in Matrigel. Regarding the absolute concentrations of laminin/collagen IV in vitro versus their “concentration” in vivo is much harder to answer. In addition to the unknown concentrations in vivo, there are many more Laminin types present with specific localisation and even specific receptor interactions. For our in vitro studies we relied on the Laminin present in EHS tumours, which is Laminin alpha 1 beta 1 gamma 1. We are currently investigating decellularization protocols to purify the ECM from mouse intestinal tissue, but again this would be more suited for a follow up study.

      It would be appreciated if gene expression analyses presented in figures would include p-values to provide context for differences in gene expression.

      The avoidance of statistical inference for most of the experiments was a deliberate choice. In line with several comments (e.g. 1. Vaux, D. L. (2012) Know when your numbers are significant. Nature. 492, 180–181), we chose to show all individual data points (with exception of Fig. 3D, n=5, to ease interpretation) without statistical testing. For most expression data, we have data from RNAseq, single-cell RNAseq and qPCRs repeated at different hydrogel concentrations to obtain reliable results. Further, the in vivo mesenchymal qPCR expression data was validated with RNA in situ hybridization showing the mainly mesenchymal expression.

      As we will perform additional experiments for the revision of this paper, we can perform statistical tests in the key experiments (e.g. Itga6 experiment) to alleviate any concerns regarding significance.

      In figure 3f, the authors report "immunofluorescence of laminin". How is this measured? Can more details be given about the antibody in the text and figure legend? Laminins are a family of genes, and it's not clear what's being demonstrated in this figure panel. Developmental stages of the samples are also not clear.

      We apologise for the lack of labeling in Fig.3f. The details of the antibody were hidden in the Materials and Methods of the manuscript (Slides were incubated with Laminin Polyclonal Antibody (1/200, Thermo Fisher #PA5-22901) overnight at 4C ). This pan-laminin antibody reacts with most Laminin isoforms alpha1, alpha2, beta1, gamma1. We will declare it as a pan-laminin antibody in the Figure legend to help future readers.

      Reviewer #3 (Significance (Required)):

      This work is in an exciting "hot" area of research to understand the role of non-epithelial cells in intestinal epithelial development and function. The audience would be those in the GI field and those studying tissue-tissue interactions.

      There's some concern that the in vivo portion of the manuscript (4th figure) uses a model that was previously characterized and published by this group, and that isn't clearly disclosed. The manuscript would benefit from more disclosure and detail about the in vivo phenotype. Such changes would substantially increase the impact and novelty of the study.

      We would like to point out that we cited the paper of the original study that uses the model throughout the manuscript. We will disclose in more detail that this group did the study and that the reduction in stem cell genes was already mentioned in the original publication.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      Ramadan et al present a highly informative paper detailing how the Extracellular matrix influences the development of the intestine. Specifically, the authors provide a thorough analysis of how manipulating the components of the ECM can affect organoid growth, morphology, and gene expression of the organoids. Most importantly, the authors isolate laminin as a critical component of the ECM which impacts the development of fetal-like epithelium. While the in vitro work is generally compelling and of interest to the field, the in vivo data is someone lacking in depth and novelty. Particularly, these conclusions from the abstract could be much better supported: "This laminin:ITGA6 signalling is essential for the stem cell induction and crypt formation in vitro. Importantly, deletion of laminin in the adult mouse results in a fetal-like epithelium with a marked reduction of adult intestinal stem cells." The in vivo work was largely published previously and has caveats noted below, while the in vitro association of ITGA6 signaling with crypt formation is over-interpreted based upon an antibody blocking experiment and a lack of statistical rigor. Despite these concerns, this reviewer finds the work of considerable interest in an important area of the field (epithelial/stromal interactions of the intestine).

      Major Concerns:

      The use of the Ubc-Cre Lamc1-flox mouse model is an interesting way to test the impact of loss Lamc1 on intestinal development. However, with the Ubc-Cre, does the mouse model have other deleterious effects on the mouse beyond the intestine? Can the authors use a more localized Cre to observe specifically the impacts of Lamc1 loss in the intestine? What is the fate of these mice? Can the authors show swiss roll, low mag sections to let the reader know the extent of this phenotype? OLFM4 and Ki67 IHC should be conducted over a timecourse to show how the changes occur over time after loss of Lamc1. How long does Lamc1's protein product perdure after tamoxifen treatment? More details of this exciting, in vivo validation of the authors' in vitro studies are key to elevating the impact of this work. However, it appears that much of this mouse model was previously published, but the previous findings are not well summarized in the current manuscript.

      Can the authors show that ITGA6 loss has functional consequences in vivo with an epithelial knockout or via an organoid knockdown? A more rigorous genetic test of this proposed function would be important for substantiating the claims made in the abstract. The authors state that the changes in gene expression are not due to differences in morphology, but rather are specific to the components of the environment. While the authors show that organoids treated with Wnt3a "in Matrigel" and "CollagenI" appear to have similar morphologies and yet still result in a different gene expression profiles, it would be of great interest to see whether that difference persists without Wnt3a when organoids are "in Matrigel" and "in CollagenI". While the reviewer understands the difficulties of culturing organoids in 3D Collagen without Wnt3a, organoids can be indeed be cultured in 3D using "floating collagenI rings" (Sachs et al 2017). Similarly, while the authors indicate that increasing Matrigel concentrations altered the gene expression patterns in a dose-dependent manner, it is unknown whether this can fully be attributed to the Matrigel composition, or whether the layer of Matrigel is providing the capability to transition from 2D to 3D culture.

      The authors cultured organoids in different concentrations of Laminin/CollagenIV when mixed with CollagenI. Can organoids be sustained only on a matrix of CollagenIV and/or Laminin? Would this show more direct differences between CollagenI vs. Laminin cultured organoids? With the reduction in stem-cell and Paneth cells in the Lamc1-KO mice, it would also be of interest to determine what cell types are now prominent within the heavily elongated intestinal "crypt" structures seen in the Lamc1-KO mice and whether populations are more TA-cells or enterocytes to consider differentiation status of the cells. Additionally, it would also of interest to see if Itga6 expression is significantly altered in the absence of Lamc1.

      Minor concerns:

      Matrigel is a complex matrix derived from mouse tumors. In many instances in the manuscript, the authors portray it as a more simpler mix of laminin/Collagen4 (fig 3a). It should be made clearer to the reader that Matrigel is not a mix of recombinant proteins, but a more clear depiction of how Matrigel is derived will be critical for this study, given the focus on specific ECM components and how they affect intestinal epithelial growth.

      Some labels of specific conditions would be appreciated in the figures as opposed to only the figure legends (ie. Fig. 1b and 1d should be labeled with comparisons; Fig. 3f labels of fluorescence, Fig. 4b label of itga6 staining).

      With the light staining of Lamc1 in-situ, it is hard to appreciate the expression of laminin within the stroma of the intestine when compared to Col4. This reviewer is also curious of the biological relevance of the concentrations of Laminin/CollagenIV when culturing organoids in Fig. 4a. It would be appreciated if gene expression analyses presented in figures would include p-values to provide context for differences in gene expression.

      In figure 3f, the authors report "immunofluorescence of laminin". How is this measured? Can more details be given about the antibody in the text and figure legend? Laminins are a family of genes, and it's not clear what's being demonstrated in this figure panel. Developmental stages of the samples are also not clear.

      Significance

      This work is in an exciting "hot" area of research to understand the role of non-epithelial cells in intestinal epithelial development and function. The audience would be those in the GI field and those studying tissue-tissue interactions.

      There's some concern that the in vivo portion of the manuscript (4th figure) uses a model that was previously characterized and published by this group, and that isn't clearly disclosed. The manuscript would benefit from more disclosure and detail about the in vivo phenotype. Such changes would substantially increase the impact and novelty of the study.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript, Ramadan and colleagues demonstrate that depending on the extracellular matrix (ECM) composition in which mouse intestinal organoids and/or 2D intestinal epithelial cells are grown in, cellular composition of the epithelium changes. Organoids plated on 2D collagen layers show a unique cell cluster characteristic of fetal-like genes, while organoids plated with increased amount of Matrigel in 2D or in 3D exhibit a shift towards higher stem cell abundance and the absence of the fetal-like gene cluster. Specifically, the ECM component Laminin supports acquisition of stem cells identities in small intestinal epithelial cells, correlating with a transient increase in expression levels of collagen and laminin genes in vivo spanning time points of crypt formation. The authors reported the functional contribution of laminin signaling (Lamc1 KO) via integrin alpha 6 (antibody-blocking) to intestinal stem cell acquisition in vitro and in vivo.

      There are a handful of comments/concerns that would need to be addressed before publication.

      Major points:

      1. The author claimed: "the effect of ECM components on gene expression is not due to difference in morphology (2D collogen vs. 3D Matrigel)". The conclusion of the 2D vs. 3D experiment should be toned down to that organoid morphology (2D vs. 3D) does not directly impact on the expression of fetal-like genes. Otherwise more analysis of RNAseq data with different group of genes (e.g., in different mechanosensing pathways) should be provided with Fig. S1D. Also, would it be technically feasible to perform experiments of SI in collagen (3D) in all group of experiments? Directly comparing 3D Matrigel with 3D collagen avoids the concern of the 2D vs. 3D effect.
      2. For Fig. 1f, the authors should include overlapping stainings of Lyz (or Olfm4, CD44 etc.) and Adolase B signal, or they could perform Aldolase B staining in Lgr-5-DTR-GFP and/or Lyz-RFP organoid line. From the current data provided one cannot draw clear conclusions on the crypt morphology as claimed by the authors. Additionally, when talking about crypt morphology and apical accumulation of Actin specifically in the Lyz+ cells, the authors should show a higher zoom in of the picture and either add an orthogonal slice to see apical and basal side and the specific accumulation in one of the cell types, or also co-label with apical polarity markers.
      3. Authors referred the organoid transient change to fetal-like state. To exam the similarity of ECM-induced reprogramming with the regenerative-type of reprogramming, it would be essential to compare the expression of the selected fetal-like genes (Anxa3, Ly6a/Sca1, Msln, Col4a2 et al.), as well as bulk and single-cell (if applicable) RNA-seq data.
      4. For in vivo data, authors were looking at the normal development of intestine. Following the point of organoid culture recapitulates regeneration, it would be relevant to check the in vivo ECM change by staining in the process of intestinal regeneration or discuss would the fetal-like genes be involved in regeneration.
      5. For Fig2.d and e, it would be important to measure compactness vs. the emergence/probability of Ly6+ cells to see if there is correlation.
      6. In Fig.2d, Ly6a expression is very obscure, and it would be important to show control staining for cell boundaries (eg. Phalloidin, PM) to visualize which nuclei show Ki67 staining and are high or low in Ly6a (plus quantification).
      7. In Fig. 2f-g, FACS-ed Ly6a+ and Ly6- cells embedded in Matrigel can grow into organoids with crypts. Here the imaging of Paneth cell staining is not clear, and a quantification on number of Paneth cells per crypt would be very helpful to confirm the phenotype. Also, authors should either provide data on the initial size of seeded cell clusters and report organoid growth and cell type composition in more detail when plating from Ly6a+ and Ly6- cells or report the variation in the respective populations.
      8. The authors could also test whether Ly6+ cells have any advantages over Ly6- cells when grown on collagen I instead of Matrigel.
      9. In Fig.3f, a control of membrane protein staining should be added for the experiment. The increased Laminin signal can be caused by the global increase of protein when there are more cells, or tissues are more compact. When authors make conclusion of "Dramatic remodelling of ECM during crypt formation ", the experiment should also count cell numbers vs. Laminin (intensity). The phenotype can come from increased area of interface between epithelium and mesenchyme instead of active remodelling.
      10. The authors claim that intestinal stem cells in vivo are controlled by Laminin signalling that goes via Integrin alpha 6. However, there is no evidence provided that supports the contribution of ITGA6 in the in vivo setting. So, the authors should either tone down on that point or show a convincing in vivo experiment (e.g., inhibit ITGA6 in vivo by inhibitor injections or by extracting the ECM of a wild-type mouse and seeding intestinal epithelial cells without vs. with ITGA6 blocking antibody which should recapitulate the phenotype in Fig. 4 c.
      11. Fig. 4: For the data of ITGA6 expression and all sorts of analysis on protein expression with staining, normalization with cell numbers should be performed.
      12. Two questions on mechanisms: a) What is the mechanism from ITGA signaling to Ly6a+ cell fate? b) And would/how Laminin induce ITGA expression? Depends on how much the authors would like to go deep with the project, could be addressed further with functional studies, or touch on the topics with discussion.

      Minor points:

      1. Text in Fig.S1d regarding 'in' or 'on' collagen, could be clearer by changing the terms to 2D and 3D correspondingly.
      2. Fig. S1a, it is great the authors showed that similar stiffness in Matrigel and collagen I. It would be important to check the concentration of collagen I vs. stiffness (also for increasing concentrations of Laminin in Fig. 3b), since this is also the type of ECM change that might lead to the change of cell status in cancer progression or collective cell migration.
      3. When plating intestinal epithelial cells on collagen I, is the Ly6+ phenotype altered upon Wnt addition? This is not so clear from the RNAseq data Fig S1d., so authors should provide antibody stainings (stem cells/Paneth cells). This could give insight whether Ly6+ cells are still able to convert into stem cells/ Paneth cells by changing morphogen concentration vs. ECM composition. Similar to this point, also enteroendocrine cell fate is absent in collagen I condition (Fig2.ab), the authors could address this point by medium induced EE cell fate.
      4. It would be more informative to indicate the thickness of ECM layer in culture of 2D collagen I, as well as the image of the whole well, demonstrating the morphological variation in the middle and peripheral of the ECM layer.
      5. After the formation of PC/SC clusters, would ECM contribute to maintenance? Putting mature organoids from Matrigel to Collagen I 3D would help to clarify.
      6. Check secretome and individual culture of Mesenchyme, see if the increase of Laminin is epithelium independent.
      7. In general, the authors should look at cell polarity markers to check the ECM contribution to cell polarity in different cell types.

      Significance

      Significance:

      The work highlights the role of ECM on stem cell niche and is of great interest to the organoid and stem cell community.

      Our field of expertise is image- and seq-technology-based quantitative biology, regeneration and mechanics in organoid.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In the manuscript of Ramadan et al. , authors use the ex vivo organoid approach to compare gene expression in organoids derived from adult type stem cells when these organoids are grown using different matrices. The presence of Collagen type I induces the emergence of cells with a transcriptome similar to fetal progenitors. In contrast, laminin the main component of matrigel, induces an organoid-protruded phenotype with transcriptome of stem cell type. Then, they correlate these data with expression of collagens and laminins from data publicly available. They show by qRT -PCR that laminins are more expressed in mesenchymal versus epithelial fractions postnatally. They hypothesize on this basis that the remodeling at postnatal stage is likely only dependent on the mesenchymal compartment and it involves interaction of laminins with integrity a6. It seems that some of the presented data have already been described and could not be considered as « novel ». For some of the statements, like this one « the basement-membrane produced by the epithelium is not sufficient to increase stem cell numbers and induce a morphological crypt formation », the conclusion is not sustained by provided experiments. To draw definitive conclusion on this particular point, authors could reproduce the experiment presented in Fig. 4d but using Cre recombinases specific for mesenchymal and epithelial compartments rather than the ubiquitous Cre line. It would be interesting to investigate if organoids grown from lamc1-/- mice can generate protruded organoids or not. In addition, how interpret the fact that fetal organoids up is associated with « laminin interactions » in fig. 1c?

      One major point to address regards statistics. In material and methods, the paragraph describing statistical analyses is missing. Moreover, in the figures presenting qPRC data ( figs 1g 3b 3D 3g 4c and f), no statistic analysis is provided; and the number of samples for some conditions is extremely limited (n=2). In general, the term « independent experiment «  should be clarified : does it correspond to one organoid line for which the experiment was repeated or one single experiment using different organoid lines? In fig 4c , all collagen conditions are set to 1.

      Regarding the experiment presented in fig 4c, authors should include additional control conditions : anti-a6 integrity antibody in matrigel and use of an isotype antibody.

      Another point regards RNAscope data presented in Fig 4b, it is surprising to observe such difference in terms of expression between E19 and P0. Does this mean that birth dramatically unregulates Itga6 expression in few hours? Authors should comment this point if verified. Authors should avoid the word « signaling » for laminin-integrin interactions as they do not study this aspect at all in their experiments.

      Regarding Col1a1, authors cannot claim that it's expression only slightly changed (fig 3d) as it is clearly upregulated between E17 and P0.

      Significance

      Overall, the methodology used for the asked questions is accurate.

      One potential problem for publication comes from the fact that some of the findings are already reported and hat the present data do not provide further advances.

      for example, collagen and fetal-like expression profile, Ly6a sorting and replating in culture-Yui et al, 2018, Jabaji et al, 2013.

      The phenotype of Lamc1-/- mice and the observed reduced stem cell marker expression are also reported by Fields et al, 2019.

      Infine, authors do not interpret their ex vivo data in the context of fetal progenitors which grow as spheres in matrigel (containing laminin)? In figure 5, should we interpret that there is no laminin at all in the fetal mesenchyme?

      Also, authors do not cite a paper reporting on the role of the epithelium ( stem cells) in regulating its own extracellular matrix composition, this process modulating the stem cell number and fate (Fernandez-Vallone et al. 2020). As this is contradictory with the claim that only mesenchyme impacts on crypt morphogenesis, authors could discuss on this point.

      This manuscript could be interesting for an audience in the stem cell and developmental fields ( my field of expertise).

      Referee Cross-commenting

      Considering the pertinent and sometimes overlapping comments of the two other reviewers, the estimated time is revised to 3-6 months.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      This manuscript by Gulyrutlu and co-workers addresses the role of CUG expanded repeat RNA associated with DM1 in regulating the formation of higher order RNP assemblies such as stress granules and P-bodies in the cell. The authors used lens epithelial cells (hLECs) derived from a DM1 patient

      We used cell lines from several patients and age-matched controls to avoid effects of individual cell-line variation. We will make sure that this is clear in the text.

      or a HeLa cell inducible model of DM1 to investigate whether expression of the CUG repeat-associated protein MBNL1 and CUGBP1 affected the formation and dispersal of stress granules and P-bodies. The authors show that MBNL1 and CUGBP1 are components of SGs and PBs in hLECs and HeLa cells. In cells expressing the CUG repeat, there are minor alterations in the dispersal of stress granules as well as in the formation of P-bodies.

      The alterations in the formation and dispersal of stress granules are not minor. For example, in the HeLa cell model, stress granules take more than twice as long to form in cell expressing the CUGexp repeats associated with DM1 and disperse in half the time. These data are already in the results section, but we will highlight them in a revision and have included an additional representation of the data to the figure, using graphs of ‘proportion of cells with stress granules’ against time. The changes we see are as large, or larger, then results published elsewhere (see appendix below)

      MBNL1 could affect the formation and dispersal of SGs independent of the CUG repeat.

      In fact, we present data in HeLa cells with MBNL1 almost completely removed by shRNA revealing that this has a much smaller effect on stress granules than does the expression of CUGexp RNA. This is an important point, as it is widely assumed that most of the cellular defects in DM1 are caused by the ‘sequestration’ of MBNL1 in the CUGexp foci. Since only . This is not what our data show. In the hexanediol experiments, both cell lines over-express MBNL1 in similar amounts. The difference between them is that one cell line expresses a DMPK1 mini-gene with a CUG expansion and the other expresses a mini-gene without the expansion. Again, our results show that the alteration to P-body responses to 1,6-hexanediol can be attributed to the presence of the CUGexp RNA, rather than altered levels of MBNL1. We will revise the results and discussion to further emphasise this point.

      Finally, in HeLa cells, overexpression of MBNL1 can reduce the dispersal of P-bodies upon 1,6-hexanediol treatment.

      This is not what our data show. In the hexanediol experiments, both cell lines over-express MBNL1 in similar amounts. The difference between them is that one cell line expresses a DMPK1 mini-gene with a CUG expansion and the other expresses a mini-gene without the expansion. Again, our results show that the alteration to P-body responses to 1,6-hexanediol can be attributed to the presence of the CUGexp RNA, rather than altered levels of MBNL1. We will revise the results and discussion to further emphasise this point.

      Major comments:

      One limitation of the work is that the perturbations seen with stress granules or P-bodies are all relatively small, and no evidence for a functional consequence on gene expression is demonstrated. Specifically, the authors observe only minor alterations in the formation or disaggregation of PBs and SGs in these DM1 models. Further, some of the effects observed are independent of the CUG repeat expression, suggesting that MBNL1 and CUGBP1 might have independent roles in modulating some properties of SG and PB formation or dispersal.

      As above, the changes we see in SG formation and dispersal are not small. There are already numerous studies of the effects of DM1 on gene expression and mRNA splicing. This is not what we set out to study: we are interested in perturbations to the organisation of cellular structures associated with the expression of the CUGexp repeat RNA characteristic of DM1. We do show some data relating specifically to the proteins MBNL1 and CUGBP1 in the paper. shRNA resulting in almost complete loss of these proteins has much smaller effects that the expression of CUGexp RNA, suggesting that the major part of the effects caused by expression of the CUGexp RNA is not mediated through changes in MBNL1 or CUGBP1 levels. MBNL1 and CUGBP1 levels may well contribute to alterations in SG dynamics, but our data suggest that they are minor contributors. This is an advance in our current knowledge

      1. The authors could investigate whether the CUG repeat RNA itself is localized to SGs or PBs in their models, and whether the presence of the repeat RNA is absolutely necessary for regulating the dynamics of SG or PB formation.

      We have now done this. The CUG repeat RNA is not localised in stress granules or PBs to a detectable extent. This suggests that the effect we see on these structures by expression of the expanded RNA occurs despite the absence of the RNA from the structures. This is similar to the effects of the ALS-associated paraspeckle protein FUS, which can affect the integrity of nuclear LLPS structures (gems) despite not co-localising with them https://doi.org/10.1016/j.celrep.2012.08.025 We have added these data to the manuscript, as part of draft figure 8, and will add text emphasising this as an additional example of disease-causing macromolecules affecting the structure of LLPS domains in which they are not found.

      1. The authors use 1,6-hexanediol to suggests that PBs and SGs in HeLa cells show behavior analogous to LLPS. However, the use of 1,6,-hexanediol to establish an assembly as a LLPS is a relatively limited analysis (despite its widespread use in the field), since this compound can affect the formation of multiple cellular substructures that are not always LLPS (for example, see Wheeler et al, 2016, eLife).

      We are aware of and have cited this publication. Our comments about LLPS structures are measured, as there is still controversy about how to definitively identify them in cells. SGs and PBs have, however, previously been widely published to be formed by LLPS. The rapid exchange of SG and PB components during FRAP and the ability of SGs to both fuse and bud (seen in our supplementary movies) are also supportive of these structures behaving as LLPS structures in our models. Wheeler et showed that, in yeast, the nuclear pore complex and some cytoskeletal structures were affected by 1,6-hexanediol but membrane-bound structures such as the ER and mitochondria were not. The disruption of the nuclear pore complex is not unexpected, since phase separation is involved in cargo shuttling through the NPC (reviewed in https://doi.org/10.1016/j.devcel.2020.06.033). We will revise our discussion to make it more clear that we are not relying only on the use of 1,6-hexanediol to define SGs and PBs as LLPS structures but also on other aspects of their dynamic behaviour and on extensive prior literature.

      Significance

      This study would be of interest to the field if the impact of the DM! repeat RNAs on PB and SG were more substantial...

      As above, the effects we see on SG formation and loss are substantial. Tissue types affected in DM1 are prone to stress, particularly the lens of the eye, so alterations to cellular response to stress associated with the presence of CUG repeats are of key importance to understanding the cellular pathology of DM1.

      ...and if some functional consequences were demonstrated.

      In terms of function, we show altered responses to stress caused by the expression of CUGexp RNA and probably mediated through alterations in the propensity of LLPS cytoplasmic structures (SGs and PBs) to form and be resolved. Additionally, we can now show that SGs in HeLa cells expressing CUGexp RNA contain less total polyA RNA than is seen in controls, and that ‘docking’ events between SGs and PBs are compromised in cells with CUGexp RNA. These docking events are proposed to mediate transfer of RNA from SGs to PBs (reviewed in https://doi.org/10.1007/978-1-4614-5107-5_12). These new data demonstrate functional impairment of SGs and PBs associated with DM1. We have included this as an additional draft figure 8.

      The lack of a strong effect on SG or PB formation in the DM1 models, along with the CUG repeat-independent effect of MBNL1 on the formation and dispersal of these complexes, argues that MBNL1/CUGBP1 may not significantly affect the formation or dispersal of SGs and PBs.

      We are actually not arguing that MBNL1 and CUGBP1 are the main effectors in the changes we see to SGs and PBs, but that the CUGexp RNA is the key player, so are a little confused by this comment.


      Reviewer #2:

      In the current study, the authors compared the dynamics of P-bodies (PBs) and stress granules (SGs) between control and several DM1 cell lines. They found that MBNL1 and CUGBP1, two CUG repeat RNA-binding proteins that are primarily nuclear, could also co-localize with PBs in the cytoplasm and re-localize to SGs under stress. Small differences were observed in SG assembly and disassembly dynamics between control and DM1 HLECs, between HeLa cells expressing either CTG12 or CTG960, and between HeLa cells with and without shRNAs targeting CUGBP1 or MBNL1.

      As detailed above, the alterations in SG assembly and disassembly in cells expressing CUGexp RNA are not small, in contrast to those in cells will lowered expression of MBNL1 and CUGBP1, which are much smaller suggesting that the changes caused by CUGexp RNA largely do not result from loss of MBNL1 (or CUGBP1). We have inserted additional graphs of ‘proportion of cells with stress granules’ against time' and will modify the text to emphasise both of these points.

      Overall, the experiments were clearly described and the results properly presented. However, critical controls, as detailed below, are missing in multiple analyses. The mechanisms underlying these apparent differences are also unknown.

      We do not consider that any ‘critical controls’ are missing, but can supply all of the additional analysis of our data that the reviewer requests below. We can also now provide additional mechanistic insight and will add an additional figure showing lowered amount of polyA RNA in stress granules in cells expressing CUGexp RNA and compromised docking events between stress granules and P-bodies, suggesting impaired communication between them.

      Major concerns:

      1. Throughout the study, the authors compared MBNL1 and CUGBP1 association with PBs and SGs without considering the potential differences in their cytoplasmic abundance between control and DM1 cell lines, which seems to be case for MBNL1 abundance in CTG960-expressing HeLa cells (Fig. 3). Provided that PBs and SGs exchange components with the cytosol at an equilibrium, if the cytoplasmic abundance of, for example, MBNL1 is decreased in DM1, one would expect the equilibrium being shifted resulting in less MBNL1 associated with PB/SG. Therefore, before measuring the association or the assembly/disassembly kinetics of PB and SG, the authors should first test whether MBNL1 and CUGBP1 abundance may be different between control and DM cell lines.

      There is, in fact, no difference in the relative cytoplasmic abundance of GFP-MBNL1 between CTG12 and CTG960- expressing HeLa cells. Each has approximately a 50/50 split between nucleus and cytoplasm, with <3% of nuclear GFP-MBNL1 found in nuclear CUGexp foci when they are present. We have added a graph demonstrating this to the supplementary data. The abundance of total endogenous MBNL1 is also not altered in DM1 patient-derived lens cell lines compared to controls, as shown by semi-quantitative western blot analysis, which we have also added to the supplementary data. However, if the expression of CUGexp RNA did cause a major loss of cytoplasmic MBNL1, this change would be reflective of the situation seen in DM1 and would not invalidate our results or conclusions.

      The same caveat applies to MBNL1/CUGBP1 knockdown experiments, where knocking down one may change the abundance of the other.

      To carry out FRAP experiments or live cell analysis of SG formation and loss, it is necessary to over-express a tagged version of the protein being studied. For the knockdown experiments shown in figure 6, therefore, when we knocked down MBNL1, CUGBP1 was present in excess as a GFP-tagged protein and when we knocked down CUGBP1, MBNL1 was present in excess as a GFP-tagged protein. Thus, any effects of the knockdowns on expression of the endogenous proteins being analysed would be highly unlikely to influence the results.

      1. Similarly, the authors did not consider the possibility that changes in SG/PB dynamics may be due to changes in the abundance/availability of essential SG/PB components such as GE1 and G3BP1.

      From our immunofluorescence experiments, there was certainly no obvious reduction in GE1 or TIA1 abundance (we did not assess G3BP1). We have quantitative proteomic analysis (unpublished) from a similar pair of cell lines, expressing CUGexp RNA alongside GFP rather than GFP-MBNL1. This shows no change in GE1 or G3BP1, so we would not expect to see any here either. We can easily carry out a quantitative western blot analysis to confirm and will add this to the supplementary data

      1. Most of the observed differences between control and DM cell lines were modest, leaving one wonder whether it could be simply due to cell line-to-cell line variability. Whenever possible, the authors should present results for each individual lines. For example, in Fig.2, 3 DM1 lines and 2 control lines were used. Was the difference in SG disassembly (Fig. 2B) observed in each of the 3 lines?

      Some of the alterations were modest and there is cell line-to-cell line variability in the lens cell lines. This is why we pooled the data: on average, DM1 cells disperse their SGs more quickly than control cell lines do on average. This is not an unusual way to present data from patient cell lines of diverse genetic background. We have added data for stress granule loss in the individual cell lines to the supplementary data. These data show a consistent trend towards quicker dispersal of stress granules in patient cell lines. The variability between the patient lens cell lines was also the primary reason for us to develop the inducible system in HeLa cells, on a fixed genetic background, as explained in the manuscript.

      Minor points:

      1. Western blot in Fig. 3 shows two protein products from both endogenous and overexpressed MBNL1. Please explain.

      Many of the commercially available anti-MBNL1 antibodies show this double-band in some cell lines as evidenced in numerous publications and on manufacturers’ websites (for example https://abclonal.com/catalog-antibodies/MBNL1RabbitmAb/A5149, https://www.ptglab.com/products/MBNL1-Antibody-66837-1-Ig.htm). We haven’t analysed the two bands in detail, but assume this to be a result of a post translational modification of some sort. Since GFP-MBNL1 and endogenous MBNL1 show the same thing, we do not consider it to be a major concern. We do mention the double-band as ‘characteristic’ in the figure legend for figure 3 so are not seeking to conceal anything here.

      1. No data were shown to substantiate the statement that "MBNL1 localises to CUGexp foci and CUGBP1 does not" (page 6).

      This has been published many times and is shown in figure 1A. However, we will add in a citation for this and have added an additional supplementary figure showing the lack of co-localisation in the foci from figure 1A more clearly together with separate data confirming that MBNL1 and CUGexp RNA do not co-localise with CUGBP1 in the nuclei of line HeLa_CTG960_GFPMBNL1.

      1. The y-axis of Fig. 4D should not go beyond 1.

      We will trim the axis. There are no data points above 1.0, just the indicator of statistical significance

      Significance:

      The nature of the current study is highly descriptive with little mechanistic insights.

      Our work is not descriptive, as we observed a change in stress granules in patient cells, which we could then replicate (and enhance) in a novel inducible model of DM1 designed to abrogate the unavoidable variation in patient-derived cell lines. We also now have additional mechanistic insights (see above) and have added an additional figure (draft figure 8) detailing these.

      For the subtle differences observed between control and DM1 cell lines, it remains unclear whether it may be due to cell line-to-cell line variation (see above).

      We cannot completely rule out an influence of cell line-to-cell line variation in the patient-derived lens cell lines (see above), though we think this unlikely as we saw the same effect repeated and amplified in the inducible HeLa-derived cell model, which was designed to minimise this concern. Furthermore, for stress granule loss, we see a larger effect in the HeLa cell model after 72hrs of induction than after 24hrs (figure 5C). This argues strongly that the effects seen are due to the expression of CUGexp RNA and we will emphasise this point more strongly.

      Some difference appear to be specific to one model but not the others (e.g., SG formation is slower in HeLa-CTG960 cells but not in DM1 HLECs).

      Even for the observations that seem consistent between models, the current results yielded little novel biological insights into whether and how these subtle differences in PB/SG dynamics may relate to DM1 pathogenesis. Collectively, these weaknesses render the current study incremental at best.

      The key biological insight the results provide is that the presence of the CUGexp repeat RNA results in defects in LLPS structures that are largely separable from any sequestration of MBNL1 in nuclear foci. With many researchers attributing the cellular defects in DM1 simply to the loss of MBNL1 by sequestration into nuclear foci, both this separation of altered stress response from MBNL1 levels and the involvement of altered LLPS formation (evidenced by the changes in PB behaviour on 1,6-hexanediol treatment) are novel biological insights into the cellular pathology of DM1. Additionally, our results shift the emphasis from nuclear effects to those seen in the cytoplasm.

      In terms of specific DM1 pathogenesis, the eye lens is subject to constant repeated stress and is subject to continued growth throughout the life span, relying on lens epithelial cells as a stem cell pool. Epithelial cells are also vital to the homeostatic regulation of ions, growth factor and nutrient flow from the aqueous humor to the underlying fibre cells. Any alterations in the response of lens epithelial cells, in particular, to stress is highly relevant to the pathology of cataract seen in DM1. We will revise our discussion to emphasise these key points more strongly.


      Reviewer #3

      The manuscript entiled "Phase-separated stress granules and processing bodies are compromised in Myotonic Dystrophy Type 1" by Gulyurtlu et al., characterizes the composition and ydnamics of stress granules and P-bodies in two Myotonic Dystrophy Type 1 (DM1) cell models, human lens epithelial cells from DM1 patients and age-matched controls and HeLa_CTG12_GFPMBNL1 and HeLa_CTG960GFPMBNL1 cell lines. The manuscript is somewhat descriptive with lack of functional data and some discrepancies. For example, in the discussion section, the authors conclude that "MBNL1 appears to be absent from P-bodies in cells with CUGexp foci in their nuclei. This observation suggests that the role of MBNL1 in P-bodies may be disrupted by the presence of CUGexp RNA." Figure 4A shows that "P-bodies in the DM1 model line, HeLa_CTG960GFPMBNL1 do not contain detectable amounts of GFPMBNL1". However, Figure 4E shows similar levels of total cellular MBNL1 per PB between the control CTG12 and mutated CTG960 lines.

      There is no discrepancy here. The reviewer has misinterpreted our data. PBs in the HeLa CTG960 cell line do not contain detectable amounts of GFP-MBNL1 under normal growth conditions, as shown in figure 4A. The data shown in figure 4E concern arsenite-treated cells, where some PBs in the CTG960 line do contain detectable levels of GFP-MBNL1, but significantly less than in the control CTG12 cells. We will reword these sections to make sure this is clear.

      Most importantly, in Figure S3 the authors show that CUGexp foci are present in 1-2 % of the cells. The claim appears to be too strong for the data presented in the manuscript.

      This is not what is shown in figure S3. The reviewer has misinterpreted our data. Figure S3 shows that in cells from line CTG960, only 1-2% of the total nuclear GFP-MBNL1 signal is found in the CUGexp foci, despite the intensity of the signal within them. Virtually all of the cells from the CTG960 cell line contain CUGexp foci (>95%). We will add a statement to this effect into the results section. We would not have continued working with a cell line in which only 1-2% of cells showed the DM1 phenotype of nuclear CUGexp RNA foci.

      Although the findings are interesting and of potential impact for a better understanding of the implications of RNA-protein condensate dynamics in the pathogenesis of DM1, the work presented here is still descriptive and preliminary in my opinion. In summary, the conclusions are not so convincing and additional experiments are essential to support the authors claims.

      This reviewer seems to have misinterpreted several of our data sets, including the specific points above, leading to the assertion that our conclusions are not convincing.

      Several months of works will be required to consolidate data and reorganize and ameliorate the manuscript, including the way data are presented and quantified.

      We already have data with which to address the majority of the queries posed, so should be able to make the adjustments relatively quickly.

      Specific comments:

      "On removal of stress, clearance of stress granules is mediated largely by a form of autophagy." This statement is not correct since the majority of stress granules disassemble and are not targeted to autophagy; in healthy cells only 5 % (or less) of the total SGs tend to persist in presence of autophagy or lysosome inhibitors, while the vast majority disassembles. Please rephrase carefully.

      The degree and manner of dispersal of stress granules in healthy cells on removal of stress is not well understood, but is known to differ depending on the type and duration of the stress (DOI: 10.1126/science.abj2400). We do not yet know how this may be altered in DM1, however, compromised autophagy is implicated in cataract formation, which is of relevance to our study. We will re-phrase this section of the discussion carefully to reflect the complex situation.

      Figure 1: RNA-protein complexes have heterogeneous composition. In HLECs, do all PBs colocalize with MBNL1 and CUGBP1 or only a fraction of them?

      We do not routinely see PBs without MBNL1 or CUGBP1 in the HLECs, in contrast to the situation in the HeLa CTG960 line. We have data available in order to quantify this and will add the numbers to the text of the results.

      Figure 2: Stress granules and P-Bodies are known to touch each-other, a process referred to as a "kissing event". The authors have studied the mobility of GFP-MBNL1 inside these two types of assemblies. It would be important also to quantify the "kissing" events. Is this altered in DM1 cells?

      We couldn’t find reference in the literature to ‘kissing events’ between SGs and PBs, but found several references to ‘docking’ events. We have noticed such interactions between PBs and SGs in our models. We are currently quantifying this and our first experiments (one in the HeLa cell model and one comparing one of the patient-derived lens cell lines to a control) suggest that there is a change in the frequency and/or size of such interactions in the HeLa CTG960 cell line compared to the CTG12 control and in the DM1 lens cell line derived to control. If this holds true in our repeat experiments (currently in progress), this would also provide the mechanistic insight requested by reviewer 2. We have included this, together with our data showing a decrease in total polyA RNA in stress granules in HeLa cells expressing CUGexp RNA, as an additional draft figure 8.

      Figure 3: In HeLa cells overexpressing CTG960_GFPMBNL1, beside the accumulation of one bright CUGexp puncta, several intranuclear GFPMBNL1 protein foci are visible. This subcellular distribution is different from the one observed in the control HeLaCTG12 GFPMBNL1. Can the author describe what these intranuclear GFPMBNL1 protein foci are?

      In most cells expressing CUGexp RNA, several nuclear foci form (usually one large and several smaller) and all of them contain MBNL1 (or GFP-MBNL1 in the HeLa_CTG960_GFPMBNL1 cell line). Figure S3 shows object identification using MBNL1 in this cell line, with two clear foci detected as the reviewer points out. We have added an additional panel to supplementary figure 1 to confirm that the additional foci are also CUGexp RNA foci and will clarify in the text of the results that there is not a single focus of CUGexp RNA in each nucleus.

      Is GFPMBNL1 accumulating at the level of splicing speckles? Or paraspeckles? Or other types on intranuclear condensates such as e.g. PML nuclear bodies? The different intranuclear distribution of GFPMBNL1 should be better characterized.

      The sub-nuclear distribution of MBNL1 is, indeed, very complex. MBNL1 also sometimes co-localises to splicing speckles/interchromatin granule clusters as we have previously reported in lens epithelial cell lines (DOI: 10.1042/BJ20130870 ) . The details of differences in the nuclear distribution of MBNL1, beyond its accumulation in CUGexp RNA foci, in DM1 cells compared to controls is the subject of another manuscript we have in preparation but are beyond the scope of the current study.

      Moreover, the % of cells expressing CTG960_GFPMBNL1 and forming intranuclear CUGexp foci is only mentioned in the discussion (Figure S3); for clarity it should be reported in the main text when describing Figure 3.

      The number of cells forming nuclear CUGexp foci on expression of CTG960_GFP-MBNL1 is >95% and we will add this to the text of the results section.

      "Figure S2: Quantitation of GFPMBNL1 in P-bodies in HeLa cell model of DM1." The authors report in the legend "Some, but not all, of these P-bodies contain detectable amounts of GFPMBNL1". However, the figure only shows a representative image of cells without quantification. Quantitation should be provided.

      We have data available to provide this simple quantitation. Approximately 38% of PBs in arsenite-treated cells from line HeLa_CTG960_GFPMBNL1 contain detectable levels of GFPMBNL1 using a manually-assigned cut-off intensity. We will add this to the relevant figure legend (now figure S5). However, this method of analysis requires an intensity to be manually set above which GFP-MBNL1 signal is considered ‘detectable’. This is hugely subjective, and in our opinion, the automatically generated quantitative comparison of “% total cellular MBNL1 per P-body” as shown in figure 4E is a more experimentally robust way to demonstrate a small loss of MBNL1 from P-bodies in cells from line HeLa_CTG960_GFPMBNL1 treated with arsenite compared to the relevant control.

      The authors report "a subtle change in stress granule architecture associated with the presence of CUGexp RNA". This statement is not supported by experimental data and should be omitted.

      We will qualify this statement to make it clear that we are referring to a subtle alteration in the co-localisation between CUGBP1 and MBNL1 specifically in the SGs, as our experimental data shown in figure 4D clearly support that, showing a statistically significant increase in the Pearson’s co-efficient of cololcalisation between MBNL1 and CUGBP1 in cell containing CUGexp RNA compared to the relevant control (0.90+/-0.05 for CTG960; 0.87+/-0.07 for CTG12).

      Figure 4. MBNL1 and CUGBP1 co-localise in P-bodies. What is the % of colocalization?

      We’re not sure exactly what is being requested here or what biological question the reviewer is asking us to address. MBNL1 and CUGBP1 co-localise in virtually all PBs (except in the HeLa CTG960 line where MBNL1 is undetectable in PBs under normal growth conditions). Figure 4E shows that, in cells with PBs upregulated by sodium arsenite, the mean amount of total cellular MBNL1 per PB is 0.1%, so it will be similar in cells grown under normal conditions as the PB sizes are similar and they appear to be of similar brightness by immunofluorescence. Again, this would be straightforward to quantify with our existing data if this is, indeed what the reviewer is requesting, but we question the biological significance. We would be reluctant to derive a Pearson’s co-efficient for the degree of co-localisation between CUGBP1 and MBNL1 in P-bodies as the structures are too small in size for this to be meaningful within the limits of imaging capabilities. We could, however, provide this if this is a specific request.

      Figure 5: "Treatment with sodium arsenite was then carried out under time-lapse microscopy, with Z-stacks of images taken every 4 minutes until stress granule formation was clearly seen (Fig.5A). This revealed a pronounced delay in formation of stress granules in cells containing CUGexp foci (HeLa CTG960 GFPMBNL1, 36 min +/- 12) compared to those without (HeLa CTG12 GFPMBNL1, 15 min +/- 2) (Fig.5B)." Data representation in Figure 5 is unclear and the pronounced delay in stress granule formation is not appreciated. Since the authors performed a live imaging taking pictures every 4 minutes, it would me more informative to plot the data and show the assembly and disassembly kinetics over time for both control and CTG960_ GFPMBNL1 cell lines (similar to what shown in e.g. Gwon et al., Science 2021, Ubiquitination of G3BP1 mediates stress granule disassembly in a context-specific manner, Figure 2G).

      The bar graph in figure 5B shows that cells from the CTG960 line take more than twice as long to form SGs compared to controls and are lost in half the time, with the precise numbers given in the text. A simple bar graph seemed the clearest way to present this. However, we have plotted our existing data in a similar manner similar to that in the cited reference and added this to figure 5. These graphs clearly show that the differences we see are at least as great as in other published literature, including the reference given by the reviewer (see below).

      Concerning Figure 1, the authors report no difference in the kinetic of stress granule formation in HLECs. However, they only report data after 45 and 60 min of arsenite treatment; at these time-points the assembly step is maximal. Thus, for consistency, the authors should include earlier time-points to the analysis of stress granule assembly also in HLECs, similar to what done in HeLa cells in Figure 5.

      The assembly step is not ‘maximal’ in these cell lines after 45 minutes. Figure 2A clearly shows that only ~30% of cells have SGs after 45 minutes of treatment, compared with 100% of cells after 90 minutes shown in figure 2B. We have additional data at 10, 20 and 30 minutes all showing no significant differences. We had omitted them to keep the graph simple, but have now included them as a graph of ‘% of cells with stress granules against time’ in figure 2.

      "Having established that MBNL1 and CUGBP1 co-localise closely in stress granules": the authors investigated the colocalization of each of these two proteins with stress granule markers but they did not verify whether MBNL1 and CUGBP1 co-localise.

      In figure 1B we show that endogenous CUGBP1 and endogenous MBNL1 both co-localise with the stress granule marker TIA1 in stress granules in lens epithelial cells. It would, therefore, be highly unlikely that CUGBP1 and MBNL1 would not co-localise with each other in stress granules. We have also previously verified that GFPMBNL1 behaves identically to its endogenous counterpart (Coleman et al, 2014). Furthermore, in figure 4C and 4D, we show close co-localisation between endogenous CUGBP1 and GFPMBNL1 in stress granules in our HeLa cell model, using high-resolution AiryScan microscopy for which we provide detailed quantitation.

      This aspect should be addressed experimentally since the authors also conclude that "a complex relationship between MBNL1 and CUGBP1 in stress granules" exists. Thus, the authors need to assess the colocalization of GFPMBNL1 with endogenous CUGBP1 in stress granules and the one of GFPCUGBP1 with endogenous MBNL1.

      The complex relationship we propose is based on the effects of CUGBP1 or MBNL1 knockdown on the dynamic behaviours of each other by FRAP assay and not solely on their co-localisation, although we have already analysed their co-localisation in detail as above.

      Figure 6: Please add antibody labeling to microscopy panels A and B.

      Certainly, this was an accidental omission and has been added

      Moreover, specify is the numbers refer to minutes in panel F. The data representation is also unclear - see comment above, Figure 5.

      As stated in the figure legend and on the graph axes, these numbers have been normalised to the mean time taken for SG formation/loss in the control CTG12 cell line (set at 100%). The precise numbers in minutes for mean and SD are given in the text. We have added additional graphs of ‘% of cells with stress granules against time’ to this figure, with the values in minutes given to clarify the exact time-scale.

      Figure 7: was 1,6-hexanediol added in presence of arsenite? Or was arsenite removed?

      Arsenite was not removed (neither was Doxycycline) as we wanted to examine the effect of 1,6-hexanediol on SGs and PBs without the added complication of the effects of stress removal. We will clarify this point in the methods/results.

      Aberrant persistent stress granules have been implicated in age-related (Mateju et al., 2017) and neurodegenerative diseases (Protter and Parker, 2016), such as ALS and FTD (Jain et al., 2016; Markmiller et al., 2018; Zhang et al., 2018). These are proposed to result from increased liquid-to-solid phase transitions within the stress granules (Mateju et al., 2017)." The authors should better define what are aberrant stress granules (e.g. see Ganassi et al., 2016; Turakhiya et al., 2018, PMID: 29804830).*

      We will expand on this subject in the discussion

      "Persistent stress granules have long been associated with degenerative conditions, notably ALS (Li et al., 2013)". I suggest updating the reference adding a more recent one.

      We selected this 2013 review to emphasise that there is a long history of association of persistent stress granules with degenerative conditions. We will add in an additional, more recent review.

      Significance

      The work is descriptive; thus, in this form I do not consider that it is strongly advancing the field.

      Having noted alterations to stress granule disassembly in lens epithelial cells from DM1 patients, we went on to develop a novel inducible model in which we replicated and enhanced these effects by expressing the large CUGexp RNA that causes DM1 as part of a DMPK mini-gene mimicking the genetic mutation seen in DM1 patients. This is not purely descriptive. Furthermore, we are now in a position to add an additional figure showing two pieces of evidence for functional defects in stress granules associated with CUGexp RNA expression 1) reduced accumulation of total PolyA RNA in stress granules indicating compromised function and 2) compromised ‘docking’ events between stress granules and P-bodies, a process proposed to be integral to the function of both structures.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript entiled " Phase-separated stress granules and processing bodies are compromised in Myotonic Dystrophy Type 1" by Gulyurtlu et al., characterizes the composition and ydnamics of stress granules and P-bodies in two Myotonic Dystrophy Type 1 (DM1) cell models, human lens epithelial cells from DM1 patients and age-matched controls and HeLa_CTG12_GFPMBNL1 and HeLa_CTG960_GFPMBNL1 cell lines. The manuscript is somewhat descriptive with lack of functional data and some discrepancies. For example, in the discussion section, the authors conclude that "MBNL1 appears to be absent from P-bodies in cells with CUGexp foci in their nuclei. This observation suggests that the role of MBNL1 in P-bodies may be disrupted by the presence of CUGexp RNA." Figure 4A shows that "P-bodies in the DM1 model line, HeLa_CTG960_GFPMBNL1 do not contain detectable amounts of GFPMBNL1". However, Figure 4E shows similar levels of total cellular MBNL1 per PB between the control CTG12 and mutated CTG960 lines. Most importantly, in Figure S3 the authors show that CUGexp foci are present in 1-2 % of the cells. The claim appears to be too strong for the data presented in the manuscript.

      Although the findings are interesting and of potential impact for a better understanding of the implications of RNA-protein condensate dynamics in the pathogenesis of DM1, the work presented here is still descriptive and preliminary in my opinion. In summary, the conclusions are not so convincing and additional experiments are essential to support the authors claims. Several months of works will be required to consolidate data and reorganize and ameliorate the manuscript, including the way data are presented and quantified.

      Specific comments:

      "On removal of stress, clearance of stress granules is mediated largely by a form of autophagy." This statement is not correct since the majority of stress granules disassemble and are not targeted to autophagy; in healthy cells only 5 % (or less) of the total SGs tend to persist in presence of autophagy or lysosome inhibitors, while the vast majority disassembles. Please rephrase carefully.

      Figure 1: RNA-protein complexes have heterogeneous composition. In HLECs, do all PBs colocalize with MBNL1 and CUGBP1 or only a fraction of them?

      Figure 2: Stress granules and P-Bodies are known to touch each-other, a process referred to as a "kissing event". The authors have studied the mobility of GFP-MBNL1 inside these two types of assemblies. It would be important also to quantify the "kissing" events. Is this altered in DM1 cells?

      Figure 3: In HeLa cells overexpressing CTG960_GFPMBNL1, beside the accumulation of one bright CUGexp puncta, several intranuclear GFPMBNL1 protein foci are visible. This subcellular distribution is different from the one observed in the control HeLaCTG12 GFPMBNL1. Can the author describe what these intranuclear GFPMBNL1 protein foci are? Is GFPMBNL1 accumulating at the level of splicing speckles? Or paraspeckles? Or other types on intranuclear condensates such as e.g. PML nuclear bodies? The different intranuclear distribution of GFPMBNL1 should be better characterized. Moreover, the % of cells expressing CTG960_GFPMBNL1 and forming intranuclear CUGexp foci is only mentioned in the discussion (Figure S3); for clarity it should be reported in the main text when describing Figure 3.

      "Figure S2: Quantitation of GFPMBNL1 in P-bodies in HeLa cell model of DM1." The authors report in the legend "Some, but not all, of these P-bodies contain detectable amounts of GFPMBNL1". However, the figure only shows a representative image of cells without quantification. Quantitation should be provided.

      The authors report "a subtle change in stress granule architecture associated with the presence of CUGexp RNA". This statement is not supported by experimental data and should be omitted.

      Figure 4. MBNL1 and CUGBP1 co-localise in P-bodies. What is the % of colocalization?

      Figure 5: "Treatment with sodium arsenite was then carried out under time-lapse microscopy, with Z-stacks of images taken every 4 minutes until stress granule formation was clearly seen (Fig.5A). This revealed a pronounced delay in formation of stress granules in cells containing CUGexp foci (HeLaCTG960 GFPMBNL1, 36 min +/- 12) compared to those without (HeLaCTG12 GFPMBNL1, 15 min +/- 2) (Fig.5B)." Data representation in Figure 5 is unclear and the pronounced delay in stress granule formation is not appreciated. Since the authors performed a live imaging taking pictures every 4 minutes, it would me more informative to plot the data and show the assembly and disassembly kinetics over time for both control and CTG960_ GFPMBNL1 cell lines (similar to what shown in e.g. Gwon et al., Science 2021, Ubiquitination of G3BP1 mediates stress granule disassembly in a context-specific manner, Figure 2G). Concerning Figure 1, the authors report no difference in the kinetic of stress granule formation in HLECs. However, they only report data after 45 and 60 min of arsenite treatment; at these time-points the assembly step is maximal. Thus, for consistency, the authors should include earlier time-points to the analysis of stress granule assembly also in HLECs, similar to what done in HeLa cells in Figure 5.

      "Having established that MBNL1 and CUGBP1 co-localise closely in stress granules": the authors investigated the colocalization of each of these two proteins with stress granule markers but they did not verify whether MBNL1 and CUGBP1 co-localise. This aspect should be addressed experimentally since the authors also conclude that "a complex relationship between MBNL1 and CUGBP1 in stress granules" exists. Thus, the authors need to assess the colocalization of GFPMBNL1 with endogenous CUGBP1 in stress granules and the one of GFPCUGBP1 with endogenous MBNL1.

      Figure 6: Please add antibody labeling to microscopy panels A and B. Moreover, specify is the numbers refer to minutes in panel F. The data representation is also unclear - see comment above, Figure 5.

      Figure 7: was 1,6-hexanediol added in presence of arsenite? Or was arsenite removed?

      Minor comments:

      Aberrant persistent stress granules have been implicated in age-related (Mateju et al., 2017) and neurodegenerative diseases (Protter and Parker, 2016), such as ALS and FTD (Jain et al., 2016; Markmiller et al., 2018; Zhang et al., 2018). These are proposed to result from increased liquid-to-solid phase transitions within the stress granules (Mateju et al., 2017)." The authors should better define what are aberrant stress granules (e.g. see Ganassi et al., 2016; Turakhiya et al., 2018, PMID: 29804830).

      "Persistent stress granules have long been associated with degenerative conditions, notably ALS (Li et al., 2013)". I suggest updating the reference adding a more recent one.

      Significance

      The work is descriptive; thus, in this form I do not consider that it is strongly advancing the field.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In the current study, the authors compared the dynamics of P-bodies (PBs) and stress granules (SGs) between control and several DM1 cell lines. They found that MBNL1 and CUGBP1, two CUG repeat RNA-binding proteins that are primarily nuclear, could also co-localize with PBs in the cytoplasm and re-localize to SGs under stress. Small differences were observed in SG assembly and disassembly dynamics between control and DM1 HLECs, between HeLa cells expressing either CTG12 or CTG960, and between HeLa cells with and without shRNAs targeting CUGBP1 or MBNL1. Overall, the experiments were clearly described and the results properly presented. However, critical controls, as detailed below, are missing in multiple analyses. The mechanisms underlying these apparent differences are also unknown.

      Major concerns:

      1. Throughout the study, the authors compared MBNL1 and CUGBP1 association with PBs and SGs without considering the potential differences in their cytoplasmic abundance between control and DM1 cell lines, which seems to be case for MBNL1 abundance in CTG960-expressing HeLa cells (Fig. 3). Provided that PBs and SGs exchange components with the cytosol at an equilibrium, if the cytoplasmic abundance of, for example, MBNL1 is decreased in DM1, one would expect the equilibrium being shifted resulting in less MBNL1 associated with PB/SG. Therefore, before measuring the association or the assembly/disassembly kinetics of PB and SG, the authors should first test whether MBNL1 and CUGBP1 abundance may be different between control and DM cell lines. The same caveat applies to MBNL1/CUGBP1 knockdown experiments, where knocking down one may change the abundance of the other.
      2. Similarly, the authors did not consider the possibility that changes in SG/PB dynamics may be due to changes in the abundance/availability of essential SG/PB components such as GE1 and G3BP1.
      3. Most of the observed differences between control and DM cell lines were modest, leaving one wonder whether it could be simply due to cell line-to-cell line variability. Whenever possible, the authors should present results for each individual lines. For example, in Fig.2, 3 DM1 lines and 2 control lines were used. Was the difference in SG disassembly (Fig. 2B) observed in each of the 3 lines?

      Minor points:

      1. Western blot in Fig. 3 shows two protein products from both endogenous and overexpressed MBNL1. Please explain.
      2. No data were shown to substantiate the statement that "MBNL1 localises to CUGexp foci and CUGBP1 does not" (page 6).
      3. The y-axis of Fig. 4D should not go beyond 1.

      Significance

      The nature of the current study is highly descriptive with little mechanistic insights. For the subtle differences observed between control and DM1 cell lines, it remains unclear whether it may be due to cell line-to-cell line variation (see above). Some difference appear to be specific to one model but not the others (e.g., SG formation is slower in HeLa-CTG960 cells but not in DM1 HLECs). Even for the observations that seem consistent between models, the current results yielded little novel biological insights into whether and how these subtle differences in PB/SG dynamics may relate to DM1 pathogenesis. Collectively, these weaknesses render the current study incremental at best.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This manuscript by Gulyrutlu and co-workers addresses the role of CUG expanded repeat RNA associated with DM1 in regulating the formation of higher order RNP assemblies such as stress granules and P-bodies in the cell. The authors used lens epithelial cells (hLECs) derived from a DM1 patient or a HeLa cell inducible model of DM1 to investigate whether expression of the CUG repeat-associated protein MBNL1 and CUGBP1 affected the formation and dispersal of stress granules and P-bodies. The authors show that MBNL1 and CUGBP1 are components of SGs and PBs in hLECs and HeLa cells. In cells expressing the CUG repeat, there are minor alterations in the dispersal of stress granules as well as in the formation of P-bodies. MBNL1 could affect the formation and dispersal of SGs independent of the CUG repeat. Finally, in HeLa cells, overexpression of MBNL1 can reduce the dispersal of P-bodies upon 1,6-hexanediol treatment.

      Major comments:

      One limitation of the work is that the perturbations seen with stress granules or P-bodies are all relatively small, and no evidence for a functional consequence on gene expression is demonstrated. Specifically, the authors observe only minor alterations in the formation or disaggregation of PBs and SGs in these DM1 models. Further, some of the effects observed are independent of the CUG repeat expression, suggesting that MBNL1 and CUGBP1 might have independent roles in modulating some properties of SG and PB formation or dispersal.

      1. The authors could investigate whether the CUG repeat RNA itself is localized to SGs or PBs in their models, and whether the presence of the repeat RNA is absolutely necessary for regulating the dynamics of SG or PB formation.
      2. The authors use 1,6-hexanediol to suggests that PBs and SGs in HeLa cells show behavior analogous to LLPS. However, the use of 1,6,-hexanediol to establish an assembly as a LLPS is a relatively limited analysis (despite its widespread use in the field), since this compound can affect the formation of multiple cellular substructures that are not always LLPS (for example, see Wheeler et al, 2016, eLife).

      Significance

      This study would be of interest to the field if the impact of the DM! repeat RNAs on PB and SG were more substantial, and if some functional consequences were demonstrated. The lack of a strong effect on SG or PB formation in the DM1 models, along with the CUG repeat-independent effect of MBNL1 on the formation and dispersal of these complexes, argues that MBNL1/CUGBP1 may not significantly affect the formation or dispersal of SGs and PBs.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:

      This paper puts together a nice set of data showing that a specific gene called Resf1 when deleted effects the ability of ESCs to self-renew and proceed to germline fates. I believe the data are sound and that they provide the evidence needed for the authors to make their conclusions.While I think the data are presented well and the manuscript is well-written, the "modest" functional results suggest this work would be more suited for a specialized journal.

      We thank Reviewer 1 for their supportive comments.

      • *

      Reviewer #2:

      1. In the presence of LIF, there is no difference between Resf1 knockout mESCs and WT mESCs except the expression of Esrrb, Nanog and Pou5f1. What about other genes? RNA-seq is needed to distinguish the two cell lines.

      Fukuda et al. have shown that deletion of Resf1 leads to misregulation of ~1000 genes (adj. p-value 2) in presence of LIF. This highlights large differences between transcriptomes of Resf1 KO and WT cells that occur despite only a marginal difference in self-renewal efficiency between Resf1 KO cells and WT in the presence of LIF. It is therefore questionable whether the time and resources required to perform the requested RNA-seq would produce data that could unambiguously identify the potential causative effector difference downstream of Resf1.

      As an alternative approach, we have reanalysed the Fukuda et al RNA-seq data. We find that Esrrb is significantly downregulated (in agreement with our Q-RT-PCRs), as are Klf4 and LifR (FDR 1.5). However, our meta-analysis of the Fukuda et al data did not show Pou5f1 and Nanog to be differentially expressed (FDR 1.5). This is in line with the lower level of downregulation of Pou5f1 and Nanog, compared to Esrrb in our Q-RT-PCR data. Notably, our gene expression analyses were performed in 5 biological replicates, whereas Fukuda et al. performed RNA-seq in two biological replicates. We can include the meta-analysis of the Fukuda et al data in our submission. As the change in ESC self-renewal that we see at low LIF concentrations could result from a decrease in Lifr expression, we will verify the change in expression of Lifr by Q-RT-PCR. Importantly, we will do this in a way that discriminates between expression of the transmembrane Lifr and soluble LifR, since the latter acts antagonistically (PMID: 9396734, Chambers, BJ, 1997).


      1. The authors showed Resf1 is not required for Nanog function, so how does Resf1 regulate the expression of pluripotency genes? Through epigenetic modifications or signaling pathways? The authors should design experiments to explain the detailed mechanisms.

      The strength of the immunoblot signal for RESF1 is low, even when Resf1 is expressed episomally. Therefore, although we could try to co-immunoprecipitate with the Resf1-v5 cell line and endogenous Nanog, the expression level of RESF1 may mean this effort is unsuccessful. Given the fact that the result will not affect the conclusions of our study, we do not think this effort is justifiable.

      1. The authors showed that Resf1 interacts with Nanog, but they used forced expressed proteins. Does the endogenous Resf1 interacts with endogenous Nanog? Do they bind to some same DNA sequences?

      This are important questions to answer. However, many more experiments would be required to reach firm conclusions. The reviewer is right to say that the mechanisms by which Resf1 affects pluripotency are unknown and remain to be answered in future. We therefore propose to improve the text discussing similarities in pluripotency phenotype between deletions of Trim28, SETDB1, YTHDC1 and RESF1. As deletion of RESF1 partner SETDB1 or other proteins involved in repression of retrotransposons lead to downregulation of pluripotency genes and in some cases collapse of ESCs (e.g. PMID: 19884255, Bilodeau et al. 2009; PMID: 19884257, Yuan et al. 2009), we hypothesise that the RESF1 phenotype may be explained by affecting SETDB1 chromatin binding and therefore repression of SETDB1 targets. The mild phenotype of RESF1 KO indicates that RESF1 would not be an essential component of this repressor complex but rather “a modulatory protein”.

      It is also worth noting that the meta-analysis of the RNA-seq data from Fukuda et al. suggests that Resf1-null ESCs may express reduced levels of LifR mRNA, and this is something we plan to investigate.


      1. In figure 5C, some Resf1 positive cells showed Nanog negative. Are these Nanog negative cells pluripotent?

      Nanog-null ESCs are pluripotent (PMID: 18097409, Chambers et al., 2007). In addition, NANOG-negative cells in FCS/LIF cultures can retain pluripotency. Our purpose in this figure was therefore not to say whether NANOG-negative:RESF1-positive cells are pluripotent but to draw attention to the broader expression of RESF1 in FCS/LIF compared to NANOG. Such broader expression has also been noted for other heterogeneously expressed factors (PMID: 31582397, Pantier et al. 2019).

      1. In figure 6A, the naïve mESCs are induced to EpiLCs. Is the transition efficiency of Resf1 knockout cells the same with WT mESCs? The finally obtained PGCLCs should be identified.

      We show that the key TFs of EpiLC state are expressed similarly in WT and Resf1 KO cells (Supplementary figure 4) and we have data showing that WT and Resf1KO EpiLCs have a similar morphology. Together this suggests an efficient transition to an EpiLC state. Our analysis has identified expression of Blimp1/Ap2g/Prdm14 in Resf1-null cultures. Compared to wild-type cells these levels are reduced up to 3-fold. As this is from an unsorted population and the number of SSEA1/CD61-positive cells is decreased around 2x, this suggests that the PGCLC population formed by Resf1-null cells is reduced in proportion but is otherwise normal.

      We will add photographs of EpiLC colonies formed by Resf1 KO and WT cells.


      1. in figure 5c, the scale bar is missing.

      We will add missing scale bars in the figure 5C.

      Reviewer #3:

      1. What was less clear was an explanation of why colonies 4 and 24 were chosen. Were there other colonies with the desired expression? Was this amount of expression repeated in replicative experiments with approximately 2 colonies only available to be selected?

      Approximately 30 colonies were selected for analysis. Of these, only 2 had deletion of both Resf1 alleles. We will make this point clearer in the text.


      1. Figure 1C, 5C and S2B with microscopic images should include a scale bar.

      Missing scalebars in the Figure 1C will be added. Unfortunately the microscopy setup used to collect the images in Figures 5C and S2B did not allow scalebars to be added at the time of imaging and these cannot be added retrospectively. However, we do not think that inclusion of scalebars, even were it possible would affect the conclusions of our manuscript.

      1. Figure 1E needs a better explanation of the significance, "less clear cut" is not adequate. Reporting statistics, or lack of significance, on the graph would help.

      We will update the manuscript and the Figure 1E to include results of a statistical analysis (Wilcoxon-rank sum test) comparing formation of AP+ colonies between Resf1 KO and WT cells at different LIF concentrations. These results show that both Resf1 KO cell lines have lower median number of AP+ colonies than WT cells at LIF concentrations 0 and 1 (p.adj. *

      1. It's translatability to medicine, although perhaps that is not the intention, is somewhat lacking. Is there a naturally occurring situation where LIF is absent that would require this pathway to be used? These were mouse ESC's, perhaps this study could incorporate information about relevant translation to a human condition to aid in the significance. This manuscript suggests a mechanistic evaluation by which self-renewal can occur other than the canonical pathway, which is interesting and can inform the field.

      Our results suggest that RESF1 directly or indirectly supports self-renewal of ESCs. Interestingly, Human cell atlas identified RESF1 expression as a negative predictor of survival of renal cancer and was found to be expressed in testis cancer cells and other cancer tissues. Therefore, RESF1 could promote self-renewal of cancer cells similarly to ESCs. However, this is speculative and needs further studies. As this is both outside of the scope of this manuscript and our expertise, we do not think it prudent for us to pursue this line of inquiry. However, we agree that further studies could evaluate RESF1 function in human tissues, especially pluripotent cells and germ cells. As we show that RESF1 deletion leads to reduced induction of PGCLCs and previous studies showed infertility of Resf1 KO mice, investigating link between human fertility and RESF1 could have implications in reproductive medicine.

      We will improve our discussion to highlight the possible significance of RESF1 function in human fertility.





    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The authors aimed to study RESF1 in ESC's to understand it's role in germ cell specification and PGCLC differentiation under in vitro experimental challenges.

      The experiments performed and reported were thorough and convincing. Data and methods were clearly explained.

      What was less clear was an explanation of why colonies 4 and 24 were chosen. Were there other colonies with the desired expression? Was this amount of expression repeated in replicative experiments with approximately 2 colonies only available to be selected?

      Figure 1C, 5C ad S2B with microscopic images should include a scale bar.

      Figure 1E needs a better explanation of the significance, "less clear cut" is not adequate. Reporting statistics, or lack of significance, on the graph would help.

      Significance

      Understanding the specific interactions and suggested role of RESF1 in self-renewal is informative on a molecular biology and developmental biology level.

      It's translatability to medicine, although perhaps that is not the intention, is somewhat lacking. Is there a naturally occurring situation where LIF is absent that would require this pathway to be used? These were mouse ESC's, perhaps this study could incorporate information about relevant translation to a human condition to aid in the significance. This manuscript suggests a mechanistic evaluation by which self-renewal can occur other than the canonical pathway, which is interesting and can inform the field.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors uncovered the new roles of Resf1 in mESC self-renewal and germline entry. They showed that Resf1 deletion reduced mESC self-renewal, and it's not required for Nanog function. In addition, the efficiency of PGCLC specification of Resf1 knockout mESC is less than WT mESC. However, these conclusions are too preliminary and the underlying mechanism is missing.

      Major comments:

      1. In the presence of LIF, there is no difference between Resf1 knockout mESCs and WT mESCs except the expression of Esrrb, Nanog and Pou5f1. What about other genes? RNA-seq is needed to distinguish the two cell lines.
      2. The authors showed Resf1 is not required for Nanog function, so how does Resf1 regulate the expression of pluripotency genes? Through epigenetic modifications or signaling pathways? The authors should design experiments to explain the detailed mechanisms.
      3. The authors showed that Resf1 interacts with Nanog, but they used forced expressed proteins. Does the endogenous Resf1 interacts with endogenous Nanog? Do they bind to some same DNA sequences?
      4. In figure 5C, some Resf1 positive cells showed Nanog negative. Are these Nanog negative cells pluripotent?
      5. In figure 6A, the naïve mESCs are induced to EpiLCs. Is the transition efficiency of Resf1 knockout cells the same with WT mESCs? The finally obtained PGCLCs should be identified.

      Minor comments:

      1. in figure 5c, the scale bar is missing.

      Significance

      The authors uncovered the new roles of Resf1 in mESC self-renewal and germline entry.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This paper puts together a nice set of data showing that a specific gene called Resf1 when deleted effects the ability of ESCs to self-renew and proceed to germline fates. I believe the data are sound and that they provide the evidence needed for the authors to make their conclusions.

      Significance

      While I think the data are presented well and the manuscript is well-written, the "modest" functional results suggest this work would be more suited for a specialized journal.