Reviewer #1 (Public Review):
Summary:
Moody et al. presented a comprehensive investigation into the choice of marker genes and its impact on the reconstruction of the early evolution of life, especially regarding the length of the branch that separates domains Bacteria and Archaea in the phylogenetic tree. Specifically, this work attempts to resolve a debate raised by a previous work: Zhu et al. Nat Commun. 2019, that the evolutionary distance between the two domains is short as estimated using an expanded set of marker genes, in contrast to conventional strategies which involve a small number of "core" genes and indicate a long branch.
Through a series of analyses on 1000 genomes, Moody et al. defended the use of core genes, and reinforced the conventional notion that the inter-domain branch (the AB branch) is long, as inferred by the core gene set. They proposed that with the 381 marker genes (the "expanded" set) used by Zhu et al., the observed short branch length is an artifact due to inter-domain gene transfer and hidden paralogy. Through topology tests, they ranked the markers by "verticality", and showed that it is positively correlated with the AB branch length. They also conducted divergence time estimation and showed that even the most vertical genes led to an implausible estimate of the origin of life.
In parallel, Moody et al. surveyed the best marker genes using a set of 700 genomes. They recovered 54 markers, and demonstrated that ribosomal markers do not indicate a longer AB branch than non-ribosomal markers do. With the better half (27) of these marker genes, they conducted further phylogenetic analyses, which shows that potential substitutional saturation and the use of site-homogeneous models could contribute to the underestimation of the AB branch. Using this taxon set and marker set, they reconstructed the prokaryotic tree of life, which revealed a long AB branch, a basal placement of DPANN in Archaea, and a derived placement of CPR in Bacteria.
Prokaryotic tree of life:
The scope(s) of the manuscript is somehow split. First, it is posed as a point-to-point rebuttal to the Zhu et al. paper, on the long vs. short AB branch question. Second, it introduces a new phylogeny of prokaryotes using 27 "good" marker genes, and demonstrates that DPANN is basal to Archaea, and CRP is derived within Bacteria.
The second scope has inadequate novelty. A recent paper (Coleman et al. Science. 2021), which was from a partially overlapping group of authors, was dedicated to the topic of CPR placement, and indicated the same conclusion (CPR being derived and sister to Chloroflexi) as the current work does, albeit using more sophisticated approaches. The paper also addressed the debate of CPR placement (including citing the Zhu et al. paper). Additionally, the basal placement of DPANN has also been suggested by previous works (such as Castelle and Banfield. Cell. 2018). Therefore, re-addressing these two topics using a largely well-established and repeatedly adopted method on a relatively small taxon set does not constitute a significant extension of current knowledge.
The debate:
The first scope appears to be the more important goal of this manuscript, as it extensively discusses the claims made by Zhu et al. and presents a point-to-point rebuttal, including counter evidence. This may narrow the interest of this work to a small audience of specialists. Nevertheless, to best evaluate the current work, it is necessary to review the Zhu et al. paper and compare individual analyses and conclusions of the two studies.
In doing so, I found that the two articles have distinct scopes that appear similar but not actually inline. To a large extent, the current work does not constitute actual rebuttal to the points made by Zhu et al. In contrast, some of the analyses presented in the current work support those by Zhu et al., despite being interpreted in a different way. For the claims that directly contest Zhu et al., I do not see sufficient evidence that they are supported by the analyses.
Below is a summary of the comparison, which I will explain point-by-point in later paragraphs.
- Moody et al. assessed AB branch length, while Zhu et al. assessed AB evolutionary distance (which is different). <br>
- Moody et al. evaluated the phylogeny indicated by a small number of core markers, while Zhu et al. evaluated the genome average using hundreds of global markers. <br>
- Zhu et al.'s results also showed that gene non-verticality, substitutional saturation, and site-homogeneous models shorten the AB distance, which is consistent with Moody et al.'s. <br>
- However, Zhu et al. found that some core markers are outliers in the genome-wide context, and the long AB distance indicated by them cannot be compensated for by the aforementioned effects. Moody et al. hasn't addressed this. <br>
Therefore, the novelty and potential impact of the current work is less compelling: It used a classical method (a few dozen core genes) and found a pattern that has been found many times by some of the same authors and others (including Zhu et al., who also analyzed core genes).
AB distance metric:
There is a subtle but critical difference between the scopes of the two papers: The Zhu et al. paper "reveals evolutionary proximity between domains Bacteria and Archaea". By stating "evolutionary proximity", it investigated two metrics: <br>
The length of the branch separating Archaea from Bacteria in the phylogenetic tree, i.e., the "AB branch". This was the main focus of the current work.
The average tip-to-tip distance (sum of branch lengths) between pairs of Archaea and Bacteria taxa in the tree. A significant proportion of the Zhu et al. work was discussing this metric, and it led to several important conclusions (e.g., Figs. 4F, 5). The current work has not explored this metric.
These two metrics implicate distinct research strategies: For 1), HGTs and paralogy are usually considered problematic (as the current and many previous works argued). However, 2) is naturally compatible with the presence (and prevalence) of HGTs and paralogy.
Authors of the current work equate "genetic distance" to "branch length" (line 70), and only investigated the latter. This equation is misleading. If organism groups A and B diverged early, but then exchanged many genes post-divergence, then this is indisputable evidence that their "genetic distance" is close. This point needs to be clearly explained in the manuscript.
Core vs genome:
This difference between "AB distance" and "AB branch length" is relevant to a more fundamental question: What defines the "evolutionary distance" between two groups of organisms? Both papers did not explicitly discuss this topic. It likely cannot be resolved in one article (as many scholars have continuously attempted on related topics in the past decades). But the discordance in understanding led to very different research strategies in the two papers, and rendering them incongruent in methodology.
Specifically, the current work (and multiple previous works) based phylogenetic inference on only genes that demonstrate a strong pattern of vertical evolution. HGTs were considered deleterious, and needed to be excluded from the analysis. This left a few dozen genes at most, and many are spatially syntenic and functionally related (e.g. ribosomal proteins). In this work, the final number is 27. Previous critiques of this methodology have suggested that this is not a tree of life, but a "tree of one percent" (Dagan and Martin, Genome Biol. 2006).
In contrast, Zhu et al. (and related previous works) attempted to evaluate the evolution of whole genomes by "maximizing the included number of loci.". They used a "global" set of 381 genes. They faced the challenge of "reconciling discordant evolutionary histories among different parts of the genome", because "HGT is widespread across the domains". To resolve this, they adopted the gene tree summary method ASTRAL.
Therefore, the "AB distance" estimated by Zhu et al. is a genome-level distance, calculated by merging conflicting gene evolutions (which itself can be disputed, see below). Whereas the "AB branch" evaluated in this work is strictly the branch length in the core gene evolution. Therefore, the results presented in the two papers do not necessarily conflict, because of the different scopes.
The expanded marker set:
The authors made a valid critique (line 121-135) that many of the 381 genes in the "expanded marker set" adopted by Zhu et al., are under-represented in Archaea. According to the PhyloPhlAn paper (Segata et al. Nat Commun. 2013) which originally developed the 400 markers (a superset of the 381 markers), these genes were selected from ~3,000 bacterial and archaeal genomes available in IMG at that time time (note that it was 2013). Zhu et al. also admitted, in the discussion section, that this marker set falls short in addressing some questions (such as the placement of DPANN). What is important in the current context, is that they were not specifically selected to address the AB distance question.
However, note that Zhu et al.'s Fig. 5A, B presented the AB distance informed by 161 out of the 381 genes. These genes have at least 50% taxa represented in both domains - the same threshold discussed in the current work (line 132). Even with those sufficiently represented genes, they still found that ribosomal proteins and a few other core genes are "outliers" in the far end of the AB distance spectrum.
Domain monophyly in gene trees:
The authors' efforts in manually checking the gene trees are appreciable (Table S1), considering the number and size of those trees. They found (line 147) "Archaea and Bacteria are recovered as reciprocally monophyletic groups in only 24 of the 381 published (Zhu et al., 2019) maximum likelihood (ML) gene trees of the expanded marker set."
The domain monophyly check was valid, however the result could be misleading because any sporadical A/B mixture was considered evidence of non-monophyly for the entire gene tree. As the taxon sampling grows, the opportunity of observing any A/B mixture also increases. For example, in Puigbò et al. J. Biology. 2009, 56% (a much higher ratio) of nearly universal genes trees had perfect domain monophyly based on merely 100 taxa. This is because even the "perfect" marker genes (such as ribosomal proteins) are not completely free from HGTs (e.g., Creevey et al. Plos One. 2011), let alone the fact that there are many artifacts in the published reference genomes (Orakov et al. Genome Biol. 2021).
Therefore, to have an objective assessment of this topic, it would be better to have a metric that allows some imperfection and reports an overall "degree" of separation (also see below).
AB branch by gene: correlation and outliers
Figure 1 is the single most important result in this work, because it argues that the short AB branch observed in Zhu et al. is an artifact due to "inter-domain gene transfer and hidden paralogy" (line 202). This argument is based on the observation that the indicated AB branch length is negatively correlated with "verticality" (measured by ΔLL and split score) of the gene.
However, Zhu et al. also investigated the impact of verticality on AB distance, and they also found that they are negatively correlated (Fig. 5E). Therefore, the current result does not appear to deliver new information (as do multiple other analyses, see below).
An important finding in Zhu et al., which is largely not discussed in the current work, is that a handful of "core" genes are outliers in the spectrum of AB distance, as compared to the majority of the genome (Fig. 5A). The AB distance indicated by these core genes is so long compared with the genome average that it cannot be compensated for by the impact of non-verticality, substitutional saturation, site-homogeneous model, etc (see below).
Fig. 1A of the current work also clearly shows that many long-AB branch genes are outliers compared with the majority of the genome (the bottom of the blue bar).
Figs. 3 and 4 attempted to show that ribosomal proteins are not outliers, but that analysis was based on a very small set of core genes, and the figures clearly show that there are outliers even in this small set (to be further discussed below).
Verticality is not causative of short AB branch:
In spite of the outlier question, there is an important logic problem in these analyses: The authors observed that gene verticality (measured by negative ΔLL) is correlated with AB branch length (Fig. 1), and concluded that HGTs and paralogy shortened the AB branch (line 202). However, they did not directly assess the rate of evolution in this model. It is totally possible that the most vertical genes happen to be those that evolved faster at the AB split. In order to support the claim made in this work, it is important to separate the effect of the rate of evolution from the effect of HGT / paralogy.
The ideal solution would be to include ALL genes (not just "good" ones), build gene trees, identify parts of the gene trees that once experienced HGT or paralogy, and prune off these PARTS, instead of excluding the entire gene tree. The remaining data are thus free of HGT / paralogy, based on which one can quantify the "true" AB branch length, and further assess how much it is correlated with "verticality", and whether there are still "outliers". This solution is likely not trivial in implementation, though. However, without such assessment, the observed short AB branch still only applies to the "tree of one percent", not the "tree of life".
Differential metric for verticality:
In spite of the similarity between the current result and Zhu et al.'s (see above), the two works approached this goal using different metrics.
First, the authors attempted to quantify the AB branch length in individual gene trees, including those that do not have Archaea and Bacteria perfectly separated. To do so they performed a constrained ML search (line 210). I am wary of this treatment because it could force distinct sequences (due to HGT or paralogy) to be grouped together, and the resulting branch length estimates could be highly inaccurate.
In contrast, Zhu et al. quantifies the average taxon-to-taxon phylogenetic distance between the two domains, regardless of the overall domain monophyly. That method was free of this concern, although it computed a different metric.
Second, the authors assessed "marker gene verticality" using two metrics: a) AU test result (rejected or not) (Fig. 1A), c) ΔLL, the difference in log likelihood between the constrained ML tree and ML gene tree (line 222, Fig. 1B, C). I am concerned that they are sensitive to taxon sampling and stochastic events, as I explained above regarding domain monophyly. It is possible that a single mislabeling event would cause the topology test to report a significant result. In addition, they evaluate how severely domain monophyly is violated, but they do not account for intra-domain HGTs and other artifacts, which are also part of "verticality", and they can potentially distort the AB branch as well.
I did not find the ΔLL values of individual markers in any supplementary table. I also did not find any correlation statistics associated with Fig. 1B.
Statistical test:
Line 157: "For the remaining 302 genes, domain monophyly was rejected (p < 0.05) for 232 out of 302 (76.8%) genes." Did the authors perform multiple hypothesis correction? If not, they probably should.
Line 217: "This result suggests that inter-domain gene transfers reduce the AB branch length when included in a concatenation." and Fig. 1A. If I understand correctly, this analysis was performed on individual gene trees, rather than in a concatenated setting. Therefore, the result does not directly support this conclusion.
Line 224: "Furthermore, AB branch length decreased as increasing numbers of low-verticality markers were added to the concatenate (Figure 1(c))". While this statement is likely true, Zhu et al. also presented similar results (Fig. 5) despite using a different metric, and they concluded that the impact is moderate and cannot explain the status of some core genes as outliers.
Concatenation and branch length:
The authors pointed out that "Concatenation is based on the assumption that all of the genes in the supermatrix evolve on the same underlying tree; genes with different gene tree topologies violate this assumption and should not be concatenated because the topological differences among sites are not modelled, and so the impact on inferred branch lengths is difficult to predict." (line 187).
This argument is valid. In my opinion, this is the one most important potential issue of Zhu et al.'s analysis. In that work, they inferred genome tree topology through ASTRAL, which resolves conflicting gene evolutions. However ASTRAL does not report branch lengths in the unit of number of mutations. Therefore, they plugged the concatenated alignment into this topology for branch length estimation, hoping that it will "average out" the result. That workaround was apparently not ideal.
However, the practice of molecular phylogenetics is complicated. Theoretically, every gene, domain, codon position and site may have its unique evolutionary process, and there have been efforts to develop better partition and mixture models to address these possibilities. But there is a trade off; these technologies are computationally demanding and have the risk of overfitting. It is plausible that in some scenarios, the gain of concatenating many loci (despite conflicting phylogeny) may outweigh the cost of having unpredictable effects.
But this dilemma needs to be analyzed rather than just being discussed. The Zhu et al. paper did not assess the impact of such concatenation on branch length estimation. The best answer is to conduct an analysis to show that concatenating genes with conflicting phylogeny would result in an AB branch that is shorter than the mean of those genes, and the reduction of AB branch length is correlated with the amount of conflict involved. The current work has not done this.
Divergence time estimation:
The manuscript dedicates one section (line 230-266) to argue that the divergence time estimation analysis performed by Zhu et al. was not good evidence for marker gene suitability. Zhu et al. showed congruence of the expanded marker set with geological records whereas ribosomal proteins were conflicting with the geologic record.To support their argument, the authors estimated divergence times using the top 20 most "vertical" genes measured by ΔLL.
It would be good to clarify which genes they are, and it would be important to check whether they include some of the most "AB-distant" ones found by Zhu et al. Their Fig. 5A shows that there are genes that divide the two domains several folds further than the ribosomal proteins (such as rpoC). If they are among the 20 genes, it will not be surprising that the estimated AB split is older than it should be.
Overall, I think this section is logically questionable. Zhu et al. suggested that "They show the limitation of using core genes alone to model the evolution of the entire genome, and highlight the value in using a more diverse marker gene set.". The current work showed that using another set of a few genes (I do not know if they include multiple "core" genes, as discussed above, but it is plausible) also did not work well. This does not refute Zhu et al.'s claim.
What's important in Zhu et al.'s analysis is this: they demonstrated that using a small set of genes in DTE may cause artifacts due to them significantly violating the molecular clock at certain stages of evolution. Instead, using a larger set of markers that represent a portion of the entire genome would help to "smooth out" these artifacts. This of course is not the ideal solution, likely because concatenating conflicting genes and modelling them uniformly is not the best idea (see above). But as an operational workaround, it was not challenged by the analysis in the current work.
Finally, I agree with the authors' statement that more and reliable calibrations are the best way to improve divergence time estimation.
AB branch by ribosomal and non-ribosomal genes:
Two figures (Figs. 3 and 4) are two sections (line 270-303) dedicated to the argument that ribosomal markers do not indicate a longer AB branch than a non-ribosomal one. However, this is a small scale test (38 ribosomal markers vs. 16 non-ribosomal markers) compared with the similar analysis in Zhu et al. (30 ribosomal markers vs. 381 global markers). A closer look at Figs. 3 and 4 suggests that while the AB lengths indicated by the ribosomal markers are within a relatively narrow range, those by the non-ribosomal ones are very diverse, including ones that are several folds longer than the ribosomal average. This result is in accordance with that of Zhu et al.'s Fig. 5A, although the latter was describing a different metric. Do these genes also overlap the ones found by Zhu et al.?
Nevertheless, this analysis does not falsify Zhu et al.'s, because it compared a different, much smaller, and deliberately chosen group of genes.
Substitutional saturation:
The comparative analysis of slow- and fast-evolving sites is interesting. The result (Fig. 5) is visually impactful. In my view, this analysis is valid, and the conclusion is supported. It would be better to explain the rationale with more detail to facilitate understanding by a general audience.
Zhu et al. also tested the impact of substitution saturation on the AB branch, using a more traditional approach (Fig. S19). They also found that the inter-domain distance is more influenced by potential substitution saturation, but the difference is minor. They concluded that (AB distance) "is not substantially impacted by saturation."
Like other analyses, these two analyses involved very different locus sampling (27 most "vertical" genes vs. 381 expanded genes). They also differ by the metric being measured (AB branch length vs. average distance between AB taxa). Therefore, the analysis in the current work does not falsify the analysis by Zhu et al. In contrast, it is inline with (though not in direct support of) Zhu et al. and others' suggestion that there was "accelerated evolution of ribosomal proteins along the inter-domain branch" (line 25) in the 27 core genes (of which 15 are ribosomal proteins).
Evolutionary model fit:
The authors compared the AB branch length indicated by the standard, site-homogeneous model LG+G4+F vs. the site-heterogeneous model LG+C60+G4+F, and found that the latter recovered a longer AB branch (2.52 vs. 1.45). The author's reasoning for using a site-heterogeneous model is valid, and this analysis is sound.
However, Zhu et al. also analyzed their data using the site-heterogeneous model C60 -- the same as in this work, but through the PMSF (posterior mean site frequency) method. Zhu et al. also compared it with two site-homogeneous models (Gamma and FreeRate). The results were extensively presented and discussed (Figs. 3, 4E, F, S23, S24, Note S2). They also found that C60+PMSF elongated the AB branch compared with the site-homogeneous models (Fig. S24A). As for the average AB distance (another metric evaluated by Zhu et al., as discussed above), C60+PMSF increased this metric when using ribosomal proteins, but not much when using the expanded marker set (Fig. S25A). And overall, the elongation by C60+PMSF with the expanded markers cannot compensate for the long branch indicated by the ribosomal proteins.
Therefore, similar to the point I made above, this analysis is sound but it does not logically falsify the conclusion made by Zhu et al., as it only concerns a small set of markers, and it recovered a previously described pattern.
The manuscript also did not clarify what the phrase "poor model fit" refers to (line 34 and line 304). If this is addressing the Gamma model evaluated by the authors, then this claim is valid though not novel (but see my previous comment on the trade-off). If that is a general reference to Zhu et al.'s methodology, then the authors should at least include the C60+PMSF model in the analysis, and show that C60 indicates a significantly longer AB branch than C60+PMSF does (if that's the case, which is doubtful). Admittedly, C60+PMSF is cheaper than the native C60 in computation, but "In some empirical and simulation settings PMSF provided more accurate estimates of phylogenies than the mixture models from which they derive." (Wang et al. Syst Biol. 2018).
Finally, Zhu et al. also performed an analysis using the native C60 model on a further reduced taxon set. That result was not presented in the published paper, but it can be found in the "Peer Review File" posted on the Nature Communications website. That tree also recovered a short AB distance, and placed CPR at the base of Bacteria, and showed that this placement was not impacted by the removal of Archaea.
Taxon sampling:
My final comment is about taxon sampling. Zhu et al. developed an algorithm for less biased taxon sampling, and they argued that extensive taxon sampling is important in resolving the early evolution of life. They presented evidence showing that reduced taxon sampling changed overall topology and basal relationships (Figs. S13, S14, S23, Note S2). The analyses were performed in combination with the assessment of site sampling, locus sampling, substitution model and other factors. The importance of less biased and/or extensive taxon sampling was also noted by previous works, especially in a phylogenomic framework (e.g., Hedtke et al. Syst Biol. 2006; Wu and Eisen. Genome Biol. 2008; Beiko. Biol Direct. 2011). The current work is based on a smaller set of taxa, and it has not addressed the impact of taxon sampling. As I suggested above, some results may be sensitive to taxon sampling.