On 2017 Jun 10, Shawn McGlynn commented:
With the trees from this phylogeny paper now available, we can resolve the discussion between myself and the authors (below) and conclude that there is no evidence that nitrogenase was present in the LUCA as the authors claimed in their publication.
In their data set, the authors identified two clusters of proteins which they refer to as NifD; clusters 3058 and 3899. NifD binds the metal cluster of nitrogenase and is required for catalysis. In the author's protein groups, cluster 3058 is comprised of 30 sequences, and 3899 is comprised of 10 sequences. Inspection of these sequences reveals that neither cluster contains any actual NifD sequences. This can be said with certainty since biochemistry has demonstrated that the metal cofactor coordinating residues Cys<sup>275</sup> and His<sup>442</sup> (using the numbering scheme from the Azotobacter vinelandii NifD sequence) are absolutely required for activity. NONE of the 40 sequences analyzed by the authors contain these residues. Therefore, NONE of these sequences can have the capability to bind the nitrogenase metal cluster, and it follows that none of them would have the capacity to reduce di-nitrogen. The authors have not analyzed a single nitrogenase sequence in their analysis and are therefore disqualified from making claims about the evolution of the protein; the claims made in this paper about nitrogenase cannot be substantiated with the data which have been analyzed. The sequences contained in the author's "NifD" protein clusters are closely related homologs related to nitrogenase cofactor biosynthesis and are within a large family of related proteins (which includes real NifD proteins, but also proteins involved in bacteriochlorophyll and Ni porphyrin F430 biosynthesis). While the author's analyzed proteins are more related to nitrogen metabolism than F430 or bacteriochlorophyll biosynthesis, they are not nitrogenase, but are nitrogenase homologs that complete assembly reactions.
Other than not having looked at any sequences which would be capable of catalyzing nitrogen reduction, the presentation of two "NifD" clusters highlights important problems with the methods used in this paper which affect the entire analysis and conclusions. First, two clusters were formed for one homologous group, which should not have occurred if the goal was to investigate ancestry. Second, by selecting small clusters from whole trees, the authors were able to prune the full tree until they recovered small sub trees which show monophyly of archaea and bacteria. However it was incorrect to ignore the entire tree of homologs and present only two small clusters from a large family. This is "cherry" picking to the extreme - in this case it is "nitrogenase" picking, but it is very likely that this problem of pruning until the desired result sullies many if not all of the protein families and conclusions in the paper; for example the radical SAM tree was likely pruned in this same way with the incorrect conclusion being reached (like nitrogenase, a full tree of radical SAM does not recover the archaea bacteria split in protein phylogenies either). Until someone does a complete analysis with full trees the claims of this paper will remain unproven and misleading since they are based on selective sampling of information. It would seem that the authors have missed the full trees whilst being lost in mere branches of their phylogenetic forest of 286,514 protein clusters.
In a forthcoming publication, I will discuss in detail the branching position of the NifD homologs identified by the authors, as well as the possible evolutionary trajectory of the whole protein family with respect to the evolution of life and the nitrogen cycle on this planet in more detail, including bona fide NifD proteins which I have already made comment on below in this PubMed Commons thread.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.