On 2024-07-23 16:03:09, user ryhisner wrote:
This preprint claims that the BA.2.86 lineage, which first appeared in the sequencing record in late July 2023, evolved gradually, while cryptically circulating over the course of about 17 months, beginning in early 2022. I do not find any of the evidence presented here convincing.
Three clusters of sequences, named C1, C2, and C3, are cited as evidence of cryptic BA.2.86 circulation prior to its emergence on the world stage. Each cluster has a far simpler, more parsimonious explanation than the hypothesis presented by the authors.
I have documented and analyzed cluster C1 as they have been uploaded, beginning in October 2023, and it appears to be a classic case of a chronic infection in an immunocompromised individual who may have transmitted to one other person. Three of the four C1 sequences have matching metadata—same location, age, and sex—while the fourth has the same location and age but a different sex. Furthermore, there are two additional, closely related sequences collected in 2024 (EPI_ISL_18969735 and EPI_ISL_19259365) whose metadata also match the other three.
It seems very unlikely the C1 sequences have any relation to BA.2.86. I count approximately 17 spike substitutions and one large deletion (∆138-144) that are in two or more of these four sequences but not in BA.2.86, including P9L, H69D, K77E, T95I, ∆136-144, R158K, Q183E, G213E, D215G, R346T, L452R, F486L, V615A, V642G, H681Y, L841R, D936Y, and D1146N. None of the four C1 sequences have the most distinctive spike mutations of BA.2.86, such as ins16MPLF, the triple-nucleotide F157S-R158G, A264D, I332V, K356T, L452W, or ∆V483. All C1 sequences have ORF1a:L3201F, which is found in BA.2 but not BA.2.86, and none of the C1 sequences possess any of the seven synonymous mutations found in the BA.2.86 branch (C8293T, T13339C, T15756A, A18492G, C21622T, C25207T, C26681T).
Of the more than 30 spike mutations (relative to baseline BA.2) in BA.2.86, I only see four that are shared between BA.2.86 and the C1 sequences: R403K, A484K, R493Q (reversion), and P621S. All of these are extremely convergent in highly mutated, chronic-infection sequences. I maintain a list of such sequences, and the R493Q reversion is the single most common private mutation, occurring 304 times independently, while R403K appears 103 times, and P621S 60 times. A484K, despite being a two-nucleotide mutation, has independently evolved at least 15 times among the sequences I've recorded. (G446S—very common both in chronic-infection sequences and circulating lineages—is in 2/6 sequences from this cluster).
Furthermore, the most recently uploaded sequence from this cluster—collected on May 22, 2024, but not listed in the C1 cluster in this paper—contains 16 new spike mutations (at least two of which I suspect can be attributed to sotrovimab treatment). Four of the 16 new spike mutations are also in BA.2.86 (I332V, K356T, L486P, and S939F), a textbook example of how, through convergent intrahost evolution, chronic-infection sequences can come to acquire mutations found in other chronic infections and in unrelated circulating lineages.
I do not see any resemblance between BA.2.86 and the C1 cluster in the non-spike part of the genome, apart from M:A104V, which is commonly found in chronic-infection sequences (32 independent acquisitions by my count), and is also found in the Pango-designated GS.4/5 lineage (XBB.2.3.11.4/5). It seems to me that this is a case of a chronically infected, immunocompromised individual who developed a few mutations also found in BA.2.86—mutations which are convergent in such chronic infections—and who may have transmitted to one other person (assuming there was not a mistake in the sex assignment of EPI_ISL_18415854, in which case these sequences almost certainly all came from the same patient).
The phylogenetic relationship between these sequences, as determined by USHER (Ultrafast Sample placement on Existing tRee, created at University of California Santa Cruz and maintained by Angie Hinrichs), can be seen at the following link: <br />
https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/posited_BA.2.86_intermediates/main/BA.2.86_C1_posited_intermediates_Mexico_6_seq.json?c=gt-S_841&gmax=25384&gmin=21563&label=id:node_3061533
The C2 cluster (12 sequences of XBB.1.5.90, 11 from Japan, one from Finland) does not seem to resemble BA.2.86 at all. It is part of a large branch of XBB.1.5.90 (>400 sequences) with S:P621S, also found in BA.2.86, and the only other private mutation I see that it shares with BA.2.86 (but not other hundreds of other XBB.1.5.90 sequences) is C26681T, which is a highly homoplasic synonymous mutation in the coding region of M. Perhaps I am overlooking something, but the C2 cluster looks like a relatively humdrum branch of XBB.1.5.90 to me.
The Usher tree for these 12 sequences from C2 can be viewed here: https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/posited_BA.2.86_intermediates/main/BA.2.86_C2_posited_intermediates_JPN_FIN_12_seq.json?c=userOrOld&label=id:node_4041697
The C3 cluster consisting of 10 sequences from Sarawak, Malaysia, were all uploaded on the same day (2024-1-25), bear the same collection date (2022-3-11), and have spikes identical to JN.1—including S:L455S—from S:356 to S:681, while the rest of spike is identical to baseline BA.2. The remainder of the genome in these sequences is extremely odd. Two sequences contain the XBB mutation ORF1b:S959P. Seven have the universal BA.2.86 mutation ORF1a:N2526S, while three lack it. One has the BA.2.86.1 mutation ORF1a:K1973R. Seven of the ten have ORF1a:L3201F, which is absent from all BA.2.86. No dropout is indicated in any of the sequences.
The same Malaysian lab uploaded 321 other sequences (EPI_ISL_18821317-18821647), all from Sarawak, Malaysia, on the same day they uploaded the 10 C3 sequences. The collection dates of these sequences range from 2022-2-27 to 2024-1-9 and include 153 JN.1* sequences and 12 XBB* sequences. As Zach Hensel has noted, six of the ten C3 sequences have G19677T (ORF1b:2070H), which is the defining mutation of BA.2.40, a variant that made up about 60% of all sequences in Sarawak, Malaysia, in mid-March 2022. (Source: https://cov-spectrum.org/explore/Malaysia/AllSamples/from%3D2022-01-15%26to%3D2022-04-28/variants?nextcladeQcSnpClustersScoreTo=55&variantQuery=Nextcladepangolineage%3ABA.2.40*&)
Sixty-three sequences in this upload are categorized by Nextclade as being BA.2.40 and have 0-3 mutations relative to baseline BA.2.40. Most suspicious of all, 29 of the 153 JN.1* sequences in this same upload also have G19677T. From July 1, 2023 to the present, just 77 sequences categorized by CovSpectrum as BA.2.86* (from 13 different Pango-designated lineages) have had G19677T, with 30 of those coming from Malaysia. (Source: https://cov-spectrum.org/explore/World/AllSamples/from%3D2023-07-01%26to%3D2024-07-14/variants?variantQuery=Nextcladepangolineage%3ABA.2.86*+%26+G19677T&)
It seems clear that the 10 C2 sequences were BA.2 sequences contaminated by JN.1 sequences from the same upload.
The authors list 129 sequences they claim shorten the branch leading to BA.2.86, of which I was able to find 128 on GISAID. Ten of these sequences are from the C3 Malaysian cluster described above, along with one additional sequence from the same upload. Apart from these C3 sequences, there are only six sequences with collection dates preceding the first BA.2.86 sequences. All others were collected more than seven weeks after the first BA.2.86 sequences. Six sequences were collected between 7-14 weeks after the first BA.2.86, while the remaining 105 sequences were collected more than 15 weeks afterward.
It would be surprising if one could not find hundreds of such "hybrid" sequences due purely to contamination. Such sequences have frequently appeared in the sequencing record throughout the pandemic. A few sequences may result from coinfection, but the quality of these sequences, described below, along with the fact that a large proportion of them come from a small number of labs with records of quality-control issues, support the hypothesis that these sequences result from contamination or other lab errors.
All of the sequences in this list are low quality. They feature a mixture of extensive dropout (particularly in spike), frameshifts, large numbers of mixed nucleotides, clearly artifactual reversions, and mutations from multiple lineages (primarily BA.2.86 and XBB) with no distinct breakpoints. Many of these sequences come from labs known to have frequent quality-control issues. For example, there are 52 sequences from the United States, but none are from the CDC, whose sequences are virtually always first-rate. Instead, they come from smaller local and state labs, whose sequencing quality is often inconsistent.
The 28 sequences from Texas, for example, come from city hospitals. The average Nextclade qc score of these sequences is 2901 (median 2882). Anything over 100 is designated "bad" by Nextclade. The average number of ambiguous nucleotides per sequences is 17 (median 17), and they average 690 nucleotides of dropout. (EPI_ISL_16599325, EPI_ISL_16599747, EPI_ISL_18546432, EPI_ISL_18690036, EPI_ISL_18690080, EPI_ISL_18690421, EPI_ISL_18690466, EPI_ISL_18690496, EPI_ISL_18743044, EPI_ISL_18743073, EPI_ISL_18743094, EPI_ISL_18743097, EPI_ISL_18743159, EPI_ISL_18743350, EPI_ISL_18743431, EPI_ISL_18743464, EPI_ISL_18743470, EPI_ISL_18743477, EPI_ISL_18743592, EPI_ISL_18816401, EPI_ISL_18816517, EPI_ISL_18816528, EPI_ISL_18816612, EPI_ISL_18816709, EPI_ISL_18816890, EPI_ISL_18816980, EPI_ISL_18874714, EPI_ISL_18908998)
Similarly, the 17 sequences on this list from Italy are all from the same lab, have an average Nextclade qc score of 2356 (median 2253) and average 1349 nucleotides of dropout. Some sequences on the list are somewhat less bad than these, but none are high-quality. <br />
(EPI_ISL_18496352, EPI_ISL_18674020, EPI_ISL_18677248, EPI_ISL_18721993, EPI_ISL_18722001, EPI_ISL_18722007, EPI_ISL_18722009, EPI_ISL_18755145, EPI_ISL_18792827, EPI_ISL_18792828, EPI_ISL_18792829, EPI_ISL_18792831, EPI_ISL_18820147, EPI_ISL_18820149, EPI_ISL_18820150, EPI_ISL_18820154, EPI_ISL_18820157)
Finally, I also examined the list of 100 sequences from Supplementary Data 1, Table 5, containing genomes posited by the authors to be recombinants related to ancestors of BA.2.86. These sequences seem to me to fall into five different categories.
First, there are numerous sequences here that are also listed in one of the C1-C3 clusters—three sequences from C1 (EPI_ISL_18415832, EPI_ISL_18415854, EPI_ISL_18798234), three from C2 (EPI_ISL_18040349, EPI_ISL_18060516, EPI_ISL_18106303, EPI_ISL_18116248), and four from C3 (EPI_ISL_18821484, EPI_ISL_18821485, EPI_ISL_18821487).
Second, nine of the sequences appear to be fairly unremarkable XDD sequences, which is a designated JN.1/EG.5.1.1 recombinant (EPI_ISL_18617332, EPI_ISL_18706019, EPI_ISL_18706171, EPI_ISL_18553650, EPI_ISL_18653986, EPI_ISL_18531477, EPI_ISL_18569411, EPI_ISL_18695627).
The third category consists of what seem to me to be relatively normal sequences from a variety of Omicron lineages but with little resemblance to BA.2.86. Some of them have extensive dropout and come from labs known for high rates of artifacts and contamination. (EPI_ISL_18076898, EPI_ISL_17990180, EPI_ISL_18062641, EPI_ISL_18000549, EPI_ISL_18042058, EPI_ISL_18104305, EPI_ISL_18070023, EPI_ISL_18044667, EPI_ISL_18111437, EPI_ISL_15153261, EPI_ISL_17255807, EPI_ISL_15153261, EPI_ISL_16282414, EPI_ISL_16457740)
The fourth category is BA.2.86 or JN.1 sequences that either don't strike me as very unusual or else have extensive dropout and artifactual reversions. A few of these are from unreliable labs. (EPI_ISL_18097345, EPI_ISL_18556860, EPI_ISL_18567791, EPI_ISL_18682823, EPI_ISL_18705393, EPI_ISL_18635682, EPI_ISL_18503709, EPI_ISL_18400531, EPI_ISL_18717823, EPI_ISL_18446586, EPI_ISL_18584588, EPI_ISL_18631046, EPI_ISL_18700743, EPI_ISL_18675075, EPI_ISL_18659819, EPI_ISL_18713456, EPI_ISL_18636806, EPI_ISL_18704459, EPI_ISL_18686183)
The fifth and largest category is highly divergent sequences almost certainly deriving from chronic infections, but which appear to me to bear almost no resemblance to BA.2.86 apart from the possession of a few mutations that are widely convergent in such sequences. I've documented most of these, and almost all contain a large number of mutations and deletions not found in BA.2.86 and lack the vast majority of BA.2.86 mutations.
I hope I haven't misinterpreted any of the authors' hypotheses or data.
-Ryan Hisner