On 2014 Oct 06, Paul Vaucher commented:
False conclusions drawn about the level of evidence of factual statements drawn from Wikipedia
Paul Vaucher, PhD, DiO<sup>1,</sup> Jean Gabriel Jeannot, MD<sup>2,</sup> Reto Auer, MD, MAS<sup>2</sup>
1 University Center of Legal Medicine Lausanne-Geneva, University Hospital of Lausanne (CHUV), Lausanne, Switzerland
2 Department of Ambulatory Care and Community Medicine, University of Lausanne, Lausanne, Switzerland
We believe Hastly et al's study (Hasty RT, 2014) to be misleading and that their paper should not be made available to the scientific community without serious revision. Apparently, it has passed unnoticed that the methodology and statistical analysis they used had little to do with their stated hypothesis and that their interpretation of the statistics was erroneous. Authors concluded Wikipedia to be an unreliable source of health information when an alternate interpretation of the presented results would in contrary point towards considering Wikipedia as a trusted source of information, provided rigorous reanalysis and reinterpretation. There conclusions were therefore in contradiction with other studies on the subject (Archambault PM, 2013, Kräenbring J, 2014) that tend to show that for topics for which health workers contribute, such as for drugs, Wikipedia’s information as trustworthy as those from textbooks. These discrepancies can be explained by major statistical and methodological errors in Hastly et al’s publication.
The correct interpretation of the McNemar statistic suggests that the concordance for diabetes and back pain are significantly better than for concussion, and not the reverse, as stated by the authors. Using data provided in Table 3, for factual statements identified by both reviewers (two first columns of Table 3), for diabetes mellitus and back pain, the authors found that up to 94% of assertions on Wikipedia were verified (respectively 72 out of 75 statement and 63 out of 67, p<0.001 for NcNemar statistic). In contrast, for concussion, only 65% were verified (66 out of 98, p=0.888 for NcNemar statistic ). The McNemar statistic tests whether the proportion of factual statements from both reviewers, classified as concordant or discordant by the authors, are above what would be expected by chance alone (i.e. 50%). The interpretation for diabetes mellitus, if McNemar’s test should be used at all, is that we would fail to reject the null hypothesis of a proportion of concordance of 50% in < 1% of the cases. McNemar statistic thus suggests that the concordance for diabetes and back pain are significantly better than for concussion, and not the reverse, as stated by the authors. The calculation is based on the number of discordant results on the diagonals (i.e. 34 vs 1 for diabetes mellitus). It does by no means test the discordance between Wikipedia statements and existing guidelines. To test their hypothesis, it would appear more relevant to simply report the pooled average proportion of correct statements with their confidence intervals. On a technical note, McNemar statistic should not be used when there are different numbers of assertions assessed between reviewers. McNemar is also known to be highly dependent on the number of factual statements within each article. Article with higher number of statements would reach the level of significance with higher proportion of ungrounded factual statements.
Second, the article falsely leads readers to believe that all factual statements (assertions) from 10 Wikipedia articles were identified and independently assessed by two internist to see whether they were concordant or not. Using two reviewers is a recognized method for increasing precision of a measure. However, the authors did not to provide statistics allowing readers to assess the between-reviewers variability in the identification of assertions. Table 3 suggests that over a third of assertions were reported by only one of the two reviewers. Authors did not find a method to then resolve these dissimilarities (to use their definition in Table 2) and clearly define which assertions were factual statements and which were not. They then did not to define a method of agreement to define which assertions were supported by evidence and which were not. Analysed results therefore only tend to show that for certain topics, internists have difficulties in detecting factual statements from Wikipedia and knowing whether they are grounded or not.
Hastly et al’s findings suggest that while there might be some discrepancies in the quality of articles between topics, some appear of very high quality, such as diabetes mellitus and back pain. Given the unnoticed errors included in their article and the importance on the interpretation of the results, Hastly et al.’s published article reveals that peer reviewed misleading information can also be made available to the public.
Conflicts of interest :
Reto Auer and Jean-Gabriel Jeannot are advocates of the use of Wikipedia as a
communication mean to inform the population on health issues. Paul Vaucher is an
important contributor to the French Wikipedia page dedicated to osteopathic medicine.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.