On 2014 Jun 02, Melanie Courtot commented:
We are grateful to Drs. Botsis, Woo and Ball for their comment on our article, and the opportunity to address their questions. We entirely agree on the importance of improving automated analysis of safety data, and we would have welcomed the opportunity to reuse their data. However, our multiple FOIA requests, beginning in May 2012, to obtain the complete datasets from the multiple published works to allow us to directly compare our approach with theirs were unsuccessful. Our correspondence eventually culminated in May 2013 with a response that “After a thorough and diligent investigation, CBERs search did not locate a record that contains the MedDRA terms for the VAERS identification numbers that were the subject of this paper or any document responsive to your request for the list of those records that are the 100 confirmed anaphylaxis cases”. This may cause slight differences in the exact numerical values being compared, and where applicable we attempted to infer possible causes, as stated in [Courtot M, 2014]: “However details of the original analysis approach necessary for reproducing the original results were not made available and we could only hypothesize the cause of results we obtained that were not in concordance with the original publication.” We also added specific mentions of differences (for example, in the legend of Table 2) when we were able to form such a hypothesis. Due to size restriction on this response, we address below the points in the order raised. A version including text from the original comment is available online.
We hope this clarifies the issues raised. We encourage Botsis et al., as well as other interested parties, to release their code and data upon publication as we did and in accordance with PLoS’s Data Policy. The availability of the dataset supporting published results will increase reproducibility or research and foster scientific advances. It will also prevent the need to form hypotheses when trying to interpret existing work, which may be detrimental to interpretation of the scientific content.
Detailed comments:
(1)The total number of potentially positive cases used in [Courtot M, 2014] is 237; this can be verified in the data we published alongside the paper. Table 2 contains information from 2 sources, as mentioned in its legend: (1) values taken from the existing published work from Botsis et al. [Botsis T, 2013], and identified by an asterisk, and (2) values obtained through our own analysis. The former, from [Botsis T, 2013], are the values from the testing set (as mentioned in the legend); it would be of little interest to compare results obtained from the training set. The latter, from our own analysis [Courtot M, 2014], encompass results from the Ontology Classification as well as from the Expanded SMQ. Regarding the Ontology Classification, the method does not use a training phase to compute the classifier results and so there is no basis on which to make the split. Similarly, there is no training done with the ABC tool. We hypothesized that there was a desire in [Botsis T, 2013] to keep the approach consistent across methods used and consequently acknowledge in the legend of the table that this may result in a small difference for the ABC classification row. With respect to the Expanded SMQ results, the table includes the results on the whole dataset. However the text does provide additional information: “Similar results were obtained using a 50/50 training/testing data split: 92% sensitivity (8696% at 95% CI) and 81% specificity (80-82% at 95% CI) in the testing set, AUC 0.93 (0.9-0.95 at 95% CI).”
(2) The original paper [Botsis T, 2013] did not document any deviation of the MedDRA terms used to the standard. The dataset used in our analysis (including the MedDRA terms) is published online. While it would certainly have been of interest to be able to compare our results with the set of MedDRA terms used by Botsis et al., answers to our FOIA requests stated that “no such record were located”.
(3) We provided details about our cosine similarity method and cited the original paper in which the method is described. We indicate that we obtained similar results on the whole dataset as with training/testing split, as we mentioned in point (1) above. In both cases, a cosine similarity score was computed for each pair of vectors (query and vector of PTs from the specific report considered) and the best cut-off point was determined. Regarding the use of additional PTs, we do not agree with the commentators’ assessment that that the additional PTs were commonly reported or unrelated to anaphylaxis. Indeed, the very first term in the list of additional PTs we suggest and provide as supplementary material for review is Hypersensitivity, which we do consider to be related to anaphylaxis. Specifically, the SMQ for anaphylaxis contains the term “type 1 hypersensitivity”. The next 4 PTs in Table S1 are already included in the SMQ; the following one is “pharyngeal oedema”. The SMQ already lists “oedema mouth”, “oropharyngeal swelling”, “laryngotracheal oedema” etc. and it seems appropriate to assume that “pharyngeal oedema” is related.
(4)(a) Using a statistical correlation between MedDRA PTs and outcome is an original contribution made by Courtot et al. in [Courtot M, 2014] and clearly described as such when constructing the Expanded SMQ (and supported by Table S1 in appendix). It was never implied that this had been done in [Botsis T, 2013]. We believe the referred-to excerpt is “Rather than creating a bag of words de novo based on keyword extraction from a training set of reports, we rather chose to expand on a known, already widely implemented, screening method, i.e., the SMQs.” We did not intend to refer to Botsis et al.’s work - we use the word “we”, and expected readers would understand that this referred to the authors of the manuscript [Courtot M, 2014]. We apologize for any confusion.
(4)(b) We are aware that Botsis et al. used the online version of the tool, and that the tool does not require the existence of a tentative diagnosis when data is submitted. However, The ABC tool is a diagnosis confirmation tool based on the Brighton guidelines, and the authors of that tool assume that instructions given with the guideline are being followed. Note that the tool itself is labeled “ANALYZE DATABASE Confirm a diagnosis for all cases in your database (Excel spreadsheet)” on the Brighton Collaboration website. We hypothesized that the diagnosis was automatically pre-selected for the batch entry done by Botsis et al. As a result, we emphasize a discrepancy between the information reported as ‘Insufficient evidence’ by Botsis et al, and what is expected based on the Brighton Case definition: ‘Reported anaphylaxis with insufficient evidence’ [Rüggeberg JU, 2007]. In [Botsis T, 2013], 488 cases are classified as ‘Insufficient evidence’. With the guideline in mind, we reviewed the records and showed that only 3 reports should have been classified as “Reported anaphylaxis with insufficient evidence”. Indeed, only 12 reports in the VAERS dataset were reported as anaphylaxis (and corresponding synonyms) in the VAERS report itself, of which 3 do not meet the Brighton case definition. Our results have been communicated to the Brighton Collaboration, with the aim of emphasizing the process for those users of the ABC tool who may not have a full understanding of the logic of the Brighton guideline.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.