4 Matching Annotations
  1. Jul 2018
    1. On 2014 Jun 02, Melanie Courtot commented:

      We are grateful to Drs. Botsis, Woo and Ball for their comment on our article, and the opportunity to address their questions. We entirely agree on the importance of improving automated analysis of safety data, and we would have welcomed the opportunity to reuse their data. However, our multiple FOIA requests, beginning in May 2012, to obtain the complete datasets from the multiple published works to allow us to directly compare our approach with theirs were unsuccessful. Our correspondence eventually culminated in May 2013 with a response that “After a thorough and diligent investigation, CBERs search did not locate a record that contains the MedDRA terms for the VAERS identification numbers that were the subject of this paper or any document responsive to your request for the list of those records that are the 100 confirmed anaphylaxis cases”. This may cause slight differences in the exact numerical values being compared, and where applicable we attempted to infer possible causes, as stated in [Courtot M, 2014]: “However details of the original analysis approach necessary for reproducing the original results were not made available and we could only hypothesize the cause of results we obtained that were not in concordance with the original publication.” We also added specific mentions of differences (for example, in the legend of Table 2) when we were able to form such a hypothesis. Due to size restriction on this response, we address below the points in the order raised. A version including text from the original comment is available online.

      We hope this clarifies the issues raised. We encourage Botsis et al., as well as other interested parties, to release their code and data upon publication as we did and in accordance with PLoS’s Data Policy. The availability of the dataset supporting published results will increase reproducibility or research and foster scientific advances. It will also prevent the need to form hypotheses when trying to interpret existing work, which may be detrimental to interpretation of the scientific content.

      Detailed comments:

      (1)The total number of potentially positive cases used in [Courtot M, 2014] is 237; this can be verified in the data we published alongside the paper. Table 2 contains information from 2 sources, as mentioned in its legend: (1) values taken from the existing published work from Botsis et al. [Botsis T, 2013], and identified by an asterisk, and (2) values obtained through our own analysis. The former, from [Botsis T, 2013], are the values from the testing set (as mentioned in the legend); it would be of little interest to compare results obtained from the training set. The latter, from our own analysis [Courtot M, 2014], encompass results from the Ontology Classification as well as from the Expanded SMQ. Regarding the Ontology Classification, the method does not use a training phase to compute the classifier results and so there is no basis on which to make the split. Similarly, there is no training done with the ABC tool. We hypothesized that there was a desire in [Botsis T, 2013] to keep the approach consistent across methods used and consequently acknowledge in the legend of the table that this may result in a small difference for the ABC classification row. With respect to the Expanded SMQ results, the table includes the results on the whole dataset. However the text does provide additional information: “Similar results were obtained using a 50/50 training/testing data split: 92% sensitivity (8696% at 95% CI) and 81% specificity (80-82% at 95% CI) in the testing set, AUC 0.93 (0.9-0.95 at 95% CI).”

      (2) The original paper [Botsis T, 2013] did not document any deviation of the MedDRA terms used to the standard. The dataset used in our analysis (including the MedDRA terms) is published online. While it would certainly have been of interest to be able to compare our results with the set of MedDRA terms used by Botsis et al., answers to our FOIA requests stated that “no such record were located”.

      (3) We provided details about our cosine similarity method and cited the original paper in which the method is described. We indicate that we obtained similar results on the whole dataset as with training/testing split, as we mentioned in point (1) above. In both cases, a cosine similarity score was computed for each pair of vectors (query and vector of PTs from the specific report considered) and the best cut-off point was determined. Regarding the use of additional PTs, we do not agree with the commentators’ assessment that that the additional PTs were commonly reported or unrelated to anaphylaxis. Indeed, the very first term in the list of additional PTs we suggest and provide as supplementary material for review is Hypersensitivity, which we do consider to be related to anaphylaxis. Specifically, the SMQ for anaphylaxis contains the term “type 1 hypersensitivity”. The next 4 PTs in Table S1 are already included in the SMQ; the following one is “pharyngeal oedema”. The SMQ already lists “oedema mouth”, “oropharyngeal swelling”, “laryngotracheal oedema” etc. and it seems appropriate to assume that “pharyngeal oedema” is related.

      (4)(a) Using a statistical correlation between MedDRA PTs and outcome is an original contribution made by Courtot et al. in [Courtot M, 2014] and clearly described as such when constructing the Expanded SMQ (and supported by Table S1 in appendix). It was never implied that this had been done in [Botsis T, 2013]. We believe the referred-to excerpt is “Rather than creating a bag of words de novo based on keyword extraction from a training set of reports, we rather chose to expand on a known, already widely implemented, screening method, i.e., the SMQs.” We did not intend to refer to Botsis et al.’s work - we use the word “we”, and expected readers would understand that this referred to the authors of the manuscript [Courtot M, 2014]. We apologize for any confusion.

      (4)(b) We are aware that Botsis et al. used the online version of the tool, and that the tool does not require the existence of a tentative diagnosis when data is submitted. However, The ABC tool is a diagnosis confirmation tool based on the Brighton guidelines, and the authors of that tool assume that instructions given with the guideline are being followed. Note that the tool itself is labeled “ANALYZE DATABASE Confirm a diagnosis for all cases in your database (Excel spreadsheet)” on the Brighton Collaboration website. We hypothesized that the diagnosis was automatically pre-selected for the batch entry done by Botsis et al. As a result, we emphasize a discrepancy between the information reported as ‘Insufficient evidence’ by Botsis et al, and what is expected based on the Brighton Case definition: ‘Reported anaphylaxis with insufficient evidence’ [Rüggeberg JU, 2007]. In [Botsis T, 2013], 488 cases are classified as ‘Insufficient evidence’. With the guideline in mind, we reviewed the records and showed that only 3 reports should have been classified as “Reported anaphylaxis with insufficient evidence”. Indeed, only 12 reports in the VAERS dataset were reported as anaphylaxis (and corresponding synonyms) in the VAERS report itself, of which 3 do not meet the Brighton case definition. Our results have been communicated to the Brighton Collaboration, with the aim of emphasizing the process for those users of the ABC tool who may not have a full understanding of the logic of the Brighton guideline.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2014 May 08, Taxiarchis Botsis commented:

      Commentary on article “The Logic of Surveillance Guidelines: An Analysis of Vaccine Adverse Event Reports from an Ontological Perspective” by Mélanie Courtot, Ryan R. Brinkman and Alan Ruttenberg.

      Taxiarchis Botsis<sup>1</sup> , Emily Jane Woo<sup>1</sup> , Robert Ball<sup>1</sup>

      <sup>1</sup> Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research (CBER), Food and Drug Administration (FDA), Rockville, MD

      In their recent work, we were pleased to see Courtot et al. [Courtot M, 2014] pursue a line of research we initiated [Botsis T, 2011, Botsis T, 2012, Botsis T, 2013, Botsis T, 2013] to improve the efficiency of the application of case definitions to spontaneous reports of adverse events following immunization reported to the US Vaccine Adverse Event Reporting System (VAERS). VAERS is a spontaneous reporting system co-managed by the Food and Drug Administration (FDA) and Centers for Disease Control and Prevention (CDC). Courtot et al. described the development and evaluation of an adverse event ontology and proposed using it to represent the Brighton Collaboration (BC) case definitions. Their work is based on data sets consisting of “possible anaphylaxis” reports identified by FDA VAERS review staff and used in our research [Botsis T, 2011, Botsis T, 2013]. Using data obtained from the FDA through a series of Freedom of Information Act (FOIA) requests, the authors evaluated their system using the BC case definition for anaphylaxis. However, we have some concerns regarding the methods that Courtot et al. used to compare their results with ours [Botsis T, 2011, Botsis T, 2013], and we believe some clarification on the part of the authors would be helpful. First, in Table 2 of their results, Courtot et al. present a comparison of the sensitivity, specificity and AUC of their methods with those previously published by us [Botsis T, 2013]. The values taken for comparison from our report [Botsis T, 2013] were based on a randomly selected 25% testing subset. The authors’ results appear to be based on the entire set of 6034 reports for H1N1 vaccine previously classified by the FDA VAERS review staff as either “possible anaphylaxis” (N=237, not 236 as reported by Courtot et al.) or not “possible anaphylaxis” since they do not state otherwise. This issue is important to understand because the comparison presented in Table 2 of Courtot et al. might not be based on the same data. Second, we are concerned about the source of the Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs) used in all the analyses because these were not provided in the FOIA responses. While it is possible to obtain MedDRA PTs from public data, these PTs would differ from those used in our analysis because of constraints on the number of PTs available in the public files. It might also be possible to convert the narrative text to PTs, but doing so would certainly produce a different set from ours [Botsis T, 2013]. Both of these differences could explain and contribute to variability in the comparative results.<br> Third, Courtot et al. neither explained whether they used a training/testing split when developing and then testing the expanded Standardized MedDRA Query (SMQ), nor exactly how cosine similarity was calculated for this purpose. They suggest the use of an expanded SMQ by selecting the PTs that are significantly correlated with the outcome and adding 120 new PTs to the original anaphylaxis SMQ. Details about the cosine similarity calculations are not provided, but it appears that Courtot et al. are using the expanded SMQ to perform a cosine similarity based classification on the same set used for estimating the correlations; such an approach could explain the high sensitivity and might not be reproducible if applied prospectively to another data set. Additionally, regarding the established and validated SMQ for anaphylaxis, it seems paradoxical that the addition of PTs which are commonly reported and unrelated to anaphylaxis would increase its performance. Clarification would be helpful. Fourth, the subsection “Automated Case Screening” and the discussion for the appropriate use of Automatic Brighton Classification (ABC) tool contain a number of misinterpretations for the work performed in [Botsis T, 2013]. With regard to the “Automated Case Screening” subsection, we would like to clarify two main points. First, we did not use the statistical correlation between the MedDRA PTs and the outcome (i.e., the “potentially positive” label) as implied by Courtot et al. We used the cosine similarity scores of the report vectors in the training subset to define the best cutoff point for the classification of reports and further evaluated it with the scores in the testing subset. Second, we used only MedDRA PTs in that analysis; we did not follow a “bag of words” approach and the terms were not obtained “based on keyword extraction” as stated by Courtot et al. With regard to the ABC tool, we used the online version of the tool that allows the processing of an appropriately formatted database of reports. While certain fields of the database must be marked as present (‘yes’), absent (‘no’), or even remain blank, the existence of a tentative diagnosis is not necessary to perform the classification. The online ABC tool processes the database and assigns one of the following labels to a report: “Level 1”, “Level 2”, “Level 3”, “Not a case”, “Insufficient evidence”, and “No evidence”. As previously stated [Botsis T, 2013], we prepared the database of 6034 reports to allow their automated classification by the ABC tool without human intervention or further interpretation of either the classification process or the labels produced by the ABC tool; this analysis was performed in collaboration with the ABC tool developer at the Brighton Collaboration. The development of advanced methodologies for the automated processing of safety surveillance data is important considering the large number of reports submitted to the FDA and other public health authorities. We welcome the reuse of our data for serious and constructive research, as well as the exploration of other methodologies that may offer efficient, effective, and rigorous processing of our data. We encourage the authors of this study and any other researchers to follow up with queries about our work, and we welcome them to maintain ongoing communication to ensure the most appropriate use of the data and avoid misinterpretations that might reduce the usefulness of the work and its potential application.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2014 May 08, Taxiarchis Botsis commented:

      Commentary on article “The Logic of Surveillance Guidelines: An Analysis of Vaccine Adverse Event Reports from an Ontological Perspective” by Mélanie Courtot, Ryan R. Brinkman and Alan Ruttenberg.

      Taxiarchis Botsis<sup>1</sup> , Emily Jane Woo<sup>1</sup> , Robert Ball<sup>1</sup>

      <sup>1</sup> Office of Biostatistics and Epidemiology, Center for Biologics Evaluation and Research (CBER), Food and Drug Administration (FDA), Rockville, MD

      In their recent work, we were pleased to see Courtot et al. [Courtot M, 2014] pursue a line of research we initiated [Botsis T, 2011, Botsis T, 2012, Botsis T, 2013, Botsis T, 2013] to improve the efficiency of the application of case definitions to spontaneous reports of adverse events following immunization reported to the US Vaccine Adverse Event Reporting System (VAERS). VAERS is a spontaneous reporting system co-managed by the Food and Drug Administration (FDA) and Centers for Disease Control and Prevention (CDC). Courtot et al. described the development and evaluation of an adverse event ontology and proposed using it to represent the Brighton Collaboration (BC) case definitions. Their work is based on data sets consisting of “possible anaphylaxis” reports identified by FDA VAERS review staff and used in our research [Botsis T, 2011, Botsis T, 2013]. Using data obtained from the FDA through a series of Freedom of Information Act (FOIA) requests, the authors evaluated their system using the BC case definition for anaphylaxis. However, we have some concerns regarding the methods that Courtot et al. used to compare their results with ours [Botsis T, 2011, Botsis T, 2013], and we believe some clarification on the part of the authors would be helpful. First, in Table 2 of their results, Courtot et al. present a comparison of the sensitivity, specificity and AUC of their methods with those previously published by us [Botsis T, 2013]. The values taken for comparison from our report [Botsis T, 2013] were based on a randomly selected 25% testing subset. The authors’ results appear to be based on the entire set of 6034 reports for H1N1 vaccine previously classified by the FDA VAERS review staff as either “possible anaphylaxis” (N=237, not 236 as reported by Courtot et al.) or not “possible anaphylaxis” since they do not state otherwise. This issue is important to understand because the comparison presented in Table 2 of Courtot et al. might not be based on the same data. Second, we are concerned about the source of the Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs) used in all the analyses because these were not provided in the FOIA responses. While it is possible to obtain MedDRA PTs from public data, these PTs would differ from those used in our analysis because of constraints on the number of PTs available in the public files. It might also be possible to convert the narrative text to PTs, but doing so would certainly produce a different set from ours [Botsis T, 2013]. Both of these differences could explain and contribute to variability in the comparative results.<br> Third, Courtot et al. neither explained whether they used a training/testing split when developing and then testing the expanded Standardized MedDRA Query (SMQ), nor exactly how cosine similarity was calculated for this purpose. They suggest the use of an expanded SMQ by selecting the PTs that are significantly correlated with the outcome and adding 120 new PTs to the original anaphylaxis SMQ. Details about the cosine similarity calculations are not provided, but it appears that Courtot et al. are using the expanded SMQ to perform a cosine similarity based classification on the same set used for estimating the correlations; such an approach could explain the high sensitivity and might not be reproducible if applied prospectively to another data set. Additionally, regarding the established and validated SMQ for anaphylaxis, it seems paradoxical that the addition of PTs which are commonly reported and unrelated to anaphylaxis would increase its performance. Clarification would be helpful. Fourth, the subsection “Automated Case Screening” and the discussion for the appropriate use of Automatic Brighton Classification (ABC) tool contain a number of misinterpretations for the work performed in [Botsis T, 2013]. With regard to the “Automated Case Screening” subsection, we would like to clarify two main points. First, we did not use the statistical correlation between the MedDRA PTs and the outcome (i.e., the “potentially positive” label) as implied by Courtot et al. We used the cosine similarity scores of the report vectors in the training subset to define the best cutoff point for the classification of reports and further evaluated it with the scores in the testing subset. Second, we used only MedDRA PTs in that analysis; we did not follow a “bag of words” approach and the terms were not obtained “based on keyword extraction” as stated by Courtot et al. With regard to the ABC tool, we used the online version of the tool that allows the processing of an appropriately formatted database of reports. While certain fields of the database must be marked as present (‘yes’), absent (‘no’), or even remain blank, the existence of a tentative diagnosis is not necessary to perform the classification. The online ABC tool processes the database and assigns one of the following labels to a report: “Level 1”, “Level 2”, “Level 3”, “Not a case”, “Insufficient evidence”, and “No evidence”. As previously stated [Botsis T, 2013], we prepared the database of 6034 reports to allow their automated classification by the ABC tool without human intervention or further interpretation of either the classification process or the labels produced by the ABC tool; this analysis was performed in collaboration with the ABC tool developer at the Brighton Collaboration. The development of advanced methodologies for the automated processing of safety surveillance data is important considering the large number of reports submitted to the FDA and other public health authorities. We welcome the reuse of our data for serious and constructive research, as well as the exploration of other methodologies that may offer efficient, effective, and rigorous processing of our data. We encourage the authors of this study and any other researchers to follow up with queries about our work, and we welcome them to maintain ongoing communication to ensure the most appropriate use of the data and avoid misinterpretations that might reduce the usefulness of the work and its potential application.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2014 Jun 02, Melanie Courtot commented:

      We are grateful to Drs. Botsis, Woo and Ball for their comment on our article, and the opportunity to address their questions. We entirely agree on the importance of improving automated analysis of safety data, and we would have welcomed the opportunity to reuse their data. However, our multiple FOIA requests, beginning in May 2012, to obtain the complete datasets from the multiple published works to allow us to directly compare our approach with theirs were unsuccessful. Our correspondence eventually culminated in May 2013 with a response that “After a thorough and diligent investigation, CBERs search did not locate a record that contains the MedDRA terms for the VAERS identification numbers that were the subject of this paper or any document responsive to your request for the list of those records that are the 100 confirmed anaphylaxis cases”. This may cause slight differences in the exact numerical values being compared, and where applicable we attempted to infer possible causes, as stated in [Courtot M, 2014]: “However details of the original analysis approach necessary for reproducing the original results were not made available and we could only hypothesize the cause of results we obtained that were not in concordance with the original publication.” We also added specific mentions of differences (for example, in the legend of Table 2) when we were able to form such a hypothesis. Due to size restriction on this response, we address below the points in the order raised. A version including text from the original comment is available online.

      We hope this clarifies the issues raised. We encourage Botsis et al., as well as other interested parties, to release their code and data upon publication as we did and in accordance with PLoS’s Data Policy. The availability of the dataset supporting published results will increase reproducibility or research and foster scientific advances. It will also prevent the need to form hypotheses when trying to interpret existing work, which may be detrimental to interpretation of the scientific content.

      Detailed comments:

      (1)The total number of potentially positive cases used in [Courtot M, 2014] is 237; this can be verified in the data we published alongside the paper. Table 2 contains information from 2 sources, as mentioned in its legend: (1) values taken from the existing published work from Botsis et al. [Botsis T, 2013], and identified by an asterisk, and (2) values obtained through our own analysis. The former, from [Botsis T, 2013], are the values from the testing set (as mentioned in the legend); it would be of little interest to compare results obtained from the training set. The latter, from our own analysis [Courtot M, 2014], encompass results from the Ontology Classification as well as from the Expanded SMQ. Regarding the Ontology Classification, the method does not use a training phase to compute the classifier results and so there is no basis on which to make the split. Similarly, there is no training done with the ABC tool. We hypothesized that there was a desire in [Botsis T, 2013] to keep the approach consistent across methods used and consequently acknowledge in the legend of the table that this may result in a small difference for the ABC classification row. With respect to the Expanded SMQ results, the table includes the results on the whole dataset. However the text does provide additional information: “Similar results were obtained using a 50/50 training/testing data split: 92% sensitivity (8696% at 95% CI) and 81% specificity (80-82% at 95% CI) in the testing set, AUC 0.93 (0.9-0.95 at 95% CI).”

      (2) The original paper [Botsis T, 2013] did not document any deviation of the MedDRA terms used to the standard. The dataset used in our analysis (including the MedDRA terms) is published online. While it would certainly have been of interest to be able to compare our results with the set of MedDRA terms used by Botsis et al., answers to our FOIA requests stated that “no such record were located”.

      (3) We provided details about our cosine similarity method and cited the original paper in which the method is described. We indicate that we obtained similar results on the whole dataset as with training/testing split, as we mentioned in point (1) above. In both cases, a cosine similarity score was computed for each pair of vectors (query and vector of PTs from the specific report considered) and the best cut-off point was determined. Regarding the use of additional PTs, we do not agree with the commentators’ assessment that that the additional PTs were commonly reported or unrelated to anaphylaxis. Indeed, the very first term in the list of additional PTs we suggest and provide as supplementary material for review is Hypersensitivity, which we do consider to be related to anaphylaxis. Specifically, the SMQ for anaphylaxis contains the term “type 1 hypersensitivity”. The next 4 PTs in Table S1 are already included in the SMQ; the following one is “pharyngeal oedema”. The SMQ already lists “oedema mouth”, “oropharyngeal swelling”, “laryngotracheal oedema” etc. and it seems appropriate to assume that “pharyngeal oedema” is related.

      (4)(a) Using a statistical correlation between MedDRA PTs and outcome is an original contribution made by Courtot et al. in [Courtot M, 2014] and clearly described as such when constructing the Expanded SMQ (and supported by Table S1 in appendix). It was never implied that this had been done in [Botsis T, 2013]. We believe the referred-to excerpt is “Rather than creating a bag of words de novo based on keyword extraction from a training set of reports, we rather chose to expand on a known, already widely implemented, screening method, i.e., the SMQs.” We did not intend to refer to Botsis et al.’s work - we use the word “we”, and expected readers would understand that this referred to the authors of the manuscript [Courtot M, 2014]. We apologize for any confusion.

      (4)(b) We are aware that Botsis et al. used the online version of the tool, and that the tool does not require the existence of a tentative diagnosis when data is submitted. However, The ABC tool is a diagnosis confirmation tool based on the Brighton guidelines, and the authors of that tool assume that instructions given with the guideline are being followed. Note that the tool itself is labeled “ANALYZE DATABASE Confirm a diagnosis for all cases in your database (Excel spreadsheet)” on the Brighton Collaboration website. We hypothesized that the diagnosis was automatically pre-selected for the batch entry done by Botsis et al. As a result, we emphasize a discrepancy between the information reported as ‘Insufficient evidence’ by Botsis et al, and what is expected based on the Brighton Case definition: ‘Reported anaphylaxis with insufficient evidence’ [Rüggeberg JU, 2007]. In [Botsis T, 2013], 488 cases are classified as ‘Insufficient evidence’. With the guideline in mind, we reviewed the records and showed that only 3 reports should have been classified as “Reported anaphylaxis with insufficient evidence”. Indeed, only 12 reports in the VAERS dataset were reported as anaphylaxis (and corresponding synonyms) in the VAERS report itself, of which 3 do not meet the Brighton case definition. Our results have been communicated to the Brighton Collaboration, with the aim of emphasizing the process for those users of the ABC tool who may not have a full understanding of the logic of the Brighton guideline.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.