On 2016 Nov 10, Heather Stuart commented:
We would like to thank Professor Jorm for his careful consideration of our results and his comment. As requested, we have provided the following additional data analysis.
1. Report means, standard deviations and Cohen’s d with 95% CI for the primary outcome. This will allow comparison with the results of the meta-analyses by Corrigan et al. Corrigan PW, 2012 and Griffiths et al. Griffiths KM, 2014.
Professor Jorm’s questions raise the important issues of what constitutes a meaningful outcome when conducting anti-stigma research and how much of an effect is noteworthy (statistical significance aside). We discussed these issues at length when designing the evaluation protocol and based on the book Analysis of Pretest-Posttest Designs (Bonate, 2000) we took the approach that scale scores are not helpful for guiding program improvements. Aggregated scale scores do not identify which specific areas require improvement, whereas individual survey items do. We also considered what would be a meaningful difference to program partners (who participated actively in this discussion) and settled on the 80% (A grade) threshold as a meaningful heuristic describing the outcome of an educational intervention. Thus, we deliberately did not use the entire scale score to calculate a difference of means. Our primary outcome was the adjusted odds ratio. When we convert the odds ratio to an effect size (Chinn, 2000)we get an effect size of 0.52, reflecting a moderate effect. The mean pretest Social Acceptance score was 24.56 (SD 6.71, CI 24.34-24.75) and for the post-test it was 23.62 (SD 6.93, CI 23.40-23.83). Using these values and the correlation between the 2 scores (0.73) the resulting Cohen’s d is 0.186, reflecting a small and statistically significant effect size. It is important to point out that the mean differences reported here do not take into consideration the heterogeneity across programs, so most likely underestimate the effect. This might explain why the effect size when using the OR (which was corrected for heterogeneity) was higher than the unadjusted mean standardized effect. Whether using a mean standardized effect size or the adjusted odds ratio, results suggest that the contact based education is a promising practice for reducing stigma in high school students.<br>
2. Data on the percentage of ‘positive outliers’ to compare with the ‘negative outliers’.
Because we had some regression to the mean in our data, we used the negative outliners to rule out the hypothesis that the negative changes noted could be entirely explained by this statistical artefact. We defined negative outliners as the 25th percentile minus 1.5 times the interquartile range. Outliners were 3.8% for the Stereotype Scale difference score and 2.8% for the Social Acceptance difference score suggesting that some students actually got worse. We noted that males were more likely to be among the outliers.
Our subsequent analysis of student characteristics showed that males who did not self-disclose a mental illness were less likely to achieve a passing score. This supported the idea that a small group of students may be reacting negatively to the intervention and becoming more stigmatized. While the OR alone (or the mean standardized difference) could, as Professor Jorm indicates, mask some deterioration in a subset of students, our full analysis was designed to uncover this exact phenomenon.<br>
Professor Jorm has asked that we show the positive outliers. If we define a positive outliner as the 75th percentile plus 1.5 times the interquartile range, then 1.9% were outliners on the Stereotype Scale difference score and 2.3% are outliers on the Social Acceptance distance score, suggesting that the intervention also resonated particularly well with a small group of students.<br>
Thus, while contact based interventions appear to be generally effective (i.e. when using omnibus measures such as a standardized effect size or the adjusted odds ratio), our findings support the idea that effects are not uniform across all sub-sets of students (or, indeed programs). Consequently, more nuanced approaches to anti-stigma interventions are needed, such as those that are sensitive to gender and personal disclosure along with fidelity criteria to maximize program effects.
- Data on changes in ‘fail grades’, i.e. whether there was any increase in those with less than 50% non-stigmatizing responses<br>
In response to Professor Jorm’s request for a reanalysis of students who failed, we defined a fail grade as giving a stigmatising response to at least 6 of the 11 statements, (54% of the questions). At pretest, 32.8% of students ‘failed’ on the Stereotype scale, dropping to 23.7% at post-test (reflecting a decrease of 9.1%). For the Social acceptance scale, at pretest 28.5% ‘failed’, dropping to 24.8% at post-test, reflecting (a decrease of 3.7%). Using McNemar’s test, both the Stereotype scale (X2 (1) = 148.7, p <.001) and the Social Acceptance scale (X2 (1) = 28.4, p <.001) were statistically significant lending further support to our conclusion that the interventions were generally effective.
Bonate, P. L. (2000). Analysis of Pretest- Posttest Designs. CRC Press.
Chinn, S. (2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine, 3127-3131.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.