On 2016 Mar 09, Daniël Lakens commented:
Invalid statistical conclusions in Gilbert, King, Pettigrew, and Wilson (2016)
Gilbert, King, Pettigrew, and Wilson (GKPW; 2016) argue that the Reproducibility Project (Open Science Collaboration, 2015) provides no evidence for a ‘replication crisis’ in psychology. Their statistical conclusions are meaningless due to a crucial flaw in their understanding of confidence intervals. The authors incorrectly assume that ‘based on statistical theory we know that 95% of replication estimates should fall within the 95% CI of the original results’. This is incorrect. When original and replication studies have identical sample sizes, 83.4% of confidence intervals from a single study will capture the sample statistic of a replication study. This is known as the capture percentage (Cumming & Maillardet, 2006).
GKPW use data from Many Labs (another large-scale replication project, Klein et al., 2014) to estimate the expected capture percentage in the Reproducibility Project when allowing for random error due to infidelities in the replication study, and arrive at an estimate of 65.5%. They fail to realize that the capture percentage for studies with different sample sizes (in the Many Labs project ranging from 79 to 1329) can be any number between 0 and 1, and can’t be used to estimate ‘infidelities’ in replications in general. Most importantly, the capture percentage observed for replications in the Many Labs dataset does not generalize in any way to the expected capture percentages between original and replication studies in the Reproducibility Project.
Nevertheless, GKPW conclude that the capture percentage in a subset of Reproducibility Project studies overlaps with the “the 65.5% replication rate that one would expect if every one of the original studies had reported a true effect.” Due to this basic statistical misunderstanding, the main claim by GKPW that ‘the reproducibility of psychological science is quite high’, based on the 65.5% estimate, lacks a statistical foundation, and is not valid.
References
Cumming, G., & Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11(3), 217–227. http://doi.org/10.1037/1082-989X.11.3.217
Gilbert, D., King, G., Pettigrew, S., & Wilson, T. Comment on 'Estimating the reproducibility of psychological science', Science. (4 March 2016), Vol 351, Issue 6277, Pp. 1037a-1037b.
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating Variation in Replicability: A “Many Labs” Replication Project. Social Psychology, 45(3), 142–152. http://doi.org/10.1027/1864-9335/a000178
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. http://doi.org/10.1126/science.aac4716
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.