In stage 1, we find a significant effect in the same direction as the original study for 12 replications16,17,18,19,22,23,24,25,27,29,30,36 (57.1%) (Fig. 1a and Supplementary Table 3). When we increase the statistical power further in stage 2 (Fig. 1b and Supplementary Table 4), two additional studies20,31 replicate based on this criterion. By mistake, a second data collection was carried out for one study16 replicating in stage 1; thus, we also include this study in the stage 2 results to base our results on all the data collected. This study16 does not replicate in stage 2. This may suggest that replication studies should routinely be powered to detect at least 50% of the original effect size or that one should use a lower P value threshold than 0.05 for not continuing to stage 2 in our two-stage testing procedure. Based on all of the data collected, 13 (61.9%) studies replicated after stage 2 using the statistical significance criterion.
Main result of the study