Braver, Thoemmes, and Rosenthal (28) argue that judging the success of a replication only by whether it shows a significant effect (in the current study, at the 0.05 threshold) would be inappropriate.
They argue that replication success depends a lot on the statistical power and therefore on the sample size used in the replication study. The replication study must have sufficiently many subjects so that it is probable enough that the effect in question, should it really exist in the population, can be found in this sample. If a replication study had low power, for example because the size of the original effect was overestimated and the replication sample size was consequently too small, this makes it less likely that the replication attempt will be successful and show a result that is statistically significant at the 0.05 threshold.
For each individual replication study, the replication success therefore depends on the sample size. If you assess several replication attempts individually, the replication success rate could therefore be distorted to underestimate how reproducible an effect really is.
To circumvent this problem, the authors suggest using a different technique than counting if individual replications were significant at the 0.05 threshold. They analysis is called “continuously cumulating meta-analysis”. The data of several replication attempts are combined, so that conclusions on whether the data of all the replication attempts supports the effect of interest.
After a new replication attempt is conducted, its data is added to the pool of data from previous replication attempts. This data is then taken together, and on the combined data, a test is run to estimate the effect of interest.