Tuesday, February 18, 2014

Replication In Psychological Research

Eminent psychologist Daniel Kahneman warned of a “looming train wreck” while others have declared that psychology is in the throes of a replicability crisis (Pashler & Harris, 2012). The questions surrounding replication in psychology are both philosophical and statistical in nature, and are prompting concerted effort to change the incentives in academia and standards of publications. Indeed, the Society for Personality and Social Psychology recently released new standards for research and publications and journals like Perspectives in Psychological Science have published articles about the role of replications. Given the increased interest in replication, we provided a review of some of the key points of discussion below.

What’s all the fuss about?
While replication has always been an important aspect psychological research (as with any science), the recent fervor was energized by the uncovering of several major cases of fraud in social psychology. In particular, the case of Diederik Stapel, who committed years of scientific fraud, raised questions about our tendency to dismiss the importance of direct replications and discard failures to replicate as uninformative.

These cases of fraud compounded other attempts to call attention to the irreproducibility of research, including Ioannidis‘ (2005) exposition on research in biology. While psychology may not be any worse off than other “harder” sciences, problems of reproducibility have implications for attitudes about the utility and credibility of scientific research more generally.

Why is it difficult to replicate research?

Data “cleaning” and Unethical Practices
In some cases, effects may be difficult to replicate because researchers manipulated data in ways that were not fully described in the literature. For example, researchers may find an effect when outliers are deleted and no effect when everyone is included in analyses. If the treatment of outliers is not mentioned in the original research article, there is no way for other researchers to identify this as an important factor in finding the effect even if the choice to delete outliers is valid. In the worst cases, the effects might not replicate because of fraudulent practices on the part of the researcher, as in Stapel case.

The Nature of our Statistics
Most research in psychology uses the Null Hypothesis Significance Testing (NHST) method of inferential statistics. One serious downside of NHST is the tendency to think in terms of a dichotomy in which effects exist (p < .05) or effects don’t exist (p > .05), ignoring the inherent uncertainty in psychological effects. It is tempting to use p-values as indicators of the reliability of an effect, but the problems with this kind of thinking are nicely illustrated in the Dance of the p-values video from psychologist Geoff Cumming. Understanding the variability in p-values suggests that unreliability in “significant” statistical tests is not surprising, and psychologists benefit from thinking in terms of estimation (e.g., confidence intervals) when trying to evaluate replication attempts.

Sample Size
Another pervasive problem in psychological research is our generally small sample sizes. Many of our studies continue to be very underpowered given the typically small-to-moderate effect sizes we find in our research (Nelson et al., 2013). As such, we are less likely to replicate findings in the literature. However, another problem with small sample sizes is the implication for our estimates of effect sizes. Estimates of effect size are going to be more unreliable with small samples, which can lead to overestimation of the true size of an effect in the population. As such, it might be much more difficult to replicate a finding than a reported effect size (if there is one) would suggest.

What is a good replication?
The best replications begin with transparency and collaboration between both the replicator and original authors. With this in mind, Brandt et al. (2014) also outlined important considerations that make for good replications. These authors argue that a good replication has five main ingredients, including the following:

1) Carefully defining the effects and methods that the researcher intends to replication 
2) Following exactly the methods of the original study 
3) Having high statistical power 
4) Making complete details about the replication available  
5) Evaluating replication results and comparing them critically


The article goes into greater detail in relation to each of these “ingredients,” but the question of how to evaluate replications deserves special attention. Brandt et al. (2014) recommend evaluating the replication of effects in two ways: 1) reporting the size, direction and confidence interval of the target effect (tells us whether the effect is different from the null) and 2) testing whether the effect is different from the original effect. Another approach to evaluate the success of replications is to apply a meta-analytic aggregation if the replication and original study effects. There are many other approaches to evaluating replications (Simohnson, 2013), but it is clear that evaluating the significance of results is insufficient.

What can I do if I am interested in conducting replications?
If you are interested in conducting replications, the Open Science Framework’s Reproducibility Project  is seeking partners to conduct replications of studies found in 2008 issues of Journal of Personality and Social Psychology, Psychological Science, and Journal of Experimental Psychology: Learning, Memory, and Cognition. In doing so, the OSF aims to learn more about the overall reproducibility in the psychology literature. The OSF also provides assistance with carrying out the replications and provides workflow resources that can make it easier for others to replication your own research. 

References
Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., ... & Van't Veer, A. (2014). The replication recipe: What makes for a convincing replication?. Journal of Experimental Social Psychology, 50, 217-224

Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124.

Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7(6), 531-536.

Simonsohn, U. (2013). Evaluating replication results. Available at SSRN: http://ssrn.com/
abstract=2259879