Monday, October 7, 2013

How Can Researchers Claim Support for the Null Hypothesis?

Traditional null hypothesis significance testing (NHST) does not allow researchers to claim support for the null hypothesis given a finding of non-significance (i.e., p > .05).  This is due to the fact that NHST methods only signify the probability of a set of data given the null model (D | H0), and it does necessarily follow that the null model is probable given the data (H0 | D). Rather, researchers using NHST methods are only able to state that the null “was not able to be rejected.” This, of course, belies the spirit of scientific pursuit and goes against the desires of most researchers, who wish to make substantive claims based on their research findings. If this is the case, how might researchers proceed?

Unlike NHST methods, Bayesian methods allow researchers to approximate the probability of a hypothesis given the data (H0 | D) using a model comparison approach. Specifically, Bayes Factors are used to compute the ratio of two models (e.g., the null and the alternative hypothesis) to determine which is more supported by the statistical evidence. This allows researchers to make claims in support of either the null or alternative model. Consequently, Bayesian methods offer a useful alternative to NHST that is more in line with the goals of the scientific enterprise, which seek to support the validity of competing hypotheses. However, Bayesian methods are often computationally complex and many researchers in the field of psychology may not have the skills necessary appropriately employ them. Yet, easily computable alternatives exist.

In addition to reviewing the drawbacks of NHST in greater detail, Masson (2011) offers a reasonable method whereby researchers can appropriate the benefits of Bayesian methods in a way that requires little computational complexity. Instead of computing Bayes Factors in a traditional manner, he builds on a method specified by Wagenmakers (2007) that computes the “Bayes Information Criterion” or BIC. The BIC uses results computed from traditional NHST tests (such as the sums-of-squares values generated by ANOVA’s) to easily produce a model comparison ratio that approximates the Bayes Factor. Researchers are urged to read Masson (2011) in order to understand the full methodological processes. Additionally, those wishing for a useful categorization scheme to describe the magnitude of the resulting ratio should refer to Raftery (1995). Finally, researchers interested in reading more about the drawbacks of NHST and Bayesian methods in general will find articles by Wagenmakers (2007), Gallistel (2009), and Kruscke (2010, 2013) helpful.

In conclusion, researchers often seek to demonstrate that some experimental manipulations have no demonstrable effect on an outcome variable. Moreover, even when a researcher has no a prior motivation to demonstrate a null effect, it is scientifically responsible to test the extent to which a null model is more or less supported relative to a specified alternative model, rather than simply making assumptions about its validity. This benefits the quality of research that comes out of the psychological research community and provides important information for other researchers that may preserve valuable resources, including time and money. In addition, although most psychological journals still advocate the reporting of NHST results, it is easy enough for researchers to compute BICs and report them alongside more traditional tests.


Gallistel, C. R. (2009). The importance of proving the null. Psychological Review, 116,

Kruschke, J. K. (2010). Bayesian data analysis. Wiley Interdisciplinary Reviews:
Cognitive Science, 1, 658-676.

Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of
Experimental Psychology: General, 142, 573-603.

Masson, E. J. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis
significance testing. Behavior Research Methods, 43, 679-690.

Raftery, A. E. (1995). Bayesian model selection in social research. In P. V. Marsden
(Ed.), Sociological methodology 1995 (pp. 111-196). Cambridge: Blackwell.

Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p
values. Psychonomic Bulletin & Review, 14, 779-804.


  1. What is the most common cause of non significance? Insufficient data?

  2. Non-significance will arise from one of two major issues: 1) problem with the study (e.g., poor measurement, low sample size), or 2) problem with the theory (i.e., effect of interest does not exist). However, in practice, "insufficient" data may be the most common cause of non-significance . . . but only because any effect, no matter how measly, will appear significant with a large enough sample size! In other words, with a large enough sample size, almost no hypothesis is still reasonably falsifiable.

    This highlights an interesting tension in research. Certainly, researchers want large enough sample sizes to be able to detect an effect of interest. But, results from a very large sample should be interpreted with caution, as the significant findings might be more indicative of the sample size than the effect of interest.
