How Can Researchers Claim Support for the Null Hypothesis?
Traditional null hypothesis significance testing (NHST) does
not allow researchers to claim support for the null hypothesis given a finding
of non-significance (i.e., p >
.05). This is due to the fact that NHST
methods only signify the probability of a set of data given the null model (D | H0),
and it does necessarily follow that the null model is probable given the data (H0 | D). Rather, researchers using NHST methods are only able to state
that the null “was not able to be rejected.” This, of course, belies the spirit
of scientific pursuit and goes against the desires of most researchers, who
wish to make substantive claims based on their research findings. If this is
the case, how might researchers proceed?
Unlike NHST methods, Bayesian methods allow researchers to approximate
the probability of a hypothesis given the data (H0 | D) using
a model comparison approach. Specifically, Bayes Factors are used to compute
the ratio of two models (e.g., the null and the alternative hypothesis) to
determine which is more supported by the statistical evidence. This allows
researchers to make claims in support of either the null or alternative model.
Consequently, Bayesian methods offer a useful alternative to NHST that is more
in line with the goals of the scientific enterprise, which seek to support the
validity of competing hypotheses. However, Bayesian methods are often
computationally complex and many researchers in the field of psychology may not
have the skills necessary appropriately employ them. Yet, easily computable
alternatives exist.
In addition to reviewing the drawbacks of NHST in greater
detail, Masson (2011) offers a reasonable method whereby researchers can
appropriate the benefits of Bayesian methods in a way that requires little
computational complexity. Instead of computing Bayes Factors in a traditional
manner, he builds on a method specified by Wagenmakers (2007) that computes the
“Bayes Information Criterion” or BIC. The BIC uses results computed from
traditional NHST tests (such as the sums-of-squares values generated by
ANOVA’s) to easily produce a model comparison ratio that approximates the Bayes
Factor. Researchers are urged to read Masson (2011) in order to understand the
full methodological processes. Additionally, those wishing for a useful
categorization scheme to describe the magnitude of the resulting ratio should
refer to Raftery (1995). Finally, researchers interested in reading more about
the drawbacks of NHST and Bayesian methods in general will find articles by Wagenmakers
(2007), Gallistel (2009), and Kruscke (2010, 2013) helpful.
In conclusion, researchers often seek to demonstrate that
some experimental manipulations have no demonstrable effect on an outcome
variable. Moreover, even when a researcher has no a prior motivation to
demonstrate a null effect, it is scientifically responsible to test the extent
to which a null model is more or less supported relative to a specified
alternative model, rather than simply making assumptions about its validity. This
benefits the quality of research that comes out of the psychological research community
and provides important information for other researchers that may preserve
valuable resources, including time and money. In addition, although most
psychological journals still advocate the reporting of NHST results, it is easy
enough for researchers to compute BICs and report them alongside more
traditional tests.
References
Gallistel, C. R. (2009). The importance of proving the null.
Psychological Review, 116,
439-453.
Kruschke, J. K. (2010). Bayesian data analysis. Wiley Interdisciplinary Reviews:
Cognitive
Science, 1, 658-676.
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of
Experimental
Psychology: General, 142,
573-603.
Masson, E. J. (2011). A tutorial on a practical Bayesian
alternative to null-hypothesis
significance testing. Behavior Research Methods, 43, 679-690.
Raftery, A. E. (1995). Bayesian model selection in social
research. In P. V. Marsden
(Ed.), Sociological methodology 1995 (pp. 111-196). Cambridge: Blackwell.
Wagenmakers, E.-J. (2007). A practical solution to the
pervasive problems of p
values. Psychonomic Bulletin & Review, 14, 779-804.
What is the most common cause of non significance? Insufficient data?
ReplyDeleteNon-significance will arise from one of two major issues: 1) problem with the study (e.g., poor measurement, low sample size), or 2) problem with the theory (i.e., effect of interest does not exist). However, in practice, "insufficient" data may be the most common cause of non-significance . . . but only because any effect, no matter how measly, will appear significant with a large enough sample size! In other words, with a large enough sample size, almost no hypothesis is still reasonably falsifiable.
ReplyDeleteThis highlights an interesting tension in research. Certainly, researchers want large enough sample sizes to be able to detect an effect of interest. But, results from a very large sample should be interpreted with caution, as the significant findings might be more indicative of the sample size than the effect of interest.