Monday, January 27, 2014
Free Online Statistics and Methods Courses
Monday, January 20, 2014
Open-Ended Questions in Survey Research
- Target open-ended question prompts toward specific topics. Responses to general (e.g., “if you have any additional comments, please provide them below”) prompts are more likely to vary in relevance and scope (Evans et al., 2005; Garcia et al., 2004) and may not provide the level of detail or kind of information desired.
- To increase response rate on open-ended questions, include targeted questions throughout the survey rather than only using a general open-ended question at the end of the survey. Open-ended questions asked at the end of a survey may elicit shorter answers than open-ended questions asked earlier in a survey (Galesic & Bosnjak, 2009).
- Assess response bias to open-ended comments. Certain people, including those with education (Garcia et al., 2004), more interest in the survey topic (Geer, 1988; 1991), higher perceptions of survey value (Rogelberg, Fisher, Maynard, Hakel, & Horvath, 2001), and more negative experiences (Evans et al., 2005; Garcia et al., 2004; Poncheri, Lindberg, Thompson, & Surface, 2008) may be more likely to respond to open-ended questions. Consequently, responses to open-ended comments may not be representative of all respondents’ opinions.
- To reduce negativity bias in – and potentially boost response rate to – responses to open-ended questions, provide more detailed, motivating instructions in open-ended item stems (Smyth, Dillman, Christian, & McBride, 2009). Including explanations or instructions in the stem (e.g., emphasizing the importance of open-ended responses to the project) may improve open-ended item response length, elaboration on themes, and item response rate (Smyth et al., 2009). However, it is unclear whether these instructions would substantively affect the negativity of responses in addition to the response rate.
Monday, January 13, 2014
Aligning Your Research
Two common issues arise when determining theory, methods, and analysis separately. First, researchers sometimes collect data that doesn't truly test their hypotheses of interest. Second, and more commonly, the data collected are not adequate for a desired analysis. For example, a researcher might collect data from individuals in 5 locations for a multilevel model, then learn afterwards that she doesn't have data from enough locations to estimate the effect of location characteristics on individual-level outcomes. Fortunately, this issues are fairly readily avoided if researchers abide by the following:
1. When developing any part of your project, keep the others in mind. While developing theory, think about how you might collect data and test it. While developing your method, think about your theory - to ensure you are collecting data that allows you to test your theory - and your analysis technique. Considering your analysis technique while developing your method will help prevent you from collecting insufficient or inadequate data for rigorous testing. Finally, when determining your analysis technique, make sure it will actually let you test your theory and that it fits the method you are developing. The most important thing to keep in mind throughout all these stages is that methods and analysis are not separate from theory; they are part of your theory.
2. Have a plan - or a co-author! At times, you will want to test theory that might require you to conduct an analysis you've never used before, or implement a method you are unfamiliar with. When this happens, you will want to either learn the analysis/method in advance, or line up a co-author who knows the analysis/method. Ignorance, in this case, will lead to flawed data collection - a waste of your time and the participants'. As long as you or a co-author knows the method/analysis in question, you can design your study more optimally.
3. Keep it simple. It's hard to resist sexy techniques. Many researchers hear of a new technique - usually one they haven't used before (see 2 above!) - and decide they need to implement it in their research. Sometimes, these 'hot' techniques are exactly what you need to test your theory. Often, however, a simple t-test, ANOVA, or regression will suffice. Ultimately, the best technique is the simplest possible technique that lets you test your theory rigorously.
Are there other techniques you recommend to ensure that theory, methods, and analysis are aligned?
Monday, January 6, 2014
Why Plot Your Data as Part of Your Analysis?
Suppose you wanted to compare the following dataset from four independent experiments, where 'X' was the continuous independent variable and 'y' was the continuous dependent variable:
|
|
|
|
After taking a quick look at the numbers, the datasets seem relatively similar, so you decide to compare the means and variances for these data and you obtain the following:
dataset | Mean.x | Variance.x | Mean.y | Variance.y | |
---|---|---|---|---|---|
1 | 1 | 9.00 | 11.00 | 7.50 | 4.13 |
2 | 2 | 9.00 | 11.00 | 7.50 | 4.13 |
3 | 3 | 9.00 | 11.00 | 7.50 | 4.12 |
4 | 4 | 9.00 | 11.00 | 7.50 | 4.12 |
Now the data seem very similar. But what about the effect of the independent variable? To find out, you compare simple regressions of y on X for each of the datasets:
Model 1 | |
---|---|
(Intercept) | 3.00* |
(1.12) | |
X | 0.50** |
(0.12) | |
R2 | 0.67 |
Adj. R2 | 0.63 |
Num. obs. | 11 |
***p < 0.001, **p < 0.01, *p < 0.05 |
Dataset 1 | Dataset 2 | Dataset 3 | Dataset 4 | |
---|---|---|---|---|
(Intercept) | 3.00* | 3.00* | 3.00* | 3.00* |
(1.12) | (1.13) | (1.12) | (1.12) | |
X | 0.50** | 0.50** | 0.50** | 0.50** |
(0.12) | (0.12) | (0.12) | (0.12) | |
R2 | 0.67 | 0.67 | 0.67 | 0.67 |
Adj. R2 | 0.63 | 0.63 | 0.63 | 0.63 |
Num. obs. | 11 | 11 | 11 | 11 |
***p < 0.001, **p < 0.01, *p < 0.05 |
The means, variances, regression coefficients, and effect sizes are nearly identical! These data must be nearly identical... except they're not: the relationship between the X and y is very different for each set. While there is a generally linear relationship between the variables in Dataset 1, the relationship is polynomial in Dataset 2. Dataset 2 and Dataset 3 both have a single extreme outlier, though in different forms.
Francis Anscombe created these data in a 1973 paper to demonstrate the importance of data visualization. Though the datasets have nearly identical statistical properties, they in fact represent 4 very different relationships between X and y. Even with only 11 observations in each dataset, it is relatively difficult to see this from just looking at the numbers, and would be almost impossible if one was looking at an SPSS or Excel spreadsheet with hundreds of observations. Furthermore, depending on the experimental hypothesis, these different relationships would each require a different statistical technique in order to properly analyze the data.
For these reasons, it's a good idea to always visualize your data as part of your analysis, not just as a last step in preparation for a poster or publication. It can help you to select the right analysis, and to avoid making poor statistical inferences!
Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17-21.
The Anscombe Quartet in R
require(reshape2)
require(data.table)
# Get the Anscombe Data
dt.anscombe <- data.table(Dataset=rep(c("Dataset 1", "Dataset 2",
"Dataset 3", "Dataset 4"),
each=11),
X=unlist(anscombe[,1:4]),
y=unlist(anscombe[,5:8]),key="Dataset")
# Summary Stats
## Means
dt.anscombe[,lapply(.SD[,list(X,y)],mean),by=Dataset]
## Variances
dt.anscombe[,lapply(.SD[,list(X,y)],var),by=Dataset]
# Plot It
ggplot(data=dt.anscombe, aes(X,y)) +
geom_smooth(method='lm', fullrange=TRUE,
se=FALSE, color="steelblue") +
geom_point(color="firebrick") +
facet_wrap(~Dataset) +
theme_bw()
labs(title="Anscombe's Quartet")
Wednesday, January 1, 2014
New statistical analysis software announcement - "xxM"
The xxM package runs off the existing R or R Studio software.
See more information here: http://xxm.times.uh.edu/ and get started here: http://xxm.times.uh.edu/get-started/
Happy new year everyone!