For these examples, we are going to work with a fictional data that that has two fully nested (i.e., between-subjects) and two fully crossed (i.e., within-subject) factors. We have two hypothetical groups of older adults (OA, >65 y) and younger adults (YA, <35 y). Within each of those groups, half of the participants were in the control group (C) and half were in the treatment group (T). Each participant was also measured on four different trials (1 to 4) in three different conditions (A, B, and C).
First, we will open the relevant R packages that we will use and then we will download the data from the web. (If you don’t have these packages installed, you will need install the packages before running the library() function.) Once the data are downloaded we can use the head() function to view the first ten rows of the data.
library(tidyverse); library(RCurl); library(ez); library(lme4); library(car); library(lmerTest)
DATA <- read.csv("https://raw.githubusercontent.com/keithlohse/mixed_effects_models/master/data_AGING_example.csv",
stringsAsFactors = TRUE)
head(DATA, 10)
For the examples we will walk through, we will ignore different variables at different times. For instance, we will want to average across trials to create a data set with one observation per condition (which we will call data_COND), and we will want to average across conditions to create a data set with one observation for each trial (which we will call data_TIME).
Averaging across trials, here are the first ten rows of data_COND.
data_COND <- DATA %>% group_by(subID, condition, age_group, group) %>% # Note we ignore trial to average across it
summarize(speed = mean(speed, na.rm=TRUE)) %>% # I have included na.rm=TRUE even though there are no missing data
arrange(age_group, subID, condition) # finally, we can resort our data by subIDs within groups
## `summarise()` has grouped output by 'subID', 'condition', 'age_group'. You can
## override using the `.groups` argument.
head(data_COND, 10)
Averaging across conditions, here are the first ten rows of data_TIME.
data_TIME <- DATA %>% group_by(subID, time, age_group, group) %>% # Note we ignore condition to average across it
summarize(speed = mean(speed, na.rm=TRUE)) %>% # I have included na.rm=TRUE even though there are no missing data
arrange(age_group, subID, time) # finally, we can resort our data by subIDs within groups
## `summarise()` has grouped output by 'subID', 'time', 'age_group'. You can override
## using the `.groups` argument.
head(data_TIME, 10)
With these data in place, we can now test all of our different models. We are going to explore the appropriate mixed-effects regression (MER) models for these different situations:
For this example, we will focus on only the effect of condition, so we will use the data_COND dataset to average across different trials. First, let’s plot the data to get a better sense of what the data look like.
ggplot(data_COND, aes(x = condition, y = speed)) +
geom_point(aes(fill=condition), pch=21, size=2,
position=position_jitter(w=0.2, h=0))+
geom_boxplot(aes(fill=condition), col="black",
alpha=0.4, width=0.5, outlier.shape = NA)+
scale_x_discrete(name = "Condition") +
scale_y_continuous(name = "Speed (m/s)") +
theme(axis.text=element_text(size=16, color="black"),
axis.title=element_text(size=16, face="bold"),
plot.title=element_text(size=16, face="bold", hjust=0.5),
panel.grid.minor = element_blank(),
strip.text = element_text(size=16, face="bold"),
legend.position = "none")
To implement a simple one-way repeated measures ANOVA, we have a few options. We could directly code our ANOVA using the aov() function in R:
summary(aov(speed ~ condition + Error(subID/condition), data=data_COND))
##
## Error: subID
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 39 2.751 0.07054
##
## Error: subID:condition
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 2 0.2820 0.14098 24.46 5.67e-09 ***
## Residuals 78 0.4496 0.00576
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Or we can use the ezANOVA() function from the “ez” package.
ezANOVA(data = data_COND,
dv = .(speed),
wid = .(subID),
within = .(condition)
)
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 condition 2 78 24.46039 5.674295e-09 * 0.08095761
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 2 condition 0.8955986 0.1230706
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05
## 2 condition 0.9054678 2.494269e-08 * 0.947018 1.300658e-08 *
Although these two different functions present the results in slightly different ways, note that the f-values for the two different omnibus tests match. By either method, the f-observed for the main-effect of condition is F(2,78)=24.46, p<0.001.
If you are familiar with issues of contrast versus treatment coding, Type I versus Type III Sums of Squared Errors, and collinearity, then directly controlling your own ANOVA using base R is probably a safe bet.
If you are not familiar with those terms/issues, then the ezANOVA code is probably the best option for you. However, I would strongly encourage you to develop a more detailed understanding of general-linear models before jumping into mixed-effect models.
By default the aov() function provides a test of your statistical model using Type I Sums of Squared Errors, whereas the ezANOVA() function provides a test of your statistical model using Type III Sums of Squared Errors. In a completely orthogonal design where all factors are statistically independent of each other (i.e., there is no collinearity), the Type I and Type III Sums of Squared Errors will agree. By default, R also uses treatment coding (or “dummy” coding) for categorical factors rather than orthogonal contrast codes. The omnibus F-tests that we see in the ANOVA output will be the same regardless of the types of codes used, but if we dig into individual regression coefficients, it is important to remember how these variables were coded so that we can interpret them correctly.
Because we have a single within-subject factor, we will need to add a random-effect of subject to account for individual differences between subjects. By partitioning the between-subjects variance out of our model, we can fairly test the effect of condition, because our residuals will now be independent of each other.
# First we will define our model
mod1 <- lmer(speed ~
# Fixed Effects:
condition +
# Random Effects:
(1|subID),
# Define the data:
data=data_COND, REML = TRUE)
# We can then get the ANOVA results for our model:
anova(mod1)
Most critically, note that the F-value, F(2,78)=24.46, p<0.001 is identical in both the RM ANOVA and in the mixed-effects regression model.
If we want to delve deeper into our model, we can also use the summary() function to get more information about model fit statistics, parameter estimates, random-effects and residuals.
summary(mod1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: speed ~ condition + (1 | subID)
## Data: data_COND
##
## REML criterion at convergence: -162.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.5766 -0.3736 -0.0165 0.3682 3.8369
##
## Random effects:
## Groups Name Variance Std.Dev.
## subID (Intercept) 0.021594 0.14695
## Residual 0.005764 0.07592
## Number of obs: 120, groups: subID, 40
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.84250 0.02615 52.09109 32.215 <2e-16 ***
## conditionB 0.11831 0.01698 78.00000 6.970 9e-10 ***
## conditionC 0.05050 0.01698 78.00000 2.975 0.0039 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndtnB
## conditionB -0.325
## conditionC -0.325 0.500
For this example, we will be analyzing both Time and Condition, so we will use our full data frame, DATA. First, let’s plot the data to get a better sense of what our data look like.
DATA$time <- factor(DATA$time)
ggplot(DATA, aes(x = time, y = speed)) +
geom_point(aes(fill=condition), pch=21, size=2,
position=position_jitterdodge(dodge.width = 0.5, jitter.width = 0.1))+
geom_boxplot(aes(fill=condition), col="black",
alpha=0.4, width=0.5, outlier.shape = NA)+
labs(fill="Condition")+
scale_x_discrete(name = "Trial") +
scale_y_continuous(name = "Speed (m/s)", limits = c(0,3)) +
theme(axis.text=element_text(size=16, color="black"),
axis.title=element_text(size=16, face="bold"),
legend.text=element_text(size=16, color="black"),
legend.title=element_text(size=16, face="bold"),
plot.title=element_text(size=16, face="bold", hjust=0.5),
panel.grid.minor = element_blank(),
strip.text = element_text(size=16, face="bold"),
legend.position = "bottom")
Before we run our models, we want to convert trial to a factor so that our model is treating time categorically rather than continuously. (In later modules, we will discuss how to mix continuous and categorical factors.) Additionally, remember that we are now using our larger data set DATA rather than our aggregated data set data_COND. To implement a two-way repeated measures ANOVA, we have the same options as before. We can directly code our ANOVA using the aov() function in R:
DATA$time <- factor(DATA$time)
summary(aov(speed ~ condition*time + Error(subID/(condition*time)), data=DATA))
##
## Error: subID
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 39 11.01 0.2822
##
## Error: subID:condition
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 2 1.128 0.5639 24.46 5.67e-09 ***
## Residuals 78 1.798 0.0231
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: subID:time
## Df Sum Sq Mean Sq F value Pr(>F)
## time 3 10.618 3.539 106 <2e-16 ***
## Residuals 117 3.909 0.033
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: subID:condition:time
## Df Sum Sq Mean Sq F value Pr(>F)
## condition:time 6 0.7851 0.13084 13.53 3.59e-13 ***
## Residuals 234 2.2626 0.00967
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Or we can use the ezANOVA() function from the “ez” package.
ezANOVA(data = DATA,
dv = .(speed),
wid = .(subID),
within = .(time, condition)
)
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 time 3 117 105.95095 3.295544e-33 * 0.35881815
## 3 condition 2 78 24.46039 5.674295e-09 * 0.05610418
## 4 time:condition 6 234 13.53149 3.588475e-13 * 0.03973035
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 2 time 0.3262678 5.356910e-08 *
## 3 condition 0.8955986 1.230706e-01
## 4 time:condition 0.2380896 9.170350e-05 *
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 2 time 0.6917461 9.817867e-24 * 0.7313027 5.956088e-25
## 3 condition 0.9054678 2.494269e-08 * 0.9470180 1.300658e-08
## 4 time:condition 0.6852099 1.068940e-09 * 0.7760402 1.056951e-10
## p[HF]<.05
## 2 *
## 3 *
## 4 *
Again, regardless of the coding approach that you use, these two different functions produce the same f-observed for the main-effects of time, F(3,117)=105.95, p<0.001, and Condition, F(2,78)=24.46, p<0.001, and the Time x Condition interaction, F(6,234)=13.53, p<0.001.
Because we now have two crossed factors, we need to account not only for the fact that we have multiple observations for each participant, but we have multiple observations for each person at each level of each factor.
For instance, for the effect of Condition, we actually have Condition effects at Trial 1, Trial 2, Trial 3, and Trial 4. The converse is also true for the effect of Time, we have four different trials in Condition A, Condition B, and Condition C. Thus, in order to appropriately account for the statistical dependencies in our data, we need to add random-effects of “time:subID” and “condition:subID” to the model.
The colon operator (“:”) means that we are crossing or multiplying these factors. That is, if we have subject ID’s A, B, and C and Trials 1, 2, 3, and 4, then we end up with A1, A2, A3, A4, B1, B2, B3, B4, etc.
For more information on how random-effects are specified and what they mean in R, I recommend looking at this discussion on Stack Exchange: https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified
# First we will define our model
mod2 <- lmer(speed ~
# Fixed Effects:
time*condition +
# Random Effects
(1|subID)+ (1|time:subID) + (1|condition:subID),
# Define your data,
data=DATA, REML=TRUE)
# We can then get the ANOVA results for our model:
anova(mod2)
Again, for our purposes, the critical thing to note is that the F-values for the main-effects and interactions are the same between our RM ANOVAs and the mixed-effects model. Without delving into the mathematical details, this is a good demonstration that the appropriate random-effects our regression model make it analogous to the factorial ANOVA. This allows us to capitalize on the benefits of mixed-effects regression for designs that we would normally analyze using factorial ANOVA.
If we want to delve deeper into our model, we can also use the summary() function to get more information about model fit statistics, parameter estimates, random-effects and residuals.
summary(mod2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: speed ~ time * condition + (1 | subID) + (1 | time:subID) + (1 |
## condition:subID)
## Data: DATA
##
## REML criterion at convergence: -454.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9509 -0.4451 -0.0673 0.3844 3.2827
##
## Random effects:
## Groups Name Variance Std.Dev.
## time:subID (Intercept) 0.007912 0.08895
## condition:subID (Intercept) 0.003346 0.05785
## subID (Intercept) 0.019616 0.14006
## Residual 0.009669 0.09833
## Number of obs: 480, groups: time:subID, 160; condition:subID, 120; subID, 40
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.00975 0.03184 109.12304 31.716 < 2e-16 ***
## time2 -0.15300 0.02965 249.81033 -5.160 5.03e-07 ***
## time3 -0.25350 0.02965 249.81033 -8.550 1.25e-15 ***
## time4 -0.26250 0.02965 249.81033 -8.853 < 2e-16 ***
## conditionB 0.24625 0.02551 260.37317 9.653 < 2e-16 ***
## conditionC 0.10125 0.02551 260.37317 3.969 9.34e-05 ***
## time2:conditionB -0.07025 0.03110 234.00008 -2.259 0.02479 *
## time3:conditionB -0.19800 0.03110 234.00008 -6.367 1.01e-09 ***
## time4:conditionB -0.24350 0.03110 234.00008 -7.831 1.68e-13 ***
## time2:conditionC -0.03825 0.03110 234.00008 -1.230 0.21990
## time3:conditionC -0.06425 0.03110 234.00008 -2.066 0.03991 *
## time4:conditionC -0.10050 0.03110 234.00008 -3.232 0.00141 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) time2 time3 time4 cndtnB cndtnC tm2:cB tm3:cB tm4:cB
## time2 -0.466
## time3 -0.466 0.500
## time4 -0.466 0.500 0.500
## conditionB -0.401 0.320 0.320 0.320
## conditionC -0.401 0.320 0.320 0.320 0.500
## tim2:cndtnB 0.244 -0.524 -0.262 -0.262 -0.609 -0.305
## tim3:cndtnB 0.244 -0.262 -0.524 -0.262 -0.609 -0.305 0.500
## tim4:cndtnB 0.244 -0.262 -0.262 -0.524 -0.609 -0.305 0.500 0.500
## tim2:cndtnC 0.244 -0.524 -0.262 -0.262 -0.305 -0.609 0.500 0.250 0.250
## tim3:cndtnC 0.244 -0.262 -0.524 -0.262 -0.305 -0.609 0.250 0.500 0.250
## tim4:cndtnC 0.244 -0.262 -0.262 -0.524 -0.305 -0.609 0.250 0.250 0.500
## tm2:cC tm3:cC
## time2
## time3
## time4
## conditionB
## conditionC
## tim2:cndtnB
## tim3:cndtnB
## tim4:cndtnB
## tim2:cndtnC
## tim3:cndtnC 0.500
## tim4:cndtnC 0.500 0.500
For this example, we will now consider both the between-subject factor of age, as well as the within-subject factor of condition. Do to this, we will average over the different trials (1, 2, 3, and 4) to get one observation in each condition. This design is sometimes referred as a “mixed-factorial” design, because we have a mix of between-subjects and within-subject factors. Depending on your background, you might be more familiar with this as a “split plot” design. Split plot comes from agronomy (where a lot of statistics where developed!) and refers to the fact that some plots are assigned to different levels of Factor A (e.g., Treatment versus Control), but within each plot there is a second level of randomization to different levels of Factor B (e.g., Conditions A, B, or C). Both terms describe the same thing but differ in their unit of analysis. In psychology, a single person is usually the experimental unit (hence “within-subject” variables) whereas in agronomy or biology a physical area might be the unit of analysis (hence “split-plot” variables). The more general way to talk about these terms is as nested- versus crossed-factors.
In our example, participants are nested within age-group, because each person is represented at only one level of that factor (i.e., you are only a younger adult or older adult). However, Condition is a crossed factor, because each person is represented at all levels of Condition (i.e., each person was measured in all three conditions). We can see this more clearly if we plot all of our data.
ggplot(data_COND, aes(x = condition, y = speed)) +
geom_point(aes(fill=age_group), pch=21, size=2,
position=position_jitterdodge(dodge.width = 0.5, jitter.width = 0.1))+
geom_boxplot(aes(fill=age_group), col="black",
alpha=0.4, width=0.5, outlier.shape=NA)+
labs(fill="Age Group")+
scale_x_discrete(name = "Condition") +
scale_y_continuous(name = "Speed (m/s)") +
theme(axis.text=element_text(size=16, color="black"),
axis.title=element_text(size=16, face="bold"),
legend.text=element_text(size=16, color="black"),
legend.title=element_text(size=16, face="bold"),
plot.title=element_text(size=16, face="bold", hjust=0.5),
panel.grid.minor = element_blank(),
strip.text = element_text(size=16, face="bold"),
legend.position = "bottom")
For this mixed factorial ANOVA, we have one factor of Condition that varies within-subjects, but we also have a factor of Age Group that varies between subjects. As before, we can directly code this into our analysis of variance using the aov() function or using the ezANOVA() function from the “ez” package.
summary(aov(speed ~ age_group*condition + Error(subID/condition), data=data_COND))
##
## Error: subID
## Df Sum Sq Mean Sq F value Pr(>F)
## age_group 1 1.524 1.5238 47.18 3.71e-08 ***
## Residuals 38 1.227 0.0323
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: subID:condition
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 2 0.2820 0.14098 27.271 1.18e-09 ***
## age_group:condition 2 0.0567 0.02834 5.482 0.00597 **
## Residuals 76 0.3929 0.00517
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Or we can use the ezANOVA() function from the “ez” package.
ezANOVA(data = data_COND,
dv = .(speed),
wid = .(subID),
within = .(condition),
between = .(age_group)
)
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 age_group 1 38 47.176031 3.706667e-08 * 0.48465601
## 3 condition 2 76 27.271263 1.181245e-09 * 0.14822123
## 4 age_group:condition 2 76 5.481689 5.972183e-03 * 0.03379572
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 3 condition 0.9008222 0.1448182
## 4 age_group:condition 0.9008222 0.1448182
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 3 condition 0.9097709 5.566488e-09 * 0.9530301 2.646471e-09
## 4 age_group:condition 0.9097709 7.698688e-03 * 0.9530301 6.815470e-03
## p[HF]<.05
## 3 *
## 4 *
The ezANOVA() function provides us with more detail, for instance automatically providing Mauchly’s Test for Sphericity and both the Greenhouse-Geisser and Hyunh-Feldt corrections in the event that the sphericity assumption is violated. Most importantly, however, the outputs of these two functions agree when we look at the f-statistics for all main-effects and interactions when sphericity is assumed.
For this mixed-factorial design, we need to account for the fact that we have multiple observations coming from each person, so we will add a random-effect of “subID”. After accounting for this statistical dependence in our data, we can now fairly test the effects of Age Group, and Condition with residuals that are independent of each other.
# First we will define our model
mod3 <- lmer(speed ~
# Fixed Effects:
age_group*condition +
# Random Effects
(1|subID),
# Define your data,
data=data_COND, REML=TRUE)
# We can then get the ANOVA results for our model:
anova(mod3)
Critically, note that the f-statistics and degrees of freedom all match our ANOVA outputs (when sphericity is assumed), with a main-effect of Age Group, F(1,38)=47.18, Condition, F(2,76)=27.27, and the Age Group x Condition interaction, F(2,76)=5.48.
If we want to delve deeper into our model, we can also use the summary() function to get more information about model fit statistics, parameter estimates, random-effects, and residuals.
summary(mod3)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: speed ~ age_group * condition + (1 | subID)
## Data: data_COND
##
## REML criterion at convergence: -189.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6329 -0.4286 -0.0582 0.3525 3.7641
##
## Random effects:
## Groups Name Variance Std.Dev.
## subID (Intercept) 0.009044 0.0951
## Residual 0.005169 0.0719
## Number of obs: 120, groups: subID, 40
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.70000 0.02666 62.99267 26.258 < 2e-16 ***
## age_groupYA 0.28500 0.03770 62.99267 7.560 2.11e-10 ***
## conditionB 0.16950 0.02274 76.00000 7.455 1.21e-10 ***
## conditionC 0.08875 0.02274 76.00000 3.903 0.000204 ***
## age_groupYA:conditionB -0.10238 0.03215 76.00000 -3.184 0.002107 **
## age_groupYA:conditionC -0.07650 0.03215 76.00000 -2.379 0.019862 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) ag_gYA cndtnB cndtnC a_YA:B
## age_groupYA -0.707
## conditionB -0.426 0.302
## conditionC -0.426 0.302 0.500
## ag_grpYA:cB 0.302 -0.426 -0.707 -0.354
## ag_grpYA:cC 0.302 -0.426 -0.354 -0.707 0.500
Next, we will consider the more complicated example of our fully factorial design with a nested factor of Age-Group, and crossed factors of Condition and Time.
DATA$time <- factor(DATA$time)
ggplot(DATA, aes(x = time, y = speed)) +
geom_point(aes(fill=age_group), size=2, shape=21,
position=position_jitterdodge(dodge.width = 0.5, jitter.width = 0.1))+
geom_boxplot(aes(fill=age_group), col="black",
alpha=0.4, width=0.5, outlier.shape = NA)+
labs(fill="Age Group")+
facet_wrap(~condition)+
scale_x_discrete(name = "Trial") +
scale_y_continuous(name = "Speed (m/s)", limits = c(0,3)) +
theme(axis.text=element_text(size=16, color="black"),
axis.title=element_text(size=16, face="bold"),
legend.text=element_text(size=16, color="black"),
legend.title=element_text(size=16, face="bold"),
plot.title=element_text(size=16, face="bold", hjust=0.5),
panel.grid.minor = element_blank(),
strip.text = element_text(size=16, face="bold"),
legend.position = "bottom")
For this mixed factorial ANOVA, we have two factors that vary within subjects, Condition and Time, and we have one factor that varies between subjects, Age Group. As before, we can directly code this into our analysis of variance using the aov() function or using the ezANOVA() function from the “ez” package.
summary(aov(speed ~ age_group*condition*time + Error(subID/(condition*time)), data=DATA))
##
## Error: subID
## Df Sum Sq Mean Sq F value Pr(>F)
## age_group 1 6.095 6.095 47.18 3.71e-08 ***
## Residuals 38 4.910 0.129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: subID:condition
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 2 1.1278 0.5639 27.271 1.18e-09 ***
## age_group:condition 2 0.2267 0.1133 5.482 0.00597 **
## Residuals 76 1.5715 0.0207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: subID:time
## Df Sum Sq Mean Sq F value Pr(>F)
## time 3 10.618 3.539 113.88 <2e-16 ***
## age_group:time 3 0.365 0.122 3.92 0.0105 *
## Residuals 114 3.543 0.031
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: subID:condition:time
## Df Sum Sq Mean Sq F value Pr(>F)
## condition:time 6 0.7851 0.13084 13.718 2.71e-13 ***
## age_group:condition:time 6 0.0880 0.01467 1.539 0.166
## Residuals 228 2.1746 0.00954
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ezANOVA(data = DATA,
dv = .(speed),
wid = .(subID),
within = .(condition, time),
between = .(age_group)
)
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 age_group 1 38 47.176031 3.706667e-08 * 0.33318007
## 3 condition 2 76 27.271263 1.181245e-09 * 0.08462822
## 5 time 3 114 113.882930 3.746328e-34 * 0.46536693
## 4 age_group:condition 2 76 5.481689 5.972183e-03 * 0.01824443
## 6 age_group:time 3 114 3.919720 1.049688e-02 * 0.02908814
## 7 condition:time 6 228 13.718345 2.705162e-13 * 0.06046299
## 8 age_group:condition:time 6 228 1.538560 1.664920e-01 0.00716581
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 3 condition 0.9008222 1.448182e-01
## 4 age_group:condition 0.9008222 1.448182e-01
## 5 time 0.3554491 3.872111e-07 *
## 6 age_group:time 0.3554491 3.872111e-07 *
## 7 condition:time 0.2268549 8.327332e-05 *
## 8 age_group:condition:time 0.2268549 8.327332e-05 *
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe
## 3 condition 0.9097709 5.566488e-09 * 0.9530301
## 4 age_group:condition 0.9097709 7.698688e-03 * 0.9530301
## 5 time 0.7021534 1.020427e-24 * 0.7443488
## 6 age_group:time 0.7021534 2.194748e-02 * 0.7443488
## 7 condition:time 0.6718683 1.241027e-09 * 0.7615676
## 8 age_group:condition:time 0.6718683 1.933581e-01 0.7615676
## p[HF] p[HF]<.05
## 3 2.646471e-09 *
## 4 6.815470e-03 *
## 5 4.686178e-26 *
## 6 1.975578e-02 *
## 7 1.230714e-10 *
## 8 1.856659e-01
For this mixed-factorial design, we need to account for the fact that we have multiple observations coming from each person, but we also need to acocunt for the fact that we multiple observations for each Condition and on each Trial. In order to account for this dependence in our data, we need to include random-effects of subject, subject:condition, and subject:time. Adding these random-effects to our model will make our mixed-effects model statistically equivalent to the mixed-factorial ANOVAs that we ran above.
# First we will define our model
mod4 <- lmer(speed ~
# Fixed Effects:
age_group*condition*time +
# Random Effects
(1|subID)+(1|condition:subID)+(1|time:subID),
# Define your data,
data=DATA, REML=TRUE)
# We can then get the ANOVA results for our model:
anova(mod4)
Note that the f-statistics and the degrees of freedom match what we got from the ANOVA tables above (when sphericity is assumed). There are a lot of effects so I wont list them all, but note the main-effects of Age Group, F(1,38)=47.18, Condition, F(2,76)=27.27, and Time, F(3,114)=113.88.
Readers familiar with repeated measures ANOVA wont be surprised to see that that the output of the aov() function partitions the effects into different sections. The between-subject effect of Age Group is evaluated using residual error at the level of “Error:subID”. The effects of Condition and Age Group x Condition are evaluated using residuals of “Error: subID:condition”. This is precisely what we are doing when we include the three different random-intercepts in the mixed-effects regression. The \((1|subID)\) partitions out the average individual differences between subjects, \((1|condition:subID)\) and \((1|time:subID)\) partitions out the variance at the different levels of our repeated measures.
If we want to delve deeper into our model, we can also use the summary() function to get more information about model fit statistics, parameter estimates, random-effects and residuals.
summary(mod4)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## speed ~ age_group * condition * time + (1 | subID) + (1 | condition:subID) +
## (1 | time:subID)
## Data: DATA
##
## REML criterion at convergence: -463.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8077 -0.4203 -0.0405 0.3229 3.2902
##
## Random effects:
## Groups Name Variance Std.Dev.
## time:subID (Intercept) 0.007181 0.08474
## condition:subID (Intercept) 0.002785 0.05277
## subID (Intercept) 0.007249 0.08514
## Residual 0.009538 0.09766
## Number of obs: 480, groups: time:subID, 160; condition:subID, 120; subID, 40
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.79400 0.03657 186.55257 21.710 < 2e-16
## age_groupYA 0.43150 0.05172 186.55257 8.343 1.58e-14
## conditionB 0.34850 0.03510 263.60569 9.928 < 2e-16
## conditionC 0.17750 0.03510 263.60569 5.056 8.01e-07
## time2 -0.06700 0.04089 249.82467 -1.639 0.102553
## time3 -0.13950 0.04089 249.82467 -3.412 0.000753
## time4 -0.16950 0.04089 249.82467 -4.145 4.65e-05
## age_groupYA:conditionB -0.20450 0.04964 263.60569 -4.119 5.09e-05
## age_groupYA:conditionC -0.15250 0.04964 263.60569 -3.072 0.002350
## age_groupYA:time2 -0.17200 0.05782 249.82467 -2.975 0.003222
## age_groupYA:time3 -0.22800 0.05782 249.82467 -3.943 0.000105
## age_groupYA:time4 -0.18600 0.05782 249.82467 -3.217 0.001468
## conditionB:time2 -0.15450 0.04368 228.00014 -3.537 0.000489
## conditionC:time2 -0.08950 0.04368 228.00014 -2.049 0.041587
## conditionB:time3 -0.26350 0.04368 228.00014 -6.033 6.44e-09
## conditionC:time3 -0.12300 0.04368 228.00014 -2.816 0.005285
## conditionB:time4 -0.29800 0.04368 228.00014 -6.823 7.94e-11
## conditionC:time4 -0.14250 0.04368 228.00014 -3.263 0.001273
## age_groupYA:conditionB:time2 0.16850 0.06177 228.00014 2.728 0.006868
## age_groupYA:conditionC:time2 0.10250 0.06177 228.00014 1.659 0.098394
## age_groupYA:conditionB:time3 0.13100 0.06177 228.00014 2.121 0.035011
## age_groupYA:conditionC:time3 0.11750 0.06177 228.00014 1.902 0.058388
## age_groupYA:conditionB:time4 0.10900 0.06177 228.00014 1.765 0.078951
## age_groupYA:conditionC:time4 0.08400 0.06177 228.00014 1.360 0.175185
##
## (Intercept) ***
## age_groupYA ***
## conditionB ***
## conditionC ***
## time2
## time3 ***
## time4 ***
## age_groupYA:conditionB ***
## age_groupYA:conditionC **
## age_groupYA:time2 **
## age_groupYA:time3 ***
## age_groupYA:time4 **
## conditionB:time2 ***
## conditionC:time2 *
## conditionB:time3 ***
## conditionC:time3 **
## conditionB:time4 ***
## conditionC:time4 **
## age_groupYA:conditionB:time2 **
## age_groupYA:conditionC:time2 .
## age_groupYA:conditionB:time3 *
## age_groupYA:conditionC:time3 .
## age_groupYA:conditionB:time4 .
## age_groupYA:conditionC:time4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 24 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it