12 Mixed Effects
Learning Goals
- Explain and illustrate a random intercept model and connect to compound symmetry model
- Explain how random intercepts and slopes model correlation
- Fit mixed effects models to real data and interpret the output
Notation
Mixed Effects Models: We can write a linear mixed effects model in terms of our (quantitative) outcome vector,
\[\mathbf{Y}_{i} = \mathbf{X}_{i}\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i\]
\[ = \text{Fixed effects} + \text{Random effects} + \text{Error}\]
where \(\mathbf{X}_i\) is the model matrix for the fixed effects, \(\mathbf{Z}_i\) is the model matrix for the random effects (a subset of columns of \(\mathbf{X}_i\)), \(\boldsymbol\epsilon_i\sim N(0,\boldsymbol\Sigma)\), \(\mathbf{b}_i\sim N(0,\mathbf{G})\), and \(\mathbf{b}_i\) and \(\boldsymbol\epsilon_i\) are independent of each other.
Thus,
\[\mathbf{Y}_{i}\sim \mathcal{N}(\mathbf{X}_{i}\boldsymbol\beta,\mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma)\]
Typically, we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\).
Warm Up
Random Intercept Model: Using the notation above, define \(\mathbf{Z}_i\), \(\mathbf{b}_i\), and \(\mathbf{G}\) for a random intercept model.
If we use a random intercept model and we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\), describe the structure of \(Cov(Y_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\).
Group Activity
Download a template RMarkdown file to start from here and add it to your checkpoint-4 folder Github repository.
- We are going to consider modeling
Memory
as a function ofYears
and treatment groupINTGRP
with a mixed effects model. Let’s start by using a random intercept model. Describe the assumptions made by the code below. Consider the assumptions about the model for the mean (fixed effects) and the assumptions about the random effects.
ANSWER:
activeLong <- activeLong %>% mutate(INTGRP = relevel(INTGRP, ref='Control'))
library(lme4)
library(tidyr)
mod1 <- activeLong %>% drop_na(Memory, Years, INTGRP) %>%
lmer(Memory ~ Years*I(Years > 3)*INTGRP + (1|AID), data = ., REML = FALSE)
mod1 %>% summary()
- Compare the estimated coefficients between the random intercept model and the GEE with an exchangeable working correlation structure. Describe what you observe and explain why it is so.
ANSWER:
mod2 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>%
geeM::geem(Memory ~ Years*I(Years > 3)*INTGRP, data = ., id = AID, corstr = 'exchangeable')
tibble(MixedBeta = fixef(mod1) %>% round(2), GEEBeta = mod2$beta %>% round(2)) %>%
mutate(Diff = MixedBeta - GEEBeta)
- Compare the estimated standard errors between the random intercept model and the GEE with an exchangeable working correlation structure (GEE have naive/model-based and robust SE’s). Describe what you observe and explain why it might be so.
ANSWER:
tibble(MixedSE = vcov(mod1) %>% diag() %>% sqrt() %>% round(2),
GEENaivSE = mod2$naiv.var %>% diag() %>% sqrt() %>% round(2),
GEERobustSE = mod2$var %>% diag() %>% sqrt() %>% round(2)) %>%
mutate(Diff = MixedSE- GEENaivSE, Diff2 = MixedSE- GEERobustSE)
- Now, to get a sense of the random intercept model, let’s look at a visualizations of the predictions. First, the predictions are based on the fixed effects and the random effects. The second includes the predictions from the fixed effects. Comment on what you observe and how it relates to the assumptions you made within the model.
ANSWER:
activeLong %>% drop_na(Memory, Years, INTGRP) %>%
mutate(Pred = predict(mod1)) %>%
ggplot(aes(x = Years, y = Pred, color = INTGRP, group = AID)) +
geom_line(alpha = .2) +
labs(color = 'Treatment Group', y = 'Predicted Outcomes (including random effects)') + theme_classic()
activeLong %>% drop_na(Memory, Years, INTGRP) %>%
mutate(Pred = predict(mod1, re.form = NA)) %>%
distinct(Years,INTGRP,Pred) %>%
ggplot(aes(x = Years, y = Pred, color = INTGRP)) +
geom_line() +
labs(color = 'Treatment Group', y = 'Predicted Outcomes (only fixed effects)') + theme_classic()
Model Selection
Hypothesis Testing for Coefficients (one at a time)
In order to decide which fixed effects should be in the model (using ML, not REML), we can test whether \(\beta_k=0\) for some \(k\).
In mixed effects models, the estimated covariance matrix for \(\hat{\boldsymbol\beta}\) is \[\widehat{Cov}(\hat{\boldsymbol\beta}) = \left\{ \sum^n_{i=1} \mathbf{X}_i^T\hat{\mathbf{V}}_i\mathbf{X}_i \right\}^{-1}\]
The standard error for \(\hat{\beta}_k\) is the square rooted values of the diagonal of this matrix corresponding to \(\hat{\beta}_k\). This is what is reported in the summary output.
To test the hypothesis \(H_0: \beta_k=0\) vs. \(H_A: \beta_k\not=0\), we can calculate a t-statistic,
\[t = \frac{\hat{\beta}_k}{SE(\hat{\beta}_k)} \]
With mixed effects models, there is debate about what the appropriate distribution of this statistic is, which is why p-values are often not reported in the output. However, if you have an estimate that is large relative to the standard error, that is a good indication that it is significantly different from zero.
- Looking back on the mixed effects model you fit, what do you conclude about the influence of Memory Training (as compared to Control Group) on Memory Score over time.
ANSWER:
Hypothesis Testing Comparing Nested Models
To compare two nested models (same random effects and same covariance models, but set some beta’s to 0), we could use a likelihood ratio test. Suppose we have two models with the same random effects and covariance models,
- Full model: \(M_f\) has \(p+1\) columns in \(\mathbf{X}_i\) and thus \(p+1\) fixed effects \(\beta_0,...,\beta_{p}\).
- Nested model: \(M_n\) has \(k+1\) columns such that \(k < p\) and \(p-k\) fixed effects \(\beta_l = 0\).
If we are using maximum likelihood estimation, it makes sense to compare the likelihood of the two models.
\(H_0:\) nested model is true, \(H_A:\) full model is true
If we take the ratio of the likelihoods from the nested model and full model and plug in the maximum likelihood estimators, then we have another statistic.
\[D = -2\log\left(\frac{L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}{L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}\right) = -2 \log(L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})) + 2\log(L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}}))\]
The sampling distribution of this statistic is approximately chi-squared with degrees of freedom equal to the difference in the number of parameters between the two models.
For mixed effects models, here is some example code that allows us to test whether the interactions between INTGRP
and Years
are needed (remove them in mod4
, which is equivalent to setting them all to 0; remove the interactions for Speed and Reasoning training in mod4
but keep the interaction with Memory).
mod3 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>%
lmer(Memory ~ Years*I(Years > 3)*I(INTGRP == 'Memory Training') + (1|AID), data = ., REML = FALSE)
mod4 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>%
lmer(Memory ~ Years*I(Years > 3) + INTGRP + (1|AID), data = ., REML = FALSE)
anova(mod1, mod4)
anova(mod1, mod3)
- Compare these three models. Discuss whether your should favor the larger model or the nested models. In other words, do we have evidence that the treatment group is related to how Memory changes over time on average?
ANSWER:
Information Criteria for Choosing Fixed Effects
In order to use BIC to choose between mixed effects models, you must use maximum likelihood estimation instead of REML. To choose between two models with different fixed effects, only compare models with the same random effects.
- Discuss which model seems the best using the Bayesian Information Criteria.
ANSWER:
Information Criteria for Choosing Random Effects
In order to use BIC to choose between mixed effects models, you must use maximum likelihood estimation instead of REML. To choose between two models with different random effects, only compare models with the same fixed effects.
- Discuss which model seems the best using the Bayesian Information Criteria.
ANSWER:
Diagnostics
For any linear model, we should check to see if the residuals have no pattern.
activeLong %>% drop_na(Memory, Years, INTGRP) %>%
mutate(Pred = predict(mod3)) %>%
mutate(Resid = resid(mod3)) %>%
ggplot(aes(x = Pred, y = Resid, color = INTGRP)) +
geom_point(alpha = .2) +
geom_hline(yintercept = 0) +
geom_smooth(se = FALSE) +
theme_classic()
For a mixed effects model, you are assuming the random effects distribution is Normal. You should check on the distribution of the predicted random effects.
- Lastly, all of these model selection tools are only valid if the entire model is correct (right mean model, right random effects, distributional assumptions about random effects and errors). Discuss the limitations of your model with this context in mind.
ANSWER: