12 Mixed Effects

Learning Goals

Explain and illustrate a random intercept model and connect to compound symmetry model
Explain how random intercepts and slopes model correlation
Fit mixed effects models to real data and interpret the output

Notation

Mixed Effects Models: We can write a linear mixed effects model in terms of our (quantitative) outcome vector,

\[\mathbf{Y}_{i} = \mathbf{X}_{i}\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i\]

\[ = \text{Fixed effects} + \text{Random effects} + \text{Error}\]

where \(\mathbf{X}_i\) is the model matrix for the fixed effects, \(\mathbf{Z}_i\) is the model matrix for the random effects (a subset of columns of \(\mathbf{X}_i\)), \(\boldsymbol\epsilon_i\sim N(0,\boldsymbol\Sigma)\), \(\mathbf{b}_i\sim N(0,\mathbf{G})\), and \(\mathbf{b}_i\) and \(\boldsymbol\epsilon_i\) are independent of each other.

Thus,

\[\mathbf{Y}_{i}\sim \mathcal{N}(\mathbf{X}_{i}\boldsymbol\beta,\mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma)\]

Typically, we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\).

Warm Up

Random Intercept Model: Using the notation above, define \(\mathbf{Z}_i\), \(\mathbf{b}_i\), and \(\mathbf{G}\) for a random intercept model.

If we use a random intercept model and we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\), describe the structure of \(Cov(Y_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\).

Group Activity

Download a template RMarkdown file to start from here and add it to your checkpoint-4 folder Github repository.

source('Cleaning.R') #Open Cleaning.R for information about variables.
head(activeLong)

We are going to consider modeling Memory as a function of Years and treatment group INTGRP with a mixed effects model. Let’s start by using a random intercept model. Describe the assumptions made by the code below. Consider the assumptions about the model for the mean (fixed effects) and the assumptions about the random effects.

ANSWER:

activeLong <- activeLong %>% mutate(INTGRP = relevel(INTGRP, ref='Control'))

library(lme4)
library(tidyr)
mod1 <- activeLong %>% drop_na(Memory, Years, INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*INTGRP + (1|AID), data = ., REML = FALSE)

mod1 %>% summary()

Compare the estimated coefficients between the random intercept model and the GEE with an exchangeable working correlation structure. Describe what you observe and explain why it is so.

ANSWER:

mod2 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  geeM::geem(Memory ~ Years*I(Years > 3)*INTGRP, data = ., id = AID, corstr = 'exchangeable')

tibble(MixedBeta = fixef(mod1) %>% round(2), GEEBeta =  mod2$beta %>% round(2)) %>%
  mutate(Diff = MixedBeta - GEEBeta)

Compare the estimated standard errors between the random intercept model and the GEE with an exchangeable working correlation structure (GEE have naive/model-based and robust SE’s). Describe what you observe and explain why it might be so.

ANSWER:

tibble(MixedSE = vcov(mod1) %>% diag() %>% sqrt() %>% round(2), 
       GEENaivSE =  mod2$naiv.var %>% diag() %>% sqrt() %>% round(2),
       GEERobustSE =  mod2$var %>% diag() %>% sqrt() %>% round(2)) %>%
  mutate(Diff = MixedSE- GEENaivSE, Diff2 = MixedSE- GEERobustSE)

Now, to get a sense of the random intercept model, let’s look at a visualizations of the predictions. First, the predictions are based on the fixed effects and the random effects. The second includes the predictions from the fixed effects. Comment on what you observe and how it relates to the assumptions you made within the model.

ANSWER:

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod1)) %>%
  ggplot(aes(x = Years, y = Pred, color = INTGRP, group = AID)) + 
  geom_line(alpha = .2) + 
  labs(color = 'Treatment Group', y = 'Predicted Outcomes (including random effects)') + theme_classic()

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod1, re.form = NA)) %>%
  distinct(Years,INTGRP,Pred) %>%
  ggplot(aes(x = Years, y = Pred, color = INTGRP)) + 
  geom_line() + 
  labs(color = 'Treatment Group', y = 'Predicted Outcomes (only fixed effects)') + theme_classic()

Model Selection

Hypothesis Testing for Coefficients (one at a time)

In order to decide which fixed effects should be in the model (using ML, not REML), we can test whether \(\beta_k=0\) for some \(k\).

In mixed effects models, the estimated covariance matrix for \(\hat{\boldsymbol\beta}\) is \[\widehat{Cov}(\hat{\boldsymbol\beta}) = \left\{ \sum^n_{i=1} \mathbf{X}_i^T\hat{\mathbf{V}}_i\mathbf{X}_i \right\}^{-1}\]

The standard error for \(\hat{\beta}_k\) is the square rooted values of the diagonal of this matrix corresponding to \(\hat{\beta}_k\). This is what is reported in the summary output.

To test the hypothesis \(H_0: \beta_k=0\) vs. \(H_A: \beta_k\not=0\), we can calculate a t-statistic,

\[t = \frac{\hat{\beta}_k}{SE(\hat{\beta}_k)} \]

With mixed effects models, there is debate about what the appropriate distribution of this statistic is, which is why p-values are often not reported in the output. However, if you have an estimate that is large relative to the standard error, that is a good indication that it is significantly different from zero.

# Manual Code to get t-values: fixef(mod1)/sqrt(diag(vcov(mod1)))
summary(mod1)

Looking back on the mixed effects model you fit, what do you conclude about the influence of Memory Training (as compared to Control Group) on Memory Score over time.

ANSWER:

Hypothesis Testing Comparing Nested Models

To compare two nested models (same random effects and same covariance models, but set some beta’s to 0), we could use a likelihood ratio test. Suppose we have two models with the same random effects and covariance models,

Full model: \(M_f\) has \(p+1\) columns in \(\mathbf{X}_i\) and thus \(p+1\) fixed effects \(\beta_0,...,\beta_{p}\).
Nested model: \(M_n\) has \(k+1\) columns such that \(k < p\) and \(p-k\) fixed effects \(\beta_l = 0\).

If we are using maximum likelihood estimation, it makes sense to compare the likelihood of the two models.

\(H_0:\) nested model is true, \(H_A:\) full model is true

If we take the ratio of the likelihoods from the nested model and full model and plug in the maximum likelihood estimators, then we have another statistic.

\[D = -2\log\left(\frac{L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}{L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}\right) = -2 \log(L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})) + 2\log(L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}}))\]

The sampling distribution of this statistic is approximately chi-squared with degrees of freedom equal to the difference in the number of parameters between the two models.

For mixed effects models, here is some example code that allows us to test whether the interactions between INTGRP and Years are needed (remove them in mod4, which is equivalent to setting them all to 0; remove the interactions for Speed and Reasoning training in mod4 but keep the interaction with Memory).

mod3 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*I(INTGRP == 'Memory Training') + (1|AID), data = ., REML = FALSE)

mod4 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3) + INTGRP + (1|AID), data = ., REML = FALSE)

anova(mod1, mod4)
anova(mod1, mod3)

Compare these three models. Discuss whether your should favor the larger model or the nested models. In other words, do we have evidence that the treatment group is related to how Memory changes over time on average?

ANSWER:

Information Criteria for Choosing Fixed Effects

In order to use BIC to choose between mixed effects models, you must use maximum likelihood estimation instead of REML. To choose between two models with different fixed effects, only compare models with the same random effects.

Discuss which model seems the best using the Bayesian Information Criteria.

ANSWER:

BIC(mod1)
BIC(mod3)
BIC(mod4)

Information Criteria for Choosing Random Effects

In order to use BIC to choose between mixed effects models, you must use maximum likelihood estimation instead of REML. To choose between two models with different random effects, only compare models with the same fixed effects.

Discuss which model seems the best using the Bayesian Information Criteria.

ANSWER:

mod5 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*I(INTGRP == 'Memory Training') + (Years|AID), data = ., REML = FALSE)

BIC(mod3)
BIC(mod5)

Diagnostics

For any linear model, we should check to see if the residuals have no pattern.

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod3)) %>%
  mutate(Resid = resid(mod3)) %>%
  ggplot(aes(x = Pred, y = Resid, color = INTGRP)) + 
  geom_point(alpha = .2) + 
  geom_hline(yintercept = 0) + 
  geom_smooth(se = FALSE) + 
  theme_classic()

For a mixed effects model, you are assuming the random effects distribution is Normal. You should check on the distribution of the predicted random effects.

A = ranef(mod3)$AID$`(Intercept)`
qqnorm(A)
qqline(A)

Lastly, all of these model selection tools are only valid if the entire model is correct (right mean model, right random effects, distributional assumptions about random effects and errors). Discuss the limitations of your model with this context in mind.

ANSWER: