12 Mixed Effects

Learning Goals

  • Explain and illustrate a random intercept model and connect to compound symmetry model
  • Explain how random intercepts and slopes model correlation
  • Fit mixed effects models to real data and interpret the output

Slides from today are available here.

Notation

Mixed Effects Models: We can write a linear mixed effects model in terms of our (quantitative) outcome vector,

\[\mathbf{Y}_{i} = \mathbf{X}_{i}\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i\]

\[ = \text{Fixed effects} + \text{Random effects} + \text{Error}\]

where \(\mathbf{X}_i\) is the model matrix for the fixed effects, \(\mathbf{Z}_i\) is the model matrix for the random effects (a subset of columns of \(\mathbf{X}_i\)), \(\boldsymbol\epsilon_i\sim N(0,\boldsymbol\Sigma)\), \(\mathbf{b}_i\sim N(0,\mathbf{G})\), and \(\mathbf{b}_i\) and \(\boldsymbol\epsilon_i\) are independent.

Thus,

\[\mathbf{Y}_{i}\sim \mathcal{N}(\mathbf{X}_{i}\boldsymbol\beta,\mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma)\]

Typically, we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\).

Warm Up

Random Intercept Model: Using the notation above, define \(\mathbf{Z}_i\), \(\mathbf{b}_i\), and \(\mathbf{G}\) for a random intercept model.

If we use a random intercept model and we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\), describe the structure of \(Cov(Y_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\).

Group Activity

Download a template RMarkdown file to start from here and add it to your checkpoint-4 folder Github repository.

source('Cleaning.R') #Open Cleaning.R for information about variables.
head(activeLong)
  1. We are going to consider modeling Memory as a function of Years and treatment group INTGRP again but with a mixed effects model. Let’s start by using a random intercept model. Describe the assumptions made by the code below. Consider the assumptions about the model for the mean (fixed effects) and the assumptions about the random effects.

ANSWER:

activeLong <- activeLong %>% mutate(INTGRP = relevel(INTGRP, ref='Control'))

library(lme4)
library(tidyr)
mod1 <- activeLong %>% drop_na(Memory, Years, INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*INTGRP + (1|AID), data = ., REML = FALSE)

mod1 %>% summary()
  1. Compare the estimated coefficients between the random intercept model and the GEE with an exchangeable working correlation structure. Describe what you observe and explain why it is so.

ANSWER:

mod2 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  geeM::geem(Memory ~ Years*I(Years > 3)*INTGRP, data = ., id = AID, corstr = 'exchangeable')

fixef(mod1)
mod2$beta
  1. Compare the estimated standard errors between the random intercept model and the GEE with an exchangeable working correlation structure (GEE SE’s are Robust). Describe what you observe and explain why it might be so.

ANSWER:

vcov(mod1) %>% diag() %>% sqrt()
mod2$var %>% diag() %>% sqrt()
  1. Now, to get a sense of the random intercept model, look at visualizations of the predictions. First, the predictions are based on the fixed effects and the random effects. The second includes the predictions from the fixed effects. Comment on what you observe and how it relates to the assumptions you made with the model.

ANSWER:

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod1)) %>%
  ggplot(aes(x = Years, y = Pred, color = INTGRP, group = AID)) + 
  geom_line(alpha = .2) + 
  labs(color = 'Treatment Group', y = 'Predicted Outcomes (including random effects)') + theme_classic()

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod1, re.form = NA)) %>%
  distinct(Years,INTGRP,Pred) %>%
  ggplot(aes(x = Years, y = Pred, color = INTGRP)) + 
  geom_line() + 
  labs(color = 'Treatment Group', y = 'Predicted Outcomes (only fixed effects)') + theme_classic()

Model Selection

Hypothesis Testing for Coefficients (one at a time)

In order to decide which fixed effects should be in the model (using ML, not REML), we can test whether \(\beta_k=0\) for some \(k\).

In mixed effects models, the estimated covariance matrix for \(\hat{\boldsymbol\beta}\) is \[\widehat{Cov}(\hat{\boldsymbol\beta}) = \left\{ \sum^n_{i=1} \mathbf{X}_i^T\hat{\mathbf{V}}_i\mathbf{X}_i \right\}^{-1}\]

The standard error for \(\hat{\beta}_k\) is the square rooted values of the diagonal of this matrix corresponding to \(\hat{\beta}_k\). This is what is reported in the summary output.

To test the hypothesis \(H_0: \beta_k=0\) vs. \(H_A: \beta_k\not=0\), we can calculate a z-statistic,

\[z = \frac{\hat{\beta}_k}{SE(\hat{\beta}_k)} \]

With mixed effects models, there is debate about what the appropriate distribution of this statistic is, which is why p-values are often not reported in the output. However, if you have an estimate that is large relative to the standard error, that is a good indication that it is significantly different from zero.

# Manual Code to get t-values: fixef(mod1)/sqrt(diag(vcov(mod1)))
summary(mod1)
  1. Looking back on the mixed effects model you fit, what do you conclude about the influence of Memory Training (as compared to Control Group) on Memory Score over time.

ANSWER:

Hypothesis Testing for Coefficients (many at one time)

To test a more involved hypothesis \(H_0: \mathbf{L}\boldsymbol\beta=0\) (to test whether multiple slopes are zero or a linear combination of slopes is zero), we can calculate a Wald statistic,

\[W^2 = (\mathbf{L}\hat{\boldsymbol\beta})^T(\mathbf{L}\widehat{Cov}(\hat{\boldsymbol\beta})\mathbf{L}^T)^{-1}(\mathbf{L}\hat{\boldsymbol\beta}) \]

Then we assume the sampling distribution is approximately \(\chi^2\) with df = # of rows of \(\mathbf{L}\) to calculate p-values (as long as \(n\) is large).

For mixed effects models, here is some example code that allows us to test whether the interactions between INTGRP and Years are needed (set all of the coefficients to 0).

b = fixef(mod1)
W = vcov(mod1)

L = matrix(0,nrow=9, ncol=16)
for(i in 1:9){
  L[i,7+i] <- 1
}
L

se = sqrt(diag(L%*%W%*%t(L))) ## SE for Lb

##Hypothesis Testing
w2 <- as.numeric( t(L%*%b) %*% solve(L %*% W %*% t(L))%*% (L%*%b)) ## should be approximately chi squared
1 - pchisq(w2, df = nrow(L)) #p-value

Lastly, another option with mixed effects models is a likelihood ratio test. Suppose we have two models with the same random effects and covariance models.

  • Full model: \(M_f\) has \(p+1\) columns in \(\mathbf{X}_i\) and thus \(p+1\) fixed effects \(\beta_0,...,\beta_{p}\).
  • Nested model: \(M_n\) has \(k+1\) columns such that \(k < p\) and \(p-k\) fixed effects \(\beta_l = 0\).

If we are using maximum likelihood estimation, it makes sense to compare the likelihood of the two models.

\(H_0:\) nested model is true, \(H_A:\) full model is true

If we take the ratio of the likelihoods from the nested model and full model and plug in the maximum likelihood estimators, then we have another statistic.

\[D = -2\log\left(\frac{L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}{L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}\right) = -2 \log(L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})) + 2\log(L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}}))\]

The sampling distribution of this statistic is approximately chi-squared with degrees of freedom equal to the difference in the number of parameters between the two models.

For mixed effects models, here is some example code that allows us to test whether the interactions between INTGRP and Years are needed (remove them in mod3, which is equivalent to setting them all to 0).

mod3 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3) + INTGRP + (1|AID), data = ., REML = FALSE)

anova(mod1, mod3)
  1. Compare these two models. Discuss whether your should favor the larger model or the nested model. In other words, do we have evidence that the treatment group is related to how Memory changes over time on average?

ANSWER:

Information Criteria for Choosing Fixed Effects

In order to use BIC to choose between mixed effects models, you must use maximum likelihood estimation instead of REML. Only compare models with the same random effects.

  1. Discuss which model seems the best using the Bayesian Information Criteria.

ANSWER:

BIC(mod1)
BIC(mod3)

Diagnostics

For any linear model, we should check to see if the residuals have no pattern.

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod3)) %>%
  mutate(Resid = resid(mod3)) %>%
  ggplot(aes(x = Pred, y = Resid, color = INTGRP)) + 
  geom_point(alpha = .2) + geom_hline(yintercept = 0) + geom_smooth(se = FALSE) + theme_classic()

For a mixed effects model, you are assuming the random effects distribution is Normal. You should check on the distribution of the predicted random effects.

A = ranef(mod3)$AID$`(Intercept)`
qqnorm(A)
qqline(A)
  1. Lastly, all of these model selection tools are only valid if the entire model is correct (right mean model, right random effects, distributional assumptions about random effects and errors). Discuss the limitations of your model with this context in mind.

ANSWER: