12 Mixed Effects

Settling In

Sit next to someone that you want to work with on a longitudinal project.

They cannot be the same people that you worked with on time series.

Open HW6 and do the Github Setup.

Fill out this form about your preferences about Capstone Projects.

Everything on the slides is in the online manual: https://bcheggeseth.github.io/452_fall_2025/

Highlights from Day 11

Generalized Estimating Equations (GEE)

Definition: Generalized Estimating Equations is an approach to estimate the parameters of a Marginal Model version of a GLM with correlation among repeated measures of longitudinal data.

. . .

GEE is a semi-parametric approach that estimates coefficients by solving the following estimating equations:

\[0 = \sum_{i=1}^n \frac{\partial \boldsymbol\mu_i}{\partial \boldsymbol\beta }\mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol\mu_i(\boldsymbol \beta)) \]

where \(\mathbf{V}_i\) is a covariance matrix based on a working correlation structure and \(\boldsymbol\mu_i(\boldsymbol\beta)\) is based on a link function and a linear combination of X variables, \(\mu_{ij}(\boldsymbol\beta) = g^{-1}(\mathbf{x}_{ij}^T\boldsymbol\beta)\), and

\[\mathbf{V}_i = \mathbf{A}_i^{1/2} Cor(\mathbf{Y}_i) \mathbf{A}_i^{1/2}\]

where \(\mathbf{A}_i\) is a diagonal matrix with \(Var(Y_{ij} | X_{ij}) = \phi v(\mu_{ij})\) along the diagonal.

. . .

If an observation is missing completely at random (MCAR), the estimates are still valid.

If an observation is not missing completely at random, one could incorporate inverse-probability weights that account for drop out / missing values if the missingness is associated with other variables that are observed (called MAR).

Learning Goals

Explain and illustrate a random intercept model and connect to compound symmetry model
Explain how random intercepts and slopes model correlation
Fit mixed effects models to real data and interpret the output

Mixed Effects Models

Definitions

Generalized Linear Mixed Effects is a longitudinal model that incorporates correlation within subjects.

. . .

Mixed Effects Model is similar to a GLM (Distribution, Link function, Systematic component), except we incorporate subject-specific parameters in the Systematic Component of the mean model,

\[g(E[\mathbf{Y}_{i} | \mathbf{X}_{i}, \mathbf{Z}_{i},\mathbf{b}_i ]) = \mathbf{X}_{i}\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i \]

. . .

The Mixed Effects Model involves fixed effects and random effects.

It models the mean with population-level parameters \(\boldsymbol\beta\), similar to GLM (fixed effects).
- These are the parameters you are primarily interested in, that are common within the population (e.g., the association with a new drug, the association with age).
It models “within-subject” correlation with subject-specific parameters, \(\mathbf{b}_i\), that are treated as random variables (random effects).
- Random effects represent the variability attributable to the groups or individuals themselves (e.g., differences in average response between individuals, or differences in slopes of change over time between individuals).
- You’re typically not interested in the specific effect of each individual group, but rather in the overall variance attributable to that grouping factor.
Note: If you have multiple levels of correlation (time and space), you can model these different sources of (co)-variation.

Model Specification

For now, we’ll focus on quantitative, continuous outcomes.

We can write a linear mixed effects model in terms of our (quantitative) outcome vector,

\[\mathbf{Y}_{i} = \mathbf{X}_{i}\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i,\quad\quad \boldsymbol\epsilon_i\sim N(0,\boldsymbol\Sigma), \mathbf{b}_i\sim N(0,\mathbf{G}), \boldsymbol\epsilon_i \perp \mathbf{b}_i\]

\[= \text{Fixed effects} + \text{Random effects} + \text{Error}\]

where

\(\mathbf{X}_i\) is the model matrix for the fixed effects,
\(\mathbf{Z}_i\) is the model matrix for the random effects (a subset of columns of \(\mathbf{X}_i\)),
\(\boldsymbol\epsilon_i\sim N(0,\boldsymbol\Sigma)\),
\(\mathbf{b}_i\sim N(0,\mathbf{G})\), and
\(\mathbf{b}_i\) and \(\boldsymbol\epsilon_i\) are independent of each other.

. . .

Thus,

\[\mathbf{Y}_{i}\sim N(\mathbf{X}_{i}\boldsymbol\beta,\mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma)\quad\quad \boldsymbol\epsilon_i\sim N(0,\boldsymbol\Sigma), \mathbf{b}_i\sim N(0,\mathbf{G}), \boldsymbol\epsilon_i \perp \mathbf{b}_i \]

Typically, we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\).

Properties

If the full model is correct \[\mathbf{Y}_i = \mathbf{X}_i\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i\quad\quad \boldsymbol\epsilon_i\sim N(0,\boldsymbol\Sigma), \mathbf{b}_i\sim N(0,\mathbf{G}), \boldsymbol\epsilon_i \perp \mathbf{b}_i,\]

. . .

then the marginal expected value of the outcome is

\[ E(\mathbf{Y}_i| \mathbf{X}_{i}, \mathbf{Z}_{i}) = E(\mathbf{X}_i\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i | \mathbf{X}_{i}, \mathbf{Z}_{i})\] \[ = E(\mathbf{X}_i\boldsymbol\beta | \mathbf{X}_{i}, \mathbf{Z}_{i}) +E(\mathbf{Z}_i\mathbf{b}_i | \mathbf{X}_{i}, \mathbf{Z}_{i}) + E(\boldsymbol\epsilon_i| \mathbf{X}_{i}, \mathbf{Z}_{i})\] \[ = E(\mathbf{X}_i\boldsymbol\beta | \mathbf{X}_{i}, \mathbf{Z}_{i}) = \mathbf{X}_i\boldsymbol\beta \]

. . .

the conditional expected value of the outcome is

\[ E(\mathbf{Y}_i| \mathbf{X}_{i}, \mathbf{Z}_{i}, \mathbf{b}_{i}) = E(\mathbf{X}_i\boldsymbol\beta + \mathbf{Z}_ib_i + \boldsymbol\epsilon_i | \mathbf{X}_{i}, \mathbf{Z}_{i}, \mathbf{b}_{i})\] \[ = E(\mathbf{X}_i\boldsymbol\beta | \mathbf{X}_{i}, \mathbf{Z}_{i}, \mathbf{b}_{i}) +E(\mathbf{Z}_i\mathbf{b}_i | \mathbf{X}_{i}, \mathbf{Z}_{i}, \mathbf{b}_{i}) + E(\boldsymbol\epsilon_i| \mathbf{X}_{i}, \mathbf{Z}_{i}, \mathbf{b}_{i})\] \[ = \mathbf{X}_i\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i\]

. . .

The covariance of the outcome is

\[ Cov(\mathbf{Y}_i) = Cov(\mathbf{X}_i\boldsymbol\beta + \mathbf{Z}_ib_i + \boldsymbol\epsilon_i)\] \[ = Cov(\mathbf{Z}_i\mathbf{b}_i) + Cov(\boldsymbol\epsilon_i)\] \[ = \mathbf{Z}_iCov(\mathbf{b}_i)\mathbf{Z}_i^T + Cov(\boldsymbol\epsilon_i)\] \[ = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\]

Implementation

To estimate a mixed effects model, we can use the lme4 package in R.

We can use the lmer function for linear mixed effects models and the glmer function for generalized linear mixed effects models (e.g., logistic regression).

Estimation could be done with Restricted Maximum Likelihood (REML) or Maximum Likelihood (ML). REML is often preferred for mixed effects models because it accounts for the estimation of the random effects variance separately from the fixed effects.

. . .

The basic syntax is:

# Linear Mixed Effects Model
# Subject-specific parameter specified in ( ___ | id)

lme4::lmer(Y ~ X1 + X2 + (1 | id), data = data, REML = TRUE) %>%
  summary() # prints out estimates, model SE, t-values


lme4::lmer(Y ~ X1 + X2 + (X1 | id), data = data , REML = TRUE) %>%
  summary() # prints out estimates, model SE, t-values

# Logistic Linear Mixed Effects Model
lme4::glmer(Y ~ X1 + X2 + (1 | id), family = binomial, data = data, REML = TRUE) %>%
  summary() # prints out estimates, model SE, t-values

Warm Up

Random Intercept Model

Using the notation, define \(\mathbf{Z}_i\), \(\mathbf{b}_i\), and \(\mathbf{G}\) for a random intercept model.

If we use a random intercept model and we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\), describe the structure of \(Cov(Y_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\).

Random Slope Model

Using the notation, define \(\mathbf{Z}_i\), \(\mathbf{b}_i\), and \(\mathbf{G}\) for a random slope model.

If we use a random slope model and we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\), describe the structure of \(Cov(Y_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\).

Model Selection

When selecting models, we must consider:

Model Type
Mean Model
Covariance Model

Choices

Model Type: Pros and Cons (Mixed Effects v. GEE)
- Desired Interpretation
Mean Model: Which X’s
- Research Question
- Research Goals (Causal Inference v. Approximate a relationship)
Covariance Model: Structure or Random Effects
- Data Context
- Resulting Covariance
- Goodness of Fit to Data

Hypothesis Testing Coefficients

In order to test fixed effects, use ML and not REML, then we can test whether \(\beta_k=0\) for some \(k\).

. . .

In mixed effects models, the estimated covariance matrix for \(\hat{\boldsymbol\beta}\) is \[\widehat{Cov}(\hat{\boldsymbol\beta}) = \left\{ \sum^n_{i=1} \mathbf{X}_i^T\hat{\mathbf{V}}_i\mathbf{X}_i \right\}^{-1}\]

The standard error for \(\hat{\beta}_k\) is the square rooted values of the diagonal of this matrix corresponding to \(\hat{\beta}_k\). This is what is reported in the summary output.

. . .

To test the hypothesis \(H_0: \beta_k=0\) vs. \(H_A: \beta_k\not=0\), we can calculate a t-statistic,

\[t = \frac{\hat{\beta}_k}{SE(\hat{\beta}_k)} \]

With mixed effects models, there is debate about what the appropriate distribution of this statistic is, which is why p-values are often not reported in the output. However, if you have an estimate that is large relative to the standard error (greater than 2 SE’s away), that is a good indication that it is significantly different from zero.

summary(mod1)

# Manual Code to get t-values

fixef(mod1)/sqrt(diag(vcov(mod1)))

Hypothesis Testing Groups of Coefficients

To compare two models (same random effects and same covariance models, but set some \(\beta\)’s to 0), we could use a likelihood ratio test. Suppose we have two models with the same random effects and covariance models,

Full model: \(M_f\) has \(p+1\) columns in \(\mathbf{X}_i\) and thus \(p+1\) fixed effects \(\beta_0,...,\beta_{p}\).
Nested model: \(M_n\) has \(k+1\) columns such that \(k < p\) and \(p-k\) fixed effects \(\beta_l = 0\).

. . .

If we are using maximum likelihood estimation (not REML), it makes sense to compare the likelihood of the two models.

\(H_0:\) nested model is true, \(H_A:\) full model is true

. . .

If we take the ratio of the likelihoods from the nested model and full model and plug in the maximum likelihood estimators, then we have another statistic.

\[D = -2\log\left(\frac{L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}{L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}\right) = -2 \log(L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})) + 2\log(L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}}))\]

The sampling distribution of this statistic is approximately chi-squared with degrees of freedom equal to the difference in the number of parameters between the two models.

anova(mod1, mod2) # mod2 is a nested model of mod1

Goodness of Fit: Information Criteria

To choose between two models, we can use the Bayesian Information Criterion (BIC). BIC is a model selection criterion that penalizes the likelihood of the model by the number of parameters and the sample size. Choose a model with a lower BIC (higher likelihood, fewer parameters).

\[BIC= -2\log(\hat{L}(\hat{\boldsymbol\beta},\hat{\mathbf{V}})) + k\log(n)\]

where \(\hat{L}(\hat{\boldsymbol\beta},\hat{\mathbf{V}})\) is the likelihood of the model with estimated parameters \(\hat{\boldsymbol\beta}\) and covariance matrix \(\hat{\mathbf{V}}\), \(k\) is the number of parameters in the model, and \(n\) is the number of observations.

To choose between two models with different random effects, only compare models with the same fixed effects.

Note: In order to use BIC to choose between mixed effects models, you must use maximum likelihood (ML) estimation instead of REML.

Model Diagnostics

For any linear model, we should check to see if the residuals have no pattern.

data %>% 
  mutate(Pred = predict(mod)) %>%
  mutate(Resid = resid(mod)) %>%
  ggplot(aes(x = Pred, y = Resid)) + 
  geom_point(alpha = .2) + 
  geom_hline(yintercept = 0) + 
  geom_smooth(se = FALSE) + 
  theme_classic()

For a mixed effects model, you are assuming the random effects distribution is Normal.

You should check on the distribution of the predicted random effects to see if they are approximately Normal.

A <- ranef(mod)
qqnorm(A)
qqline(A)

Small Group Activity

Download a template Qmd file to start from here.

Exercises

source('Cleaning.R') #Open Cleaning.R for information about variables.
head(activeLong)

We are going to consider modeling Memory as a function of Years and treatment group INTGRP with a mixed effects model. Let’s start by using a random intercept model. Describe the assumptions made by the code below. Consider the assumptions about the model for the mean (fixed effects) and the assumptions about the random effects.

activeLong <- activeLong %>% mutate(INTGRP = relevel(INTGRP, ref='Control'))

library(lme4)
library(tidyr)
mod1 <- activeLong %>% drop_na(Memory, Years, INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*INTGRP + (1|AID), data = ., REML = FALSE)

mod1 %>% summary()

Compare the estimated coefficients between the random intercept model and the GEE with an exchangeable working correlation structure. Describe what you observe and explain why it is so.

mod2 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  geeM::geem(Memory ~ Years*I(Years > 3)*INTGRP, data = ., id = AID, corstr = 'exchangeable')

tibble(MixedBeta = fixef(mod1) %>% round(2), GEEBeta =  mod2$beta %>% round(2)) %>%
  mutate(Diff = MixedBeta - GEEBeta)

Compare the estimated standard errors between the random intercept model and the GEE with an exchangeable working correlation structure (GEE have naive/model-based and robust SE’s). Describe what you observe and explain why it might be so.

tibble(MixedSE = vcov(mod1) %>% diag() %>% sqrt() %>% round(2), 
       GEENaivSE =  mod2$naiv.var %>% diag() %>% sqrt() %>% round(2),
       GEERobustSE =  mod2$var %>% diag() %>% sqrt() %>% round(2)) %>%
  mutate(Diff = MixedSE- GEENaivSE, Diff2 = MixedSE- GEERobustSE)

Now, to get a sense of the random intercept model, let’s look at a visualizations of the predictions. First, the predictions are based on the fixed effects and the random effects. The second includes the predictions from the fixed effects. Comment on what you observe and how it relates to the assumptions you made within the model.

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod1)) %>%
  ggplot(aes(x = Years, y = Pred, color = INTGRP, group = AID)) + 
  geom_line(alpha = .2) + 
  labs(color = 'Treatment Group', y = 'Predicted Outcomes (including random effects)') + theme_classic()

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod1, re.form = NA)) %>%
  distinct(Years,INTGRP,Pred) %>%
  ggplot(aes(x = Years, y = Pred, color = INTGRP)) + 
  geom_line() + 
  labs(color = 'Treatment Group', y = 'Predicted Outcomes (only fixed effects)') + theme_classic()

Looking back on the mixed effects model you fit, what do you conclude about the influence of Memory Training (as compared to Control Group) on Memory Score over time.
Compare these three models. Discuss whether your should favor the larger model or the nested models. In other words, do we have evidence that the treatment group is related to how Memory changes over time on average?

For mixed effects models, here is some example code that allows us to test whether the interactions between INTGRP and Years are needed (remove them in mod4, which is equivalent to setting them all to 0; remove the interactions for Speed and Reasoning training in mod4 but keep the interaction with Memory).

mod3 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*I(INTGRP == 'Memory Training') + (1|AID), data = ., REML = FALSE)

mod4 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3) + INTGRP + (1|AID), data = ., REML = FALSE)

anova(mod1, mod4)
anova(mod1, mod3)

Discuss which set of fixed effects seems the best using the Bayesian Information Criteria.

BIC(mod1)
BIC(mod3)
BIC(mod4)

Discuss which set of random effects seems the best using the Bayesian Information Criteria.

mod5 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*I(INTGRP == 'Memory Training') + (Years|AID), data = ., REML = FALSE)

BIC(mod3)
BIC(mod5)

Lastly, all of these model selection tools are only valid if the entire model is correct (right mean model, right random effects, distributional assumptions about random effects and errors). Check diagnostics and discuss the limitations of your model with this context in mind.

Note about Fixed Effects v. Random Effects

Economics Prefers Fixed Effects

In Econometrics, faculty will suggest including subject-specific parameters as fixed effects rather than random effects.

Estimating the subject-specific intercepts as parameters (instead of random variables)

For example, if you have longitudinal data for each country over time, you might include country id or name as a fixed effect.

lm(GDP ~ Population + Inflation + Country, data = data)

. . .

Economics Main Goals

Omitted Variable Bias and Endogeneity (unobserved variables that are correlated to observed variables)
Causal Inference

Why Economics use Fixed Effects Models

\(\mathbf{b}_i\) and \(\boldsymbol\epsilon_i\) might not be independent
Fixed Effects results in consistency in estimates of interest even if there is omitted variable bias

Solutions

Mixed Effects

Random Intercept Model

Solution

\[\mathbf{Z}_i = \left[\begin{array}{c}1\\1\\ \vdots\\ 1\\1 \end{array}\right]\]

\[\mathbf{b}_i = \left[\begin{array}{c}b_{i0} \end{array}\right]\]

\[\mathbf{G} = \left[\begin{array}{c}\sigma_{0}^2 \end{array}\right]\]

If we use a random intercept model and we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\), describe the structure of \(Cov(Y_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\).

\[Cov(Y_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\] \[= \left[\begin{array}{c}1\\1\\ \vdots\\ 1\\1 \end{array}\right]\left[\begin{array}{c}\sigma_{0}^2 \end{array}\right]\left[\begin{array}{ccccc}1&1& \cdots& 1&1 \end{array}\right] + \sigma^2_{\epsilon} I\]

\[= \sigma_{0}^2\left[\begin{array}{ccccc}1&1& \cdots& 1&1 \\1&1& \cdots& 1&1 \\ 1&1&\ddots&1&1\\ 1&1& \cdots& 1&1 \\1&1& \cdots& 1&1 \end{array}\right] + \sigma^2_{\epsilon} I\] Compound Symmetric Structure!

Random Slope Model

Solution

\[\mathbf{Z}_i = \left[\begin{array}{cc}1&x_{i1}\\1&x_{i2}\\ \vdots&\vdots\\ 1&x_{im_i} \end{array}\right]\]

\[\mathbf{b}_i = \left[\begin{array}{c}b_{i0}\\b_{i1} \end{array}\right]\]

\[\mathbf{G} = \left[\begin{array}{cc}\sigma_{0}^2& \sigma_{01}\\ \sigma_{01} & \sigma_{1}^2 \end{array}\right]\]

If we use a random intercept model and we assume \(\boldsymbol\Sigma = \sigma^2_{\epsilon} I\), describe the structure of \(Cov(Y_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\).

\[Cov(Y_i) = \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\] \[=\left[\begin{array}{cc}1&x_{i1}\\1&x_{i2}\\ \vdots&\vdots\\ 1&x_{im_i} \end{array}\right]\left[\begin{array}{cc}\sigma_{0}^2& \sigma_{01}\\ \sigma_{01} & \sigma_{1}^2 \end{array}\right]\left[\begin{array}{cc}1&1&\cdots&1\\x_{i1}&x_{i2}& \cdots&x_{im_i} \end{array}\right] + \sigma^2_{\epsilon} I\]

Complex Structure… dependent on X’s and all of the variances

Small Group Activity

Solution

We are modeling the mean Memory score as a function of Years and Treatment group.
We assume the mean Memory Score changes linearly from baseline (Years = 0) to the 3rd year of follow up and then the mean memory score changes linearly (but perhaps with a different slope) from the 3rd year of follow up till the 10th year of follow up. We specify this assumption with the interaction between Years and an indicator variable for Years > 3 years, Years*I(Years > 3).
We assume that each Treatment group has a different relationships with time by including an interaction term in the model, Years*I(Years > 3)*INTGRP.
We assume that each individual has their own random intercept and that the intercept deviations from the overall average intercept follow a Normal distribution with mean 0 and variance, \(\sigma^2_b\).
We assume the errors are independent with constant variance, \(\sigma^2_e\).

activeLong <- activeLong %>% mutate(INTGRP = relevel(INTGRP, ref='Control'))

library(lme4)
library(tidyr)
mod1 <- activeLong %>% drop_na(Memory, Years, INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*INTGRP + (1|AID), data = ., REML = FALSE)

mod1 %>% summary()

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: Memory ~ Years * I(Years > 3) * INTGRP + (1 | AID)
   Data: .

      AIC       BIC    logLik -2*log(L)  df.resid 
  99505.8   99640.6  -49734.9   99469.8     13236 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-5.6494 -0.5424  0.0318  0.5753  6.6810 

Random effects:
 Groups   Name        Variance Std.Dev.
 AID      (Intercept) 255.9    15.997  
 Residual              56.6     7.524  
Number of obs: 13254, groups:  AID, 2785

Fixed effects:
                                                          Estimate Std. Error
(Intercept)                                               77.42116    0.64396
Years                                                      0.88896    0.14181
I(Years > 3)TRUE                                           0.92667    0.99204
INTGRPMemory Training                                      2.89782    0.90857
INTGRPReasoning Training                                   1.21432    0.90917
INTGRPSpeed of Processing Training                         0.98397    0.90865
Years:I(Years > 3)TRUE                                    -2.37598    0.19568
Years:INTGRPMemory Training                                0.52983    0.19807
Years:INTGRPReasoning Training                            -0.05223    0.19823
Years:INTGRPSpeed of Processing Training                   0.17503    0.19789
I(Years > 3)TRUE:INTGRPMemory Training                     6.12709    1.38875
I(Years > 3)TRUE:INTGRPReasoning Training                  3.52266    1.36602
I(Years > 3)TRUE:INTGRPSpeed of Processing Training        3.61827    1.37079
Years:I(Years > 3)TRUE:INTGRPMemory Training              -1.28948    0.27340
Years:I(Years > 3)TRUE:INTGRPReasoning Training           -0.48388    0.27080
Years:I(Years > 3)TRUE:INTGRPSpeed of Processing Training -0.61670    0.27146
                                                          t value
(Intercept)                                               120.227
Years                                                       6.269
I(Years > 3)TRUE                                            0.934
INTGRPMemory Training                                       3.189
INTGRPReasoning Training                                    1.336
INTGRPSpeed of Processing Training                          1.083
Years:I(Years > 3)TRUE                                    -12.142
Years:INTGRPMemory Training                                 2.675
Years:INTGRPReasoning Training                             -0.263
Years:INTGRPSpeed of Processing Training                    0.884
I(Years > 3)TRUE:INTGRPMemory Training                      4.412
I(Years > 3)TRUE:INTGRPReasoning Training                   2.579
I(Years > 3)TRUE:INTGRPSpeed of Processing Training         2.640
Years:I(Years > 3)TRUE:INTGRPMemory Training               -4.716
Years:I(Years > 3)TRUE:INTGRPReasoning Training            -1.787
Years:I(Years > 3)TRUE:INTGRPSpeed of Processing Training  -2.272

Solution

The coefficients from the GEE with exchangeable correlation and the random intercept model are EXACTLY the same.
This is true because the random intercept model is mathematically equivalent to the exchangeable correlation structure. Additionally, the GEE estimation process (minimizing the sum standardized residuals) is the same as the maximum likelihood estimation process (minimizing the sum standardized residuals) when the outcome is quantitative and the assumed distribution is Gaussian with the identity link function.

mod2 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  geeM::geem(Memory ~ Years*I(Years > 3)*INTGRP, data = ., id = AID, corstr = 'exchangeable')

bind_cols(ME = round(fixef(mod1),2),GEE = round(mod2$beta,2))

# A tibble: 16 × 2
      ME   GEE
   <dbl> <dbl>
 1 77.4  77.4 
 2  0.89  0.9 
 3  0.93  0.94
 4  2.9   2.9 
 5  1.21  1.21
 6  0.98  0.98
 7 -2.38 -2.38
 8  0.53  0.53
 9 -0.05 -0.05
10  0.18  0.17
11  6.13  6.12
12  3.52  3.52
13  3.62  3.61
14 -1.29 -1.29
15 -0.48 -0.49
16 -0.62 -0.62

Solution

The standard errors are similar magnitude. For some coefficients, the random intercept model has larger SE’s and for others, the GEE has larger robust SE’s.
The random intercept SE’s assume the full model is correct. The robust SE’s from GEE do not require the correlation model to be correct. If they are similar in magnitude, then perhaps the exchangeable correlation structure (random intercept model) is close to modeling the correlations correctly.

vcov(mod1) %>% diag() %>% sqrt()

                                              (Intercept) 
                                                0.6439604 
                                                    Years 
                                                0.1418052 
                                         I(Years > 3)TRUE 
                                                0.9920359 
                                    INTGRPMemory Training 
                                                0.9085660 
                                 INTGRPReasoning Training 
                                                0.9091733 
                       INTGRPSpeed of Processing Training 
                                                0.9086495 
                                   Years:I(Years > 3)TRUE 
                                                0.1956765 
                              Years:INTGRPMemory Training 
                                                0.1980667 
                           Years:INTGRPReasoning Training 
                                                0.1982257 
                 Years:INTGRPSpeed of Processing Training 
                                                0.1978924 
                   I(Years > 3)TRUE:INTGRPMemory Training 
                                                1.3887542 
                I(Years > 3)TRUE:INTGRPReasoning Training 
                                                1.3660172 
      I(Years > 3)TRUE:INTGRPSpeed of Processing Training 
                                                1.3707891 
             Years:I(Years > 3)TRUE:INTGRPMemory Training 
                                                0.2734031 
          Years:I(Years > 3)TRUE:INTGRPReasoning Training 
                                                0.2707967 
Years:I(Years > 3)TRUE:INTGRPSpeed of Processing Training 
                                                0.2714597

mod2$var %>% diag() %>% sqrt()

                                              (Intercept) 
                                                0.6237813 
                                                    Years 
                                                0.1404518 
                                         I(Years > 3)TRUE 
                                                1.1943610 
                                    INTGRPMemory Training 
                                                0.8787372 
                                 INTGRPReasoning Training 
                                                0.8618661 
                       INTGRPSpeed of Processing Training 
                                                0.8631033 
                                   Years:I(Years > 3)TRUE 
                                                0.2217443 
                              Years:INTGRPMemory Training 
                                                0.1933720 
                           Years:INTGRPReasoning Training 
                                                0.1973003 
                 Years:INTGRPSpeed of Processing Training 
                                                0.1917151 
                   I(Years > 3)TRUE:INTGRPMemory Training 
                                                1.6838686 
                I(Years > 3)TRUE:INTGRPReasoning Training 
                                                1.5959530 
      I(Years > 3)TRUE:INTGRPSpeed of Processing Training 
                                                1.5659314 
             Years:I(Years > 3)TRUE:INTGRPMemory Training 
                                                0.3123003 
          Years:I(Years > 3)TRUE:INTGRPReasoning Training 
                                                0.3047666 
Years:I(Years > 3)TRUE:INTGRPSpeed of Processing Training 
                                                0.2994713

bind_cols(ME = round(vcov(mod1) %>% diag() %>% sqrt(),2),GEE = round(mod2$var %>% diag() %>% sqrt(),2))

# A tibble: 16 × 2
      ME   GEE
   <dbl> <dbl>
 1  0.64  0.62
 2  0.14  0.14
 3  0.99  1.19
 4  0.91  0.88
 5  0.91  0.86
 6  0.91  0.86
 7  0.2   0.22
 8  0.2   0.19
 9  0.2   0.2 
10  0.2   0.19
11  1.39  1.68
12  1.37  1.6 
13  1.37  1.57
14  0.27  0.31
15  0.27  0.3 
16  0.27  0.3

Solution

We see the variation in predicted Memory scores from the random effects models. We also note the differing lengths of observations per person as we can see some individuals with only two predicted outcomes. Some with only three. Others with four predictions and a smaller number with a 5th at 10 years of follow up.
From the predictions based on fixed effects only, we also note that based on our assumptions of how the group means change over time, we have one common estimated slope from 0 to 3 years of follow up for all groups besides the Memory Training group (which has a slightly higher rate of increase). We estimate a drop in mean Memory between 3 and 5 years (modeled by the change in intercept, the coefficient for Years>3). For years 5 and 10 of follow up, there is a different rate of change than earlier in follow up. In fact, the slope appears to be similar for Control and Memory Training but a faster decline for the other training groups.

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod1)) %>%
  ggplot(aes(x = Years, y = Pred, color = INTGRP, group = AID)) + 
  geom_line(alpha = .2) + 
  labs(color = 'Treatment Group') + theme_classic()

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod1, re.form = NA)) %>%
  distinct(Years,INTGRP,Pred) %>%
  ggplot(aes(x = Years, y = Pred, color = INTGRP)) + 
  geom_line() + 
  labs(color = 'Treatment Group') + theme_classic()

Solution

# Manual Code to get t-values: fixef(mod1)/sqrt(diag(vcov(mod1)))
summary(mod1)

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: Memory ~ Years * I(Years > 3) * INTGRP + (1 | AID)
   Data: .

      AIC       BIC    logLik -2*log(L)  df.resid 
  99505.8   99640.6  -49734.9   99469.8     13236 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-5.6494 -0.5424  0.0318  0.5753  6.6810 

Random effects:
 Groups   Name        Variance Std.Dev.
 AID      (Intercept) 255.9    15.997  
 Residual              56.6     7.524  
Number of obs: 13254, groups:  AID, 2785

Fixed effects:
                                                          Estimate Std. Error
(Intercept)                                               77.42116    0.64396
Years                                                      0.88896    0.14181
I(Years > 3)TRUE                                           0.92667    0.99204
INTGRPMemory Training                                      2.89782    0.90857
INTGRPReasoning Training                                   1.21432    0.90917
INTGRPSpeed of Processing Training                         0.98397    0.90865
Years:I(Years > 3)TRUE                                    -2.37598    0.19568
Years:INTGRPMemory Training                                0.52983    0.19807
Years:INTGRPReasoning Training                            -0.05223    0.19823
Years:INTGRPSpeed of Processing Training                   0.17503    0.19789
I(Years > 3)TRUE:INTGRPMemory Training                     6.12709    1.38875
I(Years > 3)TRUE:INTGRPReasoning Training                  3.52266    1.36602
I(Years > 3)TRUE:INTGRPSpeed of Processing Training        3.61827    1.37079
Years:I(Years > 3)TRUE:INTGRPMemory Training              -1.28948    0.27340
Years:I(Years > 3)TRUE:INTGRPReasoning Training           -0.48388    0.27080
Years:I(Years > 3)TRUE:INTGRPSpeed of Processing Training -0.61670    0.27146
                                                          t value
(Intercept)                                               120.227
Years                                                       6.269
I(Years > 3)TRUE                                            0.934
INTGRPMemory Training                                       3.189
INTGRPReasoning Training                                    1.336
INTGRPSpeed of Processing Training                          1.083
Years:I(Years > 3)TRUE                                    -12.142
Years:INTGRPMemory Training                                 2.675
Years:INTGRPReasoning Training                             -0.263
Years:INTGRPSpeed of Processing Training                    0.884
I(Years > 3)TRUE:INTGRPMemory Training                      4.412
I(Years > 3)TRUE:INTGRPReasoning Training                   2.579
I(Years > 3)TRUE:INTGRPSpeed of Processing Training         2.640
Years:I(Years > 3)TRUE:INTGRPMemory Training               -4.716
Years:I(Years > 3)TRUE:INTGRPReasoning Training            -1.787
Years:I(Years > 3)TRUE:INTGRPSpeed of Processing Training  -2.272

INTGRPMemory Training (estimate = 2.89782; t = 3.189): At baseline (Years = 0), we estimate that there is a discernible difference between the mean Memory score of those in Memory Training and those in the Control group.
Years:INTGRPMemory Training (estimate = 0.52983; t = 2.675): In the first years after training (Years < 3), we estimate that there is discernible difference in the rate of change for the mean Memory score of those in Memory Training and those in the Control group. We note a more positive increase in mean Memory for those in the Memory Training as compare to the Control group.
I(Years > 3)TRUE:INTGRPMemory Training (estimate = 6.12709 ; t = 4.412): We estimate that there is a discernible difference in the “change in intercept” between the Memory training group and the control group, comparing before and after 3 years of follow up.
Years:I(Years > 3)TRUE:INTGRPMemory Training (estimate = -1.28948; t = -4.716: We estimate that there is a discernible difference in the change in slopes or decline in mean Memory score between the Memory training group and the control group in the years

In fact, we might be interested in the decline after 3 years.

Among the Control Group, we estimate:

Mean Memory = 77.42 + 0.88896 Years [for Years < 3]
Mean Memory = (77.42 + 0.92667 ) + (0.88896 - 2.37598) Years = 78.34667 - 1.4870 Years [for Years > 3]

Among the Memory Training Group, we estimate:

Mean Memory = (77.42 + 2.89782 ) + (0.88896 + 0.52983) Years = 80.31782 + 1.41879 Years [for Years < 3]
Mean Memory = (77.42 + 0.92667 + 2.89782 + 6.12709) + (0.88896 + 0.52983 - 2.37598 - 1.28948 ) Years = 87.37158 - 2.24667 Years [for Years > 3]

# Calculating Slopes for Four Equations Above
b = fixef(mod1)
L = matrix(c(0,1,rep(0,14),
             0,1,rep(0,4),1,rep(0,9),
             0,1,rep(0,5),1,rep(0,8),
             0,1,rep(0,4),1,1,rep(0,5),1,rep(0,2)), ncol=16, byrow=TRUE)

L%*%b

           [,1]
[1,]  0.8889644
[2,] -1.4870120
[3,]  1.4187960
[4,] -2.2466610

Solution

mod3 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*I(INTGRP == 'Memory Training') + (1|AID), data = ., REML = FALSE)

mod4 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3) + INTGRP + (1|AID), data = ., REML = FALSE)

anova(mod1, mod4)

Data: .
Models:
mod4: Memory ~ Years * I(Years > 3) + INTGRP + (1 | AID)
mod1: Memory ~ Years * I(Years > 3) * INTGRP + (1 | AID)
     npar   AIC   BIC logLik -2*log(L)  Chisq Df Pr(>Chisq)    
mod4    9 99517 99584 -49750     99499                         
mod1   18 99506 99641 -49735     99470 29.227  9  0.0005933 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(mod1, mod3)

Data: .
Models:
mod3: Memory ~ Years * I(Years > 3) * I(INTGRP == "Memory Training") + (1 | AID)
mod1: Memory ~ Years * I(Years > 3) * INTGRP + (1 | AID)
     npar   AIC   BIC logLik -2*log(L)  Chisq Df Pr(>Chisq)  
mod3   10 99504 99579 -49742     99484                       
mod1   18 99506 99641 -49735     99470 14.384  8    0.07229 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If there were no difference in the relationship with time comparing treatment groups, there is less than a 0.001 chance we’d see this difference between these models or larger than we did. We have strong evidence in favor of needing an interaction term for treatment group. Thus, we have strong evidence that those in the memory training (or any training) has a difference average mean Memory score pattern over time.

In comparing a model with a simpler Treatment group variable with that of the full group variable, there is not strong evidence to suggest we need the full group variable.

Solution

The BIC also suggests that we choose the model without the interaction term with treatment group. Its addition to the model (added complexity is not balanced by an increased goodness of fit to the data).

BIC(mod1)

[1] 99640.65

BIC(mod3)

[1] 99579.09

BIC(mod4)

[1] 99584.44

Solution

mod5 <- activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  lmer(Memory ~ Years*I(Years > 3)*I(INTGRP == 'Memory Training') + (Years|AID), data = ., REML = FALSE)

BIC(mod3)

[1] 99579.09

BIC(mod5)

[1] 99062.38

This suggests that maybe a model with a subject-specific slope for Years might fit better in terms of correlation better than a random intercept only model.

Solution

activeLong %>% drop_na(Memory, Years, INTGRP) %>%
  mutate(Pred = predict(mod3)) %>%
  mutate(Resid = resid(mod3)) %>%
  ggplot(aes(x = Pred, y = Resid, color = INTGRP)) + 
  geom_point(alpha = .2) + 
  geom_hline(yintercept = 0) + 
  geom_smooth(se = FALSE) + 
  theme_classic()

A = ranef(mod3)$AID$`(Intercept)`
qqnorm(A)
qqline(A)

We constrained the mean model to a piecewise linear (change in relationship at year = 3 from follow up). This is a strong simplifying assumption that we did for interpretation purposes. We could leave Years as factor(Years) to avoid constraining the relationship over time.
We ended up not including the interaction terms with INTGRP as they did not improve the model fit that much. With that structure, we assume that the treatment group has the same impact on mean Memory at every time during follow-up.
Lastly we are assuming that the correlation in the data can be modeled with a random intercept model (which is equivalent to the exchangeable correlation structure and constant variance).
The estimated random intercepts look approximately Normal, so that it good.

Wrap-Up

Finishing the Activity

If you didn’t finish the activity, no problem! Be sure to complete the activity outside of class, review the solutions in the online manual, and ask any questions on Slack or in office hours.
Re-organize and review your notes to help deepen your understanding, solidify your learning, and make homework go more smoothly!

After Class

Before the next class, please do the following:

Take a look at the Schedule page to see how to prepare for the next class.