11 GLM + GEE
Learning Goals
- Explain the common model components of a general linear model (GLM)
- Explain the ideas of working correlation models and robust standard error
- Fit GEE models to real data and interpret the output
Group Activity
Download a template RMarkdown file to start from here.
Introduction to ACTIVE study Data
For this section of class, you’ll work on analyzing longitudinal data from the clinical trial, the Advanced Cognitive Training for Independent and Vital Elderly (ACTIVE) trial.
To get access to the data, go to https://bcheggeseth.github.io/452_fall_2023/checkpoint-4.html and click on the Github Classroom link. You will create an individual repository, rather than a team repository, for this checkpoint. Place the template file 11-glm-gee.Rmd
in that folder. Open it up.
Explore the Data
- The wide format data is named
active
and the long format data is namedactiveLong
. Look at these two data sets and describe the difference between them.
ANSWER:
- The wide format data is useful for comparing the treatment groups,
INTGRP
, at baseline. Create a plot to compare the baseline cognitive function,MMSE_1
between the randomized treatment groups,INTGRP
. Describe what you observe and whether they match what you’d expect.
ANSWER:
- The long format data is useful to fit models and look at the relationship of variables over time. Create a plot to compare the overall
Memory
score acrossYears
from the study baseline, grouping lines by the subject identifier,AID
, and coloring them by treatment group,INTGRP
.
ANSWER:
- Now, create a plot to compare the mean
Memory
score acrossYears
from the study baseline by treatment group,INTGRP
. You’ll first need to summarize the data within groups prior to plotting.
ANSWER:
Discuss Models
- Consider modeling
Memory
as a function ofYears
and treatment groupINTGRP
. Discuss with people around you how you’d model that relationship usinglm()
. Make sure you think about the assumptions you are making about the relationship when you write a formula, Y~X, forlm()
.
ANSWER:
- Discuss with people around you the potential issues of using
lm()
for this data. What part of the output will be valid to interpret and which part of the output will not be valid to interpret?
ANSWER:
Fit Models
- Fit the model you discussed in 5 using
lm()
. Comment on the output of interest (based on your discussion above).
ANSWER:
activeLong <- activeLong %>% mutate(INTGRP = relevel(INTGRP, ref='Control'))
lm(??, data = activeLong) %>% summary()
- Fit the model you discussed in 5 using
geeM::geem()
. You’ll need to runinstall.packages('geeM')
. Compare the Std. Error fromlm()
with the Model SE and Robust SE when using a independent working correlation structure.
ANSWER:
library(geeM)
library(tidyr)
activeLong %>% drop_na(Memory,Years,INTGRP) %>%
geem(??, data = ., id = AID, corstr = 'independence') %>% #independent working correlation
summary()
- Fit the model you fit above but not with an exponential decay (ar1) working correlation structure. Compare the Model SE and Robust SE to each other. Compare the difference you notice with the difference in Model v. Robust SE when assuming independence with GEE (in Q8). When the Model SE is close to the Robust SE it indicates that the working correlation structure model is close to the truth.
ANSWER:
activeLong %>% drop_na(Memory,Years,INTGRP) %>%
geem(??, data = ., id = AID, corstr = 'ar1') %>% #ar1 working correlation
summary()
- Using this GEE model with an exponential decay working correlation structure, give some general conclusions about the treatment and its impact on the Memory Score over time by interpreting the coefficients and using Robust SE’s to provide uncertainty estimates.
ANSWER: