11 GLM + GEE

Learning Goals

Explain the common model components of a general linear model (GLM)
Explain the ideas of working correlation models and robust standard error
Fit GEE models to real data and interpret the output

Group Activity

Download a template RMarkdown file to start from here.

Introduction to ACTIVE study Data

For this section of class, you’ll work on analyzing longitudinal data from the clinical trial, the Advanced Cognitive Training for Independent and Vital Elderly (ACTIVE) trial.

To get access to the data, go to https://bcheggeseth.github.io/452_fall_2023/checkpoint-4.html and click on the Github Classroom link. You will create an individual repository, rather than a team repository, for this checkpoint. Place the template file 11-glm-gee.Rmd in that folder. Open it up.

source('Cleaning.R') #Open Cleaning.R for information about variables.
head(activeLong)

Explore the Data

The wide format data is named active and the long format data is named activeLong. Look at these two data sets and describe the difference between them.

ANSWER:

head(active)
head(activeLong)

The wide format data is useful for comparing the treatment groups, INTGRP, at baseline. Create a plot to compare the baseline cognitive function, MMSE_1 between the randomized treatment groups, INTGRP. Describe what you observe and whether they match what you’d expect.

ANSWER:

The long format data is useful to fit models and look at the relationship of variables over time. Create a plot to compare the overall Memory score across Years from the study baseline, grouping lines by the subject identifier, AID, and coloring them by treatment group, INTGRP.

ANSWER:

Now, create a plot to compare the mean Memory score across Years from the study baseline by treatment group, INTGRP. You’ll first need to summarize the data within groups prior to plotting.

ANSWER:

Discuss Models

Consider modeling Memory as a function of Years and treatment group INTGRP. Discuss with people around you how you’d model that relationship using lm(). Make sure you think about the assumptions you are making about the relationship when you write a formula, Y~X, for lm().

ANSWER:

Discuss with people around you the potential issues of using lm() for this data. What part of the output will be valid to interpret and which part of the output will not be valid to interpret?

ANSWER:

Fit Models

Fit the model you discussed in 5 using lm(). Comment on the output of interest (based on your discussion above).

ANSWER:

activeLong <- activeLong %>% mutate(INTGRP = relevel(INTGRP, ref='Control'))

lm(??, data = activeLong) %>% summary()

Fit the model you discussed in 5 using geeM::geem(). You’ll need to run install.packages('geeM'). Compare the Std. Error from lm() with the Model SE and Robust SE when using a independent working correlation structure.

ANSWER:

library(geeM)
library(tidyr)

activeLong %>% drop_na(Memory,Years,INTGRP) %>% 
  geem(??, data = ., id = AID, corstr = 'independence') %>% #independent working correlation
  summary()

Fit the model you fit above but not with an exponential decay (ar1) working correlation structure. Compare the Model SE and Robust SE to each other. Compare the difference you notice with the difference in Model v. Robust SE when assuming independence with GEE (in Q8). When the Model SE is close to the Robust SE it indicates that the working correlation structure model is close to the truth.

ANSWER:

activeLong %>%  drop_na(Memory,Years,INTGRP) %>% 
  geem(??, data = ., id = AID, corstr = 'ar1') %>% #ar1 working correlation
  summary()

Using this GEE model with an exponential decay working correlation structure, give some general conclusions about the treatment and its impact on the Memory Score over time by interpreting the coefficients and using Robust SE’s to provide uncertainty estimates.

ANSWER: