Homework 6

You can download a template file to start from here.

Github Setup

To create a shared repository (for you, your partner, and me), go to https://classroom.github.com/a/bVHiUWQN. One partner creates the team, call it long-Name1-Name2. The second partner joins the team.

The second partner:

Clone the repository to your local machine.
Copy and paste the Cleaning.R and active.rda files into this new folder/repository.
Dowload the template for HW 6 (see top of the page) and put it in this folder.
Commit and Push your changes.

The first partner:

Clone the repository to your local machine. You should see the files that your partner put in there.

For this assignment, I want you to work together with a partner.

Research Question

Based on your previous work with this data set, including your literature search, come up with one research question to address as a team.

ANSWER:

Based on your previous work with this data set, including your literature search, come up with a mean model (x’s and their functional format) that addresses your research question (accounting for potential confounders).

ANSWER (In R formula structure):

Longitudinal Models

Let’s do a quick review:

Marginal Models For models fit with GEE, we need to specify a working correlation structure, but the robust standard errors are valid even if we get that structure wrong.

Mixed Effects Models We can write a linear mixed effects model in terms of our outcome vector,

\[\mathbf{Y}_{i} = \mathbf{X}_{i}\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i\]

\[ = \text{Fixed effects} + \text{Random effects} + \text{Error}\]

where \(\mathbf{X}_i\) is the design matrix for the fixed effects, \(\mathbf{Z}_i\) is the design matrix for the random effects (a subset of columns of \(\mathbf{X}_i\)), \(\boldsymbol\epsilon_i\sim N(0,\boldsymbol\Sigma)\), \(\mathbf{b}_i\sim N(0,\mathbf{G})\), and \(\mathbf{b}_i\) and \(\boldsymbol\epsilon_i\) are independent.

Thus,

\[\mathbf{Y}_{i}\sim \mathcal{N}(\mathbf{X}_{i}\boldsymbol\beta,\mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma) \]

BEFORE YOU BEGIN: Decide whether you and your partner want to use Mixed Effects or GEE models.

Come up with at least 2 candidate models (different correlation structures) that you want to fit.

GEE: consider working correlation structures, describe them in words, and then fit them with geeM::geem().
Mixed Effects: start with random intercept and consider which coef could be random, describe them in words, and then fit them with lme4::lmer().

source('Cleaning.R') # Open Cleaning.R to see the data cleaning steps to create a long format dataset
library(geeM)
library(lme4)

ANSWER:

Model 1:

Model 2:

Write bullet points about what you learn about your outcome, research question, and the modeling process from fitting these two models.

ANSWER:

Inference

Hypothesis Testing for Fixed Coefficients (one at a time)

We can use this inference tool for either GEE or Mixed Effects Models.

We can test whether \(\beta_k=0\) for some \(k\).

In marginal models (GEE), the estimated covariance matrix for \(\hat{\boldsymbol\beta}\) is given by the robust sandwich estimator. The Robust SE for \(\hat{\beta}_k\) is the square rooted values of the diagonal of this matrix corresponding to \(\hat{\beta}_k\). This is what is reported in the summary output as Robust SE.

In mixed effects models, the estimated covariance matrix for \(\hat{\boldsymbol\beta}\) is based on assuming the random effects are correct.

To test the hypothesis \(H_0: \beta_k=0\) vs. \(H_A: \beta_k\not=0\), we can calculate a z-statistic,

\[z = \frac{\hat{\beta}_k}{SE(\hat{\beta}_k)} \]

With GEE, the geem() function provides z statistics (labeled as wald) and associated p-values for testing \(H_0: \beta_k= 0\). These p-values come from a Normal distribution because we know that is the asymptotic (as the sample size, \(n\rightarrow \infty\)) distribution for these estimates.

With mixed effects, the lmer() function provides z statistics (labeled as t-value) for the fixed effects (but not the p-values) for testing \(H_0: \beta_k= 0\). We could compare the t-values to the cut off of +/-2 because we know that the asymptotic (as the sample size, \(n\rightarrow \infty\)) distribution for these estimates is Normal if the model is correct.

#gee_mod1 %>% summary()
#lmer_mod1 %>% summary()

Looking back on the models you fit, what do you conclude about the fixed coefficients? Do you think they are zero or not?

ANSWER:

Diagnostics

For any model with a quantitative outcome and quantitative explanatory variable, we should check to see if the residuals have no pattern.

Create a residual plot for your best model so far, plotting the residuals on the y-axis against a quantitative x-variable. Comment on the residual plot. See some example code below.

ANSWER:

# Residuals = Obs - Predicted

data %>% 
  mutate(Pred = predict(mod)) %>%
  mutate(Resid = resid(mod)) %>%
  #mutate(Resid = y - Pred) %>%
  ggplot(aes(x = Pred, y = Resid)) + 
  geom_point(alpha = .2) + 
  geom_hline(yintercept = 0) + 
  geom_smooth(se = FALSE) + 
  theme_classic()

Considering all of the choices you need to make, make an argument for one final model using a variety of tools.

ANSWER:

Using that final model, what do you conclude about your original research question about brain training among the elderly?

ANSWER: