6.8 Mixed Effects

Let’s return to a continuous outcome setting. Consider the mean response for a specific individual (\(i\)th individual/unit) at a specific observation time (\(j\)th time).

\[ \mu_{ij} = E(Y_{ij}|\text{ Explanatory Variables, Time}) \]

  • If the mean is constant across observation times (horizontal trend line) and other variables (no X’s), and the same across all individuals, \[\mu_{ij} = \mu \]

  • If the mean changes linearly across observation times and no other variables impact the mean, and it is the same across all individuals, \[\mu_{ij} = \beta_0 + \beta_1 t_{ij}\]

  • If the mean increases linearly with an time-varying explanatory variable and it is the same across all individuals, \[\mu_{ij} = \beta_0 + \beta_1 x_{ij}\]

6.8.1 Individual Intercepts

In any situations listed above, individuals may have their own intercept, their own starting level. We can notate that by adding \(b_i\) as an individual deviation from the mean intercept,

\[\mu_{ij} = \mu + b_i\]

\[\mu_{ij} = \beta_0 + b_i + \beta_1 t_{ij}\]

\[\mu_{ij} = \beta_0 + b_i + \beta_1 x_{ij}\]

We could estimate the individual intercepts to find \(b_i\), the difference between the overall mean intercept and an individual’s intercept. If \(b_i = -2\), the individual \(i\) starts -2 units lower in their outcome responses than the average.

6.8.1.1 First Approach: Separate Models

Let’s use the dental data to figure out the overall mean intercept and estimate each deviation from that intercept. The first approach is to fit a separate linear model for each person and look at the intercepts.

dental <- read.table("./data/dental.dat", col.names=c("obsno", "id", "age", "distance", "gender"))
int <- sapply(unique(dental$id),function(i) lm(distance~age, data = dental[dental$id == i,])$coef[1])
mean(int) #mean intercept
int-mean(int) #estimated b_i

However, this stratification approach also allows everyone to have their own slope for age, which we may not want to do. If there is some common growth across persons, we’d like to borrow information across all persons and fit only one model.

6.8.1.2 Second Approach: Fixed Effects Models

To estimate a model with one fixed slope for everyone that allows for individual intercepts, we fit a model often termed a fixed effects model. In this model, we include an indicator variable for every individual id to allow for individual intercepts. This can be quite difficult if we have a large sample size because we are estimating one parameter per person or unit.

lm(distance ~ age + factor(id), data = dental)

6.8.1.3 Third Approach: Random Effects Models

An alternative way for individuals to have their own intercept is to assume a probability distribution for the $ b_i$’s such as \(N(0,\sigma_b^2)\) and estimate the variability,\(\sigma^2_b\). This is a random intercept model,

\[Y_{ij} = \beta_0 + b_i + \beta_1x_{ij} + \epsilon_{ij},\quad \quad b_i\stackrel{iid}{\sim} N(0,\sigma^2_b), \epsilon_{ij}\stackrel{iid}{\sim} N(0,\sigma^2_e),\] where \(\epsilon_{ij}\) and \(b_i\) are independent for any \(i\) and \(j\).

library(lme4)
summary(lmer(distance ~ age + (1|id), data = dental))

In the output above, you’ll see information about the standardized or scaled residuals.

Below it, you’ll see information on the random effects. In our case, we have a random intercept for each individual that we allowed by adding + (1 | id) to the formula. The 1 indicates intercept, ’ |indicates conditional on, andidrefers to the variable calledid`. We assumed that \(b_i \sim N(0,\sigma^2_b)\) and it has estimated \(\hat{\sigma}^2_b = 4.472\), \(\hat{\sigma}_b = 2.115\). Additionally, we have assumed that our errors are independent with constant variance, \(\hat{\sigma}^2_e = 2.049\), \(\hat{\sigma}_e = 1.432\).

Lastly, we have our fixed effects, the parameters we don’t assume are random, \(\beta_0\) and \(\beta_1\). Those estimates are \(\hat{\beta}_0 = 16.76\) and \(\hat{\beta}_1 = 0.66\).

Now with this random effects model, the addition of a random intercept has induced correlation between observation within an individual because they now share an intercept.

In fact, assuming the model is true,

\[Cov(Y_{ij},Y_{il}) = Cov(\beta_0+b_i +\beta_1x_{ij}+\epsilon_{ij},\beta_0+b_i +\beta_1x_{il}+\epsilon_{il}) \] \[= Cov(b_i +\epsilon_{ij},b_i + \epsilon_{il}) \] \[= Cov(b_i, b_i) + Cov(b_i,\epsilon_{ij}) + Cov(b_i, \epsilon_{il}) + Cov(\epsilon_{ij},\epsilon_{il}) \] \[= \sigma^2_b + 0+0+0 \]

\[Cor(Y_{ij},Y_{il}) = \frac{Cov(Y_{ij},Y_{il}) }{\sqrt{Var(Y_{ij})Var(Y_{il})}}\] \[ = \frac{\sigma^2_b }{\sqrt{Var(\beta_0+b_i +\beta_1X_{ij}+\epsilon_{ij})Var(\beta_0+b_i +\beta_1X_{il}+\epsilon_{il})}}\] \[ = \frac{\sigma^2_b }{\sqrt{Var(b_i +\epsilon_{ij})Var(b_i +\epsilon_{il})}}\] \[ = \frac{\sigma^2_b }{\sqrt{(\sigma^2_b + \sigma^2_e)^2}}\]

\[ = \frac{\sigma^2_b }{(\sigma^2_b + \sigma^2_e)}\]

for \(j\not=l\) and this does not depend on \(j\) or \(l\) or \(j-l\). The correlation is constant for any time lags (exchangeable/compound symmetric).

6.8.2 Individual Slopes

Since we have longitudinal data, we can observe each individual’s trajectory over time. Everyone has their own unique growth rate. If that growth rate can be explained with an explanatory variable, we could use an interaction term to allow for that unique growth. Otherwise, we could allow each individual to estimate their own slope.

6.8.2.1 Second Approach: Fixed Effects Models

One could use a fixed effects model and use an interaction term with the id variable to estimate each individual slope, but that is not sustainable for larger data sets.

lm(distance ~ age*factor(id), data = dental)

6.8.2.2 Third Approach: Random Effects Models

A more efficient way to allow individuals to have their own slopes is to assume a probability distribution for the slopes (and typically intercepts) by assuming the vector of individual intercept and slope is bivariate Normal with covariance matrix \(G\), \((b_{i0},b_{i1}) \sim N(0,G)\). This is a random slope and intercept model,

\[Y_{ij} = (\beta_0 + b_{i0}) + (\beta_1 + b_{i1})x_{ij} + \epsilon_{ij},\quad \quad (b_{i0},b_{i1}) \sim N(0,G), \epsilon_{ij}\stackrel{iid}{\sim} N(0,\sigma^2_e),\] where \(\epsilon_{ij}\) and \((b_{i0}, b_{i1})\) are independent of each other for any \(i\) and \(j\).

summary(lmer(distance ~ age + (age|id), data = dental))

In the output above, you’ll see information about the standardized or scaled residuals.

Below it, you’ll see information on the random effects. In our case, we have a random intercept and a random slope for age for each individual that we allowed by adding + (age | id) to the formula. We might read (age | id) as we want to estimate slope | indicates conditional on, and id refers to the variable called id. We assumed that \((b_{i0}, b_{i1}) \sim N(0,G)\). The estimated covariance matrix \(G\) is

\[\hat{G} = \left(\begin{array}{cc} 5.41 & -0.61*2.32*0.22\\ -0.61*2.32*0.22& 0.05 \end{array}\right)\]

Additionally, we have assumed that our errors are independent with constant variance, \(\hat{\sigma}^2_e = 1.72\), \(\hat{\sigma}_e = 1.31\).

Lastly, we have our fixed effects, the parameters we don’t assume are random, \(\beta_0\) and \(\beta_1\). Those estimates are \(\hat{\beta}_0 = 16.76\) and \(\hat{\beta}_1 = 0.66\). You’ll notice that these estimates did not change much at all by adding a random intercept. This will more often happen when modeling a continuous outcome (not when modeling a binary or count outcome).

6.8.3 Multi-level or Hierarchical Model

We can write the random intercept and slope model by creating layers of models. We can write the model for the mean as the level 1 model

\[\mu_{ij} = \beta_{i0} + \beta_{i1} X_{ij}\quad\text{ Level 1}\]

where the individual intercepts and slopes can be written as a level 2 model,

\[\beta_{i0} = \beta_{0} + b_{i0}\quad\text{ Level 2} \] and/or \[\beta_{i1} = \beta_{1} + b_{i1} \quad\text{ Level 2}\]

where \(b_{i0}\sim N(0,\tau^2_0)\), \(b_{i1}\sim N(0,\tau^2_1)\), \(Cov(\delta_{i0},\delta_{i1})=\tau_{01}\).

This is often called a multi-level or hierarchical model in that the model is written with more than one level of models.

Combine the levels by plugging in the second level into the first level.

\[Y_{ij} = \beta_{0} + b_{i0} +(\beta_{1} + b_{i1} ) x_{ij} + \epsilon_{ij}\]

\[ = (\beta_{0} + \beta_{1}x_{ij}) + (b_{i1} x_{ij} +b_{i0})+ \epsilon_{ij}\]

\[ = \text{Fixed effects} + \text{Random effects} + \text{Error}\]

This composite model is often called a mixed effects model because there are fixed-parameter effects and random parameter effects.

6.8.4 Mixed Effects Model

In general, we can write a mixed-effects model for our outcome vector,

\[\mathbf{Y}_{i} = \mathbf{X}_{i}\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i\]

\[ = \text{Fixed effects} + \text{Random effects} + \text{Error}\]

where \(\mathbf{X}_i\) is the design matrix for the fixed effects, \(\mathbf{Z}_i\) is the design matrix for the random effects (a subset of columns of \(\mathbf{X}_i\)), \(\boldsymbol\epsilon_i\sim N(0,\boldsymbol\Sigma)\), and \(\mathbf{b}_i\sim N(0,\mathbf{G})\).

\[E(\mathbf{Y}_{i}) = E(\mathbf{X}_{i}\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i)\] \[= \mathbf{X}_{i}\boldsymbol\beta\]

\[Cov(\mathbf{Y}_{i}) = Cov(\mathbf{X}_{i}\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i)\] \[= Cov(\mathbf{Z}_i\mathbf{b}_i + \boldsymbol\epsilon_i)\]

Assuming \(\mathbf{b}_i\) and \(\boldsymbol\epsilon_i\) are independent, then

\[\mathbf{V}_i = Cov(\mathbf{Y}_{i}) = Cov(\mathbf{Z}_i\mathbf{b}_i) + Cov(\boldsymbol\epsilon_i)\] \[= \mathbf{Z}_iCov(\mathbf{b}_i)\mathbf{Z}_i^T + \boldsymbol\Sigma\] \[= \mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma\]

Since \(\mathbf{b}_i\) and \(\boldsymbol\epsilon_i\) are almost always assumed Normal, then

\[\mathbf{Y}_{i}\sim \mathcal{N}(\mathbf{X}_{i}\boldsymbol\beta,\mathbf{Z}_i\mathbf{G}\mathbf{Z}_i^T + \boldsymbol\Sigma) \]

6.8.5 History

The origins of mixed effects models go back to R.A. Fisher work under the analysis of variance (ANOVA) paradigm in the 1910-1930’s.

  • Earliest mixed effects model: random intercept model
  • Work continued in the ANOVA framework until 1970’s
  • Mixed effects models in a linear model framework are based on the ANOVA paradigm
  • Seminal paper by Harville in 1977
  • Highly cited mixed models paper by Laird and Ware in 1982

6.8.6 Interpretation

We typically discuss how a 1 unit increase in a variable in X impacts the mean, keeping all other variables constant.

With random effects, this means interpreting the change within a subject.

The expected response, given subject’s random effects, is

\[E(\mathbf{Y}_{i}|\mathbf{b}_i) = \mathbf{X}_i\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i \]

For the sake of simplicity, let’s consider the random intercept model, \[E(Y_{ij}|b_i) = \beta_0 +b_i+\beta_1X_{ij} \]

The expected response for value of \(X_{ij} = x\) is

\[E(Y_{ij}|b_i, X_{ij}=x) = \beta_0 +b_i+\beta_1x \]

The expected response for value of \(X_{ij} = x+1\) is

\[E(Y_{ij}|b_i, X_{ij}=x+1) = \beta_0 +b_i+\beta_1(x+1) \]

The difference is in the expected response for individual \(i\) with random effect \(b_i\) is,

\[E(Y_{ij}|b_i, X_i=x+1) -E(Y_{ij}|b_i, X_i=x) \] \[= \beta_0 +b_i+\beta_1(x+1) - (\beta_0 +b_i+\beta_1x) = \beta_1 \]

This means we can give a subject-specific interpretation!

We can also consider the overall or marginal mean, \(E(Y_{ij})\).

First, a definition of conditional expectation in the discrete setting, \[E(Y|B = b) = \sum_b b \cdot P(Y = y|B = b) \]

Then we can show that

\[E(Y) = E(E(Y|B=b)) \]

The iterated expectation also holds for continuous random variables.

We can also consider the overall or marginal mean, \(E(Y_{ij})\).

\[E(Y_{ij}) = E(E(Y_{ij}|b_i)) = \beta_0 +\beta_1X_{ij} \]

The expected response for value of \(X_{ij} = x\) is

\[E(Y_{ij}|X_{ij}=x) = \beta_0 +\beta_1x \]

The expected response for value of \(X_{ij} = x+1\) is

\[E(Y_{ij}|X_{ij}=x+1) = \beta_0 +\beta_1(x+1) \]

The difference is \[E(Y_{ij}| X_{ij}=x+1) -E(Y_{ij}|, X_{ij}=x) \] \[= \beta_0 +\beta_1(x+1) - (\beta_0 +\beta_1x) = \beta_1 \]

Therefore, in the context of a linear mixed effects model, \(\beta_1\) has the interpretation of the subject-specific change as well as the population mean change.

A 1-unit increase in \(X\) leads to a \(\beta_1\) increase in the subject’s mean response, keeping all other variables constant.

A 1-unit increase in \(X\) leads to a \(\beta_1\) increase in the overall mean response, keeping all other variables constant.

6.8.7 Estimation

6.8.7.1 Maximum Likelihood Estimation

Mixed Effects Models require specifying the entire distribution. We assume probability models for the random effects and the errors, and thus we could use maximum likelihood estimation to find estimates for our slopes.

If we specify the REML argument in lmer as REML = FALSE, then the lmer function will maximize the log-likelihood function (based on Normal assumptions for the errors and random effects),

\[l = -\frac{m*n}{2}\log(2\pi) - \frac{1}{2}\sum^n_{i=1} \log|\mathbf{V}_i|\] \[-\frac{1}{2}\{\sum^n_{i=1}(\mathbf{y}_i - \mathbf{X}_i\boldsymbol\beta)^T\mathbf{V}_i^{-1}(\mathbf{y}_i - \mathbf{X}_i\boldsymbol\beta) \} \]

where \(\mathbf{V}_i\) is the composite covariance matrix.

  • Estimates of \(\ beta\)’s are unbiased if the model is correct (random effects and mean model).
  • All parameter estimates are consistent if the model is correct.
  • All parameter estimates are asymptotically unbiased; as \(n\rightarrow\infty\), they become unbiased (if model is correct).
  • Estimates are asymptotically Normal; as \(n\rightarrow\infty\), the sampling distribution becomes more Normal.
  • Estimates of variance of random effects are biased with finite \(n\).

6.8.7.2 Restricted Maximum Likelihood Estimation

Patterson and Thompson (1971) and Harville (1974) provided technical solutions to the issues of maximum likelihood. They suggested maximizing the likelihood of observing the sample residuals rather than the sample data. This is referred to as restricted maximum likelihood or REML and is implemented in lmer by default when REML = TRUE.

REML Algorithm:

  • Estimate fixed effects using OLS.
  • Write down the likelihood of residuals in terms of residuals and variance parameters.
  • Then maximize likelihood with respect to variance parameters.

Another REML Algorithm:

  • Split the likelihood into one part about the mean and one part about the variance.
  • First, maximize the variance part to get estimates of the variance parameters
  • Then maximize the part about the mean using the estimated variance.

After some complicated mathematics, you ultimately end up maximizing the following with respect the parameters of \(\mathbf{V}_i\)

\[l = -\frac{m*n}{2}\log(2\pi) - \frac{1}{2}\sum^n_{i=1} \log|\mathbf{V}_i|\] \[-\frac{1}{2}\left\{\sum^n_{i=1}(\mathbf{y}_i - \mathbf{X}_i\hat{\boldsymbol\beta})^T\mathbf{V}_i^{-1}(\mathbf{y}_i - \mathbf{X}_i\hat{\boldsymbol\beta})\right\}\] \[- \frac{1}{2}\log |\sum^n_{i=1}\mathbf{X}_i^T\mathbf{V}_i^{-1}\mathbf{X}_i| \]

and \(\hat{\boldsymbol\beta}= (\sum^n_{i=1}\mathbf{X}_i^T\mathbf{V}_i\mathbf{X}_i)^{-1}(\sum^n_{i=1}\mathbf{X}_i^T\mathbf{V}_i\mathbf{Y}_i)\) are the GLS estimates.

Advantages of REML

  • Less bias in variance estimators.
  • \(\hat{\boldsymbol\beta}\) is still unbiased as long as the model is correct.

Disadvantages of REML

  • You can only compare models that differ in their variance components with REML.
  • You cannot compare models with differing fixed effects because you didn’t maximize the full information (use ML for this).

6.8.8 Model Selection

6.8.8.1 Hypothesis Testing for Fixed Effects (one at a time)

To decide which fixed effects should be in the model, we can test whether \(\beta_k=0\).

The estimated covariance matrix for \(\hat{\boldsymbol\beta}\) is \[\widehat{Cov}(\hat{\boldsymbol\beta}) = \left\{ \sum^n_{i=1} \mathbf{X}_i^T\hat{\mathbf{V}}_i\mathbf{X}_i \right\}^{-1}\]

The standard error for \(\hat{\beta}_k\) is the square rooted values of the diagonal of this matrix corresponding to \(\hat{\beta}_k\). This is what is reported in the summary output.

mod.randomslope <- lmer(distance~age + (age|id), data = dental)
lme4::fixef(mod.randomslope)
vcov(mod.randomslope)
summary(mod.randomslope)

To test the hypothesis \(H_0: \beta_k=0\) vs. \(H_A: \beta_k\not=0\), we can calculate a z-statistic,

\[z = \frac{\hat{\beta}_k}{SE(\hat{\beta}_k)} \]

There is debate about the appropriate distribution of this statistic, which is why p-values are not reported in the output. However, if you have an estimate that is large relative to the standard error, that indicates that it is significantly different from zero.

To test a more involved hypothesis \(H_0: \mathbf{L}\boldsymbol\beta=0\) (for multiple rows), we can calculate a Wald statistic,

\[W^2 = (\mathbf{L}\hat{\boldsymbol\beta})^T(\mathbf{L}\hat{Cov}(\hat{\boldsymbol\beta})\mathbf{L}^T)^{-1}(\mathbf{L}\hat{\boldsymbol\beta}) \]

Then we assume the sampling distribution is approximately \(\chi^2\) with df = # of rows of \(\mathbf{L}\) to calculate p-values (as long as \(n\) is large). Below is some example R code.

b = fixef(mod1)
W = vcov(mod1)

L = matrix(c(0,0,0,1),nrow=1)

L%*%b
(se = sqrt(diag(L%*%W%*%t(L)))) ##Robust SE's for Lb

##95% Confidence Interval (using Asymptotic Normality)
L%*%b - 1.96*se
L%*%b + 1.96*se

##Hypothesis Testing
w2 <- as.numeric( t(L%*%b) %*% solve(L %*% W %*% t(L))%*% (L%*%b)) ## should be approximately chi-squared
1 - pchisq(w2, df = nrow(L)) #p-value

Let’s consider the data on the guinea pigs. In particular, we fit a model with a random intercept and dose, time, and the interaction between dose and time. If we want to know if the interaction term is necessary, we need to test whether both of their slopes for the interaction term are equal to 0.

pigs <- read.table("./data/diet.dat", col.names=c("id", paste("bw.",c(1,3,4,5,6,7),sep=""), "dose" ))
pigs <- pigs %>%
  gather(Tmp,bw, bw.1:bw.7) %>%
  separate(Tmp,into=c('Var','time'), remove=TRUE) 

pigs$time <- as.numeric(pigs$time)
pigs$dose <- factor(pigs$dose)
levels(pigs$dose) <- c("Zero dose",' Low dose', 'High dose')

mod.pigs <- lmer(bw ~ dose*time + (1|id), data = pigs)
summary(mod.pigs)


b = lme4::fixef(mod.pigs)
W = vcov(mod.pigs)
(L <- matrix(c(rep(0,4),1,0,rep(0,5),1),nrow=2,byrow=TRUE))
w2 <-as.numeric( t(L%*%b) %*% solve(L %*% W %*% t(L))%*% (L%*%b))
1- pchisq(w2, df = nrow(L)) #We don't have enough evidence to reject null (thus, don't need interaction term)

Lastly, another option is a likelihood ratio test. Suppose we have two models with the same random effects and covariance models.

  • Full model: \(M_f\) has \(p\) columns in \(\mathbf{X}_i\) and thus \(p\) fixed effects \(\beta_0,...,\beta_{p-1}\).
  • Nested model: \(M_n\) has \(k\) columns such that \(k < p\) and \(p-k\) fixed effects \(\beta_l = 0\).

If we use maximum likelihood estimation, it makes sense to compare the likelihood of two models.

\(H_0:\) nested model is true, \(H_A:\) full model is true

If we take the ratio of the likelihoods from the nested model and full model and plug in the maximum likelihood estimators, then we have another statistic.

\[D = -2\log\left(\frac{L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}{L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}})}\right) = -2 \log(L_n(\hat{\boldsymbol\beta},\hat{\mathbf{V}})) + 2\log(L_f(\hat{\boldsymbol\beta},\hat{\mathbf{V}}))\]

The sampling distribution of this statistic is approximately chi-squared with degrees of freedom equal to the difference in the number of parameters between the two models.

mod.pigsML <- lmer(bw ~ dose*time + (1|id), data = pigs, REML=FALSE)
mod.pigsML2 <- lmer(bw ~ dose+time + (1|id), data = pigs, REML=FALSE)

anova(mod.pigsML,mod.pigsML2)

(Dstat <- -2*logLik(mod.pigsML2)+2*logLik(mod.pigsML))
1-pchisq(Dstat,df = 2) #df = difference in number of parameters

6.8.8.2 Information Criteria for Choosing Fixed Effects

To use BIC to choose models, you must use maximum likelihood estimation instead of REML.

mod.pigsML <- lmer(bw ~ dose*time + (1|id), data = pigs, REML=FALSE)
summary(mod.pigsML)
BIC(mod.pigsML)


mod.pigsML2 <- lmer(bw ~ dose+time + (1|id), data = pigs, REML=FALSE)
summary(mod.pigsML2)
BIC(mod.pigsML2) #This model has a lower BIC

6.8.8.3 Hypothesis Testing for Random Effects

The main way to compare models with different random effects is to compare two models with the same fixed effects. Then use a likelihood ratio test with ML to compare the two models’ fit. Again, the null hypothesis is that the smaller model (less complex model – fewer parameters) is the true model. If we see a higher log-likelihood with the more complex model, we may get a small p-value suggesting that we reject the null hypothesis in favor of the more complex model.

Below, I compare two models fit to the guinea pig data. They have the same fixed effects, but I allowed one to have a random slope and intercept while the other only has a random slope. By comparing these two models, we find that the one with a random slope and intercept better fits the data. Thus we reject the null hypothesis that the model with the random intercept is true in favor of the model with the random slope and intercept.

mod.pigsML2 <- lmer(bw ~ dose+time + (1|id), data = pigs, REML=FALSE)
mod.pigsML3 <- lmer(bw ~ dose+time + (time|id), data = pigs, REML=FALSE)

anova(mod.pigsML2, mod.pigsML3)

6.8.9 Predicting Random Effects

The best predictor of the random effects is

\[E(\mathbf{b}_i|\mathbf{Y}_i) = \mathbf{G}\mathbf{Z}_i^T\mathbf{V}_i^{-1}(\mathbf{Y}_i - \mathbf{X}_i\hat{\boldsymbol\beta}) \]

This is the best linear unbiased predictor (BLUP).

Therefore,

\[\hat{\mathbf{b}}_i = \hat{\mathbf{G}}\mathbf{Z}_i^T\hat{\mathbf{V}_i}^{-1}(\mathbf{Y}_i - \mathbf{X}_i\hat{\boldsymbol\beta}) \]

which is called the empirical BLUP. We can get this with R’s ranef() function.

lme4::ranef(mod.randomslope)$id

hist(lme4::ranef(mod.randomslope)$id$`(Intercept)`)
hist(lme4::ranef(mod.randomslope)$id$age)

We can see here that the predicted random effects are not necessarily Normally distributed (which was an assumption we made in fitting the model).

6.8.10 Predicting Outcomes

Thus our best prediction of \(\mathbf{Y}_i\) is

\[\hat{\mathbf{Y}}_i = \mathbf{X}_i\hat{\boldsymbol\beta} +\mathbf{Z}_i\hat{\mathbf{b}}_i \]

which can be rewritten as the weighted average of the population marginal mean, \(\mathbf{X}_i\hat{\boldsymbol\beta}\) and the individual response \(\mathbf{Y}_i\),

\[\hat{\mathbf{Y}}_i = (\hat{\boldsymbol\Sigma}\hat{\mathbf{V}}_i^{-1})\mathbf{X}_i\hat{\boldsymbol\beta} +(I - \hat{\boldsymbol\Sigma}\hat{\mathbf{V}}_i^{-1})\mathbf{Y}_i\]

predict(mod.randomslope)

6.8.11 Generalized Linear Mixed Effects Models

We can generalize the linear mixed effects model by using a link function, \(g()\), to connect the linear model (with random effects)

\[g(E(\mathbf{Y}_{i})) = \mathbf{X}_i\boldsymbol\beta + \mathbf{Z}_i\mathbf{b}_i\] where \[ \mathbf{b}_i \sim N(0,G)\]

Similar to GLM’s, we assume an outcome distribution from the exponential family that determined the relationship between the mean and the variance,

\[Var(Y_{ij}|\mathbf{b}_{i}) = \phi v(E(Y_{ij}|\mathbf{b}_i))\]

summary(glmer(resp ~ age + smoke + (1|id), data = ohio, family = binomial))