10 Introduction to Longitudinal Data

Learning Goals

Explain and illustrate the differences between ordinary least squares (OLS) and generalized least squares (GLS) as it relates to longitudinal data.

Slides from today are available here.

Warm Up: Data Examples

With individuals near you,

come up with data examples with which you’ve worked (or are aware of) with that had data collected over time on many individuals or subjects
discuss the research questions that you were (or may be) interested in exploring with that data

Be prepared to share these examples with the class.

Longitudinal Notation

Consider the random variable

\[ Y_{ij} = \text{the }j\text{th outcome measurement taken on subject }i,\text{ where } i= 1,...,n, j =1,...,m_i\]

where \(n\) is the number of units/subjects and \(m_i\) is the number of observations for the \(i\)th unit/subject.

Let \(t_{ij} =\) the time at which the \(j\)th measurement on subject \(i\) was taken.

Then for the \(i\)th subject, we can organize their outcome measurements in a vector,

\[ \mathbf{Y}_i = \left(\begin{array}{c}Y_{i1}\\ Y_{i2}\\ Y_{i3}\\ \vdots\\ Y_{im_i} \end{array}\right)\]

The corresponding observation times for the \(i\)th subject are,

\[ \mathbf{t}_i = \left(\begin{array}{c}t_{i1}\\ t_{i2}\\ t_{i3}\\ \vdots\\ t_{im_i} \end{array}\right)\]

If the observation times are the same for each subject, \(\mathbf{t}_i = \mathbf{t} = (t_1,...,t_m)\) for all \(i=1,...,n\), then the data are balanced (balanced between subjects). Otherwise, the data are unbalanced.

If the time between observation times is the same for all subjects and across time, \(t_{ij+1} - t_{ij} =\tau\) for all \(i=1,...,n\) and \(j=1,...,m_{j}\), then the data are regularly observed. Otherwise, the data are irregularly observed.

We may have \(p\) explanatory variables for subject \(i\) such that

\[ \mathbf{X}_i = \left(\begin{array}{ccccc}1&x_{i11}&x_{i12}&\cdots&x_{i1p}\\ 1&x_{i21}&x_{i22}&\cdots&x_{i2p}\\ 1&x_{i31}&x_{i32}&\cdots&x_{i3p}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_{im_i1}&x_{im_i2}&\cdots&x_{im_ip} \end{array}\right) = \left(\begin{array}{c}\mathbf{x}^T_{i1}\\ \mathbf{x}^T_{i2}\\ \mathbf{x}^T_{i3}\\ \vdots\\ \mathbf{x}^T_{im_i} \end{array}\right)\]

Linear Models

If we have a quantitative outcome \(Y\), we could assume a linear relationship between the explanatory characteristics and the outcome.

\[Y_{ij} = \beta_0 + \beta_1x_{ij1} + \cdots + \beta_p x_{ijp} + \epsilon_{ij} = \mathbf{x}_{ij}^T\boldsymbol\beta + \epsilon_{ij}\]

where \(\boldsymbol\beta = (\beta_0\; \beta_1\; \beta_2\;\cdots\; \beta_p)^T\).

OLS

If we use ordinary least squares (OLS) to find estimates of \(\boldsymbol\beta\), we assume

\(\mathbf{X}\) are fixed (not random)
\(\epsilon_{ij}\) are independent
\(E(\epsilon_{ij}) = 0\) and \(Var(\epsilon_{ij}) = \sigma^2\) (constant variance)

Note: the 2nd and 3rd can be combined into a statement about the covariance matrix, \(Cov(\boldsymbol \epsilon) = \sigma^2 I\).

Then, the estimator that minimizes the sum of squared errors can be written as

\[\hat{\boldsymbol\beta}_{OLS} = \mathbf{(X^TX)^{-1}X^TY}\] for the linear model above, now written in terms of the full data vectors and matrices,

\[\mathbf{Y} = \mathbf{X}\boldsymbol\beta +\boldsymbol\epsilon\]

where

\[ \mathbf{Y} = \left(\begin{array}{c}\mathbf{Y}_{1}\\ \mathbf{Y}_{2}\\ \mathbf{Y}_{3}\\ \vdots\\ \mathbf{Y}_{n} \end{array}\right)\]

and

\[ \mathbf{X} = \left(\begin{array}{c}\mathbf{X}_{1}\\ \mathbf{X}_{2}\\ \mathbf{X}_{3}\\ \vdots\\ \mathbf{X}_{n} \end{array}\right)\]

Sketch out on paper what \(\mathbf{Y}\) looks like in terms of \(Y_{ij}\).

Click for Answer

\[ \mathbf{Y} = \left(\begin{array}{c}Y_{11}\\ Y_{12}\\ \vdots\\ Y_{1m_1}\\ Y_{21}\\ Y_{22}\\ \vdots\\ Y_{2m_1}\\ \vdots\\ Y_{n1}\\ Y_{n2}\\ \vdots\\ Y_{nm_n} \end{array}\right)\]

Sketch out on paper what \(\mathbf{X}\) looks like in terms of \(x_{ijk}\).

Click for Answer

\[ \mathbf{X} = \left(\begin{array}{ccccc}1&x_{111}&x_{112}&\cdots&x_{11p}\\ 1&x_{121}&x_{122}&\cdots&x_{12p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{1m_11}&x_{1m_12}&\cdots&x_{1m_1p}\\ 1&x_{211}&x_{212}&\cdots&x_{21p}\\ 1&x_{221}&x_{222}&\cdots&x_{22p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{2m_21}&x_{2m_22}&\cdots&x_{2m_2p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{n11}&x_{n12}&\cdots&x_{n1p}\\ 1&x_{n21}&x_{n22}&\cdots&x_{n2p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{nm_n1}&x_{nm_n2}&\cdots&x_{nm_np}\end{array}\right)\]

Show that \(E(\hat{\boldsymbol\beta}_{OLS}) = \boldsymbol\beta\). Remember the properties of random matrices. Keep track of what assumptions you need for this to be true.

Click for Answer

\[E(\widehat{\boldsymbol{\beta}}_{OLS}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^TE(\mathbf{Y}) \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^TE(\mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon) \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} \] \[= \boldsymbol{\beta}\ \] We only use the \(E(\boldsymbol\epsilon) = 0\).

Show that the covariance matrix \(Cov(\hat{\boldsymbol\beta}_{OLS}) = \sigma^2(\mathbf{X}^T\mathbf{X})^{-1}\). Remember what we proved in Checkpoint 1 about a matrix \(A\) of constants. Keep track of what assumptions you need for this to be true.

Click for Answer

\[Cov(\widehat{\boldsymbol{\beta}}_{OLS}) = Cov((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y})\] \[ =(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T Cov(\mathbf{Y})\{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\}^T \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\sigma^2\mathbf{I})\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \] \[= \sigma^2(\mathbf{X}^T\mathbf{X})^{-1} \]

We use the \(Cov(\mathbf{Y}) = \sigma^2\mathbf{I}\).

SUMMARY: So the OLS estimates of our coefficients are fairly good estimates (unbiased), but the estimated standard errors (and thus the t-values and p-values in the output) are wrong unless our data are actually independent. Since we have correlated repeated measures, we don’t have as much “information” about the population as if they were independent observations.

Group Work: GLS

If we use generalized least squares (GLS) to find estimates of \(\boldsymbol\beta\), we assume \(\mathbf{X}\) are fixed (not random), \(\boldsymbol\epsilon = (\epsilon_{11}\; \cdots \epsilon_{nm_m})^T\) have covariance matrix \(\boldsymbol\Sigma\), then we can transform our potentially correlated data \(\mathbf{Y}\) into independent data using the inverse of the Cholesky Decomposition of \(\boldsymbol\Sigma = \mathbf{L}\mathbf{L}^T\) so that \[Cov(\mathbf{L}^{-1}\mathbf{Y}) = Cov(\mathbf{L}^{-1}\boldsymbol\epsilon)\] \[= \mathbf{L}^{-1}Cov(\boldsymbol\epsilon) (\mathbf{L}^{-1})^T\] \[= \mathbf{L}^{-1}\boldsymbol\Sigma (\mathbf{L}^{-1})^T\] \[= \mathbf{L}^{-1}(\mathbf{L}\mathbf{L}^T) (\mathbf{L}^{-1})^T\] \[= \mathbf{I}\]

Assuming the linear model, the transformed data involves transforming the explanatory variables and noise,

\[\mathbf{L}^{-1}\mathbf{Y} = \mathbf{L}^{-1}\mathbf{X}\boldsymbol\beta +\mathbf{L}^{-1}\boldsymbol\epsilon\] \[\implies \mathbf{Y}^* = \mathbf{X}^*\boldsymbol\beta +\boldsymbol\epsilon^*\]

Show that if we assume \(\boldsymbol\Sigma\) is known and fixed and use OLS on the transformed data (\(\mathbf{Y}^*\), \(\mathbf{X}^*\)), then our coefficient estimates are

\[\hat{\boldsymbol\beta}_{GLS} = \mathbf{(X^T\Sigma^{-1}X)^{-1}X^T\Sigma^{-1}Y}\]

Click for Answer

\[ \widehat{\boldsymbol{\beta}}_{OLS} = (\mathbf{X}^{*T}\mathbf{X}^*)^{-1}\mathbf{X}^{*T}\mathbf{Y}^*\] \[= ((\mathbf{L}^{-1}\mathbf{X})^T\mathbf{L}^{-1}\mathbf{X})^{-1}(\mathbf{L}^{-1}\mathbf{X})^T\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}^{-1})^T\mathbf{L}^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}^{-1})^T\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}^{T})^{-1}\mathbf{L}^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}^{T})^{-1}\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}\mathbf{L}^T)^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}\mathbf{L}^T)^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y} \]

Show that \(E(\hat{\boldsymbol\beta}_{GLS}) = \boldsymbol\beta\). Remember the properties of random matrices. Keep track of what assumptions you need for this to be true.

Click for Answer

\[E(\widehat{\boldsymbol{\beta}}_{GLS}) = E((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y}) \] \[(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}E(\mathbf{Y}) \] \[= ((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}E(\mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon)) \] \[= ((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}(\mathbf{X}\boldsymbol{\beta} + E(\boldsymbol\epsilon)) \] \[= ((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X}\boldsymbol{\beta} \] \[= \boldsymbol{\beta} \] We only use the \(E(\boldsymbol\epsilon) = 0\).

Show that the covariance matrix \(Cov(\hat{\boldsymbol\beta}_{GLS}) =\mathbf{(X^T\Sigma^{-1}X)^{-1}}\). Remember what we proved in Checkpoint 1 about a matrix \(A\) of constants. Keep track of what assumptions you need for this to be true.

Click for Answer

\[Cov(\widehat{\boldsymbol{\beta}}_{GLS}) = Cov((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y})\] \[ =(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} Cov(\mathbf{Y})\{(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\}^T \] \[ =(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} \boldsymbol \Sigma \{(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\}^T \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} \boldsymbol\Sigma \boldsymbol{\Sigma}^{-1}\mathbf{X}(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} \mathbf{X}(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1} \] We use the \(Cov(\mathbf{Y}) = \boldsymbol\Sigma\).