6.6 Why we need special methods

6.6.1 Ordinary Least Squares

Imagine we have \(n\) units/subjects/rows and each unit has \(m\) observations over time. Conditional on \(p\) explanatory variables, we typically assume

\[Y_{ij} = \beta_0 + \beta_1x_{ij1}+ \cdots+ \beta_px_{ijp} + \epsilon_{ij}\] If we are using oridinary least squares, we assume \(\epsilon_{ij} \stackrel{iid}{\sim} \mathcal{N}(0,\sigma^2)\).

This can be written using vector and matrix notation,

\[\mathbf{Y}_i = \mathbf{X}_i\boldsymbol{\beta} + \boldsymbol\epsilon_i\text{ for each unit } i\]

and if we stack all of our data together,

\[\mathbf{Y} = \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}\]

where \(\boldsymbol{\epsilon} \sim \mathcal{N}(0,\sigma^2I)\) and \(\mathbf{X}_i\) is the observations for the explanatory variables for subject \(i\) with \(m\) rows and \(p+1\) columns (a column of 1’s plus p columns that correspond to the p variables). So we are assuming that our errors and thus our data are independent have constant variance.

We can allow each unit to have differing number of repeated measures by representing the number of observations as \(m_i\), unique to the individual \(i\).

6.6.1.1 Estimation

To find the ordinary Least Squares estimate \(\widehat{\boldsymbol{\beta}}_{OLS}\), we need to minimize

\[(\mathbf{Y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y} - \mathbf{X}\boldsymbol{\beta}).\]

Taking the derivative with respect to \(\beta_0,\beta_1,...,\beta_p\), we get a set of simulatenous equations express in matrix notation as

\[ -2\mathbf{X}^T\mathbf{Y} +2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\]

As long as \(\mathbf{X}^T\mathbf{X}\) has an inverse, then

\[\widehat{\boldsymbol{\beta}}_{OLS} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}\ \]

This estimator is unbiased in that \(E(\widehat{\boldsymbol{\beta}}_{OLS}) = \boldsymbol{\beta}\).

\[E(\widehat{\boldsymbol{\beta}}_{OLS}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^TE(\mathbf{Y}) \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\ \] \[= \boldsymbol{\beta}\ \]

It also has minimum variance of all unbiased, linear estimators IF the errors are independent and have constant variance.

\[Cov(\widehat{\boldsymbol{\beta}}_{OLS}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T Cov(\mathbf{Y})\{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\}^T \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\sigma^2\mathbf{I})\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \] \[= \sigma^2(\mathbf{X}^T\mathbf{X})^{-1} \]

using \(Cov(\mathbf{C}\mathbf{Y}) = \mathbf{C}Cov(\mathbf{Y})\mathbf{C}^T\), which you can verify.

SUMMARY: So the OLS estimates of our coefficients are fairly good estimates (unbiased), but the estimated standard errors (and thus the t-values and p-values in the output) are wrong unless our data are actually independent. Since we have correlated repeated measures, we don’t have as much “information” about the population as if they were independent observations.

6.6.2 Generalized Least Squares

If you relax the assumption of constant variance, then the covariance matrix of \(\boldsymbol\epsilon_i\) and thus \(\mathbf{Y}_i\) is

\[\boldsymbol{\Sigma}_i = \left(\begin{array}{cccc} \sigma^2_1&0&\cdots&0\\ 0&\sigma^2_2&\cdots &0\\ \vdots&\vdots&\ddots&\vdots\\ 0&\cdots&0&\sigma^2_m\end{array}\right) \]

For longitudinal data, this would allow the variability to change over time.

If you relax both the assumptions of constant variance and independence, then the covariance matrix of \(\boldsymbol\epsilon_i\) and thus \(\mathbf{Y}_i\) is

\[\boldsymbol{\Sigma}_i = \left(\begin{array}{cccc} \sigma^2_1&\sigma_{12}&\cdots&\sigma_{1m}\\ \sigma_{21}&\sigma^2_2&\cdots &\sigma_{2m}\\ \vdots&\vdots&\ddots&\vdots\\ \sigma_{m1}&\cdots&\cdots&\sigma^2_m\end{array}\right) \]

For longitudinal data, this would allow the variability to change over time and you can account for dependence between repeated measures.

If we assume individual units are independent of each other (which is a fair assumption), the covariance matrix for the entire vector of outcomes (for all units and their observations over time) is written as

\[\boldsymbol\Sigma = \left(\begin{array}{cccc} \boldsymbol\Sigma_1&\mathbf{0}&\cdots&\mathbf{0}\\ \mathbf{0}&\boldsymbol\Sigma_2&\cdots &\mathbf{0}\\ \vdots&\vdots&\ddots&\vdots\\ \mathbf{0}&\cdots&\cdots&\boldsymbol\Sigma_n\end{array}\right)\]

6.6.2.1 Estimation

To find the Generalized Least Squares Estimator \(\widehat{\boldsymbol{\beta}}\) for either more general model, we need to minimize

\[(\mathbf{Y} - \mathbf{X}\boldsymbol{\beta})^T\boldsymbol{\Sigma}^{-1}(\mathbf{Y} - \mathbf{X}\boldsymbol{\beta})\]

If \(\boldsymbol{\Sigma}\) is known, taking the derivative with respect to \(\beta_0,\beta_1,...,\beta_p\), we get a set of simulatenous equations express in matrix notation as

\[ -2\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y} +2\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\]

As long as \(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X}\) has an inverse, then the generalized least squares estimator is \[\widehat{\boldsymbol{\beta}}_{GLS} = (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y}\ \]

An alternative approach is to consider transforming our \(\mathbf{Y}\) and \(\mathbf{X}\) by pre-multiplying by the inverse of the Choleskty Decomposition of \(\boldsymbol{\Sigma}\), and then find the OLS estimator.

\[\mathbf{L}^{-1}\mathbf{Y} = \mathbf{L}^{-1}\mathbf{X}\boldsymbol{\beta} + \mathbf{L}^{-1}\boldsymbol\epsilon\] \[\implies \mathbf{Y}^* = \mathbf{X}^*\boldsymbol{\beta} + \boldsymbol\epsilon^*\] \[\implies \widehat{\boldsymbol{\beta}}_{GLS} = (\mathbf{X}^{*T}\mathbf{X}^*)^{-1}\mathbf{X}^{*T}\mathbf{Y}^*\] \[= ((\mathbf{L}^{-1}\mathbf{X})^T\mathbf{L}^{-1}\mathbf{X})^{-1}(\mathbf{L}^{-1}\mathbf{X})^T\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}^{-1})^T\mathbf{L}^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}^{-1})^T\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}^{T})^{-1}\mathbf{L}^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}^{T})^{-1}\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}\mathbf{L}^T)^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}\mathbf{L}^T)^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y} \]

6.6.2.2 Estimation Issues

Main Issue: \(\boldsymbol{\Sigma}\) is not known!

The solution to this problem is to iterate. Estimate \(\boldsymbol{\beta}\) and use that to estimate \(\boldsymbol{\Sigma}\). Repeat.

Other Issues:

If the longitudinal data is balanced, there are \(m(m+1)/2\) parameters in \(\boldsymbol\Sigma_i\) to estimate. That is a lot of parameters (!)
If the longitudinal data is unbalanced, there is no common \(\boldsymbol{\Sigma}_i\) for every unit

The solution to these problems involve simplifying assumptions such as

assuming the errors are independent (not a great solution)
assuming the errors have constant variance (maybe not too bad)
assuming there is some restriction/struture on the covariance function (best option)

We approximate the covariance matrix in order to get better estimates of \(\boldsymbol\beta\).

The two main methods for longitudinal data that we’ll learn are similar to generalized least squares in flavor.

However, we need models that are not only applicable to continous outcomes but also binary or count outcomes. Let’s first learn about the the standard cross-sectional approaches to allow for binary and count outcomes (generalized linear models). Then, we’ll be able to talk about marginal models and mixed effects models.

SUMMARY: If we know the correlation structure of our data, we could use generalized least squares to get better estimates of our coefficients and the estimated standard errors (and thus the t-values and p-values in the output) would be more accurate.