10 Introduction to Longitudinal Data

Settling In

Sit with a NEW group of 3 people that you do not know well.

Introduce yourself

Name, pronounciation tips, pronouns
Macalester connections (e.g., majors/minors/concentrations, clubs, teams, events regularly attended)
How are you feeling about the semester?

Navigate to HW5 on the course website and set up the Github Repo (individual)!

We’ll talk more about the data context on Thursday.

Everything on the slides is in the online manual: https://bcheggeseth.github.io/452_fall_2025/

Content Conversation 1

Good work!

Valuable preparation
- Develop comfort in derivations
- Collaborative learning (if you prepared together)
- Communication skills (helping each other; explaining to each other)
Valuable experience
- Dealing with nerves
- Thinking and problem solving on the spot
- Applying knowledge to a new situation
Feedback
- Equity in participation; logic and clarity in explanation
- Everyone has something to work on, develop, and grow

Mini Project 1

Goal: Tell a story about the time series data (energy use at Macalester)
- The trend, seasonality, and noise models help you tell that story

Put it in the Time Series Github repository (you started for HW4) but work in a TimeSeriesReport.qmd file.

. . .

Please RENDER the qmd file to html so that I can easily read it.

Final version due Thursday (at class time).
If you haven’t already, make sure you read it top to bottom to make sure the communication flows.
Make sure you are making commits using your account. Don’t edit in Google Docs together and then copy over.
GitHub commits show your contributions.

Longitudinal Timeline

Introduction to Longitudinal Data (Today)
GLM + GEE Models for Longitudinal Data
Mixed Effects Models for Longitudinal Data
Mixed Effects v. GEE

Learning Goals

Explain and illustrate the differences between ordinary least squares (OLS) and generalized least squares (GLS) as it relates to longitudinal data.

Intro to Longitudinal Data

Warm Up: Data Examples

With individuals near you,

come up with data examples with which you’ve worked with (or are aware of) that had data collected over time on many individuals or subjects
discuss the research questions that you were (or may be) interested in exploring with that data

Be prepared to share these examples with the class.

Longitudinal Notation

Consider the random variable outcome,

\[ Y_{ij} = \text{the }j\text{th outcome measurement taken on subject }i,\] \[\text{ where } i= 1,...,n, j =1,...,m_i,\]

\(n\) is the number of units/subjects and \(m_i\) is the number of observations for the \(i\)th unit/subject.

. . .

Observation times

Let \(t_{ij} =\) the time at which the \(j\)th measurement on subject \(i\) was taken.

. . .

Then for the \(i\)th subject, we can organize their outcome measurements in a vector,

\[ \mathbf{Y}_i = \left(\begin{array}{c}Y_{i1}\\ Y_{i2}\\ Y_{i3}\\ \vdots\\ Y_{im_i} \end{array}\right)\]

The corresponding observation times for the \(i\)th subject are,

\[ \mathbf{t}_i = \left(\begin{array}{c}t_{i1}\\ t_{i2}\\ t_{i3}\\ \vdots\\ t_{im_i} \end{array}\right)\]

. . .

If the observation times are the same for each subject, \(\mathbf{t}_i = \mathbf{t} = (t_1,...,t_m)\) for all \(i=1,...,n\), then the data are balanced (balanced between subjects). Otherwise, the data are unbalanced.

If the time between observation times is the same for all subjects and across time, \(t_{ij+1} - t_{ij} =\tau\) for all \(i=1,...,n\) and \(j=1,...,m_{i}\), then the data are regularly observed. Otherwise, the data are irregularly observed.

. . .

We may have \(p\) explanatory variables for subject \(i\) such that

\[ \mathbf{X}_i = \left(\begin{array}{ccccc}1&x_{i11}&x_{i12}&\cdots&x_{i1p}\\ 1&x_{i21}&x_{i22}&\cdots&x_{i2p}\\ 1&x_{i31}&x_{i32}&\cdots&x_{i3p}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_{im_i1}&x_{im_i2}&\cdots&x_{im_ip} \end{array}\right) = \left(\begin{array}{c}\mathbf{x}^T_{i1}\\ \mathbf{x}^T_{i2}\\ \mathbf{x}^T_{i3}\\ \vdots\\ \mathbf{x}^T_{im_i} \end{array}\right)\]

Failure of OLS for Correlated Data

Why do we care about \(E(\hat{\beta})\)?
Why do we care about \(Cov(\hat{\beta})\)?

Linear Models

If we have a quantitative outcome \(Y\), we could assume a linear relationship between explanatory characteristics (\(x\)’s) and the outcome.

\[Y_{ij} = \beta_0 + \beta_1x_{ij1} + \cdots + \beta_p x_{ijp} + \epsilon_{ij} = \mathbf{x}_{ij}^T\boldsymbol\beta + \epsilon_{ij}\]

where \(\boldsymbol\beta = (\beta_0\; \beta_1\; \beta_2\;\cdots\; \beta_p)^T\).

. . .

In matrix form, this linear model is written as

\[\mathbf{Y} = \mathbf{X}\boldsymbol\beta +\boldsymbol\epsilon\] where

\[ \mathbf{Y} = \left(\begin{array}{c}\mathbf{Y}_{1}\\ \mathbf{Y}_{2}\\ \mathbf{Y}_{3}\\ \vdots\\ \mathbf{Y}_{n} \end{array}\right)\]

and

\[ \mathbf{X} = \left(\begin{array}{c}\mathbf{X}_{1}\\ \mathbf{X}_{2}\\ \mathbf{X}_{3}\\ \vdots\\ \mathbf{X}_{n} \end{array}\right)\]

We have multiple ways to estimate the coefficients \(\boldsymbol\beta\) in this linear model, but we need to be careful about the assumptions we make about the data.

OLS

Assumptions

If we use ordinary least squares (OLS) to find estimates of \(\boldsymbol\beta\), we assume

\(\mathbf{X}\) are fixed (not random)
\(\epsilon_{ij}\) are independent
\(E(\epsilon_{ij}) = 0\) and \(Var(\epsilon_{ij}) = \sigma^2\) (constant variance)

The 2nd and 3rd assumptions can be combined into a statement about the covariance matrix, \(Cov(\boldsymbol \epsilon) = \sigma^2 I\).

. . .

Definition

The OLS estimator that minimizes the sum of squared errors,

\[\hat{\boldsymbol\beta}_{OLS} = \arg\min_{\boldsymbol\beta} \mathbf{(Y - X\boldsymbol\beta)^T(Y - X\boldsymbol\beta)} = \arg\min_{\boldsymbol\beta} ||\mathbf{(Y - X\boldsymbol\beta)}||^2\]

can be written as

\[\hat{\boldsymbol\beta}_{OLS} = \mathbf{(X^T X)^{-1}X^T Y}\] for the linear model above.

Proof

Alternative ways (faster than exhaustive search) to find the minimum sum of squared errors:

We could try a numerical optimization algorithm such as steepest descent.
We could use orthogonal projections of \(y\) into the linear subspace spanned by the columns of \(\mathbf{X}\).
We could use multivariable calculus (find partial derivatives, set equal to 0, and solve).

For a simple linear regression, solve the following two equations for the two unknowns (\(\beta_0\) and \(\beta_1\)):

\[\frac{\partial }{\partial \beta_0}\sum_{i=1}^n (y_i - (\beta_0 + \beta_1\,x_i))^2 = 0\] \[\frac{\partial }{\partial \beta_1}\sum_{i=1}^n (y_i - (\beta_0 + \beta_1\,x_i))^2 = 0\] For a multiple linear regression, solve the following system of equations for the unknowns by solving the Normal Equations,

\[(\mathbf{X}^T\mathbf{X})\boldsymbol\beta = \mathbf{X^T}\mathbf{Y}\] and thus

\[\hat{\boldsymbol\beta}_{OLS} = \mathbf{(X^T X)^{-1}X^T Y}\]

Warm Up: Notation & Theory

Sketch out on paper what \(\mathbf{Y}\) looks like in terms of \(Y_{ij}\).
Sketch out on paper what \(\mathbf{X}\) looks like in terms of \(x_{ijk}\).
Show that \(E(\hat{\boldsymbol\beta}_{OLS}) = \boldsymbol\beta\). Remember the properties of random vectors & matrices. Keep track of what assumptions you need for this to be true.
Show that the covariance matrix \(Cov(\hat{\boldsymbol\beta}_{OLS}) = \sigma^2(\mathbf{X}^T\mathbf{X})^{-1}\) if the assumptions above hold. Remember what we proved in HW1 about a matrix \(A\) of constants. Keep track of what assumptions you need for this to be true.

. . .

SUMMARY: The OLS estimates of our coefficients are fairly good estimates (they are unbiased), but the estimated standard errors (and the t-values and p-values) rely on our data being actually independent. Since we have correlated repeated measures, we don’t have as much “information” about the population as if they were independent observations.

Small Group Work

GLS

Assumptions

If we use generalized least squares (GLS) to find estimates of \(\boldsymbol\beta\), we assume

\(\mathbf{X}\) are fixed (not random),
\(\boldsymbol\epsilon = (\epsilon_{11}\; \cdots \epsilon_{nm_n})^T\) have known covariance matrix \(\boldsymbol\Sigma\),

. . .

Definition

The GLS estimator that minimizes the sum of squared standardized errors,

\[\hat{\boldsymbol\beta}_{GLS} = \arg\min_{\boldsymbol\beta} \mathbf{(Y - X\boldsymbol\beta)^T\boldsymbol\Sigma^{-1}(Y - X\boldsymbol\beta)}\]

can be written as

\[\hat{\boldsymbol\beta}_{GLS} = \mathbf{(X^T\Sigma^{-1}X)^{-1}X^T\Sigma^{-1}Y}\] for the linear model above.

Connect OLS and GLS

To see how OLS and GLS are connected, we can transform our potentially correlated data \(\mathbf{Y}\) into independent data using the inverse of the Cholesky Decomposition of \(\boldsymbol\Sigma = \mathbf{L}\mathbf{L}^T\) so that \(Cov(\mathbf{L}^{-1}\mathbf{Y}) = \mathbf{I}\).

Proof

\[Cov(\mathbf{L}^{-1}\mathbf{Y}) = Cov(\mathbf{L}^{-1}\boldsymbol\epsilon)\] \[= \mathbf{L}^{-1}Cov(\boldsymbol\epsilon) (\mathbf{L}^{-1})^T\] \[= \mathbf{L}^{-1}\boldsymbol\Sigma (\mathbf{L}^{-1})^T\] \[= \mathbf{L}^{-1}(\mathbf{L}\mathbf{L}^T) (\mathbf{L}^{-1})^T\] \[= \mathbf{I}\]

Assuming the linear model,\(\mathbf{Y} = \mathbf{X}\boldsymbol\beta +\boldsymbol\epsilon\), we can write the transformed data, \(\mathbf{L}^{-1}\mathbf{Y}\), as a model of transformed explanatory variables and noise,

\[\mathbf{L}^{-1}\mathbf{Y} = \mathbf{L}^{-1}\mathbf{X}\boldsymbol\beta +\mathbf{L}^{-1}\boldsymbol\epsilon\] \[\implies \mathbf{Y}^* = \mathbf{X}^*\boldsymbol\beta +\boldsymbol\epsilon^*\]

Show that if we assume \(\boldsymbol\Sigma\) is known and fixed and use OLS on the transformed data \((\mathbf{Y}^*, \mathbf{X}^*)\), then our coefficient estimates are

\[\hat{\boldsymbol\beta}_{GLS} = \mathbf{(X^T\Sigma^{-1}X)^{-1}X^T\Sigma^{-1}Y}\]

Show that \(E(\hat{\boldsymbol\beta}_{GLS}) = \boldsymbol\beta\). Remember the properties of random matrices. Keep track of what assumptions you need for this to be true.
Show that the covariance matrix \(Cov(\hat{\boldsymbol\beta}_{GLS}) =\mathbf{(X^T\Sigma^{-1}X)^{-1}}\). Remember what we proved in HW1 about a matrix \(A\) of constants. Keep track of what assumptions you need for this to be true.

Solutions

OLS

Solution

\[ \mathbf{Y} = \left(\begin{array}{c}Y_{11}\\ Y_{12}\\ \vdots\\ Y_{1m_1}\\ Y_{21}\\ Y_{22}\\ \vdots\\ Y_{2m_1}\\ \vdots\\ Y_{n1}\\ Y_{n2}\\ \vdots\\ Y_{nm_n} \end{array}\right)\]

Solution

\[ \mathbf{X} = \left(\begin{array}{ccccc}1&x_{111}&x_{112}&\cdots&x_{11p}\\ 1&x_{121}&x_{122}&\cdots&x_{12p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{1m_11}&x_{1m_12}&\cdots&x_{1m_1p}\\ 1&x_{211}&x_{212}&\cdots&x_{21p}\\ 1&x_{221}&x_{222}&\cdots&x_{22p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{2m_21}&x_{2m_22}&\cdots&x_{2m_2p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{n11}&x_{n12}&\cdots&x_{n1p}\\ 1&x_{n21}&x_{n22}&\cdots&x_{n2p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{nm_n1}&x_{nm_n2}&\cdots&x_{nm_np}\end{array}\right)\]

Solution

\[E(\widehat{\boldsymbol{\beta}}_{OLS}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^TE(\mathbf{Y}) \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^TE(\mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon) \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} \] \[= \boldsymbol{\beta}\ \]

We only use the following assumptions:

\(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon\),
the \(\mathbf{X}\) is fixed (not random),
the \(E(\boldsymbol\epsilon) = 0\).

We don’t use any assumptions about the variance/covariance of \(\boldsymbol\epsilon\).

Solution

\[Cov(\widehat{\boldsymbol{\beta}}_{OLS}) = Cov((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y})\] \[=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T Cov(\mathbf{Y})\{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\}^T \] \[=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T Cov(\mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon)\{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\}^T \] \[=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T Cov(\boldsymbol\epsilon)\{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\}^T \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\sigma^2\mathbf{I})\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \] \[= \sigma^2(\mathbf{X}^T\mathbf{X})^{-1} \]

We use the following assumptions:

\(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon\),
the \(\mathbf{X}\) is fixed (not random),
\(Cov(\boldsymbol\epsilon) = \sigma^2\mathbf{I}\).

GLS

Solution

Remember: \(\mathbf{L}\) is a square matrix.

\[ \widehat{\boldsymbol{\beta}}_{OLS} = (\mathbf{X}^{*T}\mathbf{X}^*)^{-1}\mathbf{X}^{*T}\mathbf{Y}^*\] \[= ((\mathbf{L}^{-1}\mathbf{X})^T\mathbf{L}^{-1}\mathbf{X})^{-1}(\mathbf{L}^{-1}\mathbf{X})^T\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}^{-1})^T\mathbf{L}^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}^{-1})^T\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}^{T})^{-1}\mathbf{L}^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}^{T})^{-1}\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}\mathbf{L}^T)^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}\mathbf{L}^T)^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y} \]

Solution

\[E(\widehat{\boldsymbol{\beta}}_{GLS}) = E((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y})\] \[(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}E(\mathbf{Y})\] \[= ((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}E(\mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon)) \] \[= ((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}(\mathbf{X}\boldsymbol{\beta} + E(\boldsymbol\epsilon))\] \[= ((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X}\boldsymbol{\beta} \] \[= \boldsymbol{\beta}\]

We only use the following assumptions:

\(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon\),
the \(\mathbf{X}\) is fixed (not random),
the \(E(\boldsymbol\epsilon) = 0\).

We don’t use any assumptions about the variance/covariance of \(\boldsymbol\epsilon\).

Solution

\[Cov(\widehat{\boldsymbol{\beta}}_{GLS}) = Cov((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y})\] \[ =(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} Cov(\mathbf{Y})\{(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\}^T \] \[ =(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} \boldsymbol \Sigma \{(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\}^T \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} \boldsymbol\Sigma \boldsymbol{\Sigma}^{-1}\mathbf{X}(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} \mathbf{X}(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\]

We use the following assumptions:

\(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon\),
the \(\mathbf{X}\) is fixed (not random),
\(Cov(\boldsymbol\epsilon) = \boldsymbol\Sigma\).

Wrap-Up

Finishing the Activity

If you didn’t finish the activity, no problem! Be sure to complete the activity outside of class, review the solutions in the online manual, and ask any questions on Slack or in office hours.
Re-organize and review your notes to help deepen your understanding, solidify your learning, and make homework go more smoothly!

Recommendations

When you see new notation: focus on patterns. What does the notation remind you of?
Focus on connecting to what you’ve learned previously: linear and logistic regression

After Class

Before the next class, please do the following:

Take a look at the Schedule page to see how to prepare for the next class.