10 Introduction to Longitudinal Data
Settling In
Content Conversation 1
Good work!
- Valuable preparation
- Develop comfort in derivations
- Collaborative learning (if you prepared together)
- Communication skills (helping each other; explaining to each other)
- Valuable experience
- Dealing with nerves
- Thinking and problem solving on the spot
- Applying knowledge to a new situation
- Feedback
- Equity in participation; logic and clarity in explanation
- Everyone has something to work on, develop, and grow
Mini Project 1
- Goal: Tell a story about the time series data (energy use at Macalester)
- The trend, seasonality, and noise models help you tell that story
Put it in the Time Series Github repository (you started for HW4) but work in a TimeSeriesReport.qmd file.
. . .
Please RENDER the qmd file to html so that I can easily read it.
- Final version due Thursday (at class time).
- If you haven’t already, make sure you read it top to bottom to make sure the communication flows.
- Make sure you are making commits using your account. Don’t edit in Google Docs together and then copy over.
- GitHub commits show your contributions.
Longitudinal Timeline
- Introduction to Longitudinal Data (Today)
- GLM + GEE Models for Longitudinal Data
- Mixed Effects Models for Longitudinal Data
- Mixed Effects v. GEE
Learning Goals
- Explain and illustrate the differences between ordinary least squares (OLS) and generalized least squares (GLS) as it relates to longitudinal data.
Intro to Longitudinal Data
Warm Up: Data Examples
With individuals near you,
come up with data examples with which you’ve worked with (or are aware of) that had data collected over time on many individuals or subjects
discuss the research questions that you were (or may be) interested in exploring with that data
Be prepared to share these examples with the class.
Longitudinal Notation
Consider the random variable outcome,
\[ Y_{ij} = \text{the }j\text{th outcome measurement taken on subject }i,\] \[\text{ where } i= 1,...,n, j =1,...,m_i,\]
\(n\) is the number of units/subjects and \(m_i\) is the number of observations for the \(i\)th unit/subject.
. . .
Observation times
Let \(t_{ij} =\) the time at which the \(j\)th measurement on subject \(i\) was taken.
. . .
Then for the \(i\)th subject, we can organize their outcome measurements in a vector,
\[ \mathbf{Y}_i = \left(\begin{array}{c}Y_{i1}\\ Y_{i2}\\ Y_{i3}\\ \vdots\\ Y_{im_i} \end{array}\right)\]
The corresponding observation times for the \(i\)th subject are,
\[ \mathbf{t}_i = \left(\begin{array}{c}t_{i1}\\ t_{i2}\\ t_{i3}\\ \vdots\\ t_{im_i} \end{array}\right)\]
. . .
If the observation times are the same for each subject, \(\mathbf{t}_i = \mathbf{t} = (t_1,...,t_m)\) for all \(i=1,...,n\), then the data are balanced (balanced between subjects). Otherwise, the data are unbalanced.
If the time between observation times is the same for all subjects and across time, \(t_{ij+1} - t_{ij} =\tau\) for all \(i=1,...,n\) and \(j=1,...,m_{j}\), then the data are regularly observed. Otherwise, the data are irregularly observed.
. . .
We may have \(p\) explanatory variables for subject \(i\) such that
\[ \mathbf{X}_i = \left(\begin{array}{ccccc}1&x_{i11}&x_{i12}&\cdots&x_{i1p}\\ 1&x_{i21}&x_{i22}&\cdots&x_{i2p}\\ 1&x_{i31}&x_{i32}&\cdots&x_{i3p}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 1&x_{im_i1}&x_{im_i2}&\cdots&x_{im_ip} \end{array}\right) = \left(\begin{array}{c}\mathbf{x}^T_{i1}\\ \mathbf{x}^T_{i2}\\ \mathbf{x}^T_{i3}\\ \vdots\\ \mathbf{x}^T_{im_i} \end{array}\right)\]
Small Group Work
GLS
Assumptions
If we use generalized least squares (GLS) to find estimates of \(\boldsymbol\beta\), we assume
- \(\mathbf{X}\) are fixed (not random),
- \(\boldsymbol\epsilon = (\epsilon_{11}\; \cdots \epsilon_{nm_n})^T\) have known covariance matrix \(\boldsymbol\Sigma\),
. . .
Definition
The GLS estimator that minimizes the sum of squared standardized errors,
\[\hat{\boldsymbol\beta}_{GLS} = \arg\min_{\boldsymbol\beta} \mathbf{(Y - X\boldsymbol\beta)^T\boldsymbol\Sigma^{-1}(Y - X\boldsymbol\beta)}\]
can be written as
\[\hat{\boldsymbol\beta}_{GLS} = \mathbf{(X^T\Sigma^{-1}X)^{-1}X^T\Sigma^{-1}Y}\] for the linear model above.
Connect OLS and GLS
To see how OLS and GLS are connected, we can transform our potentially correlated data \(\mathbf{Y}\) into independent data using the inverse of the Cholesky Decomposition of \(\boldsymbol\Sigma = \mathbf{L}\mathbf{L}^T\) so that \(Cov(\mathbf{L}^{-1}\mathbf{Y}) = \mathbf{I}\).
Proof
\[Cov(\mathbf{L}^{-1}\mathbf{Y}) = Cov(\mathbf{L}^{-1}\boldsymbol\epsilon)\] \[= \mathbf{L}^{-1}Cov(\boldsymbol\epsilon) (\mathbf{L}^{-1})^T\] \[= \mathbf{L}^{-1}\boldsymbol\Sigma (\mathbf{L}^{-1})^T\] \[= \mathbf{L}^{-1}(\mathbf{L}\mathbf{L}^T) (\mathbf{L}^{-1})^T\] \[= \mathbf{I}\]
Assuming the linear model,\(\mathbf{Y} = \mathbf{X}\boldsymbol\beta +\boldsymbol\epsilon\), we can write the transformed data, \(\mathbf{L}^{-1}\mathbf{Y}\), as a model of transformed explanatory variables and noise,
\[\mathbf{L}^{-1}\mathbf{Y} = \mathbf{L}^{-1}\mathbf{X}\boldsymbol\beta +\mathbf{L}^{-1}\boldsymbol\epsilon\] \[\implies \mathbf{Y}^* = \mathbf{X}^*\boldsymbol\beta +\boldsymbol\epsilon^*\]
- Show that if we assume \(\boldsymbol\Sigma\) is known and fixed and use OLS on the transformed data \((\mathbf{Y}^*, \mathbf{X}^*)\), then our coefficient estimates are
\[\hat{\boldsymbol\beta}_{GLS} = \mathbf{(X^T\Sigma^{-1}X)^{-1}X^T\Sigma^{-1}Y}\]
Show that \(E(\hat{\boldsymbol\beta}_{GLS}) = \boldsymbol\beta\). Remember the properties of random matrices. Keep track of what assumptions you need for this to be true.
Show that the covariance matrix \(Cov(\hat{\boldsymbol\beta}_{GLS}) =\mathbf{(X^T\Sigma^{-1}X)^{-1}}\). Remember what we proved in HW1 about a matrix \(A\) of constants. Keep track of what assumptions you need for this to be true.
Solutions
OLS
- .
Solution
\[ \mathbf{Y} = \left(\begin{array}{c}Y_{11}\\ Y_{12}\\ \vdots\\ Y_{1m_1}\\ Y_{21}\\ Y_{22}\\ \vdots\\ Y_{2m_1}\\ \vdots\\ Y_{n1}\\ Y_{n2}\\ \vdots\\ Y_{nm_n} \end{array}\right)\]- .
Solution
\[ \mathbf{X} = \left(\begin{array}{ccccc}1&x_{111}&x_{112}&\cdots&x_{11p}\\ 1&x_{121}&x_{122}&\cdots&x_{12p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{1m_11}&x_{1m_12}&\cdots&x_{1m_1p}\\ 1&x_{211}&x_{212}&\cdots&x_{21p}\\ 1&x_{221}&x_{222}&\cdots&x_{22p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{2m_21}&x_{2m_22}&\cdots&x_{2m_2p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{n11}&x_{n12}&\cdots&x_{n1p}\\ 1&x_{n21}&x_{n22}&\cdots&x_{n2p}\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ 1&x_{nm_n1}&x_{nm_n2}&\cdots&x_{nm_np}\end{array}\right)\]- .
Solution
\[E(\widehat{\boldsymbol{\beta}}_{OLS}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^TE(\mathbf{Y}) \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^TE(\mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon) \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} \] \[= \boldsymbol{\beta}\ \]
We only use the following assumptions:
- \(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon\),
- the \(\mathbf{X}\) is fixed (not random),
- the \(E(\boldsymbol\epsilon) = 0\).
- .
Solution
\[Cov(\widehat{\boldsymbol{\beta}}_{OLS}) = Cov((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y})\] \[=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T Cov(\mathbf{Y})\{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\}^T \] \[=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T Cov(\mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon)\{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\}^T \] \[=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T Cov(\boldsymbol\epsilon)\{(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\}^T \] \[= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\sigma^2\mathbf{I})\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \] \[= \sigma^2(\mathbf{X}^T\mathbf{X})^{-1} \]
We use the following assumptions:
- \(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon\),
- the \(\mathbf{X}\) is fixed (not random),
- \(Cov(\boldsymbol\epsilon) = \sigma^2\mathbf{I}\).
GLS
- .
Solution
Remember: \(\mathbf{L}\) is a square matrix.
\[ \widehat{\boldsymbol{\beta}}_{OLS} = (\mathbf{X}^{*T}\mathbf{X}^*)^{-1}\mathbf{X}^{*T}\mathbf{Y}^*\] \[= ((\mathbf{L}^{-1}\mathbf{X})^T\mathbf{L}^{-1}\mathbf{X})^{-1}(\mathbf{L}^{-1}\mathbf{X})^T\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}^{-1})^T\mathbf{L}^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}^{-1})^T\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}^{T})^{-1}\mathbf{L}^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}^{T})^{-1}\mathbf{L}^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T(\mathbf{L}\mathbf{L}^T)^{-1}\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{L}\mathbf{L}^T)^{-1}\mathbf{Y} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y} \]- .
Solution
\[E(\widehat{\boldsymbol{\beta}}_{GLS}) = E((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y})\] \[(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}E(\mathbf{Y})\] \[= ((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}E(\mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon)) \] \[= ((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}(\mathbf{X}\boldsymbol{\beta} + E(\boldsymbol\epsilon))\] \[= ((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X}\boldsymbol{\beta} \] \[= \boldsymbol{\beta}\]
We only use the following assumptions:
- \(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon\),
- the \(\mathbf{X}\) is fixed (not random),
- the \(E(\boldsymbol\epsilon) = 0\).
- .
Solution
\[Cov(\widehat{\boldsymbol{\beta}}_{GLS}) = Cov((\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{Y})\] \[ =(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} Cov(\mathbf{Y})\{(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\}^T \] \[ =(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} \boldsymbol \Sigma \{(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\}^T \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} \boldsymbol\Sigma \boldsymbol{\Sigma}^{-1}\mathbf{X}(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\Sigma}^{-1} \mathbf{X}(\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1} \] \[= (\mathbf{X}^T\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\]
We use the following assumptions:
- \(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol\epsilon\),
- the \(\mathbf{X}\) is fixed (not random),
- \(Cov(\boldsymbol\epsilon) = \boldsymbol\Sigma\).
Wrap-Up
Finishing the Activity
- If you didn’t finish the activity, no problem! Be sure to complete the activity outside of class, review the solutions in the online manual, and ask any questions on Slack or in office hours.
- Re-organize and review your notes to help deepen your understanding, solidify your learning, and make homework go more smoothly!
Recommendations
- When you see new notation: focus on patterns. What does the notation remind you of?
- Focus on connecting to what you’ve learned previously: linear and logistic regression
After Class
Before the next class, please do the following:
- Take a look at the Schedule page to see how to prepare for the next class.