3.2 Autocovariance

Remember that the covariance between two random variables, \(X\) and \(Y\), is defined as

\[Cov(X,Y) = E((X - E(X))(Y - E(Y))) = E(XY) - E(X)E(Y)\]

The covariance values range the entire real line \((-\infty,\infty)\).

3.2.1 Autocovariance Function

For a random process \(\{Y_t\}_{t\in T}\), we can summarize the covariance between two random variables in the process with different indices, \(Y_t\) and \(Y_s\), as a function of the times/spaces, \(t\) and \(s\). The autocovariance function of \(s\) and \(t\) is the covariance of the random variables at those times/spaces,

\[\Sigma_Y(t, s) = Cov(Y_{t}, Y_{s}) = E((Y_t - \mu_t)(Y_s - \mu_s)) = E(Y_sY_t) - \mu_s\mu_t\] where \(\mu_s = E(Y_s)\) and \(\mu_t = E(Y_t)\).

Note:

For times series, \(t\) and \(s\) will be integer indices of time.
For longitudinal data, \(t\) and \(s\) will refer to observations times.
For spatial data, \(t\) and \(s\) will be points in space, (longitude, latitude).

Throughout this course, we’ll use the capital Greek letter Sigma, \(\Sigma\), to denote covariance. We’ll use it in a few different ways, but it will always refer to covariance.

For example, we can refer to the covariance between the first and second random variable with \(\Sigma_Y(1,2)\) and the covariance between the second and fourth with \(\Sigma_Y(1, 4)\). We’ll come back to thinking about simplifying this autocovariance function.

3.2.2 Autocorrelation Function

In general, correlation is easier to interpret than covariance since correlation values are restricted to be between -1 and 1 (inclusive).

Remember that correlation of two random variables \(X\) and \(Y\) is defined as

\[Cor(X,Y) = \frac{Cov(X,Y)}{SD(X)SD(Y)}\]

We can define an autocorrelation function as

\[\rho_Y(s,t) = \frac{\Sigma_Y(s,t)}{\sqrt{\Sigma_Y(s,s)\Sigma_Y(t,t)}} \]

We use \(\rho\) to notate theoretical correlation (in contrast to \(r\), which refers to the sample correlation coefficient).

For example, we can refer to the correlation between the first and second random variable with \(\rho_Y(1,2)\) and the correlation between the second and fourth with \(\rho_Y(1, 4)\). Again, we’ll come back to thinking about simplifying this autocorrelation function.

3.2.3 Covariance Matrix

If we have \(m\) random variables \(\mathbf{Y} = (Y_1,...,Y_m)\) of a random process, we might be interested in knowing the covariance between every pair of indexed variables. We could plug the indices into the autocovariance function. The covariance between the \(i\)th observation and the \(j\)th observation in the process would be

\[\sigma_{ij} = \Sigma_Y(i, j)\]

Among the \(m\) variables, there are \(m(m-1)/2\) unique pair-wise covariances and \(m\) variances and they could be organized into a \(m\times m\) symmetric covariance matrix,

\[\boldsymbol{\Sigma}_Y = Cov(\mathbf{Y}) = \left(\begin{array}{cccc}\sigma^2_1&\sigma_{12}&\cdots&\sigma_{1m}\\\sigma_{21}&\sigma^2_2&\cdots&\sigma_{2m}\\\vdots&\vdots&\ddots&\vdots\\\sigma_{m1}&\sigma_{m2}&\cdots&\sigma^2_m\\ \end{array} \right) \] where \(\sigma^2_i = \sigma_{ii}\). Note that \(\sigma_{ij} = \sigma_{ji}\).

The covariance matrix can also be written using the definitions of covariance and the properties of matrix algebra as

\[\boldsymbol{\Sigma}_Y = Cov(\mathbf{Y}) = E( (\mathbf{Y}-E(\mathbf{Y}))(\mathbf{Y}-E(\mathbf{Y}))^T)\] \[= E( \mathbf{Y}\mathbf{Y}^T) - \boldsymbol{\mu}\boldsymbol{\mu}^T\] where \(E(\mathbf{Y}) = \boldsymbol{\mu}\).

3.2.3.1 Properties

For a \(p \times m\) constant matrix \(\mathbf{A}\) and \(m\) dimensional random vector \(\mathbf{Y}\),

\[Cov(\mathbf{A}\mathbf{Y})=\mathbf{A}Cov(\mathbf{Y})\mathbf{A}^T\]

This is an important property because it gives us the covariance of a linear combination of our random process values: \(\mathbf{A}\mathbf{Y}\). This is also very useful when working with regression estimates (we’ll come back to this).

3.2.3.2 Positive Semidefiniteness

For the autocovariance function to be valid, it must be positive semidefinite. We can check this by evaluating it at any set of \(m\) indices and check to see if the resulting covariance matrix is positive semidefinite. A symmetric matrix, \(\boldsymbol{\Sigma}\), is positive semidefinite if and only if

\[\mathbf{x}^T\boldsymbol{\Sigma}\mathbf{x} \geq 0\text{ for all }\mathbf{x}\in \mathbb{R}^m\] or equivalently,

\[\text{The eigenvalues of }\boldsymbol{\Sigma}\text{ are all }\geq 0\]

In practice, this is important to be aware of when modeling covariance, as the covariance matrix must be positive semidefinite.

To ensure this matrix is positive semidefinite, we could use the Cholesky Decomposition in the modeling process because the decomposition is only valid for positive definite (\(>\) instead of \(\geq\) in the definitions above) matrices,

\[ \boldsymbol{\Sigma} = \mathbf{LL}^T\]

where \(\mathbf{L}\) is a lower triangular matrix (upper diagonal is all 0), we’ll return to this.

3.2.4 Correlation Matrix

If we have \(m\) random variables \(Y = (Y_1,...,Y_m)\), we might be interested in knowing the autocorrelation between every pair of observations. The correlation between the \(i\)th observation and the \(j\)th observation would be

\[\rho_{ij} = \rho_Y(i, j)\]

These \(m(m-1)/2\) correlations could be organized into a \(m\times m\) symmetric correlation matrix,

\[\mathbf{R}_Y = Cor(\mathbf{Y}) = \left(\begin{array}{cccc}1&\rho_{12}&\cdots&\rho_{1m}\\\rho_{21}&1&\cdots&\rho_{2m}\\\vdots&\vdots&\ddots&\vdots\\\rho_{m1}&\rho_{m2}&\cdots&1\\ \end{array} \right) \] Note that \(\rho_{ij} = \rho_{ji}\).

If you have the covariance matrix \(\boldsymbol{\Sigma}\), you can extract the correlation matrix,

\[\mathbf{R}_Y = \mathbf{D}^{-1}\boldsymbol{\Sigma}_Y\mathbf{D}^{-1}\] where \(\mathbf{D} = \sqrt{diag(\boldsymbol{\Sigma}_Y)}\), a diagonal matrix with standard deviations along the diagonal because

\[\rho_{ij} = \frac{\sigma_{ij}}{\sigma_i\sigma_j}\]