2.5 Multivariate Normal Distribution

The normal probability density function for random variable \(X\) with mean \(\mu\) and standard deviation \(\sigma\) is \[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2\right)\] Note: \[\left(\frac{x - \mu}{\sigma}\right)^2 = (x-\mu)(\sigma^2)^{-1}(x-\mu) \]

The multivariate normal probability density function for a k-dimensional random vector \(\mathbf{X} = (X_1,...,X_k)\) with mean vector \(\boldsymbol\mu\) and covariance matrix \(\boldsymbol\Sigma\) is \[ f(\mathbf{x}) = \frac{1}{(2\pi)^{k/2}|\boldsymbol\Sigma|^{1/2}}\exp\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{x} - \boldsymbol\mu)\right)\]

Example - Bivariate Normal

When \(k=2\), our random vector includes \(X_1\) and \(X_2\). Let \(\rho_{12}\) be the correlation between the two variables \(X_1\) and \(X_2\).

\[ f(x_1,x_2) = \frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho_{12}^2}}e^{-\frac{1}{2(1-\rho_{12}^2)}\left[\left(\frac{x_1 - \mu_1}{\sigma_1}\right)^2+2\rho_{12}\left(\frac{x_1 - \mu_1}{\sigma_1}\right)\left(\frac{x_2 - \mu_2}{\sigma_2}\right) +\left(\frac{x_2 - \mu_2}{\sigma_2}\right)^2\right]}\]

See below for 3D plots of the bivariate density function under two correlation values (\(\rho_{12}=0\) indicating no correlation and \(\rho_{12} = 0.8\) indicating a strong positive correlation) with standardized assumptions for the variance and mean, \(\sigma_1^2=\sigma^2_2 = 1\) and \(\mu_1=\mu_2=0\).

rho <- 0
bivn <- MASS::mvrnorm(5000, mu = c(0, 0), Sigma = matrix(c(1, rho, rho, 1), nrow = 2))
bivn.kde <- MASS::kde2d(bivn[,1], bivn[,2], h =2, n = 50) #density function values (we'll discuss this function later)
persp(bivn.kde, phi = 30, theta = 30,xlab='x1',ylab='x2',zlab='f(x1,x2)',main = expression(rho[12] == 0))
rho <- 0.8
bivn <- MASS::mvrnorm(5000, mu = c(0, 0), Sigma = matrix(c(1, rho, rho, 1), nrow = 2))
bivn.kde <- MASS::kde2d(bivn[,1], bivn[,2], h =2, n = 50)
persp(bivn.kde, phi = 30, theta = 30,xlab='x1',ylab='x2',zlab='f(x1,x2)',main = expression(rho[12] == 0.8))

2.5.1 Contours of Density

The multivariate normal density function is constant on surfaces where the square of the Mahalanobis-type distance \((\mathbf{x} - \boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{x} - \boldsymbol\mu)\) is constant and these are called contours.

The contours are ellipsoids defined by \[(\mathbf{x} - \boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{x} - \boldsymbol\mu) = c^2\] such that the ellipsoids are centered at \(\boldsymbol\mu\) and have axes \(\pm c\sqrt{\lambda_{i}}\mathbf{e}_i\) where \(\boldsymbol\Sigma \mathbf{e}_i = \lambda_i\mathbf{e}_i\) for \(i=1,...,k\). Thus, \(c\sqrt{\lambda_{i}}\) are the radii of the axes (\(\lambda_i\) are eigenvalues of \(\boldsymbol\Sigma\)) and the eigenvectors \(\mathbf{e}_i\) are the directions of the axes.

Also, in this context, \(c\) is the Mahalanobis distance between points in this distribution. We may refer to \(c\) as the radius, but note that this is the radius of the ellipse only when it is a circle and \(\lambda_{i}=1\).

Below are contours of bivariate normal density functions with the values representing the density function (height).

rho = 0
bivn <- MASS::mvrnorm(5000, mu = c(0, 0), Sigma = matrix(c(1, rho, rho, 1), 2))
bivn.kde <- MASS::kde2d(bivn[,1], bivn[,2], h =2, n = 50)
contour(bivn.kde,xlab='x1',ylab='x2',main = expression(rho[12] == 0),xlim=c(-4,4),ylim=c(-4,4))
rho = 0.8
bivn <- MASS::mvrnorm(5000, mu = c(0, 0), Sigma = matrix(c(1, rho, rho, 1), 2))
bivn.kde <- MASS::kde2d(bivn[,1], bivn[,2], h =2, n = 50)
contour(bivn.kde,xlab='x1',ylab='x2',main = expression(rho[12] == 0.8),xlim=c(-4,4),ylim=c(-4,4))

These should look familiar if you looked at a topographic map for hiking in mountainous areas.

2.5.2 Properties of Multivariate Normal

Linear Combinations

If \(\mathbf{X}\) is distributed as a k-dimensional normal distribution with mean \(\boldsymbol\mu\) and covariance \(\boldsymbol\Sigma\), then any linear combination of variables (where \(\mathbf{a}\) is a vector of constants), \(\mathbf{a}^T\mathbf{X} = a_1X_1+a_2X_2+\cdots+a_kX_k\) is distributed as normal with mean \(\mathbf{a}^T\boldsymbol\mu\) and covariance \(\mathbf{a}^T\boldsymbol\Sigma\mathbf{a}\)

Chi-Squared Distribution The chi-squared distribution with \(k\) degrees of freedom is the distribution of the sum of the squares of \(k\) independent standard normal random variables. You typically prove this is true in Probability.

Theorem: If a random vector \(\mathbf{X}\in\mathbb{R}^k\) follows a multivariate normal distribution with mean \(\boldsymbol\mu\) and covariance \(\boldsymbol\Sigma\), then \((\mathbf{X} - \boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{X} - \boldsymbol\mu)\) follows a Chi-squared distribution with \(k\) degrees of freedom.

Sketch of proof: Using the Cholesky decomposition, we get \(\boldsymbol\Sigma = \mathbf{L}\mathbf{L}^T\). Then define a random vector \(\mathbf{Y}\) such that \[\mathbf{Y} = \mathbf{L}^{-1}(\mathbf{X} - \boldsymbol{\mu})\] Show that \(\mathbf{Y}\) is a vector of independent standard normal random variables. You can use the properties of linear combinations above.

Then show that we have a sum of independent standard normals, \[(\mathbf{X} - \boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{X} - \boldsymbol\mu) = \mathbf{Y}^T\mathbf{Y} = \sum Y_i^2\]

Extensions of Intervals to Multiple Dimensions - Ellipse Percentiles Now that we know about the contours of a multivariate normal distribution and we know the probability distribution for the contours, we can extend the ideas of intervals (such as confidence intervals or intervals of uncertainty) to a multivariate situation.

The contours containing \(\alpha\) of the probability under the multivariate normal distribution are the values of \(\mathbf{x}\) that satisfy \[(\mathbf{x} - \boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{x} - \boldsymbol\mu) = \chi^2_k(\alpha)\] where \(\chi^2_k(\alpha)\) is the \(\alpha \cdot 100\) percentile of a chi-squared distribution with \(k\) degrees of freedom. Therefore, \[P((\mathbf{x} - \boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{x} - \boldsymbol\mu) \leq \chi^2_k(\alpha)) = \alpha\]

That means that we can calculate ellipses that correspond to a probability of say \(\alpha = 0.95\), for example (probability = volume under the surface). In a univariate situation, these ellipses correspond to intervals such that the area under the curve equals the probability of say 0.95.

For a bivariate normal with mean vector (0,0), variances equal to 1, and correlation equal to 0 (and then 0.8), there is R code below to calculate the 0.5 and 0.95 ellipses based on the theoretical mean and covariance (green lines) or estimated based sample data (red lines).

x <- MASS::mvrnorm(500, mu = c(0,0), Sigma = matrix(c(1,0,0,1),2))
tmp <- car::dataEllipse(x[,1],x[,2]) #based on data
c2 <- qchisq(.5,df = 2)
car::ellipse(c(0,0), matrix(c(1,0,0,1),2),sqrt(c2), col=3, lty=2) #theoretical
c2 <- qchisq(.95,df = 2)
car::ellipse(c(0,0), matrix(c(1,0,0,1),2),sqrt(c2), col=3, lty=2)

x <- MASS::mvrnorm(500, mu = c(0,0), Sigma = matrix(c(1,.8,.8,1),2))
tmp <- car::dataEllipse(x[,1],x[,2]) #based on data
c2 <- qchisq(.5,df = 2)
car::ellipse(c(0,0), matrix(c(1,.8,.8,1),2),sqrt(c2), col=3, lty=2) #how we generated data
c2 <- qchisq(.95,df = 2)
car::ellipse(c(0,0), matrix(c(1,.8,.8,1),2),sqrt(c2), col=3, lty=2)