2.2 Moments

2.2.1 Expectation

The expected value of a random variable is the long run average and calculated as weighted sum of possible values (weighted by the probability) \[\mu = E(X) = \sum_{\text{all }x}x\cdot P(X=x) \] for a discrete random variable and \[\mu = E(X) = \int_{-\infty}^{\infty}xf(x)dx \] for a continuous random variable where \(f(x)\) is the probability density function.

Properties

For random variables \(X\) and \(Y\) and constant \(a\), the following are true (you can show them using the definitions of expected value),

\[E(X+a) = E(X) +a\] \[E(aX) = aE(X) \] \[E(X+Y) = E(X) +E(Y)\]

Example Proof of \(E(X+a) = E(X) + a\)

Let’s assume \(X\) is a discrete random variable. Then, by the definition of expected value,

\[\begin{align} E(X+a) &= \sum_{\text{all }x}(x + a)P(X=x) \\ &= \sum_{\text{all }x}(xP(X=x) + aP(X=x))\\ &= \sum_{\text{all }x}xP(X=x) + \sum_{\text{all }x}aP(X=x)\\ &= E(X) + a\sum_{\text{all }x}P(X=x)\\ &= E(X) + a*1\\ \end{align}\]

Let’s assume \(X\) is a continuous random variable. Then, by the definition of expected value,

\[\begin{align} E(X+a) &=\int (x + a)f(x)dx \\ &= \int(xf(x) + af(x))dx\\ &= \int xf(x)dx + \int af(x)dx\\ &= E(X) + a\int f(x)dx\\ &= E(X) + a*1\\ \end{align}\]

2.2.2 Covariance and Variance

The covariance between two random variables is a measure of linear dependence (i.e. average product of how far you are from the mean or expected value in each variable). The theoretical covariance between two random variables, \(X\) and \(Y\), is

\[Cov(X,Y) = E((X - \mu_X)(Y-\mu_Y))\] where the means are defined as \(\mu_X = E(X)\) and \(\mu_Y = E(Y).\)

The variance is the covariance of a random variable with itself,

\[Var(X) = Cov(X,X) = E((X - \mu_X)(X-\mu_X)) = E((X - \mu_X)^2)\]

Covariance of a Sequence or Series of Random Variables

In this class, we will often work with a sequence or series random variables that are indexed or ordered. Imagine we have a series of indexed random variables \(X_1,...,X_n\). The subscripts indicate the order. For any two of those random variables, \(X_l\) and \(X_k\),the covariances is defined as

\[Cov(X_l, X_k) = E((X_l - \mu_l)(X_k - \mu_k)) \] where \(\mu_l = E(X_l)\) and \(\mu_k = E(X_k).\)

Note: the order of the variables do not matter, \(Cov(X_l,X_k) = Cov(X_k,X_l)\), due to the commutative properties of multiplication.

Notation

We’ll use the Greek letter, sigma, \(\sigma\) to represent covariance. With a series of indexed random variables \(X_1,...,X_n\), we use the subscripts or indexes on the \(\sigma\) as a short-hand for the covariance between those two random variables, \[\sigma_{lk} = Cov(X_l, X_k)\] where \(l,k \in \{1,2,...,n\}\).

If the index is the same, \(l = k\), then the covariance of the variable with itself is the variance, a measure of spread of a random variable.

Let us denote the variance as \[\sigma_l^2 = Cov(X_l,X_l) = Var(X_l)\]

It is the average squared distance from the mean,

\[\sigma_{l}^2 = Var(X_l) = Cov(X_l, X_l) = E((X_l - \mu_l)(X_l - \mu_l)) = E((X_l - \mu_l)^2)\]

The standard deviation (SD) of \(X_l\) is the square root of the variance, \[\sigma_l = SD(X_l) = \sqrt{\sigma_l^2}\] We often interpret the standard deviation because the units of the SD are the units of the random variable and not in squared units.

Theorem: If \(X_l\) and \(X_k\) are independent, then \(Cov(X_l, X_k) = 0\). (See the technical note below to help you prove this.) The converse is not true.

Properties

For random variables \(X_l\), \(X_j\), \(X_k\) and constants \(a\), \(b\), and \(c\), the following are true (you can show them using the definitions),

\[Cov(aX_l, b X_k) = abCov(X_l,X_k)\] \[Cov(aX_l + c, b X_k) = abCov(X_l,X_k)\]

\[Cov(aX_l + b X_j, cX_k) = acCov(X_l,X_k) + bcCov(X_j,X_k)\]

Thus, we have the following properties of variance,

\[Var(aX_l) = Cov(aX_l, aX_l) = aaCov(X_l,X_l) = a^2Var(X_l)\]



\[Var(aX_l + bX_j) = Cov(aX_l + bX_j, aX_l + bX_j)\] \[= Cov(aX_l, aX_l + bX_j) + Cov(bX_j, aX_l + bX_j)\] \[ = Cov(aX_l, aX_l) + Cov(aX_l, bX_j) + Cov(bX_j, aX_l) + Cov(bX_j,bX_j) \] \[= a^2 Var(X_l) + b^2Var(X_j) + 2abCov(X_j,X_l)\]

2.2.3 Correlation

The standardized version of the covariance is the correlation. The theoretical correlation between two random variables is calculated by dividing the covariance by the standard deviations of each variable,

\[Cor(X_l,X_k) = \rho_{lk} = \frac{\sigma_{lk}}{\sigma_l\sigma_k} = \frac{Cov(X_l,X_k)}{SD(X_l)SD(X_k)} = \frac{Cov(X_l,X_k)}{\sqrt{Var(X_l)}\sqrt{Var(X_k)}}\]