2.3 Joint Probability Distributions

For a set of discrete random variables \((X_1,...,X_k)\), the joint probability mass function of \(X_1,...,X_k\) is defined as the function such that for every point \((x_1,...,x_k)\) in the \(k\)-dimensional space, \[p(x_1,...,x_k) = P(X_1 = x_1,...,X_k = x_k) \geq 0\]

If \((x_1,...,x_k)\) is not one of the possible values for the set of random variables, then \(p(x_1,...,x_k) = 0\). Since there can be at most countably many points with \(p(x_1,...,x_k)>0\) and since these points must account for all the probability, we know that \[\sum_{\text{all }(x_1,...,x_k)}p(x_1,...,x_k) = 1 \] and for a subset of points called \(A\), \[P((x_1,...,x_k)\in A ) = \sum_{(x_1,...,x_k)\in A}p(x_1,...,x_k) \]

The joint distribution function of a set of continuous random variables \((X_1,...,X_k)\) is defined in terms of the culmulative joint distribution function, \[F(x_1,...,x_k) = P(X_1 < x_1,...,X_k < x_k)\]

For a set of continuous random variables \((X_1,...,X_k)\), the joint density function of \(X_1,...,X_k\) is defined as the non-negative function defined for every point \((x_1,...,x_k)\) in the \(k\)-dimensional space such that for every subset \(A\) of the space, \[P((x_1,...,x_k)\in A ) = \int\dots\int_A f(x_1,...,x_k) dx_1...dx_k\]

In order to be a joint probability density function, \(f\) must be a non-negative function and \[\int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}f(x_1,...,x_k) dx_1...dx_k= 1 \]

2.3.0.1 Technical Note

For two discrete random variables, the covariance can be written as \[Cov(X_l,X_k) = \sum_{\text{all }x_l}\sum_{\text{all }x_k} (x_l -\mu_l)(x_k -\mu_k)p_{lk}(x_l,x_k) \] where \(p_{lk}(x_l,x_k)\) is the joint probability distribution such that \(p_{lk}(x_l,x_k) = P(X_l = x_l \text{ and }X_k = x_k)\) and \(\sum_{\text{all }x_l}\sum_{\text{all }x_k}p_{lk}(x_l,x_k)=1\).

For two continuous random variables, the covariance can be written as \[Cov(X_l,X_k) = \int^\infty_{-\infty}\int^\infty_{-\infty} (x_l -\mu_l)(x_k -\mu_k)f(x_l,x_k)dx_ldx_k \] where \(f(x_l,x_k)\) is the joint density function such that \(\int^\infty_{-\infty}\int^\infty_{-\infty}(x_l,x_k)dx_ldx_k=1\). We’ll see some examples of joint densities soon.

Two random variables are said to be statistically independent if and only if \[f(x_l,x_k) = f_{l}(x_l)f_k(x_k) \] for all possible values of \(x_l\) and \(x_k\) for continuous random variables and \[P(X_l = x_l, X_k = x_k)=P(X_l=x_l)P(X_k=x_k) \] for discrete random variables.

A finite set of random variables are said to be mutually statistically independent if and only if \[f(x_1,x_2,...x_k) = f_{1}(x_1)\cdots f_k(x_k) \] for all possible values for continuous random variables and \[P(X_1 = x_1,X_2 = x_2,..., X_k = x_k)=P(X_1=x_1)P(X_2=x_2)\cdots P(X_k=x_k)\] for discrete random variables.