6.6 Normal Model

We’ve been introduced to the Normal model already as a smooth version of a unimodal, symmetric histogram. For a quantitative random variable \(X\) (whose value can be any real number), if the expected value is \(\mu\) and the variance is \(\sigma^2\), a Normal random variable has a probability density function of \[f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]

For every potential value of \(\mu\) and \(\sigma\), there is a different function/curve. Some examples are shown below.

If a random variable \(X\) is modeled with a Normal model, we also say that “\(X\) follows a normal distribution” or that “\(X\) is normally-distributed”.

  • In general, the center of the distribution is \(\mu\) and the standard devation \(\sigma\), the square root of the variance, determines the spread of the distribution.

  • Let’s consider the particular Normal model with \(\mu=0\) and \(\sigma=1\). This is called the standard normal distribution. We know that \(P(-1\leq X \leq 1) = 0.68\), which is calculated as the area under the curve between -1 and 1.

pnorm(1) - pnorm(-1) 
## [1] 0.6826895
#pnorm(1) gives the area under the curve to the left of 1
#pnorm(-1) gives the area under the curve to the left of -1
  • We know that \(P(-2\leq X \leq 2) = 0.95\), calculated as the area under the curve between -2 and 2.

pnorm(2) - pnorm(-2)
## [1] 0.9544997
  • We know that \(P(-3\leq X \leq 3) = 0.997\), calculated as the area under the curve between -3 and 3.

pnorm(3) - pnorm(-3)
## [1] 0.9973002

The standard normal distribution is very convenient to work with. No matter what the long-run average \(\mu\) and standard deviation \(\sigma\) are for a normally-distributed random variable \(X\), we can standardize the values to obtain z-scores by subtracting \(\mu\) and dividing by \(\sigma\):

\[\text{z-score} = \frac{X - \mu}{\sigma}\]

We typically denote z-scores with \(Z\). It turns out that \(Z\) follows a standard normal distribution. That is \(\mu=0, \sigma=1\) for \(Z\). This allows us to focus solely on the areas for the standard normal distribution rather than the particular normal distribution with mean \(\mu\) and standard deviation \(\sigma\).

Important: If a random variable can be modeled with a Normal model, then we know that:

  • About 68% of the time, the values will be within 1 standard deviation of the expected value.
  • About 95% of the time, the values will be within 2 standard deviations of the expected value.
  • About 99.7% of the time, the values will be within 3 standard deviations of the expected value.

We will call this the 68-95-99.7 rule.