6.10 Chapter 6 Major Takeaways

The two common ways we’ll think about probability is empirical probability (imagining repeated a random process over and over again; the probability of an event is the relative frequency of that event happening) and theoretical probability (based on axioms and mathematical theory).
Theoretical probability is based on a few axioms (rules) that should always be true.
- Rule 1: \(0\leq P(A)\leq 1\) (probability has to be between 0 and 1)
- Rule 2: \(P(S) = 1\) (one of the possible outcomes has to happen)
- Rule 3: \(P(\text{not }A) = 1 - P(A)\) (if we know the chance of something happening, we also know the chance that it doesn’t happen)
- Rule 4: \(P(A\text{ or }B) = P(A) + P(B) - P(A\text{ and } B)\) in general
- Rule 5: \(P(A\text{ and }B) = P(A \mid B)P(B) = P(B \mid A)P(A)\) in general
A random variable (\(X\)) is a variable whose value is random.
- We can write out the probability model for a random variable by listing the possible values and the associated probabilities.
- The expected value of a random variable \(X\) is the long-run average, calculated as the sum of the values, weighted by their chances.
- The variance of a random variable \(X\) is the variability in the long-run, calculated as the long-run average of the squared deviation from the expected value.
A set of Bernoulli Trials include a sequence of random variables \(X_1,....,X_n\) where the random variables \(X_j\)
- have only two outcomes (success = 1, failure = 0)
- are independent of each other (value of one doesn’t impact the chances for the next)
- have constant chance of success, \(p\)
The sum of those random variables, \(X = \sum_{j=1}^n X_j\), is a Binomial Random Variable, which reflects the count of successes
- The probability of \(X = x\) counts is \(P(X = x) = \frac{n!}{x!(n-x)!} p^x(1-p)^{n-x}\)
- Expected Value of Count of Successes: \(E(X) = np\)
- Variance of Count of Successes: \(Var(X) = np(1-p)\)
- Expected Value of Proportion of Successes: \(E(X/n) = p\)
- Variance of Proportion of Successes: \(Var(X) = p(1-p)/n\)
- As \(n\) increases, the variance of the proportion decreases!
- As \(n\) increases, the Binomial probabilities resemble a Normal curve!
If a random variable \(X\) can take any value on a number line (e.g. 1.01, 2.33333,…) and that variable has a symmetric, unimodal distribution, then we might say that it is a Normal Random Variable.
- With \(X\) following a standard normal distribution (mean = \(\mu\) = 0, standard deviation = \(\sigma\) = 1), we know that \(P(-1\leq X \leq 1) = 0.68\), \(P(-2\leq X \leq 2) = 0.95\), and \(P(-3\leq X \leq 3) = 0.997\).
- If \(X\) follows a normal model with \(\mu\) and \(\sigma\), then \(\frac{X-\mu}{\sigma}\) follows a standard normal distribution.
A sampling distribution of a statistic tells us exactly how that statistic would vary across all possible samples of a given size.
- The central limit theorem tells us that the sampling distribution of a sample mean, \(\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i\), is approximately Normal with mean = population mean, \(\mu\), and standard deviation = population standard deviation divided by the square root of the sample size, \(\sigma/\sqrt{n}\)
When the sample size is small \(n<30\), we need to consider William Gosset’s work because \(\frac{\bar{X} - \mu}{s/\sqrt{n}}\) is not quite Normally distributed!

Now, how does all of this apply to models and regression coefficients? Since the estimates for linear models are very similar to means, the sampling distribution of a sample regression coefficient is approximately Normal with mean = population regression coefficient, \(\beta_j\), and standard deviation, \(SD(\hat{\beta_j})\).

When the sample size is small \(n<30\), we need to consider William Gosset’s work because \(\frac{\hat{\beta_j} - \beta_j}{SE(\hat{\beta_j})}\) is not quite Normally distributed!