6.2 Theoretical Probability Rules

To understand theoretical probability, we need to define a few terms and set some rules for working with probabilities (known as axioms).

The sample space, \(S\), is the set of all possible outcomes of a random process.

Example: If you flip two coins (each coin has one side Heads and one side Tails), then the sample space contains four possible outcomes: Heads and Heads (HH), Heads and Tails (HT), Tails and Heads (TH), and Tails and Tails (TT). That is, \(S = \{HH,HT,TH,TT\}\).

A subset of outcomes is called an event, denoted as \(A\).

Example: If you flip two coins, an event \(A\) could be that exactly one of the coins lands Heads, \(A = \{HT,TH\}\).

For events \(A\) and \(B\) and sample space \(S\), the probability of an event \(A\), notated as \(P(A)\), follows the rules below:

Rule 1: \(0\leq P(A)\leq 1\) (probability has to be between 0 and 1)
Rule 2: \(P(S) = 1\) (one of the possible outcomes has to happen)
Rule 3: \(P(\text{not }A) = 1 - P(A)\) (if we know the chance of something happening, we also know the chance that it doesn’t happen)
Rule 4: \(P(A\text{ or }B) = P(A) + P(B)\) if \(A\) and \(B\) are disjoint events.
- \(A\) and \(B\) are disjoint if \(A\) occuring prevents \(B\) from occurring (they both can’t happen at the same time).
Rule 5: \(P(A\text{ and }B) = P(A)\times P(B)\) if \(A\) and \(B\) are independent.
- \(A\) and \(B\) are independent if \(B\) occurring doesn’t change the probability of \(A\) occurring.
Rule 4*: \(P(A\text{ or }B) = P(A) + P(B) - P(A\text{ and } B)\) in general
Rule 5*: \(P(A\text{ and }B) = P(A \mid B)P(B) = P(B \mid A)P(A)\) in general
- The conditional probability of A given that event B occurs, \(P(A \mid B)\), is equal to the probability of the joint event (A and B) divided by the probability of B. \[ P(A \mid B) = \frac{P(A \text{ and } B)}{P(B)} \]
- Intuition: Given that \(B\) happened, we focus on the subset of outcomes in \(S\) in which \(B\) occurs and then figure out what the chance of \(A\) happening within that subset.

For more details on theoretical probability, please see [Appendix A]. This material is optional but available for those of you who want to understand the mathematical reasoning behind the rest of the chapter.

6.2.1 Diagnotic Testing and Probability

Let’s start by taking a moment to consider a recent Washington Post article that discusses the role of probability in medical diagnostics. Before you read the whole article, consider a question.

Say that Disease X has a prevalence of 1 in 1,000 (meaning that 1 out of every 1,000 people will have it).

The test to detect Disease X has a false-positive rate of 5 percent (meaning that out of every 100 subjects who do not have Disease X, 5 will falsely test positive for it).

The test’s accuracy is 99 percent (meaning that out of every 100 who do have Disease X, 99 will correctly test positive for it).

If a patient’s test result comes back positive, what is the probability that this patient actually has the disease?

If you said the probability is 95%, then you are wrong, but almost half of the doctors surveyed in 2014 thought exactly the same thing.

We can use the rules of probability to get a sense of what the desired probability actually is. We want to know the probability that they have the disease GIVEN that they got a positive test result, \(P(D \mid +)\), where \(D\) stands for disease and \(+\) stands for positive test result.

Based on the definition of conditional probability, we must consider only those that got a positive test result back and look at the proportion of them that have the disease. In mathematical notation, that is equal to

\[P(D \mid +) = \frac{P(D \text{ and } +)}{P(+)}\]

What information were we given again?

The prevalence of the disease is 1 in 1,000, so \(P(D) = 1/1000\). Using Rule 3, the probability of no disease is \(P(\text{no }D) = 999/1000\). In 1000 people, 1 will actually have the disease and 999 won’t have the disease.
The false-positive rate is 5 percent, so given that you don’t have the disease, the probability of getting a false positive is \(P(+ \mid\text{ no } D) = 0.05\). So of the 999 that don’t have the disease, about \(0.05\times 999 = 49.95\) (about 50) of them will get a false positive test result.
While it is not stated directly in the Washington Post article, most medical tests have a fairly high accuracy in catching the disease. For example: \(P(+ \mid D) = 0.99\). Therefore, the 1 person who actually has the disease will most likely get a positive test result back (\(0.99*1 = 0.99\)).

Remember that our interest is in \(P(D \mid +)\). By the definition of conditional probability, we consider only those with positive test results (about 50 who are disease free and 1 who has the disease). So the probability of actually having the disease GIVEN a positive test result is about 1/51 = 0.019. This is not close to 95%!

In mathematical notation, that looks like this

\[\begin{align*} P(D \mid +) &= \frac{P(D \text{ and } +)}{P(+)} &\text{Rule 5*}\\ &= \frac{P(D \text{ and } +)}{P(+ \text{ and } D) + P(+ \text{ and no } D)} &\text{2 ways you can get +}\\ &= \frac{P(+ \mid D) P(D)}{P( + \mid D) P(D) + P( + \mid \text{ no }D) P(\text{no }D)} &\text{Rule 5*}\\ &= \frac{0.99*1/1000}{0.99*1/1000 + 0.05*999/1000} &\text{Plug in values}\\ &= \frac{0.99*1}{0.99*1 + 0.05*999} &\text{Simplify and evaluate}\\ &= 0.019 \end{align*}\]

The third line above (\(P(D \mid +) = \frac{P(+ \mid D) P(D)}{P( + \mid D) P(D) + P( + \mid \text{ no }D) P(\text{no }D)}\)) is often called Bayes’ Rule. The important idea to take from this is that what we condition on can make a big difference in the resulting probability.

Now, take some time to read the full Washington Post article.

6.2.2 Court Arguments and Probability

The concept of conditional probability also plays an important role in the judicial system in the U.S. The foundation of the judicial system is the concept of “innocent until proven guilty”. Decisions are supposed to be based from a point of view of assuming that the defendant is innocent. Thus, jurors are supposed to decide the chances of seeing this evidence assuming innocence. That is, evidence is presented to jurors as the conditional probability: \(P(\text{ evidence } \mid \text{ innocent })\).

Unfortunately, many prosecutors try to make the wrong argument by flipping the conditional probability, whether maliciously or due to a lack of statistical knowledge. They sometimes mistakenly try to argue that it is unlikely that a person is innocent given the evidence that is presented, \(P(\text{ innocent } \mid \text{ evidence })\).

This can be dangerous. We know that \(P(\text{ evidence } \mid \text{ innocent }) \not = P(\text{ innocent } \mid \text{ evidence })\) based on the disease testing example above. Generally, \(P(A \mid B)\) is not equal to (and can be very different from) \(P(B \mid A)\).

This is known as the prosecutor’s fallacy. You can read more about it here.