---
title: "Multiple logistic regression (Notes)"
subtitle: "STAT 155"
author: "Your Name"
format:
  html:
    toc: true
    toc-depth: 2
    embed-resources: true
---


```{r setup}
#| include: false
knitr::opts_chunk$set(
  collapse = TRUE, 
  warning = FALSE,
  message = FALSE,
  error = TRUE,
  fig.height = 2.75, 
  fig.width = 4.25,
  fig.env = 'figure',
  fig.pos = 'h',
  fig.align = 'center')

# Use a color blind friendly color palette throughout doc
library(tidyverse)
cb_palette <- c("#0072B2", "#D55E00","black", "#E69F00", "#56B4E9", "#009E73", "#F0E442",  "#CC79A7")
scale_colour_discrete <- function(...) scale_colour_manual(values = cb_palette, ...)
scale_fill_discrete   <- function(...) scale_fill_manual(values = cb_palette, ...)
theme_set(theme_bw())
```


## Warm-up {-}


We've learned two foundational modeling tools: "Normal" linear regression (when Y is quantitative) and logistic regression (when Y is binary).

- Normal linear regression examples   
    - model a person's reaction time (Y) by how long they slept (X)
    - model a person's reaction time (Y) by whether or not they took a nap (X)

- logistic regression examples
    - model whether or not a person believes in climate change (Y) based on their age (X)
    - model whether or not a person believes in climate change (Y) based on their location (X)


### Example 1: Probability, odds, odds ratio review

In this and the previous activity, we'll be modeling the *success* of a mountain climber.
Consider the following example calculations:

event                    prob         odds
------------------------ ----------- ------
climber A is successful  0.50          1
climber B is successful  0.66 (2/3)    2
climber C is successful  0.75          ?


Calculate & interpret the conditional *odds* of success for climber C:   
    
Odds(success | climber C) = 
    
    
Calculate & interpret the *odds ratio* of success, comparing climber C to climber B:   

Odds(success | Climber C) / Odds(success | Climber B) = 


Calculate & interpret the *odds ratio* of success, comparing climber A to climber B:   

Odds(success | Climber A) / Odds(success | Climber B) = 


### Example 2: Success over time

Load the climbers data:

```{r}
# Load data and packages
library(tidyverse)
climbers <- read.csv("https://mac-stat.github.io/data/climbers_sub.csv") %>% 
  select(peak_name, success, age, oxygen_used, year, season)
```

To better understand how climbing success rates have changed over time, let's explore success by year:

```{r}
# Bad plot!!
# Shows how year (X) varies by success (Y), not vice versa
climbers %>% 
  ggplot(aes(x = year, y = success)) +
  geom_boxplot() 
```

```{r}
# Bad plot!! How to interpret this?!
climbers %>% 
  ggplot(aes(x = year, y = success)) +
  geom_point() 
```

```{r}
# Better plot!
climbers %>% 
  group_by(year) %>% 
  summarize(success_rate = mean(success)) %>% 
  ggplot(aes(x = year, y = success_rate)) +
  geom_point() + 
  theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1, size = 8))
```

```{r}
climbers %>% 
  mutate(year_bracket = cut(year, breaks = 10)) %>%
  group_by(year_bracket) %>%
  summarize(success_rate = mean(success)) %>% 
  ggplot(aes(x = year_bracket, y = success_rate)) +
  geom_point() + 
  theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1, size = 8))

```

a. Let's now *model* that relationship...

```{r}
# Build the model
success_by_year <- ___(success ~ year, data = climbers, ___)

# Get the model summary table
coef(summary(success_by_year))
```


b. Write out the model formula.

___ = -119.36215 + 0.05934year


c. Interpret both model coefficients on the *odds* scale.


```{r}
# Exponentiate the coefficients

```


### Example 3: Success by season

Next, let's explore success rates by season:

```{r}
# Visualize the relationship
# Bad plot! Shows dependence of season (X) on success (Y), not vice versa
climbers %>% 
  ggplot(aes(fill = season, x = success)) +
  geom_bar(position = "fill")
```

```{r}
# Better plot!
# x = X, fill = Y
climbers %>% 
  ggplot(aes(x = season, fill = success)) +
  geom_bar(position = "fill")
```

```{r}
# Model the relationship
success_by_season <- glm(success ~ season, data = climbers, family = "binomial")
coef(summary(success_by_season))
```

Thus the estimated model formula is:

log(odds of `success` | `season`) = -0.6492953 + 0.3801896 `seasonSpring` + 14.2153618 `seasonSummer` - 1.0453004 `seasonWinter`


a. Interpret the intercept coefficient on the *odds* scale. Choose one.  

```{r}
exp(-0.6492953)
```

- The odds of success are 0.52, i.e. success is roughly half as likely as failure.
- The odds of success are 0.52 in Autumn, i.e. success is roughly half as likely as failure in Autumn.
- The odds of success are roughly half (52%) as big in Autumn than in Summer.


b. Interpret the `seasonWinter` coefficient on the *odds* scale. Choose one.

```{r}
exp(-1.0453004)
```

- The odds of success are 0.35 in Winter.
- The odds of success are 0.35 higher in Winter than in Autumn.
- The odds of success in Winter are 35% higher than the odds of success in Autumn.
- The odds of success in Winter are 35% as high as (i.e. 65% lower than) the odds of success in Autumn.


### Example 4: Predictions

Suppose your friend plans an expedition in Winter.
Their *probability* of success is roughly 0.16:

```{r}
# By hand
log_odds <- -0.6492953 - 1.0453004*1
log_odds

odds <- exp(log_odds)
odds

prob <- odds / (odds + 1)
prob
```

```{r}
# Check our work
# Note the use of type = "response"! If we left this off we'd get a prediction on the log(odds) scale
predict(success_by_season, newdata = data.frame(season = "Winter"), type = "response")
```


a. Suppose we had to translate the probability calculation into a yes / no prediction. Yes or no: do you predict that your friend will be successful?


b. In logistic regression, we don't measure the accuracy of a prediction by a residual. Our prediction is either right or wrong! For example, suppose that your friend *does* successfully summit their peak. Was our prediction right or wrong?


## Exercises {-}

**Context:** In this activity, we'll look at data from an experiment conducted in 2001-2002 that investigated the influence of race and gender on job applications. The researchers created realistic-looking resumes and then randomly assigned a name to the resume that "would communicate the applicant's gender and race" (e.g., they they assumed the name Emily would generally be interpreted as a white woman, whereas the name  Jamal would generally be interpreted as a black man).  They then submitted these resumes to job postings in Boston and Chicago and waited to see if the  applicant got a call back from the job posting.

You can find a full description of the variables in this dataset [here](https://www.openintro.org/data/index.php?data=resume). Today, we'll focus on the following variables:

- `received_callback`: indicator that the resume got a call back from the job posting
- `gender`: inferred binary gender associated with the first name on the resume
- `race`: inferred race associated with the first name on the resume

Our research question is: **does an applicant's inferred gender and race have an effect on the chance that they receive a callback after submitting their resume for an open job posting?**

```{r}
library(tidyverse)
#library(ggmosaic)

resume <- read_csv("https://mac-stat.github.io/data/resume.csv")
```

### Exercise 1: Graphical and numerical summaries

Our research question involves three categorical variables: `received_callback` (1 = yes, 0 = no), `gender` (f = female, m = male), and `race` (Black, White). Let's start by creating a mosaic plot to visually compare inferred binary gender and callbacks:

```{r graphical-summary-two-categorical-variables}
ggplot(resume) + 
    geom_bar(aes(x = gender, fill = factor(received_callback)),position='fill') +
    scale_fill_manual("Received Callback? \n(1 = yes, 0 = no)", values = c("lightblue", "steelblue")) + 
    labs(x = "Inferred Binary Gender (f = female, m = male)", y = "Received Callback? (1 = yes, 0 = no)")

# create mosaic plot of callback vs gender
#ggplot(resume) + 
#    geom_mosaic(aes(x = product(gender), fill = received_callback)) +
#    scale_fill_manual("Received Callback? \n(1 = yes, 0 = no)", values = c("lightblue", "steelblue")) + 
#    labs(x = "Inferred Binary Gender (f = female, m = male)", y = "Received Callback? (1 = yes, 0 = no)")
```


In this activity, we're also interested in looking at the relationship between inferred race and callbacks. One way we can add a third variable to a plot is to use the `facet_grid` function, particularly when that third variable is categorical. Let's try that now:

```{r graphical-summary-three-variables}
ggplot(resume) + 
    geom_bar(aes(x = gender, fill = factor(received_callback)), position='fill')+
    facet_grid(. ~ race) +
    scale_fill_manual("Received Callback? \n(1 = yes, 0 = no)", values = c("lightblue", "steelblue")) + 
    labs(x = "Inferred Binary Gender (f = female, m = male)", y = "Received Callback? (1 = yes, 0 = no)")


# create mosaic plot of callback vs gender and race
#ggplot(resume) + 
#    geom_mosaic(aes(x = product(gender), fill = received_callback)) +
#    facet_grid(. ~ race) +
#    scale_fill_manual("Received Callback? \n(1 = yes, 0 = no)", values = c("lightblue", "steelblue")) + 
#    labs(x = "Inferred Binary Gender (f = female, m = male)", y = "Received Callback? (1 = yes, 0 = no)")
```


Here's another way of looking at the relationship between these three variables, switching the placement of `gender` and `race` in the mosaic plot:

```{r graphical-summary-three-variables-version2}
ggplot(resume) + 
    geom_bar(aes(x = race, fill = factor(received_callback)), position='fill')+
    facet_grid(. ~ gender) +
    scale_fill_manual("Received Callback? \n(1 = yes, 0 = no)", values = c("lightblue", "steelblue")) + 
    labs(x = "Inferred Race", y = "Received Callback? (1 = yes, 0 = no)")


# create mosaic plot of callback vs gender and race
#ggplot(resume) + 
#    geom_mosaic(aes(x = product(received_callback, race), fill = received_callback)) +
#    facet_grid(. ~ gender) +
#    scale_fill_manual("Received Callback? \n(1 = yes, 0 = no)", values = c("lightblue", "steelblue")) + 
#    labs(x = "Inferred Race", y = "Received Callback? (1 = yes, 0 = no)")
```

When we are comparing three categorical variables, a useful numerical summary is to calculate relative frequencies/proportions of cases falling into each category of the outcome variable, conditional on which categories of the explanatory variables they fall into. Run this code chunk to calculate the conditional proportion of resumes that did nor did not receive a callback, given the inferred gender and race of the applicant:

```{r numerical-summaries}
# corresponding numerical summaries
resume %>%
    group_by(race, gender) %>%
    count(received_callback) %>%
    group_by(race, gender) %>%
    mutate(condprop = n/sum(n))
```

Write a short description that summarizes the information you gain from these visualizations and numerical summaries. Write this summary using good sentences that tell a story and do not resemble a checklist. Don't forget to consider the context of the data, and make sure that your summary addresses our research question: *does an applicant's inferred gender or race have an effect on the chance that they receive a callback?*


### Exercise 2: Logistic regression modeling

Next, we'll fit a logistic regression model to these data, modeling the *log odds* of receiving a callback as a function of the applicant's inferred gender and race:

$$\log(Odds[ReceivedCallback = 1 \mid gender, race]) = \beta_0 + \beta_1 genderm + \beta_2 racewhite$$

Fill in the blanks in the code below to fit this logistic regression model. 

```{r logistic-model}
# fit logistic model and save it as object called "mod1"
mod1 <- glm(received_callback ~ gender + race, data = ___, family = ___)
```

Then, run the code chunk below to get the coefficient estimates and exponentiated estimates:

```{r get-coefficient-estimates-mod1}
# Original estimates
coef(mod1)

# Exponentiated estimates
exp(coef(mod1))
```

Write an interpretation of each of the exponentiated coefficients in your logistic regression model.


### Exercise 3: Interaction terms

a. Do you think it would make sense to add an interaction term (between gender and race) to our logistic regression model? Why/why not?

b. Let's try adding an interaction between gender and race. Update the code below to fit this new interaction model.

```{r logistic-model-with-interaction}
# fit logistic model and save it as object called "mod2"
mod2 <- glm(received_callback ~ ___, data = resume, family = ___)
```

Then, run the code chunk below to get the coefficient estimates and exponentiated estimates for this interaction model:

```{r get-coefficient-estimates-mod2}
# Original estimates
coef(mod2)

# Exponentiated estimates
exp(coef(mod2))
```

c. (CHALLENGE) Write out the logistic regression model formula separately for males and for females. Based on this how would we interpret the exponentiated coefficients in this model?


### Exercise 4: Prediction

We can use our models to predict whether or not a resume will receive a call back based on the inferred gender and race of the applicant. Run the code below to use the `predict()` function to predict the probability of getting a call back for four job applicants: a person inferred to be a black female, a person inferred to be black male, a person inferred to be a white female, and a person inferred to be a white male. 

```{r predict}
# set up data frame with people we want to predict for
predict_data <- data.frame(
    gender = c("f", "m", "f", "m"),
    race = c("black", "black", "white", "white")
)
print(predict_data)

# prediction based on model without interaction
mod1 %>%
    predict(newdata = predict_data, type = "response")

# prediction based on model with interaction
mod2 %>%
    predict(newdata = predict_data, type = "response")
```

Report and compare the predictions we get from `predict()`. Do they make sense to you based on your understanding of the data? Combine insights from visualizations and modeling to write a few sentences summarizing findings for our research question: *does an applicant's inferred gender and race have an effect on the chance that they receive a callback after submitting their resume for an open job posting?*


### Exercise 5: Evaluating logistic models with plots

We'll fit one more model that adds on to the interaction model to also include years of college, years of work experience, and resume quality. The code below takes our fitted models and stores the predicted probabilities in a variable called `.fitted`. Then we use boxplots to show the predicted probabilities of receiving a callback in those who actually did and did not receive a callback.

```{r}
mod3 <- glm(received_callback ~ gender*race + years_college + years_experience + resume_quality, data = resume, family = "binomial")

# mod1 predicted probabilities
mod1_output <- resume %>% 
  mutate(.fitted = predict(mod1, newdata = ., type = "response")) 

mod1_output %>% 
  ggplot(aes(x = factor(received_callback), y = .fitted)) +
  geom_boxplot()
  
# mod2 predicted probabilities
mod2_output <- resume %>% 
  mutate(.fitted = predict(mod2, newdata = ., type = "response")) 

mod2_output %>% 
  ggplot(aes(x = factor(received_callback), y = .fitted)) +
  geom_boxplot()
 
# mod3 predicted probabilities
mod3_output <-resume %>% 
  mutate(.fitted = predict(mod3, newdata = ., type = "response")) 

mod3_output %>% 
  ggplot(aes(x = factor(received_callback), y = .fitted)) +
  geom_boxplot()
  
```

a. Summarize what you learn about the ability of the 3 models to differentiate those who actually did and did not receive a callback. What model seems best, and why?

b. If you had to draw a horizontal line across each of the boxplots that vertically separates the left and right boxplots well, where would you place them?


### Exercise 6: Evaluating logistic models with evaluation metrics

Sometimes we may need to go beyond the predicted probabilities from our model and try to classify individuals into one of the two binary outcomes (received or did not receive a callback). How high of a predicted probability would we need from our model in order to be convinced that the person actually got a callback? This is the idea behind the horizontal lines that we drew in the previous exercise.

Let's explore using a probability threshold of 0.08 (8%) to make a binary prediction for each case:

- If a model's predicted probability of getting a callback is greater than or equal to 8.5%, we'll predict they got a callback.
- If the predicted probability is below 8%, we'll predict they didn't get a callback.

We can visualize this threshold on our predicted probability boxplots:

```{r}
ggplot(mod1_output, aes(x = factor(received_callback), y = .fitted)) +
    geom_boxplot() +
    geom_hline(yintercept = 0.08, color = "red")
ggplot(mod2_output, aes(x = factor(received_callback), y = .fitted)) +
    geom_boxplot() +
    geom_hline(yintercept = 0.08, color = "red")
ggplot(mod3_output, aes(x = factor(received_callback), y = .fitted)) +
    geom_boxplot() +
    geom_hline(yintercept = 0.08, color = "red")
```


Next, we can use our threshold to classify each person in our dataset based on their predicted probability of getting a callback: we'll predict that everyone with a predicted probability higher than our threshold got a callback, and otherwise they did not. Then, we'll compare our model's prediction to the true outcome (whether or not they actually did get a callback).

```{r}
# get binary predictions for mod1 and compare to truth
threshold <- 0.08
mod1_output %>%
    mutate(predictCallback = .fitted >= threshold) %>% ## predict callback if probability greater than or equal to threshold
    count(received_callback, predictCallback) ## compare actual and predicted callbacks

mod2_output %>%
    mutate(predictCallback = .fitted >= threshold) %>%
    count(received_callback, predictCallback)

mod3_output %>%
    mutate(predictCallback = .fitted >= threshold) %>%
    count(received_callback, predictCallback)
```

We can use the `count()` output to fill create contingency tables of the results. (These tables are also called **confusion matrices**.)

a. Fill in the confusion matrix for Model 3.

> Models 1 and 2: (Both models result in the same confusion matrix.)

                        Predict callback   Predict no callback    Total
---------------------- ------------------ ---------------------- -------
Actually got callback       235               157                  392
Actually did not           2200              2278                 4478
Total                      2435              2435                 4870

> Model 3: 

                        Predict callback   Predict no callback    Total
---------------------- ------------------ ---------------------- -------
Actually got callback      ____               ____                ____
Actually did not           ____               ____                ____
Total                      ____               ____                ____

b. Now compute the following evaluation metrics for the models:

Models 1 and 2:

- Accuracy: P(Predict Y Correctly)
- Sensitivity: P(Predict Y = 1 | Actual Y = 1)
- Specificity: P(Predict Y = 0 | Actual Y = 0)
- False negative rate: P(Predict Y = 0 | Actual Y = 1)
- False positive rate: P(Predict Y = 1 | Actual Y = 0)

Model 3: 

- Accuracy: P(Predict Y Correctly)
- Sensitivity: P(Predict Y = 1 | Actual Y = 1)
- Specificity: P(Predict Y = 0 | Actual Y = 0)
- False negative rate: P(Predict Y = 0 | Actual Y = 1)
- False positive rate: P(Predict Y = 1 | Actual Y = 0)

c. Imagine that we are a career center on a college campus and we want to use this model to help students that are looking for jobs. Consider the consequences of incorrectly predicting whether or not an individual will get a callback. What are the consequences of a false negative? What about a false positive? Which one is worse?


### Reflection

What are some similarities and differences between how we interpret and evaluate linear and logistic regression models?

> **Response:** Put your response here.


## Extra practice {-}


Build 2 models that you'll use below, the first of which you explored in the previous activity:

```{r}
# Load data
climbers <- read.csv("https://mac-stat.github.io/data/climbers_sub.csv") %>% 
  select(peak_name, success, age, oxygen_used, year, season)

# Build 2 models
climb_model_1 <- glm(success ~ age, climbers, family = "binomial")
climb_model_2 <- glm(success ~ age + oxygen_used, climbers, family = "binomial")
```


### Exercise 7: `climb_model_2`

a. `climb_model_1` uses only `age` as a predictor:

```{r}
# Plot climb_model_1
climbers %>% 
  ggplot(aes(y = as.numeric(success), x = age)) + 
  geom_smooth(method = "glm", se = FALSE, method.args = list(family = "binomial")) +
  labs(y = "probability of success")
```

In contrast, the *multiple* logistic regression model `climb_model_2` includes *2* predictors of `success`: `age` *and* `oxygen_used`. In comparison to the above plot of `climb_model_1`, what do you anticipate `climb_model_2` looking like?


b. Check your intuition! Summarize, in words, what you learn about the relationship of success with age and oxygen use from the below plots.

```{r}
# Plot climb_model_2
climbers %>% 
  ggplot(aes(y = as.numeric(success), x = age, color = oxygen_used)) + 
  geom_smooth(method = "glm", se = FALSE, method.args = list(family = "binomial")) +
  labs(y = "probability of success")
```

```{r}
# Just for fun zoom out!
climbers %>% 
  ggplot(aes(y = as.numeric(success), x = age, color = oxygen_used)) + 
  geom_smooth(method = "glm", se = FALSE, method.args = list(family = "binomial"),
              fullrange = TRUE) +
  labs(y = "probability of success") + 
  lims(x = c(-300, 400))
```

c. The estimated model formula is:

log(odds of `success` | `age`, `oxygen_used`) = -0.520 - 0.022 `age` + 2.896 `oxygen_usedTRUE`

```{r}
# Get the model summary table
coef(summary(climb_model_2))

# Exponentiated coefficients
exp(coef(climb_model_2))
```

Interpret the `age` coefficient on the odds scale. Don't forget to "control for..."!


d. Interpret the `oxygen_usedTRUE` coefficient on the odds scale. Don't forget to "control for..."!


### Exercise 8: Model evaluation using plots

Recall our 2 models of climbing success:

model            predictors
---------------- -----------------------
`climb_model_1`  age
`climb_model_2`  age, oxygen_used
    
Let's compare the quality of these models.
To begin, the code below calculates the probability of success for every climber in our dataset using `climb_model_1`, and stores it as the `.fitted` variable:

```{r}
climbers %>% 
  mutate(.fitted = predict(climb_model_1, newdata = ., type = "response")) %>% 
  head()
```

A *visual* way to evaluate the quality of our model is to compare the probability of success it assigns to climbers that were and were not actually successful:

```{r}
# Compare the climb_model_1 probability of success calculations for those that were & weren't successful
climbers %>% 
  mutate(.fitted = predict(climb_model_1, newdata = ., type = "response")) %>% 
  ggplot(aes(x = success, y = .fitted)) +
  geom_boxplot()
```

Similarly, check out the probability calculations using `climb_model_2`:

```{r}
# Compare the climb_model_2 probability of success calculations for those that were & weren't successful
climbers %>% 
  mutate(.fitted = predict(climb_model_2, newdata = ., type = "response")) %>% 
  ggplot(aes(x = success, y = .fitted)) +
  geom_boxplot()
```


a. Summarize what you learn from the plots about the ability of `climb_model_1` and `climb_model_2` to differentiate between those who were and were not actually successful. Comment on which model seems better, and why.

b. Focus on the plot for `climb_model_2`. Sometimes we may need to go beyond the probability calculations and make a *binary* prediction about whether a climber will succeed or fail. Suppose we used a 0.5 probability threshold: If the probability of success exceeds 0.5, then predict success. Otherwise, predict failure.
In pictures:

```{r}
climbers %>% 
  mutate(.fitted = predict(climb_model_2, newdata = ., type = "response")) %>% 
  ggplot(aes(x = success, y = .fitted)) +
  geom_boxplot() + 
  geom_hline(yintercept = 0.5, color = "red")
```

Discuss:

- What do you think of the 0.5 threshold? Would it lead to good predictions? Would it do better at predicting when climbers will succeed or when they'll fail?

- If we used the 0.5 threshold, what would the specificity be: less than 25%, between 25% & 50%, between 50% & 75%, or above 75%? Answer this using only the plot.

- If we used the 0.5 threshold, what would the sensitivity be: less than 25%, between 25% & 50%, between 50% & 75%, or above 75%? Answer this using only the plot.

- If you don't like the 0.5 threshold, what do you think would be a better option? Why?


### Exercise 9: Model evaluation using evaluation metrics

Let's explore using a probability threshold of 0.25 (25%) to make a binary prediction of success / failure for each climber in our dataset:

- If a model's probability of success is greater than or equal to 25%, we'll predict that they'll succeed.
- If the probability is below 25%, we'll predict that they'll fail.

We can visualize this threshold on our predicted probability boxplots for our 2 models:

```{r}
climbers %>% 
  mutate(.fitted = predict(climb_model_1, newdata = ., type = "response")) %>% 
  ggplot(aes(x = success, y = .fitted)) +
  geom_boxplot() + 
  geom_hline(yintercept = 0.25, color = "red")
climbers %>% 
  mutate(.fitted = predict(climb_model_2, newdata = ., type = "response")) %>% 
  ggplot(aes(x = success, y = .fitted)) +
  geom_boxplot() + 
  geom_hline(yintercept = 0.25, color = "red")
```

To evaluate the quality of the resulting predictions, we essentially want to count up how many climbers fell on the "correct" side of the threshold, both overall and within both groups (successful and unsuccessful climbers).
Let's start with `climb_model_1`.
Using the 0.25 threshold, this model predicted success for the first 6 climbers -- 5 of these predictions were correct (5 of the 6 were actually successful) and 1 was wrong:

```{r}
threshold <- 0.25
climbers %>% 
  mutate(.fitted = predict(climb_model_1, newdata = ., type = "response")) %>% 
  mutate(predictSuccess = .fitted >= threshold) %>% 
  select(success, age, oxygen_used, .fitted, predictSuccess) %>% 
  head()
```

But that's just the first 6 climbers!
We can build a `tabyl()` to compare the observed `success` to the `predictSuccess` for *all* climbers in our dataset:

```{r}
# Rows = actual / observed success (FALSE or TRUE)
# Columns = predicted success (FALSE or TRUE)
library(janitor)
climbers %>% 
  mutate(.fitted = predict(climb_model_1, newdata = ., type = "response")) %>% 
  mutate(predictSuccess = .fitted >= threshold) %>% 
  tabyl(success, predictSuccess) %>%
  adorn_totals(c("row", "col"))
```

a. Use the confusion matrix to prove that `climb_model_1` has an **overall accuracy** rate of 40%.
That is:

P(Predict Y Correctly) = 0.40

HINT: Divide the total number of correct predictions by the total number of climbers.    

```{r}

```


b. `climb_model_1` has a **sensitivity** (aka **true positive rate**) of 99%, hence a **false negative rate** of 1%

- Sensitivity: P(Predict Y = success | Actual Y = success) = 0.99   
    Among (conditioned on) successful climbers, the probability of the model correctly predicting success is 99%.

- False negative rate: P(Predict Y = failure | Actual Y = success) = 0.01
    Among (conditioned on) successful climbers, the probability of the model INcorrectly predicting failure is 1%.

**Prove** these calculations using the confusion matrix:

```{r}

```


c. `climb_model_1` has a **specificity** (aka **true negative rate**) of only 2%, hence a **false positive rate** of 98%:

- Specificity: P(Predict Y = failure | Actual Y = failure) = 0.02   
    Among (conditioned on) unsuccessful climbers, the probability of the model correctly predicting failure is only 2%.

- False positive rate: P(Predict Y = success | Actual Y = failure) = 0.98
    Among (conditioned on) unsuccessful climbers, the probability of the model INcorrectly predicting success is 98%.

**Prove** these calculations using the confusion matrix:

```{r}

```


### Exercise 10: More confusion matrices

Obtain the confusion matrix, overall accuracy, sensitivity (true positive rate), and specificity (true negative rate) for `climb_model_2` (with a 0.25 probability threshold).
NOTE: The correct metrics are outlined in the next exercise if you want to check your work.    

```{r}
# Confusion matrix

# Overall accuracy

# Sensitivity

# Specificity

```


### Exercise 11: Model comparison

We've now considered 2 models of climbing success, and evaluated their predictive performance using a 0.25 probability threshold:

model  predictors         overall  sensitivity  specificity
------ ----------------- -------- ------------ ------------
1      age                   0.40         0.99         0.02
2      age, oxygen_used      0.75         0.69         0.79


a. Fill in the blanks with the model number.

- Model ___ had the higher *overall* accuracy.    

- Model ___ was better at predicting when a climber *would* succeed.    
- Model ___ was better at predicting when a climber would *not* succeed.   


b. Imagine that *you* are planning an expedition.
Consider the consequences of incorrectly predicting whether or not you will be successful.
What are the consequences of a false negative?
What about a false positive?
Which one is worse?
With that in mind, which climbing model do you prefer?


## Done!

- Finalize your notes: (1) Render your notes to an HTML file; (2) Inspect this HTML in your Viewer -- check that your work translated correctly; and (3) Outside RStudio, navigate to your `inclass_activities` subfolder within your `stat155` folder and locate the HTML file -- you can open it again in your browser.
- Clean up your RStudio session: End the rendering process by clicking the 'Stop' button in the 'Background Jobs' pane.
- Check the solutions in the course website, at the bottom of the corresponding chapter.
- Work on homework and/or any extra practice exercises!