CONTEXT
world = supervised learning
We want to model some output variable \(y\) using a set of potential predictors \((x_1, x_2, ..., x_p)\).
task = CLASSIFICATION
\(y\) is categorical and binary
(parametric) algorithm
logistic regression
Let \(y\) be a binary categorical response variable:
\[y = \begin{cases} 1 & \; \text{ if event happens} \\ 0 & \; \text{ if event doesn't happen} \\ \end{cases}\]Further define \[\begin{split} p &= \text{ probability event happens} \\ 1-p &= \text{ probability event doesn't happen} \\ \text{odds} & = \text{ odds event happens} = \frac{p}{1-p} \\ \end{split}\]
Then a logistic regression model of \(y\) by \(x\) is \[\begin{split} \log(\text{odds}) & = \beta_0 + \beta_1 x \\ \text{odds} & = e^{\beta_0 + \beta_1 x} \\ p & = \frac{\text{odds}}{\text{odds}+1} = \frac{e^{\beta_0 + \beta_1 x}}{e^{\beta_0 + \beta_1 x}+1} \\ \end{split}\]
Coefficient interpretation
\[\begin{split} \beta_0 & = \text{ LOG(ODDS) when } x=0 \\ e^{\beta_0} & = \text{ ODDS when } x=0 \\ \beta_1 & = \text{ unit change in LOG(ODDS) per 1 unit increase in } x \\ e^{\beta_1} & = \text{ multiplicative change in ODDS per 1 unit increase in } x \\ \end{split}\]
Let’s model RainTomorrow
, whether or not it rains tomorrow in Sydney, by two predictors:
Humidity9am
(% humidity at 9am today)Sunshine
(number of hours of bright sunshine today)Check out & comment on the relationship of rain with these 2 predictors:
The logistic regression model is:
Let’s interpret the Sunshine coefficient of -0.313:
The logistic regression model is:
Let’s interpret the Sunshine coefficient of -0.313:
log(odds of rain) = -1.01 + 0.0260 Humidity9am - 0.313 Sunshine
Suppose there’s 99% humidity at 9am today and only 2 hours of bright sunshine.
We used a simple classification rule above with a probability threshold of c = 0.5:
Let’s translate this into a classification rule that partitions the data points into rain / no rain predictions based on the predictor values.
What do you think this classification rule / partition will look like?
Work
Identify the pairs of humidity and sunshine values for which the probability of rain is 0.5, hence the log(odds of rain) is 0.
Set the log odds to 0:
log(odds of rain) = -1.01 + 0.0260 Humidity9am - 0.313 Sunshine = 0
Solve for Humidity9am:
Move constant and Sunshine term to other side.
0.0260 Humidity9am = 1.01 + 0.3130 Sunshine
Divide both sides by 0.026:
Humidity9am = (1.01 / 0.026) + (0.3130 / 0.026) Sunshine
Humidity9am = 38.846 + 12.038 Sunshine
Let’s visualize the partition, hence classification regions defined by our classification rule:
Use our classification rule to predict rain / no rain for the following days:
No
Work on exercises 1-8 [optional 9-10] with your group.
Reflection & Review
Group Assignment
Upcoming due dates