library(tidymodels)
library(tidyverse)
# STEP 1: specify a linear regression model
<- linear_reg() %>%
lm_spec set_mode("regression") %>%
set_engine("lm")
# STEP 2: variable recipe
# Add a pre-processing step that does PCA on the predictors
# num_comp is the number of PCs to keep (we need to tune it!)
<- recipe(y ~ ., data = sample_data) %>%
pcr_recipe update_role(data_id, new_role = "id") %>%
step_dummy(all_nominal_predictors()) %>%
step_normalize(all_predictors()) %>%
step_pca(all_predictors(), num_comp = tune())
# STEP 3: workflow
<- workflow() %>%
pcr_workflow add_recipe(pcr_recipe) %>%
add_model(lm_spec)
# STEP 4: Estimate multiple PCR models trying out different numbers of PCs to keep
# For the range, the biggest number you can try is the number of predictors you started with
# Put the same number in levels
set.seed(___)
<- pcr_workflow %>%
pcr_models tune_grid(
grid = grid_regular(num_comp(range = c(1, ___)), levels = ___),
resamples = vfold_cv(sample_data, v = 10),
metrics = metric_set(mae)
)
20 Principal Component Regression
Unsupervised & supervised learning are friends!
Settling In
- Sit with the same group as last class
- Hand in your Quiz 2 Revisions
- Prepare to take notes
- Catch up on any announcements you’ve missed on Slack
Announcements
Quiz 3 is coming up next week!
- Format: same as Quizzes 1 and 2
- Content: cumulative, but focus on unsupervised learning
- Study Tips:
- Create a study guide using the “Learning Goals” page on the course website
- Fill out the STAT 253 Concepts Maps (slides 9–11)
- Work on Group Assignment 3
- Review old CPs, HWs, and in-class exercises
- Come to office hours with questions!
Notes: PC Regression
Context
We’ve been distinguishing 2 broad areas in machine learning:
- supervised learning: when we want to predict / classify some outcome y using predictors x
- unsupervised learning: when we don’t have any outcome variable y, only features x
- clustering: examine structure among the rows with respect to x
- dimension reduction: examine & combine structure among the columns x
. . .
BUT sometimes we can combine these ideas.
Combining Forces: Clustering + Regression
Use dimension reduction to visualize / summarize lots of features and notice interesting groups.
Example: many physical characteristics of penguins, many characteristics of songs, etcUse clustering to identify interesting groups.
Example: types (species) of penguins, types (genres) of songs, etcThese groups might then become our \(y\) outcome variable in future analysis.
Example: classify new songs as one of the “genres” we identified
EXAMPLE: K-means clustering + Classification of news articles
Dimension Reduction + Regression: Dealing with lots of predictors
Suppose we have an outcome variable \(y\) (quantitative OR categorical) and lots of potential predictors \(x_1, x_2, ..., x_p\).
Perhaps we even have more predictors than data points (\(p > n\))!
. . .
This idea of measuring lots of things on a sample is common in genetics, image processing, video processing, or really any scenario where we can grab data on a bunch of different features at once.
For simplicity, computational efficiency, avoiding overfitting, etc, it might benefit us to simplify our set of predictors.
. . .
There are a few approaches:
variable selection (eg: using backward stepwise)
Simply kick out some of the predictors. NOTE: This doesn’t work when \(p > n\).regularization (eg: using LASSO)
Shrink the coefficients toward / to 0. NOTE: This sorta works when \(p > n\).feature extraction (eg: using PCA)
Identify & utilize only the most salient features of the original predictors. Specifically, combine the original, possibly correlated predictors into a smaller set of uncorrelated predictors which retain most of the original information. NOTE: This does work when \(p > n\).
Principal Component Regression (PCR)
Step 1
Ignore \(y\) for now. Use PCA to combine the \(p\) original, correlated predictors \(x\) into a set of \(p\) uncorrelated PCs.Step 2
Keep only the first \(k\) PCs which retain a “sufficient” amount of information from the original predictors.Step 3
Model \(y\) by these first \(k\) PCs.
PCR vs Partial Least Squares
When combining the original predictors \(x\) into a smaller set of PCs, PCA ignores \(y\). Thus PCA might not produce the strongest possible predictors of \(y\).
. . .
Partial least squares provides an alternative.
. . .
Like PCA, it combines the original predictors into a smaller set of uncorrelated features, but considers which predictors are most associated with \(y\) in the process.
Chapter 6.3.2 in ISLR provides an optional overview.
Small Group Discussion
EXAMPLE 1
For each scenario below, indicate which would (typically) be preferable in modeling y by a large set of predictors x: (1) PCR; or (2) variable selection or regularization.
- We have more potential predictors than data points (\(p > n\)).
- It’s important to understand the specific relationships between y and x.
- The x are NOT very correlated.
Exercises
- Make the most of your work time in class!
- These exercises are on HW7.
- IMPORTANT: Remember to
set.seed(253)
on any exercises that involve randomness. - Save at least 15 minutes to get started on Group Assignment 3
Group Assignment 3
Before you leave class today:
- Get data on your local computers
- Start exploring the data:
- Familiarize yourself with the variables
- Create initial visualizations
- Determine if any data cleaning is needed (remove, modify, or create variables; remove or fill in missing values; remove observations)
- Make a plan:
- How to decide which features to use, how many and which algorithms to try, how to evaluate each algorithm
- Set up communication avenues for out-of-class discussions (slack channel? in-person meetings? etc.)
- Divide / delegate leadership on tasks
Future Reference
Notes: R code
Suppose we have a set of sample_data
with multiple predictors x, a quantitative outcome y, and (possibly) a column named data_id
which labels each data point. We could adjust this code if y were categorical.
RUN THE PCR algorithm
FOLLOW-UP
Processing and applying the results is the same as for our other tidymodels
algorithms!
Solutions
Small Group Discussion
EXAMPLE 1
Solution:
- (typically) can’t do variable selection, regularization when \(p > n\).
- The PCs lose the original meaning of the predictors
- PCR wouldn’t simplify things much (need a lot of PCs to retain info).
Exercises
Solutions will not be provided. These exercises are part of your homework this week.
Wrapping Up
Upcoming Due Dates:
- HW7: due TOMORROW
- Quiz 3: next Thursday
- Group Assignment 3: next Friday
- Final Learning Reflection: finals week