MSCS Events
Thursday at 11:15am - MSCS Coffee Break
Next Tuesday 11:30am - 12:50pm - MSCS faculty listening session
Part 1
Part 2
Content
Any questions about:
We’ve been distinguishing 2 broad areas in machine learning:
BUT sometimes we can combine these ideas.
Use dimension reduction to visualize / summarize lots of features and notice interesting groups.
Example: many physical characteristics of penguins, many characteristics of songs, etc
Use clustering to identify interesting groups.
Example: types (species) of penguins, types (genres) of songs, etc
Suppose we have an outcome variable \(y\) (quantitative OR categorical) and lots of potential predictors \(x_1, x_2, ..., x_p\).
Perhaps we even have more predictors than data points (\(p > n\))!
For simplicity, computational efficiency, avoiding overfitting, etc, it might benefit us to simplify our set of predictors.
There are a few approaches:
variable selection (eg: using backward stepwise)
Simply kick out some of the predictors. NOTE: This doesn’t work when \(p > n\).
regularization (eg: using LASSO)
Shrink the coefficients toward / to 0. NOTE: This doesn’t work when \(p > n\).
feature extraction (eg: using PCA)
Identify & utilize only the most salient features of the original predictors. Specifically, combine the original, possibly correlated predictors into a smaller set of uncorrelated predictors which retain most of the original information. NOTE: This does work when \(p > n\).
PRINCIPAL COMPONENT REGRESSION (PCR)
Step 1
Ignore \(y\) for now. Use PCA to combine the \(p\) original, correlated predictors \(x\) into a set of \(p\) uncorrelated PC’s.
Step 2
Keep only the first \(k\) PCs which retain a “sufficient” amount of information from the original predictors.
Step 3
Model \(y\) by these first \(k\) PCs.
When combining the original predictors \(x\) into a smaller set of PCs, PCA ignores \(y\). Thus PCA might not produce the strongest possible predictors of \(y\).
Partial least squares provides an alternative.
Like PCA, it combines the original predictors into a smaller set of uncorrelated features, but considers which predictors are most associated with \(y\) in the process.
Chapter 6.3.2 in ISLR provides an optional overview.
For each scenario below, indicate which would (typically) be preferable in modeling \(y\) by a large set of predictors x:
Upcoming due dates