eval = FALSE
chunks. If you don’t like these, you can do a find-and-replace to remove them, but you won’t be able to knit your document right away.CONTEXT
world = supervised learning
We want to model some output variable \(y\) using a set of potential predictors \((x_1, x_2, ..., x_p)\).
task = regression
\(y\) is quantitative
model = linear regression
We’ll assume that the relationship between \(y\) and (\(x_1, x_2, ..., x_p\)) can be represented by
\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p + \varepsilon\]
Least Absolute Shrinkage and Selection Operator
Use the LASSO algorithm to help us regularize and select the “best” predictors \(x\) to use in a predictive linear regression model of \(y\):
\[y = \hat{\beta}_0 + \hat{\beta}_1 x_1 + \cdots + \hat{\beta}_p x_p + \varepsilon\]
Idea
Algorithm Criterion
Identify the model coefficients \(\hat{\beta}_1, \hat{\beta}_2, ... \hat{\beta}_p\) that minimize the penalized residual sum of squares:
\[RSS + \lambda \sum_{j=1}^p \vert \hat{\beta}_j\vert = \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p \vert \hat{\beta}_j\vert\]
where
Discuss basic understanding from the video to help each other clear up concepts.
EXAMPLE 1: LASSO vs other algorithms for building linear regression models
EXAMPLE 2: LASSO tuning
We have to pick a \(\lambda\) penalty tuning parameter for our LASSO model. What’s the impact of \(\lambda\)?
When \(\lambda\) is 0, …
As \(\lambda\) increases, the predictor coefficients ….
Goldilocks problem:
To decide between a LASSO that uses \(\lambda = 0.01\) vs \(\lambda = 0.1\) (for example), we can ….
Picking \(\lambda\)
We cannot know the “best” value for \(\lambda\) in advance. This varies from analysis to analysis.
We must try a reasonable range of possible values for \(\lambda\). This also varies from analysis to analysis.
In general, we have to use trial-and-error to identify a range that is…
Go to https://bcheggeseth.github.io/253_spring_2024/schedule.html
Open Rmd for today.
Go to > Exercises.
fit_resamples
to run CV, we’ll use tune_grid
to tune the algorithm with CVset_engine('glmnet')
Upcoming due dates
Nothing due Thursday
Due next Tuesday: