Go to > Small Group Discussion: Model Evaluation Experiment.
Let’s build and evaluate a predictive model of an adult’s height (\(y\)) using some predictors \(x_i\) (eg: age, height, etc).
Each group will be given a different sample of 40 adults.
Start by predicting height (in) using hip circumference (cm).
Evaluate the model on your sample.
Be prepared to share your answers to:
How good is your simple model?
What would happen if we added more predictors?
In-Class Activity - Part 1
Your group has 5 minutes to complete exercise 1 and 2 (choosing one of three models).
Reflection / Reactions to the Group Choices?
Now work on exercises 3 - 5.
Notes - Overfitting
When we add more and more predictors into a model, it can become overfit to the noise in our sample data:
our model loses the broader trend / big picture
thus does not generalize to new data
thus results in bad predictions and a bad understanding of the relationship among the new data points
Notes - Overfitting Prevention
Training and Testing
In-sample metrics, i.e. measures of how well the model performs on the same sample data that we used to build it, tend to be overly optimistic and lead to overfitting.
Instead, we should build and evaluate, or train and test, our model using different data.
Notes - R Code
Split the sample data into training and test sets
# Set the random number seedset.seed(___)# Split the sample_data# "prop" is the proportion of data assigned to the training set# it must be some number between 0 and 1data_split <-initial_split(sample_data, strata = y, prop = ___)# Get the training data from the splitdata_train <- data_split %>%training()# Get the testing data from the splitdata_test <- data_split %>%testing()
Notes - R Code
Build a training model
# STEP 1: model specificationlm_spec <-linear_reg() %>%set_mode("regression") %>%set_engine("lm")# STEP 2: model estimation using the training datamodel_train <- lm_spec %>%fit(y ~ x1 + x2, data = data_train)
Notes - R Code
Use the training model to make predictions for the test data
# Make predictionsmodel_train %>%augment(new_data = data_test)
Evaluate the training model using the test data
# Calculate the test MAEmodel_train %>%augment(new_data = data_test) %>%mae(truth = y, estimate = .pred)
Work through the exercises on cars data as a group.
Same directions as before
Be kind to yourself
Be kind to each other & collaborate
Ask me questions as I move around the room.
After Class
Finishing the activity
If you didn’t finish the activity, no problem! Be sure to complete the activity outside of class, review the solutions in the online manual, and ask any questions on Slack or in office hours.
Re-organize and review your notes to help deepen your understanding, solidify your learning, and make homework go more smoothly!
An R code video, posted on today’s section on Moodle, talks through the new code. This video is OPTIONAL. Decide what’s right for you.
Continue to check in on Slack. I’ll be posting announcements there from now on.
Upcoming due dates
Tuesday, 10 minutes before your section: Checkpoint 3. There are two (short) videos to watch in advance.
Thursday 2/1: Homework 2
Start today, even if you just review the directions and scan the exercises. You will be sad if you start too late – HW is not designed to do in one sitting.
Using Slack, invite others to work on homework with you.