
Brianna Heggeseth

As we gather

  Sit in the randomly assigned groups. Introduce yourselves and choose a team name (you will need this later).


Small Group Discussion

  • Let’s build and evaluate a predictive model of an adult’s height (\(y\)) using some predictors \(x_i\) (eg: age, height, etc).
  • Each group will be given a different sample of 40 adults.
  • Start by predicting height (in) using hip circumference (cm).
  • Evaluate the model on your sample.

Be prepared to share your answers to:

  • How good is your simple model?

  • What would happen if we added more predictors?

In-Class Activity - Part 1

Your group has 5 minutes to complete exercise 1 and 2 (choosing one of three models).

Reflection / Reactions to the Group Choices?

Now work on exercises 3 - 5.

Notes - Overfitting

When we add more and more predictors into a model, it can become overfit to the noise in our sample data:

  • our model loses the broader trend / big picture
  • thus does not generalize to new data
  • thus results in bad predictions and a bad understanding of the relationship among the new data points

Notes - Overfitting Prevention

Training and Testing

  • In-sample metrics, i.e. measures of how well the model performs on the same sample data that we used to build it, tend to be overly optimistic and lead to overfitting.
  • Instead, we should build and evaluate, or train and test, our model using different data.

Notes - R Code

Split the sample data into training and test sets

# Set the random number seed

# Split the sample_data
# "prop" is the proportion of data assigned to the training set
# it must be some number between 0 and 1
data_split <- initial_split(sample_data, strata = y, prop = ___)

# Get the training data from the split
data_train <- data_split %>% 

# Get the testing data from the split
data_test <- data_split %>% 

Notes - R Code

Build a training model

# STEP 1: model specification
lm_spec <- linear_reg() %>% 
  set_mode("regression") %>% 

# STEP 2: model estimation using the training data
model_train <- lm_spec %>% 
  fit(y ~ x1 + x2, data = data_train)

Notes - R Code

Use the training model to make predictions for the test data

# Make predictions
model_train %>% 
  augment(new_data = data_test)

Evaluate the training model using the test data

# Calculate the test MAE
model_train %>% 
  augment(new_data = data_test) %>% 
  mae(truth = y, estimate = .pred)

In-Class Activity - Part 2

