Topic 10 Unit 1-3 Review

Review & Reflection

STAT 253 is a survey course of statistical machine learning techniques and concepts. It’s important to continuously reflect on these and how they fit together. Though you won’t hand anything in, or work on this in class today, you’re strongly encouraged to complete this activity. This material is designed to help you reflect upon:

ML concepts
- enduring, big picture concepts
- technical concepts
- tidymodels code
Your progress toward…
- engagement
- collaboration
- preparation (checkpoints)
- exploration (homework)

Find and make a copy of the following 2 resources.

You’ll be given some relevant prompts below, but you should use these materials in whatever way suits you! Take notes, add more content, rearrange, etc.

STAT 253 concept maps

Mark up slides 1–4 of the concept map with respect to the prompts below. Much of this overlaps with Homework 4.

Enduring, big picture concepts

IMPORTANT to your learning: Respond in your own words.

When do we perform a supervised vs unsupervised learning algorithm?
Within supervised learning, when do we use a regression vs a classification algorithm?
What is the importance of “model evaluation” and what questions does it address?
What is “overfitting” and why is it bad?
What is “cross-validation” and what problem is it trying to address?
What is the “bias-variance tradeoff”?

Technical concepts

On page 2, identify some general themes for each model algorithm listed in the lefthand table:

What’s the goal?
Is the algorithm parametric or nonparametric?
Does the algorithm have any tuning parameters? What are they, how do we tune them, and how is this a goldilocks problem?
What are the key pros & cons of the algorithm?

For each algorithm, you should also reflect upon the important technical concepts listed in the syllabus:

Can you summarize the steps of this algorithm?
Is the algorithm parametric or nonparametric? (addressed above)
What is the bias-variance tradeoff when working with or tuning this algorithm?
Is it important to scale / pre-process our predictors before feeding them into this algorithm?
Is this algorithm “computationally expensive”?
Can you interpret the technical (RStudio) output for this algorithm? (eg: CV plots, etc)?

Model evaluation

On page 2, do the following for each model evaluation question in the righthand table:

Identify what to check or measure in order to address the question, and how to interpret it.
Explain the steps of the CV algorithm.

Algorithm comparisons

Use page 3 to make other observations about the Unit 1-3 modeling algorithms and their connections.
Use page 4 to address and compare the interpretability & flexibility of the Subset Selection (e.g. backward stepwise), LASSO, Least Squares, and GAM algorithms. Where would you place KNN on this graphic?

STAT 253 code

Check out and reflect upon some tidymodels code comparisons here. Copy, use, tweak, and add to this in whatever way suits you!

Learning Reflections

By the end of the week, take 10-20 minutes to write in your individual Reflection Google Doc (shared with you and your Professor). In this reflection, consider your preparation for each class, your engagement in the course in and out of class, your actions towards collaboration, and your exploration of the material through the homeworks with the ultimate goal of meeting my learning goals for you and your own learning goals.