Nonparametric Models

Brianna Heggeseth

As we gather

  • Sit in your assigned groups. You will stay in these groups for ~2.5 weeks and work on a group assignment together outside of class.
    • Introduce yourselves and learn a bit about eachother!
    • Write down each group member’s name.
    • Decide how you will communicate about the assignment – email or Slack – and exchange the relevant information.
  • Open today’s Rmd.

Announcements

  • Thursday at 11:15am - MSCS Coffee Break
    • Smail Gallery

Notes - Nonparametric v. Parametric Models

CONTEXT

  • world = supervised learning
    We want to model some output variable \(y\) using a set of potential predictors (\(x_1, x_2, ..., x_p\)).

  • task = regression
    \(y\) is quantitative

  • model = nonparametric regression???

Just as in Unit 2, Unit 3 will focus on model building, but a different aspect:

  • Unit 2: how do we handle / select predictors for our predictive model of \(y\)?
  • Unit 3: how do we handle situations in which linear regression models are too rigid to capture the relationship of \(y\) vs \(x\)?

Notes - Nonparametric v. Parametric Models

MOTIVATING EXAMPLE

Let’s build a predictive model of blood glucose level in mg/dl by time in hours (\(x\)) since eating a high carbohydrate meal.

Consider 3 linear regression models of \(y\), none of which appear to be very good:

\[\begin{array}{ll} \text{linear:} & y = f(x) + \varepsilon = \beta_0 + \beta_1 x + \varepsilon \\ \text{quadratic:} & y = f(x) + \varepsilon = \beta_0 + \beta_1 x + \beta_2 x^2 + \varepsilon \\ \text{6th order polynomial:} & y = f(x) + \varepsilon = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 x^4 + \beta_5 x^5 + \beta_6 x^6 + \varepsilon \\ \end{array}\]

Notes - Nonparametric v. Parametric Models

These parametric linear regression models assume (incorrectly) that we can represent glucose over time by the following formula for \(f(x)\) that depends upon parameters \(\beta_i\):

\[y = f(x) + \varepsilon = \beta_0 + \beta_1x_1 + \cdots + \beta_p x_p + \varepsilon\]



Nonparametric models do NOT assume a parametric form for the relationship between \(y\) and \(x\), \(f(x)\). Thus they are more flexible.

Small Group Activity - Intuition

Work as a group on exercises 1 - 4.


Consider W.A.I.T. Why Am/Aren’t I Talking?

  • Actively work to give everyone a chance to contribute and share.


You are tasked to coming up with a nonparametric algorithm to estimate \(f(time)\) in the equation

\[glucose = f(time) + \varepsilon\]

Small Group Activity - Distance

Central to nonparametric modeling is the concept of using data points within some local window or neighborhood.

  • Defining a local window or neighborhood relies on the concept of distance.
    • With only one predictor, this was straightforward in our glucose example:
    • the closest neighbors at time \(x\) are the data points observed at the closest time points.

GOAL

Explore the idea of distance when we have more predictors, and the data-preprocessing steps we have to take in order to implement this idea in practice.


Work on Exercises 5-8.

Small Group Activity - Preprocessing

In nonparametric modeling, we don’t want our definitions of “local windows” or “neighbors” to be skewed by the scales and structures of our predictors.


It’s therefore important to create variable recipes which pre-process our predictors before feeding them into a nonparametric algorithm.


Work on Exercises 9-15.

Main Points

  • If the relationship between \(x\) and \(y\) is not a straight line or a polynomial (such as quadratic), we might need nonparametric methods.
  • One needs to consider the scale of variables when calculating distance between observations with more than one predictor.
  • Pre-processing steps invoke important assumptions that impact your models and predictions.

After Class

  • Finish the activity, check the solutions, and reach out with questions.
  • Continue to check in on Slack. I’m posting announcements there.

Upcoming due dates

  • Due Tuesday:
    • Homework 3 (posted)
    • Checkpoint 6 (posted)