Motivating Question

Where are we?

Within the supervised learning framework, we have a categorical response variable \(y\) and a set of potential predictors \(x\). For example:

  • y = vote / don’t vote, x = (age, party id, …)
  • y = spam / not spam, x = (# of $, # of !, …)
  • y = human / car / plant, x = (speed, shape, …)

We have the following goals:

  • Build a classification model
    We’ll use the following techniques to build classification models of \(y\) from predictors \(x\):
    • parametric techniques
      • logistic regression (with or without LASSO!)
      • support vector machines (optional)
    • nonparametric techniques
      • K Nearest Neighbors (KNN)
      • classification trees
      • random forests and bagging

  • Evaluate the quality of a classification model
    We’ll use the following metrics and tools to evaluate the quality of a classification model:
    • overall accuracy, sensitivity, & specificity
      We can approximate these metrics using in-sample and cross validation techniques.
    • ROC (receiver operating characteristic) curves