4.6 Alternative Classification Models

Logistic regression is a very useful model to predict a binary outcome. However, it has its limitations. We are assuming a linear relationship between explanatory variables and the log odds of success. This is hard to check because we don’t have a variable for odds that we could quickly plot.

Other methods out there are more flexible but also more complex. Here is a list of some of the most popular classification methods.

  • Classification Trees can predict a binary outcome and choose the variables that are most important by recursively partitioning the data into groups that are more similar in terms of the outcome as well as in chosen predictor variables.

  • Random Forests are an ensemble of classification tress that together are more stable than any one classification tree.

  • Boosted Trees are classification trees that are sequentially created to target the errors from the last tree.

  • Neural Networks are a type of classification algorithm that creates new features based on the original data that are the best predictors of the outcome.

Take Statistical Machine Learning to learn more about these methods. But keep in mind that sometimes for a task, complex is not necessary: see https://www.huffingtonpost.com/2014/02/10/klemens-torggler-evolution-door_n_4762261.html.