3.10 Selecting a Model

With a multiple linear regression model, we have many decisions to make. We need to decide which explanatory variables should be in the model and what form (indicators, interactions, transformed) they should take. To help guide your decision making, we provide a list of helpful tools.

3.10.1 Satisfying conditions

Choose \(X\) variables (& their form) that:

  • have a fairly strong linear relationship with equal spread with \(Y\) (for quantitative \(X\))
  • that don’t have a pattern in a residual plot against \(X\) (for quantitative \(X\))
  • can explain large differences in average \(Y\) (for categorical \(X\))
  • capture modifying effects or different slopes (use visualizations to justify interaction terms)

3.10.2 Provide useful model

Choose \(X\) variables (& their form) that:

  • can explain a lot of variation in \(Y\) (higher R-squared)
  • provide small prediction errors (smaller standard deviation of residual)
  • have coefficients that are really different from 0 (Is the Difference Real?)
  • adjust for confounding variables to help you estimate direct causal relationships (not mediators or colliders; see Section 3.11)
  • have unique information that is not redundant (higher adjusted R-squared; see Section 3.9.5