3.3 “Best” fitting line

To choose the “best” fitting line, we need to find estimates of the intercept (\(\beta_0\)) and slope (\(\beta_1\)),

\[ E[Y | X] = \beta_0 + \beta_1\,X \]

that gives us the “best” fit to the \(n\) points on a scatterplot, \((x_i,y_i)\) where \(i=1,...,n\).

What do we mean by “best”? In general, we’d like good predictions and a model that describes the average relationship. But we need to be more precise about what we mean by “best”.

3.3.1 First idea

One idea of “best” is that we want the line that minimizes the sum of the residuals, \(e_i = y_i - \hat{y}_i = y_i - ( \hat{\beta}_0 + \hat{\beta}_1x_i)\). The residual is the error in our prediction, the difference between what you observe and what you predict based on the line.

Problem: We will have positive and negative residuals; they will cancel each other out if we add them together. While a good idea, this definition of “best” won’t give us what we want. We’ll want an idea that deals with the negative signs.

3.3.2 Second idea

Another idea of “best” is that we want the line that minimizes the sum of the absolute value of the residuals, \(\sum_{i=1}^n |y_i - \hat{y}_i| = \sum_{i=1}^n |y_i - ( \hat{\beta}_0 + \hat{\beta}_1x_i)| = \sum_{i=1}^n |e_i|\).

Problem: This definition of “best” results in a procedure referred to as Least Absolute Deviations, but there isn’t always one unique line that satisfies this. So, while this is a valid definition of “best,” this isn’t stable as there isn’t always one “best” line.

3.3.3 Third idea

Lastly, another idea of “best” is that we want the line that minimizes the sum of squared residuals, \(\sum_{i=1}^n (y_i - \hat{y}_i)^2= \sum_{i=1}^n(y_i-( \hat{\beta}_0 + \hat{\beta}_1x_i))^2=\sum_{i=1}^n e_i^2\).

This is referred to as Least Squares and has a unique solution. We’ll will focus on this definition of “best” in this class. It also has some really nice mathematical properties and connections to linear algebra.