7.1 Standard Error

In Chapter 5, we talked about how our estimates might vary if we had a different sample of data. This random variability could be due to random sampling or randomness if the data values. In order to get a sense of the uncertainty of our information as it relates to the true population, we need to quantify this variability across potential samples of data.

The tricky piece is that we do not get to observe more than one sample in real life. We have to imagine all other possible samples without actually observing them.

In Chapter 5, one tool we used was bootstrapping, in which we resampled from our sample (with replacement) to mimic the other possible samples. The variation in the bootstrap estimates could be used.

In Chapter 6, another tool we used was probability, in which we use mathematical theory and assumption to help us imagine the other possible samples. We can estimate the standard deviation of a random variable. We’ll refer to this as the classical approach as compared to the bootstrap approach.

Using either of these techniques, we can calculate the standard error (SE) of a sample estimate which is the estimated standard deviation across all possible random samples. In other words, the standard error is our best guess of the standard deviation of the sampling distribution, the distribution of sample values across all possible samples of the same size.

7.1.1 Bootstrap Standard Error Estimate

We started this chapter with a question about arrival delay times in the winter compared to the summer. Based on one sample of 100 flights, we estimated that in the winter, the delay times are on average about 0.6 minutes less than in the summer. How certain are we about this number?

Let’s resample our sample and quantify our uncertainty through bootstrapping.

boot_data <- mosaic::do(1000)*( 
    flights_samp %>% # Start with the SAMPLE (not the FULL POPULATION)
      sample_frac(replace = TRUE) %>% # Generate by resampling with replacement
      with(lm(arr_delay ~ season)) # Fit linear model
)


# Bootstrap Standard Error Estimates of coefficients
boot_data %>% 
    summarize(
        se_Intercept = sd(Intercept),
        se_seasonwinter = sd(seasonwinter))
##   se_Intercept se_seasonwinter
## 1     5.331612        6.781946

The standard error for the regression coefficient for seasonwinter is 6.8 minutes using the bootstrapping technique. That tells us that on average, the difference in mean arrival delays between winter and summer could be 6.8 minutes greater or smaller. Also, this tells us that our sample estimate could be off the true difference in the population by as much as 2*SE = 2*6.8 = 13.6 minutes (think back to the 68-95-99.7 Rule). In the context of the scale of the observed difference (half a minute), we are very uncertain!

If the standard error was 0.1 minutes rather than 6.8 minutes, we’d be much more certain that there is about a half a minute difference in the mean arrival delays between winter and summer.

You’ll notice that the code above has exactly the same structure as the bootstrapping code in Chapter 5. We calculate the bootstrap standard error by calculating the standard deviation of the bootstrap estimates. We ran this code in Chapter 5 to summarize the spread of our bootstrap sampling distribution. We didn’t name it the standard error quite yet. Also note that every time you run this code, you’ll get slightly different estimate of the standard error because bootstrapping is a random process.

Focus on the magnitude of the standard error and consider the number of significant digits of both the estimate and the SE. If our estimate might be off by almost 15 minutes, we should worry too much about the hundredths of a minute in our estimate.

7.1.2 Classical Standard Error Estimate

The other tool we can use to get our standard errors is with mathematical theory and probability. Using probability, we can write out equations to estimate the standard error. If you are interested in learning about these equations, you should take Mathematical Statistics!

For this class, you just need to know that R knows these equations and calculates the classical standard error for us.

In fact, you’ve already been looking at them in the output of the fit linear model. With the tidy output below, look for the column that says std.error. These are the classical standard errors.

lm.delay %>% 
  tidy() #SE for each estimate is in the std.error column
## # A tibble: 2 x 5
##   term         estimate std.error statistic p.value
##   <chr>           <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)     6.69       4.87    1.37     0.172
## 2 seasonwinter   -0.655      6.82   -0.0960   0.924

You’ll notice that for the seasonwinter regression coefficient, the equations give a standard error of 6.817 minutes, which when rounded is the same as the rounded bootstrap standard error of 6.8 minutes.

These are two different ways of quantifying the random variability and uncertainty in our sample estimates only using ONE observed sample of data. In practice, we only need to use one of these approaches because they should give us similar values assuming a few conditions hold.

Using either estimate of the standard error, based on the sample data, our best guess is that the mean arrival delays is about 0.6 minutes less in winter than in summer but we might be off by about 13.6 minutes. So we could say that our best guess at the difference in mean arrival delays is \(0.6\pm 13.6\) minutes, which is an interval estimate of the true population value.

How “good” of a guess is the interval estimate? Is the population value in the interval or not? How might we know?